Basics Uncovered...

Find a fast solution

2009-09-16T05:51:00.000-07:00

I came across this problem recently and found it interesting.

Say you got an input array and output array of equal length. The input array has random integers and you need to populate the output array by following this criteria.

The element at index i of the output array will the product of all the elements in the input array, excluding the element at position i.

ie output[k] = Product of i's { i=0..n-1, where i≠k }

Write a code / algo that will work the FASTEST.

2009-05-18T19:28:00.000-07:00

Two return values?

Hey ! I happened to think about the following possibility in Java and just wondered how good or bad is to do it - both return a value from a method and also modified the mutable contents of an object referred to by the reference passed as a parameter.

Team whoWonTheMatch(Team pCSK, Team pKKR) {

...
//some look up
...
pCSK.setWon(true);
...
return pKKR;
//btw the match wasn't a tie!

}

Now look at this ... can you get the point. I came across a code today - We wanted to build database queries dynamically. User can pass any condition by which he wants to limit the rows dynamically (we say he can pass filters to the query). This method (that let to this post) modified a StringBuilder passed to it by adding the where clauses and returned a Map that had key value pairs that held the name=value pairs to be filled in those where clauses. So 5 minutes in the code review the presentees were wondering "yeah...the method returns the Map as per user's preferences (his choices of filters) but where's the Query built"...it was already done in side the same method...did the user return two values?

Is this a good design. Am I missing something. Should we adapt to the above model. Should this be discussed at all. Correct me if I am wrong.

BTW, last night Kolkata Knight Riders (KKR) just stunned the Chennai Super Kings (CSK) by chasi ng 189 of the last ball.
http://content.cricinfo.com/ipl2009/content/story/404882.html

git - a stupid content tracker

2008-11-08T11:03:00.000-08:00

To start with Source Code Configuration Management is basically tracking your source code with changes made to it. A consistent change will be recorded and can be kept track as a history. In this way there is no single copy of the file. Its maintained as change of change. This is how source code is maintained in any project in both open source and closed source.

"Maintaining code is equally important to writing good code"

Some of the source version control systems that I can think of are CVS, SVN, Perforce and some other commercial ones. I had worked with CVS and SVN extensively. :) In fact was introduced to SVN first and then asked to move to CVS. Then I realised such a pain CVS was. Why do we need another version control now? Are there any major flaw with CVS or SVN? I wouldn't say "Yes". But few things can be improved. Like....

SVN or CVS have one single source repository! In simple, all developers in the code base commit their changes to the server 34.56.7.67 So its a single point failure. What if this server goes down, How will you make a checkin or update your code base?
Not so easy branching and def not so easy merging. I was asked to create a new branch in my CVS repository (called tag) for a checkin of code of a future release feature. Its such a painful thing.
Share code, Say your peer was also working on the same feature, you needed his code to merge and do a integration testing to see if things are fine before your checkin. But today this is possible by asking him to checkin first and then you update from repository.
My source code is so big that maintain one or two copies drinks my hard disk space. Since they are two physical copies of the same file.
Oh! you work on multiple projects one on CVS and other an SVN. Also finding it tough to remember those commands for both.
Yeah! eclipse plugin on Linux for CVS sucks. It had never told me a diff correctly or updated/merged code at ease.
Thinking of more... (welcome and contribute...)

Git for the rescue! Git is a distributed source control management (SCM) software. So whats "distributed" in terms of SCM. The source tree is replicated at numerous number of places than a single server. This means each replication is a repository. Any one can do a checkin any where and checkout anywhere. :) Boy! Isn't that messy. I would say "No" its more flexible now. So finally who has the master copy of the project. May be the master guy would have it. (take Linux Kernel Development)Linus would be having the master source copy. Gregkh is kernel maintainer under Linus, he will in turn have his own copy. Say I'm a kernel developer, I need to send a patch for the Kernel source code for a bug. I will do a checkin and if will ask Gregkh to pull from my repository and see if things are fine in his branch. If things are fine, Linus will in turn pull it from Kregkh or Gregkh can collect all commits for a week and ask Linus to pull on Friday evening so that Linus would merge it over the weekend. This forms a network of trust. If there was an issue with my fix Gregkh will throw back the code to me saying the fix was bad. If tomorrow Linus repository has a bad network that I cannot update my code, then I can pull the latest changes from Gregkh too.

Branching and merging is one of the most impressive feature of git. In git a branch is created with 0 bytes. Yes zero bytes (techinically 40 bytes for internal house keeping). So then based on the changes to the brach the diff of source that it changes from master (main branch) is alone stored. But in case of CVS or SVN its a separate physical copy of the file in the disk. Say your project has 6000 files. You had created a new brach for the new feature they are adding. How many files will be edited, I could guess as some 100. So why maintain a copy of other unchanged files in this branch. git has an intelligent merge algorithm that does merge is seconds. Linus claims it be one of the fastest. Auto merge is beautiful and helps you to easily resolve conflict.

The source code checkin is saved diff of previous and a SHA1 hash is created for each commit made to the source code. This saves a lot of disk space and git uses compression techniques to make sure the size of source code is less despite it bloating on the CVS or SVN repository. Say you have the source code in your laptop and you travel daily 2 hours commuting to your office, you can sit in the car and commit to the local branches within your system and then push all changes together to the main repository once you reach office. ;)

git also comes with a nice gitk gui that shows the tree structure of the branches and whats the commit that's done. Its also in Handy to do a search of commit comments and look at the diff.

Oh Wait, I'm already using SVN at my office. I cannot ask my CTO to adopt git for upcoming projects. Hold your peace, git can work over CVS and SVN like a wrapper. You do all the changes for your source code in your desktop and maintain it using git. Everything happens using git and your back end in CVS or SVN will get updated as normal commits. :)

References:

Git Home Page

Google Tech Talk by Linus Trovalds. I should admit I got inspired from this.

#git on freenode

Git Community Book - really comprehensive.

Git on Wikipedia
GitCasts

Exploring the topics least spoken about - 2

2008-10-30T12:53:00.000-07:00

So when I say in the sweep phase, the memory space is reclaimed there are more than one way of reclaiming the memory space. You can mark the memory as free space and can use that to allocate new objects. This is what typically happens in Mark and Sweep algorithm. There is another variant of Mark and Sweep algorithm and it is called Mark and Compact algorithm. Instead of just marking the space as free space, during the compact phase, all the active objects are moved together and all the free space is grouped together. So what is the advantage of doing this? Well, obvious advantage is improved performance and improved object allocation time. So if all the free space is grouped together, then memory space for objects can be allocated using contiguous blocks of memory units and therefore, leads to better object allocation time as there is no fragmentation.

Ok, so is this the best GC Algorithm available. The answer is no. There are some other variants that use the concept of generational garbage collection. The common observation in object lifetimes is typically an application creates a huge number of short lived objects of small size. Therefore, only a few of the objects live longer and only a few objects are bigger. So how can we take advantage of this observation? Well, the whole heap space is divided into three generations - Young generation, Old generation and Permanent generation. Young generation as the name suggests holds the newly created objects. When the object stays in young generation heap for a long time, then they are moved/tenured to old generation heap space. So what is this permanent generation. Permanent generation holds the objects that are not meant to be Garbage collected very often. Say for example, each compiled class has a corresponding class object representing the class in the Java runtime environment. These objects are not meant to be garbage collected often. So they are present in the permanent generation.

Young generation heap space is further divided into eden space and survivor space. Eden space is the first point of entry into the heap space for any new object. If the object is active during the garbage collection, then they are moved to the survivor space. When the object stays in survivor space for a certain number of garbage collection cycles, then they are tenured to the old generation. If the object is big enough to be accomodated in the eden space or survivor space, then they are directly allocated on the old generation heap space. Typically, young generation heap space is smaller but often garbage collected. In case of old generation heap space, it is allocated larger space but not often garbage collected.

Exploring the topics least spoken about - 1

2008-10-30T12:22:00.000-07:00

There are certain topics in Java that are not discussed in detail in a lot of places. As a new grad student, the chances are quite high that the candidate wouldn't have heard about these topics. So I would like to talk about these topics in the following set of blogs.

The first thing that I am planning to write about is Garbage Collection, shortly called as GC. So what is this GC? To answer this, lets see how the memory management in traditional programming languages like C and C++ works. In C and C++, variables, methods, classes and objects all require some definite amout of memory space. Since memory is a precious resource and is limited, memory that is allocated to an object has to be reclaimed back once the object is destroyed. In case of C and C++, the developer has the freedom to allocate memory to an object and at the same time has the responsibility to reclaim the unused memory back. In this way, memory is made available for the object that is being created. But life is not always beautiful. What happens if the developer doesn't claim the unused memory back. Then it results in memory leak and the memory usage of the application keeps growing because of this memory leak. This is not all. What happens if a developer claims back the memory space allocated to an object while object is still being used. This results in corrupt memory pointers, also known as dangling pointers.

In an effort to relieve the developer of all these nightmares, automated memory management system was incorporated into Java runtime environment. As a part of this system, developer doesn't need to take the responsibility of claiming back the unused space. So how does this system work? Well, the memory allocation to the object happens when "new" keyword is used to create a new object. Java has a special area for allocating memory to the object and this area is called "Heap". But in Java programs there are variables that hold the reference to the objects. So where is the variable allocated? Variables referencing the objects are allocated on the method stack and the variable goes out of scope when the stack frame is removed from the stack or in other words, when the execution exits the method, the variables inside the method scope is destroyed.
So back to the heap and allocation, each object is allocated a space in the Java heap and stays in the heap as long as the object is used and once the object is no longer used by anyone, the memory space allocated for the object is claimed back by the memory management system aka GC. So how do you know if an object is used or not. The simplest way will be something like keeping track of all the references to the object and whenever a reference goes out of scope, the reference count is decreased. So this type of reference counting can be assigned to each of the object and whenever a reference points to this object, the reference count is increased and when a variable is destroyed, the reference count of the object is decreased and when the object 's reference count is 0, this object is no longer used and can be claimed back. Ok, this method looks simple and probably reference count increment/decrement and all those things might add some overhead. But still it looks simple and doable. But unfortunately if we use "Reference counting" garbage collection scheme, Java performance might be bad and of course, the science of garbage collection has grown tremendously over the years and we do have some advanced garbage collection techniques. This reference counting system is not used in any JVM implementation that I know of.
Well, thats brings us to defining the relationship between the variables on the stack and the objects on the heap. Well, variables on the stack reference the object on the heap. So if we analyze all the variables on the heap and reference that each of the variables point to on the heap, we can identify all the objects in the heap that are active. So we can mark all these objects as active objects and we can reclaim the heap space occupied by the objects that are not active i.e., inactive objects. so, this particular scheme is called "Mark and Sweep" algorithm. The phase in which the variable references are traced down and marked as active is called mark phase and the phase in which the space is claimed back is called sweep phase. So what about the disadvantages of this scheme? well, during the mark phase, in order to ensure that active object tracing is done without any discrepancy, the execution of the main application program is stopped and active object tracing or mark algorithm is run. So this is called "Stop the world" Garbage collection and it causes pause in the execution of the application program. This is a typical disadvantage of mark and sweep algorithm. So what is the starting point of this mark or tracing operation. Its the active stack frame on the method stack thats the starting point. Each of the variable on the active stack frame is traversed and so on.
More about other garbage collection schemes in the following posts....

How to secure wireless network

2008-08-04T04:21:00.000-07:00

I'm mainly writing down the point to secure a "HOME" wireless network.

1. Change SSID
Many -- almost all -- wireless router comes with default SSID (service set Identifier) - kind of unique in itself. Something like linksys router uses "linksys" as default SSID. So if you are using a default SSID that gives more information about your wireless network. Its something like broadcasting "Hey I am a wireless router and you can connect to me. And yeah by the way, I am linksys router". Though that may not be much, but it give information to a potentiality hacker to know which router you are using and may be narrow down to one which he should try to hack or use -- may be get on net and know what are possible attack on that and try them.

Better set SSID to some random junk characters. Copy the SSID and save for later use.

This may not be much security, but its good not to offer low hanging fruits -- encouraging someone to actually get them.

2. Hide SSID
So the other people can't see it. Hide SSID by disabling SSID broadcast. So they will not see your network and not try to connect to them.

3. Change router's password
Again most router are shipped with default password. For example linksys router has the default password as admin. And its very easy to get default password for these boxes. Just google for "default password" and find the vendor and model. So if you have default password, someone can logoin to your router and change configuration, may be stealing all the bandwidth or do much more than that.

4. Disable remote administration
By disabling remote administration, actually you are saying "No can can change my router without connecting to wireless router through a wire." That's pretty cool. So no one out there can change your wireless router stuffs. That's a very much important.

5. Carefully upgrade firmware
Upgrade firmware if you really need to. Check the change log and find out if there is really some critical upgrade that you should install in. Secondly never upgrade the firmware on wireless, always connect physically to router and upgrade firmware.

6. Enable encryption
Most important point and a must. Enable encryption on your router so that no one can actually sniff the packet and know what you are doing or be man in middle. And it would be good to use a intelligent password - like mix of upper case and lower case, alpha numeric, numbers and large string so that its practically impossible -- or at least hard enough - to break it. May be use some kind of random key generator to do that. (and save the password somewhere safe and secure)

----
That will do a lot to make your network secure. There are even other stuff like "MAC Address filtering" but that may not be of much use cause mac address is actually sent as part of IP packet. so ...... anyway do that if you wish so.

Cannot detect USB drives on Windows?

2008-07-23T22:10:00.000-07:00

Recently in my home PC (windows XP) none of the USB drives started detecting. External hard drives, Ipod (gets detected via iTunes) but not through explorer or even an USB drive. Googling around landed me with this Microsoft support link that helped me resolve the exact issues. Also listed down possible places where they can fail and work around for them. Also trick I learned is to connect the USB drive directly to the USB ports in the mother board than through an USB extension cable.

Hope this will help, next time around when you face this.

Another known trick in the book for reducing the startup time is configure your options in 'Startup' tab of 'msconfig'. Start->run->msconfig.

You know more of this sort, send them in.

S#%t! I broke the build

2008-02-16T10:46:00.000-08:00

A usual scenario with big projects where code is written an maintained by a big crowd of geeks & nerds is tend to break the system once a while. They are usually termed as 'Build breaks'. It could be because of compliation, it could be because of linking. Common reasons could be, it compiled well on my system well but broke when comitted. Finally more than the hours that we put on development of the feature, we need to put to fix these build breaks.

Could this be avoided, NO. But could be identified and rectified soon. Here is a tool from Apache, that could help you in the activity. 'Apache Continuum' is a one such tool that helps in triggerin automated builds for your project. They provide with a simple configuration portal that is easy to deploy and start your timely builds. Try it!

Some features that has impressed me are,

Support for various source repository.
They work with Maven projects and could launch a cmd job.
Notifications from mail to IRC to Chat clients seems to be comprehensive.
Keeping track of the check-ins and some intellegence in triggering a build.

How to write maintainable code

2007-12-24T03:25:00.001-08:00

On the contrary, do you really want to write a maintainable code ?? Which can be maintained by anyone - that makes you eligible for being jobless !!

So better be safe and write the code which can be maintained by YOU so that it helps you to save your job :D But don't go too far which can cause the complete program to be rewritten . (and you will not get second chance to do it !!)

Don't take that seriously, I hope you understand what should be DOs and DONTs. !!!

I remember, we had a teacher for AI (Artificial Intelligence) who has different style for teaching. He starts with a wrong statement and convince everyone that he is correct. Finally when every one is with him, he tells that he was bluffing and tell you the points that you missed to prove him wrong.

In one of such class, he made a statement that every question can be answered as "yes" or "No". But by that time student understands his style !! So one of the student question him asked whether he can answer his question in "Yes" or "No.
Question was simple- "Have you stopped beaten your wife ??"

On the same note, here is a nice article on "How to write maintainable code" which teaches DONTs by saying it as "DO it".

NP and Weirdness!

2007-12-15T09:20:00.001-08:00

I just feel very weird about this whole NP class of problem. They are so many problems belonging to this class of NP and yet there are all the same. I can take one NP complete problem and can reduce that problem to another NP Complete problem in polynomial time. To quote that in plain words, If I have the algorithm to solve one problem, I can use that algorithm to solve the other NP problems. I just have to convert one NP problem to other in polynomial time! Even more weird is this thing: There is no known best solution for this problem. And to make things even more weirder, though there is no known best algorithm to solve this problem, but given a solution we can verify its correctness in polynomial time. That is, If you give me a solution to this problem, I can verify if the solution given is really a solution to this problem. That too, I can do this verification in polynomial time.

These are not the only weird things about NP class of problems. There are lot more. There is no known proof which either proves or disproves NP is equal to P or NP is not equal to P. But still any computer engineer will say that NP is not equal to P. If you ask the reason, then he will say if NP is equal to P then there should be a way to solve the NP problems in the order of polynomial time. What does this mean? It just means that a computer engineer's ego wont let him accept that he doesn't know how to solve these problem. So, he just says there is no best solution for this class of problem because he can't find one!

Isn't it really very weird?

Java Performance tips & Programming practice for Newbies

2007-06-28T18:51:00.000-07:00

Off late, I have started working on Axion DB, a Java based embedded database engine. I was working on various performance improvements for Axion DB. I was amazed at the speed of Axion DB on JDK 5. I am was sure that Axion would be still faster on JDK 6. So, we started testing the database with large datasets. My machine is a Dell D620 laptop, duo core processor and 2 GB RAM. We tested with 1 million rows of CSV file, even then Axion didn't show any hiccups. So we tested the table creation in Axion for 10 million rows of CSV File(~4 min) and compared it with SQL Loader of Oracle (1 min and 5 sec). Then we realized that Axion was 4X slower than Oracle SQL Loader.

The first thought that came to our mind was whatever we do we can't beat Oracle SQL Loader, not even going to match their performance. They must be using C/C++ and this is Java and there's nothing much that can be done to help it. Out of curiosity, we started profiling the Axion to find the hot spots. When we profiled the application, then stuck the lightning. We were having severe performance bottleneck at the place where we parse each line and create the column value.

These were the reasons for the bottleneck:
1. Lot of method calls.
2. Lot of heavy weight object creation.
3. improper use of for loop.
4. IO bottleneck.
5. No multi threading.
6. unnecessary method calls.

From these, I learnt a few essential points. I will try to explain them each.

1. If you doing some low level related work where speed is absolutely important, then don't use the Java built-in data types. Say for example, you are using Integer class, just think of all the overhead you are creating by using the wrapper class. You should always use the primitives instead of wrapper if you want to achieve performance. How many of use consider using unsigned int or unsigned long in our day-to-day Java applications. We don't use it because Java application that you are creating is fast enough not to bother you any longer. So, you don't think twice before using the Java build-in wrapper classes. But in Database kind of scenario, you should be using Primitives to achieve speed and performance. They manipulate the bits directly which gives the max possible performance. We use Apache commons-primitives to achieve this. There must be couple of other open source primitives available. Trying using one of them.

2. Take a look at the following piece of code.

private long getLength(String stringObj) {
long len = 1;
for(int i = 0; i < stringObj.length(); i++) {
len++;
}
return len;
}

Ok, don't ask me why should you write this getLength() method for a string again. Its the simplest example that came to my mind immediately. Ok, back to the problem, can you find out what is the performance bottleneck here. This is a perfectly normal one that we write in almost every program. But this is such a big blunder that we are making. Think in terms of this method getting called for a very very big string. Ok, this is it. Take a look at the for() statement. The comparison section says, i < stringObj.length(). Assume this stringObj is of length 1 million. Then this i < stringObj.length() is going to be called for a million times. For each method call, your JVM is going to call the method, push the method related entry in the stack. Then after the method call, its going to pop out the method related meta data. So if this is done for a million times, think of the overhead.

If this kind of method calls are unavoidable, then thats a different issue altogether. But check the above method. Its very clear that we need the length of the string object to compare with the current index. So why not do it like this.

private long getLength(String stringObj) {
long len = 1;
for(int i = 0, I = stringObj.length(); i < I; i++) {
len++;
}
return len;
}

Ok, now look at the method. Now you see that the length() method is called only once at the beginning of the for loop. This is not going to be called a million times. This certainly improves the performance by 2X to 3X times. This 2X to 3X time improvement is going to make a big impact in case of processing a million row.

Now is there any other optimization that can be made to this. Go through the method again. Assume that this particular getLength() method is going to be called a million times. Now think of this again. If you check the method, you have a len object which holds the length. you loop through the object and find the length and return the length object finally. Ok, let me write this method in a different way.

private long getLength(String stringObj) {
long len = 1;
for(int I = stringObj.length(); len <= I ; len++);
return len;
}

Now if you see, the variable 'i' used as a index is eliminated. I dont mean to say, you are always doing this kind of mistake. I just mean to say, if we code with proper care, then Java can give you really good performance.

3. Third biggest point is allocate memory judicially. Just because Java automatically does cleanup, it doesn't mean that we can program anything and expect the JVM to do the optimization for us. Of course, JVM does the optimization but still why do you want to leave it to others when you can do it for yourself. If you sure that you dont need a long, dont use it. Use int. If you feel you dont need an int, then use short. Think twice before you declare a variable.

4. Take a look at the following code.

private String getString(char[] charArray) {
for(each character) {
if(isDelimiterChar(character)){
// do something.
}
if(isQuoteChar(character)) {
// do something.
}
if(isEOF(character)){
// do something
}
}
}

boolean isDelimiter(char c) {
return (c == ',');
}

boolean isEOF(char c) {
return (c == -1);
}

boolean isQuoted(char c) {
return (c == '\"');
}

Now, Assume this piece of code is going to be called for a million times. Then think of this code again. Object orientation in Java helps us tremendously. It helps in modularization, code reusability, etc. But all those things are not always useful. If you coding for performance, then rethink about that. Its always better to write Inline code where ever possible since it reduces the overhead of method call - like push, pop into the method stack, etc. So if you see the above piece of code, the method isQuoted checks if the given character is double quote character, method isEOF checks if char value is -1, method isDelimiter checks if char is equal to comma character. So this is not something that we have to do it in method. Its not a complex piece of code that can be made as a method so that this method can be reused easily. It can be directly inlined as given below.

private String getString(char[] charArray) {
for(each character) {
// check for comma character
if(character == ','){
// do something.
}

// check for quote characted
if(character == '\"') {
// do something.
}

// check for EOF
if(character == -1){
// do something
}
}
}

Though the above doesn't look as elegant as the other one, still this one is going to be faster than the other piece of code. So keep in mind, elegant code is not always the best performing code.

5. Don't take risks that you can easily avoid. In case of coding an application that needs high degree of reliability, don't take risks. Check the following piece of code.

private boolean isNullString(String string) {
return (string.equals(""));
}

So, this looks like a absolutely normal piece of code. But still there's a hidden trap here. What is the string which is passed is null. So try to rephrase the code as given below.

private boolean isNullString(String string) {
return ("".equals(string));
}

This should work now without any null pointer exceptions.

6. Of course, IO is always a performance issue. But now in Java with NIO, its really simple and efficient to do IO related tasks. Always use buffering if you need performance. Because if you directly use FileInputStream, JVM is going to issue file read system call everytime you read a byte of data. In case of buffering, whole buffer is read in one go and only if there's no data in the buffer, JVM issues a system call to read the disk.

7. Always synchronize on a lock before waiting on the lock. Also, try to use wait with time. Else your thread might wait indefinitely, if there's no one else to notify this thread. Also use wait in a loop and check on a condition which is expected to be updated by other threads.

We also parallelized the file read and used multiple threads to read the same file in a faster manner. After all these performance tuning in Axion DB, we could finally create the table for 10 million rows of CSV file in flat 45 Seconds! Can you believe it! Even I am Awe Stuck! I can vouch that if you code carefully and judiciously you can almost match C/C++ performance(Of course, its hard to exceed compiled code(C/C++) performance with interpreted(Java) code) with all the hotspot GC and other advanced technologies available in Java.

As a newbie in the industry, these were certainly few of the best programming practice that I learnt from my experience at Sun. I was committing almost all of these mistakes when I came out of the college. It helped me improve the application performance tremendously. Hope this post helps other newbies!

Java

2007-05-11T23:42:00.000-07:00

After a long time from the Sun man!

One on String buffer
One on Weak links in Java

Yahoo!

2007-01-14T06:48:00.000-08:00

Original post dated: 14th Jan 2007
Guess the redirection is not handled. Hope Yahoo! fix this. Here is once such where the url is not redirected to error page/home page.

Repost Dated: 30th Jan 2007
The above url was fixed, Now gets redirected to the Yahoo search page.

Learn Shallow and Deep Copy - In Minutes

2006-11-10T08:06:00.000-08:00

Recently I was in a talking with my HOD, he was telling this example for "Shallow Copy" and "Deep Copy" in C++!

Consider, I went into the examination hall unprepared having full faith in Srihari (my friend) for my answers. Srihari is a gifted person, who prepared very well for the exam and sure to take the test in ease. I was sitting on the cross diagonal to him to miss the eyes of the invigilator and finish the job. Once the exam started Srihari was composing the answers in great flow. He was writing neatly and well spaced.

Me after hiding behind all the heads before me, stated to peep into his paper and copying the answers in a hurry. I didn't mind abt the hand writing and line spacing. Could read his paper once and spit them in my paper without finding out the real context of it. In the process I copied this line exactly as it is, "The architectural diagram is given in the Figure 2 of Page 10."

I state this particular line as a "pointer". Pointer to the figure in Page 2 of Srihari's paper. If I would have been lucky it should have been the same in my paper too. But it was not. In my paper it was in Page 4 Figure 1. Here comes the point...

If there hadn't been a pointer line like this, then the copying I made is referred as a shallow copy. Else with the pointer, I should make it to refer the correct figure in my paper. This is Deep Copying. When ever we have a pointer in the class and make it reference to the new location.

Modular Application Development

2006-06-21T23:42:00.000-07:00

In the earlier days, programs were primarily structured. The spagetti code style of old Fortran and Basic were replaced by the structured programming style of C language. But this didn't solve all the problems of the programmers. As the code got bigger, writing code and maintaining the code also became very difficult. Imagine a single C file containing tens of thousands of code with thousands of function modules! Perhaps, it needs a few people to be assigned explicitly to maintain the code alone! Well, then came the Object Oriented Programming style. This proved to be a great solution to maintain the complexity of the coding and code maintanence.

Thousands of functions were split into thousands of classes. This also provided the application programmer the ability to reuse the code with restricted access. Accessing a public method of a class while maintaining the abstractions implemented with the private methods. This proved to be a great gift to the programmers.

So, we try to develop an application with this Object Oriented Programming style. We come across lot of difficulties when developing an application. An application development is surely a different experience altogether when compared to writing simple standalone files.

To start with, any application needs an installer. It needs a basic framework runtime environment where it can load and run the files. If you are planning to do this in an Object Oriented way, you might have to create thousands of class and use the methods provided by each other's APIs to create your application.

But think of application development in this way. You create a small module which performs some work. You make that a seperate executable entity. Once this is done, you can import this small module into your module to do that work for you. In this way, you keep developing small small modules which perform some work independently and finally as and when you need, you use these modules to create your application. This particular style of application development is called Modular Programming.

Being a Sun guy, I will encourage the usage of NetBeans. NetBeans provides you with skeleton application framework, that is , it provides all the basic functionalities required by any application like providing GUI, menu items, tool bars, etc. In addition to this, you need few clicks to create your module. The runtime environment needed by your application to run all the programs is provided by the NetBeans Infrastructure. All these makes the life of an application developer much much simpler. Above all, it is an open source product! Try it out! NetBeans link

Why Java Will Always Be Slower than C++

2006-03-16T08:31:00.000-08:00

Neately composed article... Read here...

Thanks to Arula to pointing @ this.

Signal Handler

2006-02-15T06:45:00.001-08:00

Consider that you have written a C program. Compile that and run it on a Linux Machine. When the program is executing, press Ctrl + C, the program execution stops. Suppose, if someone asks you to avoid this kind of termination of your program, What will you do?

First we can see what exactly happened when you pressed Ctrl + C. When Ctrl + C is pressed, a signal is generated. Now, what is a signal?? Signals are software interrupts sent to a process for notifying the process about some events. They interrupt whatever the process is doing at that time, and force it to handle them immediately. Each signal has an integer number that represents it (1, 2 and so on), as well as a symbolic name that is usually defined in the file /usr/include/signal.h.

You can write a signal handler for a particular signal. This signal handler function gets called when that particular signal occurs. The operating system transfers the control directly to this signal handler and once the signal handler has finished executing, the control returns back to whatever line of code that was executed previously. So how to you pass signals? well, Ctrl + C is just one method of passing a signal. We can do the same by using kill - or through the system call like kill(PID, SIGNAL).

To write your own signal handler, you need to use the following function.

void (*signal(int sig, void (*func)(int)))(int);

I know this is looking pretty complex. But the usage is not all that difficult. Signal function actually returns a function pointer. The first argument is the Signal number for which this particular signal handler has been written. Next is the pointer to the signal handler function. The signal handler function should take only one interger argument and should not return anything.

Eg:
signal (SIGINT, sig_int_handler);

void sig_int_handler(int sig)
{
// your code goes here.
}

The signal call returns the previous signal handler pointer in case of success or SIG_ERR in case of failure. so check for the return value.
if (signal (SIGINT, sig_int_handler)==SIGERR)
{
printf("Error handling signal");
exit(1);
}

So consider you have written your signal handler to catch SIGINT (consider that it prints some warning msg) and program execution continues. When you run your program, and if Ctrl + C is pressed( which is same as SIG_INT) the signal handler function is called, so a warning is displayed and the program execution continues. When the user presses Ctrl + C again, now the program terminates. What happened? Why did it terminate?

Every signal has a default signal handler function which is called when you dont provide a user defined signal handler function. But here we have provided our signal handler then whats the problem? well, once the signal handler function is called, the pointer to the signal handler function is actually reset to the default signal handler function. So how to overcome this problem? This is simple. In your signal handler function, again set the signal handler back to the user defined signal handler function.

void sig_int_handler(int sig)
{
signal (SIGINT, sig_int_handler);
//your code goes here.
}

once a signal appears, the signal handler is called and the signal handler pointer is reset to default signal handler. Now the control comes to the signal handler function, here we are saying that the signal handler for SIG_INT is sig_int_handler. So we are again setting the signal handler function. This avoids the termination of program when Ctrl + C is pressed two time or more.

One concern here is if the signals occur at a rate faster than the time taken for the process to enter the signal handler function and reset the signal handler pointer back to the user defined signal handler function, then the program may actually call the default signal handler and terminate!!!

Puzzles to puzzle you!!

2006-02-04T23:21:00.000-08:00

Q1) There are five houses in a row, each of different color, and inhabited by five different people of different nationalities, with different pets, favorite drinks and favorite sports. Use the clues below to determine who owns the monkey and who drinks water.

1. The Englishman lives in the red house.
2. The Spaniard owns the dog.
3. Coffee is drunk in the green house.
4. The Russian drinks tea.
5. The green house is immediately to the right of white house.
6. The hockey player owns hamsters.
7. The football player lives in the yellow house.
8. Milk is drunk in the middle house.
9. The American lives in the first house to the left.
10. The table tennis player lives in the house next to the man with the fox.
11. The football player lives next to the house where the horse is kept.
12. The basketball player drinks orange juice.
13. The Japanese likes baseball.
14. The American lives next to the blue house.

Q2) Ten students, sitting in 2 rows of 5 each, took their 500-point final exam in advanced calculus. The students’ scores were all multiples of ten with no two of then receiving the same score. Use the following clues and the professor’s seating chart below to determine which student sat in which seats and the test score each student earned.

1. Hugh sat next to both Ida and to the student making 82%, which was the lowest grade on the test.
2. George and the student scoring 470 sat in diagonally opposite corner seats.
3. Chuck sat somewhere between Bill and the student scoring 410, although these 3 students are not necessarily in the same row. Similarly, Ann sat somewhere between Eve and the student scoring 490.
4. The sum of the scores of the students sitting in the first column is 880.
5. Jerry’s score was 10 points better than Dolly’s but 50 points less than Frank’s.
6. The average score of those in the Column 2 is the same as that of those in Column 4, but is 5 points less than the average of those in Column 3.
7. The student with the lowest score of those in the first row sat directly in the front of the student with the highest score of those in the second row.
8. The average test score of those in the first row is 46 points higher than the average of those in the second row.

A Discussion on Stack ADT

2005-12-10T23:58:00.000-08:00

"Stack" - is a very basic and neatly designed Abstract Data Type that is available to us. If we have to describe the structure of the STACK ADT, it can be simply said as a ADT which follows LIFO queue servicing model. It is very much similar to a Stack of Plates. When you want to add a plate to the Stack, you add the new plate to the top end of the Stack and if you want to remove a plate from the stack, you do it the same way i.e., remove the plate from the top end of the stack.

Advantage :
It is easy to implement.
The accessor methods ( PUSH & POP) are much simpler than the accessor methods of most other ADTs.
Easy to handle the data using Stack ADT.

Implementation Details :
Stack are usually implemented with Arrays. Arrays are the most primitive Data type that is usually available with any programming language. In Stack, we simply abstract the access to the array and restrict the access only to the last element ( top most element of the stack). By using array, we can implement the basic accessor methods i.e., PUSH & POP in O(1) time. We simply store the last element's index and access is using Array indexing. Hence it is accessible in Constant time. ie., O(1).

Assume Stack is implemented using an array Stack[] of size N and the index of the top most element in the Stack is stored in "top". The parameter "o" refers to the generic Object which can be anything from int to float.

Other supporting methods ( can be implemented easily for any array):
size() - returns the current size of the Stack.
isEmpty() - returns boolean value telling whether the Stack is empty or not.

PUSH(o) :
if Stack.size() == N then
throw stackFullError
Else
top <- top + 1
Stack[top] <- o

POP():
if Stack.isEmpty() then
throw stackEmptyError
Else
o <- Stack[top]
Stack[top] <- null
top <- top - 1
return o

The problem in the above implementation is the array size should be initialized at the beginning. Thus if the stack size turns out to be small, space is wasted and if stack size turns out to be large, then the stack can crash when array indexing exceeds the upper limit.

Solution:
Thus it is better to use Linked List in place of Array when space efficiency is our main concern.
Traditionally, we have a link to the head of the Linked List and we update the elements at the end opposite to the head of the Linked list. But the problem in using Linked List is the accessor methods turns out to be O(t) where t refers to the current size of the stack. Here, a reference to the Head of the Linked List is not absolutely necessary at all. The main aspect of Stack is they allow access only at one end of the ADT. Hence holding a pointer to the top most end of the stack should improve the accessor method's running time. Thus, we can simply implement the linked list in reverse order and hold pointer to the head.

Linked List Structure:

topmost -> n-1 -> n-2 -> ........ -> bottommost

Here by holding a pointer to the topmost element, we can implement PUSH & POP methods in O(1) time. Thus Stack turns out to be an efficient ADT even while using Linked List. Hence using a reverse order of storage in Linked List, we get space efficient and time efficient Stack ADT.

SQL Server: minus operator

2005-11-22T07:03:00.000-08:00

Lets see some Database stuff.

Minus operator in Oracle: Consider two tables T1 and T2. Hence T1-T2 or T1[minus]T2 means return all rows in T1 that are NOT in T2.

Consider the tables as,
T1
Col1.........Col2
1...............A
2...............B
3...............C

T2
Col1..........Col2
1................A
2................B
3................X

As per Oracle,
Query: Select * from T1 minus Select * from T2
Result: 3.........C
Query:Select * from T2 minus Select * from T1
Result: 3.........X

Minus operator in SQL Server: But unfortunately SQL Server does not support this operator. But here is work around to make it simulate in SQL Server.
Query:
Select * from T1 where
IsNull(cast(Col1 as varchar, '')) +
IsNull(cast(Col2 as varchar, ''))
Not in
(Select IsNull(cast(Col1 as varchar, '')) +
IsNull(cast(Col2 as varchar, '')) from T2)

Result: 3.......C

Explanation: The "In" operator works as per syntax. But it could be applied only to single column. Hence the basic idea is to concatenate all the columns to a single column. Similarly with the other table columns are also concatenated to a single column. Now using the "In" operator they are filtered out. The "cast" is for converting column values to varchar, the "IsNull" to remove NULL values. This is one such idea of doing it.

Java: String & String Buffer usage.

2005-11-21T23:22:00.000-08:00

While performing any string manipulations like searching for a character in a string and replacing by a given character, our natural choice would be to use String. But, this is not advisable.

This is because when we make any modification to the existing string, a new string object is created by the JVM. Hence for making four replacement in a string, four new string object would be created and hence a total of five string object would be present in place of one string object. So the better option is to use String Buffer. String buffer doesn't create new objects, instead makes the modification in the original string. Hence string buffer are more space efficient than string.

Strings in Java are stored in "String Buffer pool" maintained by JVM. This is the region each time a new string gets created. The string buffer is created and manuplated in the Heap like any other object.

Example:

{
String strFirstName, strLastName;
strFirstName = "Rakesh";
strFirstName = "Sharma";
strLastName = "Rakesh";
}

What happens in the String Buffer pool is, strFirstName will point to a location contaning the value "Rakesh". Then it points to a new location contaning "Sharma". when strLastName is assigned a value "Rakesh", no new memory location is created for storing instead it points to "Rakesh" which was created earlier.

regex package in java & handling wild card characters

2005-11-21T23:10:00.000-08:00

Java offers you regex package to handle the regular expression. Suppose you take a string from the user and search for the string in the database using regular expression, you may have some problems in implementing wildcard characters handling. Regex package can generate regular expression for all the string that you pass. But wildcard character * in regular expression has a different meaning from our normal interpretation.

Take for an example,

The search string entered : a*b

In regular expression, a*b means any number of 'a's followed by a 'b'.
so in this case, valid match could be

ab, aab, aaab, (a)-n times followed by (b).

but what user wants could be a followed by any character, any number of times and finally a 'b'. To convert this search string in to a proper format to be passed to the regex method is to replace '*' with a '.*' . And similarly for '?', it should be replaced by '.'

So, a search string "a*b"( in our normal sense, meaning 'a' followed by any character any number of times, followed by 'b') should be modified as "a.*b" before passing on to the regex package. If this search string is passed to the regex and asked to generate a regular expression from it, then we can get this regular expression to do the normal wildcard operation.

Similarly, a search string "a?b" ( meaning 'a' followed by any one character, followed by 'b') should be converted into "a.b" before passing on to the regex for generating the regular expression.

File Organisation in Web Applications

2005-11-21T22:47:00.000-08:00

In Web Applications, you may have different kind of files like Servlet files ( .java files which define all the servlet methods), JSP Pages ( used for presentation purpose i.e., displaying information. It can be used in place of HTML to provide dynamic content unlike HTML's static content by embedding java commands in the JSP page), .properties file which contain the configuration details as (key,value) pair and also other .java files (which contains the business logic implemented in them).

To make the file organisation to be clean, we can follow seperate out presentation files and the business logic files and make them to be independent of each other. We should not include any business logic like reading from a database, writing to database or any other action in JSP files. Though it is always possible to accomodate any kind of java command and operation in jsp file, it is not advisable to do so. In Industry practice, the JSP Page and the .java file which has the business logic may be developed by two different person. JSP professional needs no knowledge of java. So it is better to avoid embedding complex business logic in jsp files. Minimal amount of java commands in jsp page for the purpose of presentation is acceptable.

So we can have all the business operations in .java file as a separate package and call them in the servlet files as and when needed. All the constants used in the application can be added in a .java file and could be declared as interface. We can make all the files that needs to use these constants to implement that interface for efficient space management. Moreover it is better to handle all the exceptions in any one particular level. Best practice is to handle the exceptions in the .java files which includes the business logic, instead of handling them in the servlet files. This method provides a greater amount of independence between the servlet files and the .java files.

All the configuration information could be stored in the .properties file and could be read from that file. It is easy to change any configuration in this method. Only the key,value pair in the .properties file needs to be changed to implement any change in the configuration.

Example:

Problem Description :
You are asked to create a simple web application which fetches the data from the Backend database and display all the entries on the Jsp page.

Possible file organisation:

Now, your file organisation can be like,

rootdir/
rootdir/web/
rootdir/web/jsps/
rootdir/web/jsps/"anyjspfile".jsp
rootdir/web/WEB-INF/lib/
rootdir/web/WEB-INF/lib/"anylibfile".jar
rootdir/web/WEB-INF/classes/
rootdir/web/WEB-INF/classes/"anyconfigurationfile".properties
rootdir/src/"package1"/
rootdir/src/"package1"/"any servlet/portlet file".java
rootdir/src/"package2"/
rootdir/src/"package2"/"any business logic implementation file".java

Here you "jsps" directory holds all your jsp files. You can have
rootdir/web/jsps/help/ directory to hold your help.jsp files.

Then you can organise all your servlet/portlet files under one package.

And all your business logic implementation files i.e., here it could be the .java file that implements the jdbc connectivity methods to connect to the database, retrieve the data and all other database operations, could be placed in another package.

This organisation makes the package2 to be used independently in any other application when needed. You can import this package where ever needed.

Moreover, all the constants used in the web application can be added to an interface, may be, called "applicationname"Constants.java and this file can be implemented by which ever file, which needs to use these constants.

If you want to create a package out of this application, you can add the following files and directories.
rootdir/build.xml
rootdir/proto/
rootdir/proto/"packagename".snippet

Using "ant" command, you can "build" the application using the build.xml file. Then snippet file contains all the files that has to be added into the package and also the file access permission for each of them. You can write script to read the snippet file and create package accordingly!

This is one possible method, but in Industry practice this method is widely used. If you feel this orientation is wrong, please correct me!

Null Pointer Exception

2005-11-21T22:35:00.000-08:00

Right now, I'm doing my project in SUN Microsystems. An interesting thing that i learned here with respect to null pointer exception in Java language is as follows:

When you encounter a NULL pointer exception ( Null pointer exception is the exception that is thrown by the JVM when you perform some operation on an object which is null or calling some method on the object that is null), you should not try to handle that exception i.e., DONT try ...catch the null pointer exception. It is nothing but patching the problem and it isin't the right way. You should try to avoid performing any operation on null object.

Eg:

object.method();

Assume this statement generates a null pointer exception.

Wrong method :

try {
object.method();
} catch ( Exception Ex) {
// handle the exception
}

Correct Method :

if ( object != null) {
object.method();
}

C++: What is Slicing problem?

2005-11-21T20:51:00.000-08:00

This makes a clear explanation!

CPlusPlus Gems