Monday, December 24, 2007

How to write maintainable code

On the contrary, do you really want to write a maintainable code ?? Which can be maintained by anyone - that makes you eligible for being jobless !!

So better be safe and write the code which can be maintained by YOU so that it helps you to save your job :D But don't go too far which can cause the complete program to be rewritten . (and you will not get second chance to do it !!)

Don't take that seriously, I hope you understand what should be DOs and DONTs. !!!

I remember, we had a teacher for AI (Artificial Intelligence) who has different style for teaching. He starts with a wrong statement and convince everyone that he is correct. Finally when every one is with him, he tells that he was bluffing and tell you the points that you missed to prove him wrong.

In one of such class, he made a statement that every question can be answered as "yes" or "No". But by that time student understands his style !! So one of the student question him asked whether he can answer his question in "Yes" or "No.
Question was simple- "Have you stopped beaten your wife ??"

On the same note, here is a nice article on "How to write maintainable code" which teaches DONTs by saying it as "DO it".


Saturday, December 15, 2007

NP and Weirdness!

I just feel very weird about this whole NP class of problem. They are so many problems belonging to this class of NP and yet there are all the same. I can take one NP complete problem and can reduce that problem to another NP Complete problem in polynomial time. To quote that in plain words, If I have the algorithm to solve one problem, I can use that algorithm to solve the other NP problems. I just have to convert one NP problem to other in polynomial time! Even more weird is this thing: There is no known best solution for this problem. And to make things even more weirder, though there is no known best algorithm to solve this problem, but given a solution we can verify its correctness in polynomial time. That is, If you give me a solution to this problem, I can verify if the solution given is really a solution to this problem. That too, I can do this verification in polynomial time.

These are not the only weird things about NP class of problems. There are lot more. There is no known proof which either proves or disproves NP is equal to P or NP is not equal to P. But still any computer engineer will say that NP is not equal to P. If you ask the reason, then he will say if NP is equal to P then there should be a way to solve the NP problems in the order of polynomial time. What does this mean? It just means that a computer engineer's ego wont let him accept that he doesn't know how to solve these problem. So, he just says there is no best solution for this class of problem because he can't find one!

Isn't it really very weird?

Thursday, June 28, 2007

Java Performance tips & Programming practice for Newbies

Off late, I have started working on Axion DB, a Java based embedded database engine. I was working on various performance improvements for Axion DB. I was amazed at the speed of Axion DB on JDK 5. I am was sure that Axion would be still faster on JDK 6. So, we started testing the database with large datasets. My machine is a Dell D620 laptop, duo core processor and 2 GB RAM. We tested with 1 million rows of CSV file, even then Axion didn't show any hiccups. So we tested the table creation in Axion for 10 million rows of CSV File(~4 min) and compared it with SQL Loader of Oracle (1 min and 5 sec). Then we realized that Axion was 4X slower than Oracle SQL Loader.

The first thought that came to our mind was whatever we do we can't beat Oracle SQL Loader, not even going to match their performance. They must be using C/C++ and this is Java and there's nothing much that can be done to help it. Out of curiosity, we started profiling the Axion to find the hot spots. When we profiled the application, then stuck the lightning. We were having severe performance bottleneck at the place where we parse each line and create the column value.

These were the reasons for the bottleneck:
1. Lot of method calls.
2. Lot of heavy weight object creation.
3. improper use of for loop.
4. IO bottleneck.
5. No multi threading.
6. unnecessary method calls.

From these, I learnt a few essential points. I will try to explain them each.

1. If you doing some low level related work where speed is absolutely important, then don't use the Java built-in data types. Say for example, you are using Integer class, just think of all the overhead you are creating by using the wrapper class. You should always use the primitives instead of wrapper if you want to achieve performance. How many of use consider using unsigned int or unsigned long in our day-to-day Java applications. We don't use it because Java application that you are creating is fast enough not to bother you any longer. So, you don't think twice before using the Java build-in wrapper classes. But in Database kind of scenario, you should be using Primitives to achieve speed and performance. They manipulate the bits directly which gives the max possible performance. We use Apache commons-primitives to achieve this. There must be couple of other open source primitives available. Trying using one of them.

2. Take a look at the following piece of code.

private long getLength(String stringObj) {
long len = 1;
for(int i = 0; i < stringObj.length(); i++) {
len++;
}
return len;
}

Ok, don't ask me why should you write this getLength() method for a string again. Its the simplest example that came to my mind immediately. Ok, back to the problem, can you find out what is the performance bottleneck here. This is a perfectly normal one that we write in almost every program. But this is such a big blunder that we are making. Think in terms of this method getting called for a very very big string. Ok, this is it. Take a look at the for() statement. The comparison section says, i < stringObj.length(). Assume this stringObj is of length 1 million. Then this i < stringObj.length() is going to be called for a million times. For each method call, your JVM is going to call the method, push the method related entry in the stack. Then after the method call, its going to pop out the method related meta data. So if this is done for a million times, think of the overhead.

If this kind of method calls are unavoidable, then thats a different issue altogether. But check the above method. Its very clear that we need the length of the string object to compare with the current index. So why not do it like this.

private long getLength(String stringObj) {
long len = 1;
for(int i = 0, I = stringObj.length(); i < I; i++) {
len++;
}
return len;
}

Ok, now look at the method. Now you see that the length() method is called only once at the beginning of the for loop. This is not going to be called a million times. This certainly improves the performance by 2X to 3X times. This 2X to 3X time improvement is going to make a big impact in case of processing a million row.

Now is there any other optimization that can be made to this. Go through the method again. Assume that this particular getLength() method is going to be called a million times. Now think of this again. If you check the method, you have a len object which holds the length. you loop through the object and find the length and return the length object finally. Ok, let me write this method in a different way.

private long getLength(String stringObj) {
long len = 1;
for(int I = stringObj.length(); len <= I ; len++);
return len;
}

Now if you see, the variable 'i' used as a index is eliminated. I dont mean to say, you are always doing this kind of mistake. I just mean to say, if we code with proper care, then Java can give you really good performance.

3. Third biggest point is allocate memory judicially. Just because Java automatically does cleanup, it doesn't mean that we can program anything and expect the JVM to do the optimization for us. Of course, JVM does the optimization but still why do you want to leave it to others when you can do it for yourself. If you sure that you dont need a long, dont use it. Use int. If you feel you dont need an int, then use short. Think twice before you declare a variable.

4. Take a look at the following code.

private String getString(char[] charArray) {
for(each character) {
if(isDelimiterChar(character)){
// do something.
}
if(isQuoteChar(character)) {
// do something.
}
if(isEOF(character)){
// do something
}
}
}

boolean isDelimiter(char c) {
return (c == ',');
}

boolean isEOF(char c) {
return (c == -1);
}

boolean isQuoted(char c) {
return (c == '\"');
}

Now, Assume this piece of code is going to be called for a million times. Then think of this code again. Object orientation in Java helps us tremendously. It helps in modularization, code reusability, etc. But all those things are not always useful. If you coding for performance, then rethink about that. Its always better to write Inline code where ever possible since it reduces the overhead of method call - like push, pop into the method stack, etc. So if you see the above piece of code, the method isQuoted checks if the given character is double quote character, method isEOF checks if char value is -1, method isDelimiter checks if char is equal to comma character. So this is not something that we have to do it in method. Its not a complex piece of code that can be made as a method so that this method can be reused easily. It can be directly inlined as given below.

private String getString(char[] charArray) {
for(each character) {
// check for comma character
if(character == ','){
// do something.
}

// check for quote characted
if(character == '\"') {
// do something.
}

// check for EOF
if(character == -1){
// do something
}
}
}

Though the above doesn't look as elegant as the other one, still this one is going to be faster than the other piece of code. So keep in mind, elegant code is not always the best performing code.

5. Don't take risks that you can easily avoid. In case of coding an application that needs high degree of reliability, don't take risks. Check the following piece of code.

private boolean isNullString(String string) {
return (string.equals(""));
}

So, this looks like a absolutely normal piece of code. But still there's a hidden trap here. What is the string which is passed is null. So try to rephrase the code as given below.

private boolean isNullString(String string) {
return ("".equals(string));
}

This should work now without any null pointer exceptions.

6. Of course, IO is always a performance issue. But now in Java with NIO, its really simple and efficient to do IO related tasks. Always use buffering if you need performance. Because if you directly use FileInputStream, JVM is going to issue file read system call everytime you read a byte of data. In case of buffering, whole buffer is read in one go and only if there's no data in the buffer, JVM issues a system call to read the disk.

7. Always synchronize on a lock before waiting on the lock. Also, try to use wait with time. Else your thread might wait indefinitely, if there's no one else to notify this thread. Also use wait in a loop and check on a condition which is expected to be updated by other threads.

We also parallelized the file read and used multiple threads to read the same file in a faster manner. After all these performance tuning in Axion DB, we could finally create the table for 10 million rows of CSV file in flat 45 Seconds! Can you believe it! Even I am Awe Stuck! I can vouch that if you code carefully and judiciously you can almost match C/C++ performance(Of course, its hard to exceed compiled code(C/C++) performance with interpreted(Java) code) with all the hotspot GC and other advanced technologies available in Java.

As a newbie in the industry, these were certainly few of the best programming practice that I learnt from my experience at Sun. I was committing almost all of these mistakes when I came out of the college. It helped me improve the application performance tremendously. Hope this post helps other newbies!

Friday, May 11, 2007

Java

After a long time from the Sun man!

One on String buffer
One on Weak links in Java

Sunday, January 14, 2007

Yahoo!

Original post dated: 14th Jan 2007
Guess the redirection is not handled. Hope Yahoo! fix this. Here is once such where the url is not redirected to error page/home page.

Repost Dated: 30th Jan 2007
The above url was fixed, Now gets redirected to the Yahoo search page.