Sunday, May 23, 2010

Java versus Google Go - Part 2

The new Go Programming language from Google is very interesting because it attempts to bring to the world of compiled languages some of the benefits of the VM based languages, such as garbage collection and dynamic interfaces. I am considering porting one of my projects to Go, but before diving in, I would like to explore Go by writing a few small programs and comparing these with the Java versions.

Without further ado, here is a very simple program that reads a file and outputs lines to the console. First, lets look at the Java version:
package org.majumdar;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.io.Reader;

public class CatFile {

        public static void main(String[] args) {
                if (args.length == 0) {
                        usage();
                        return;
                }
                BufferedReader reader = null;
                try {
                        reader = new BufferedReader(new FileReader(args[0]));
                        String line;
                        while ((line = reader.readLine()) != null) {
                                System.out.println(line);
                        }
                } catch (Exception e) {
                        System.err.println("error: " + e.getMessage());
                } finally {
                        close(reader);
                }
        }

        private static void close(Reader reader) {
                if (reader == null)
                        return;
                try {
                        reader.close();
                } catch (IOException e) {
                }
        }
 
        private static void usage() {
                System.out.println("usage: CatFile ");
        }
}


Now, the same program implemented in Go:
package main

import "fmt"
import "os"
import "bufio"

func usage() {
        fmt.Printf("usage: catfile \n")
}

func main() {
        if len(os.Args) < 2 {
                usage()
                return
        }
        f, err := os.Open(os.Args[1], os.O_RDONLY, 0)
        if err != nil {
                fmt.Printf("error: %s\n", err)
                return
        }
        defer f.Close()
        r := bufio.NewReader(f);
        for {
                line, err := r.ReadString('\n');
                if err == os.EOF {
                        break
                }
                if err != nil {
                        fmt.Printf("error: %s\n", err)
                        break
                }
                fmt.Printf("%s", line);
        }
}
I am really not sure which one of the two is more readable.

The main differences in the two programs are in how errors are handled, and how resources are cleaned up.

Java offers the finally clause in a try block for cleaning up resources; the Go approach is to allow functions to be scheduled to be invoked when the enclosing function returns via the defer statement. The Go approach doesn't offer much programmer control over when the cleanup should occur. With a try block, the placement of the cleanup code is more under the programmer's control.

Error handling in Java is based upon exception management. Go doesn't have exception management yet; although some form of exception management is planned. The authors of Go seem opposed to exception handling as a mechanism for error handling; their argument is that the try-catch-finally construct makes the code convoluted and that encourages programmers to label ordinary errors as exceptions. My personal preference is for the Java approach because it forces you to handle the error condition. By convention in Java (although the language does not enforce this), error conditions are indicated via exceptions and not by return values.

I think with either approach you can write bad code that doesn't handle errors properly. In Java, you can do this by handling the exception incorrectly; in Go, if you forget to check for an error condition, the program will probably fail at runtime in an unexpected way.

My initial thoughts are that I prefer the try-catch-finally approach to the Go approach, both for error handling and for resource cleanup. Of course the Java approach isn't perfect; for example, the usefulness of checked exceptions is doubtful, and there could be better support for resource cleanup - in fact this is coming in Java 7.

The programs listed above are trivial, and the comparison is not really fair as the strengths and weaknesses of the two languages are not clear. I am hoping to compare two additional programs - a simple TCP/IP server implementation, and a Lock Scheduler implementation. I have the Java versions of these, and am hoping to write the Go versions in the next few days.

Sunday, May 02, 2010

A Simple IOC Container

All of the SimpleDBM modules are designed with constructor based dependency injection in mind. But so far, these dependencies have been manually coded. I want to move away from manual setting up of dependencies and was therefore looking for a small IOC container that would serve my needs. Unfortunately, all the available dependency injection frameworks appear to be bloated and huge; PicoContainer used to be small, but now is  a 308k jar, SpringFramework with all dependencies is huge; Google Guice has an android edition that is 403k. Considering that SimpleDBM itself is about 632k in size, I don't fancy adding external libraries that cause the size to double or treble.

As my requirements are tiny (I only need support for singletons and constructor based dependency injection) I decided to roll out my own. The core of the IOC Container is implemented in just three files, consisting of less than 300 lines of code. Here are the links to these files:
Of course it would be nice to reuse other libraries and not have to write my own, but on the plus side, you can't beat the home made solution when it comes to size. As I have also removed the dependency on Log4J, SimpleDBM is now totally self contained with no external dependencies other than the standard libraries that are shipped with JDK 5. I find this liberating.

Sunday, April 25, 2010

Proposed license boilerplate

Given below is the boilerplate license notice that will be add to SimpleDBM source files from version 2 onwards. This is based upon the boilerplate used by Mozilla.org. Note that I decided to add LGPL to the mix as well, so that SimpleDBM V2 will be triple licensed. Hopefully that will ensure compatibility with the vast majority of Open Source licenses.

/**
 * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS HEADER.
 *
 * Contributor(s):
 *
 * The Original Software is SimpleDBM (www.simpledbm.org).
 * The Initial Developer of the Original Software is Dibyendu Majumdar.
 *
 * Portions Copyright 2005-2010 Dibyendu Majumdar. All Rights Reserved.
 *
 * The contents of this file are subject to the terms of the
 * Apache License Version 2 (the "APL"). You may not use this
 * file except in compliance with the License. A copy of the
 * APL may be obtained from:
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Alternatively, the contents of this file may be used under the terms of
 * either the GNU General Public License Version 2 or later (the "GPL"), or
 * the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
 * in which case the provisions of the GPL or the LGPL are applicable instead
 * of those above. If you wish to allow use of your version of this file only
 * under the terms of either the GPL or the LGPL, and not to allow others to
 * use your version of this file under the terms of the APL, indicate your
 * decision by deleting the provisions above and replace them with the notice
 * and other provisions required by the GPL or the LGPL. If you do not delete
 * the provisions above, a recipient may use your version of this file under
 * the terms of any one of the APL, the GPL or the LGPL.
 *
 * Copies of GPL and LGPL may be obtained from:
 * http://www.gnu.org/licenses/license-list.html
 */

Monday, April 19, 2010

Multi-licensing

The next version of SimpleDBM will be available under the GPLv2 as now, as well as the Apache License. Dual licensing will allow people to use SimpleDBM in more flexible ways.

Network client server API released

I am pleased to finally publish 1.0.18-ALPHA release of SimpleDBM. This release has following changes:
  • A network client server implementation that allows SimpleDBM to run as a standalone database server to which clients can connect remotely.
  • A sample application that demonstrates the use of the network API. The sample implements a simple discussion forum; front end has been created using Google Web Toolkit.
The version 1.x codebase is now going into maintenance phase, as I am not going to add any new features to this version. I will start work on version 2.x which will allow me to refactor some of the modules as  previously blogged.

Sunday, April 18, 2010

Licensing revisited

In a previous post I wrote about why I preferred GPL license for SimpleDBM. But I am no longer sure; my intention was always to ensure that SimpleDBM can be used by anyone without worrying about licensing issues, and I have no desire to put restrictions on other people's work. So if someone enhanced SimpleDBM, they should be free to do whatever they like with their enhancement, and although it would be nice if they contributed back, I don't insist on it. But this philosophy is very different to the GPL, which asserts that any enhancements should also be GPL.

I also did not fully understand the restrictions that GPL poses on linking with another library. Users of SimpleDBM should definitely not have to change their license or adopt GPL just to be able to use SimpleDBM in their applications.

I am seriously considering changing the SimpleDBM license to some other; probably Apache Version 2.

Saturday, April 10, 2010

Roadmap

I have been thinking about how SimpleDBM should evolve. When I started the project my intention was to eventually add support for SQL, but now I can't see this happening in the near term. SimpleDBM is not aimed at competing with other SQL databases; SQL is nice because of the ease with which tables can be queries, joined etc., but implementing an SQL layer is quite a lot of work, which I am not able to put in right now.

Another subject that has interested me right from the beginning is multi-version concurrency. Unfortunately, I have not really found a way of implementing this which is satisfactory. The two main approaches are those taken by Oracle and PostgreSQL - which I have previously compared in a short paper. The Oracle approach is problematic because it requires page level redo/undo in the transaction system; SimpleDBM's BTree implementation uses logical undo, allowing for undo to be applied to a different page from the original. I do not like the PostgreSQL approach to MVCC either, as it does not support versioning in indexes.

Instead of adding large features such as above, I shall perhaps focus on the many smaller changes that I have been mulling over for some time now:
  • Refactor the code so that the modularity of SimpleDBM can be exploited better. This involves separately packaging the API from the implementation, and allowing implementations of individual modules to be easily swapped in.
  • Add statistics gathering so that useful metrics can be captured. Some work has been done in this area.
  • Improve performance and scalability of the lock manager.
  • Add support for sequences.
  • Add support for reverse indexes.
  • Add JMX monitoring capability.
  • Refactor the type system - and make the type system a first class component of the core engine. This needs a bit of explanation. When I first started SimpleDBM, my strategy was that the core database engine should be typeless, and should allow a type system to be plugged in. The core engine should treat records and keys as blobs of data, and not worry about their internal structure. This strategy allowed me to develop the core engine without first having to define a type system. However, it has meant that some things are less efficient - for instance, row updates cause the entire before and after images of the row to be logged. Another area of concern is the ability to compress data within pages, which is hard to do without some knowledge of the structure of the data inside the records.
  • Carry forward the work of making most types immutable, include row types. A row builder class can be provided to create rows, but once constructed the row should be immutable. 
  • Carry on improving the documentation.
  • Improve the test cases, and the test coverage.
  • Create a single threaded version which can run on small devices.
  • Add support for nested readonly transactions; these are useful for carrying out foreign key checks, should they be added in future.
  • Ensure the embedded and network API are interchangeable, and that clients can swap between the two without having to change any code. At present, the network API is completely separate from the embedded API.
  • Create a full blown sample application - work is ongoing to create this.
  • Try to raise awareness about SimpleDBM and build a community of users and developers.

    Wednesday, April 07, 2010

    Sample Network Application

    It is taking longer than I anticipated to create a sample application. The main hurdle has been mastering Google Web Toolkit enough to create the user interface. I am hacking from the sample mail application available in the GWT distribution; but the code is increasingly becoming very different.

    First, here is a screen shot from the web UI. Apologies for the rough edges; I am not a UI developer, building user interfaces is a chore to me.


    The basic UI is working - I have a stub server application waiting to be hooked up with the backend.

    The UI is built using the MVP paradigm, except that I don't use an EventBus, as I am the sole developer, and the added complexity of a bus, and associated event mechanisms is not warranted. I have a RequestProcessor class that handles the presentation logic.

    I have been thinking about how to create the primary key of some of the tables. I have settled for a special table that will hold sequences; each sequence has a name and a long value. As reverse indexes are not yet supported in SimpleDBM, I came up with the idea of a decreasing sequence so that as time goes by, by accessing the data in increasing sequence, I can ensure that newer data appears before older data. This goes to show that we can live with almost any limitation; a bit of thinking gives a solution to the problem!

    As sequences do not need to be rolled back ever, the sequence generator can execute its own small transaction whenever the sequence needs decrementing. To make things efficient, we can allocate chunks of sequences at a time, but for now, I will simply decrement one at a time.

    There is also nothing like really using a system to discover bugs. I found that the Long column type was missing functionality to set a Long value!

    Friday, April 02, 2010

    Testing concurrent programs

    Testing concurrent programs is particularly hard, as the interleaving of multiple threads of execution greatly multiples the number of possible code and data access paths. It is quite challenging to write test cases that properly test such scenarios, usually only a handful can be tested.

    A unique tool that helps with testing for concurrency bugs is an IBM product named ConTest. ConTest does not generate any new test cases, but if you already have multi-threaded test scenarios, it increases the likelihood of bugs being triggered by introducing random pauses in thread execution.

    A while ago, I tried running ConTest against my test suite for SimpleDBM; I found that execution had slowed considerably. So expect your test cases to take much longer to complete.

    Sunday, March 28, 2010

    Global Loggers

    It seems that all existing logging libraries assume that you want a single global Log Manager, tied to the class loader. Only static methods are provided to access the Log Manager or Logger instances.

    I have been rigorously removing all static objects from SimpleDBM, so that the entire object graph of SimpleDBM is rooted in the main Server object. Doing this not only makes the code more robust, it also allows multiple instances of SimpleDBM to coexist in the same classloader without conflict. But where this model has broke down is in the Logger implementation, which is a wrapper for either Log4J or JDK Logging, and neither of these allow non global instances of the Log Manager.

    Much as I would loathe to do this, it seems the only solution is to roll out my own ...

    Is anyone else facing this issue? 

    Life is too short

    Life is too short to be able to master multiple programming languages and tools. So while I would love to learn the new Go Programming Language, Python, AJAX, and a few other cool new things, I keep going back to what I already know, the Java programming language. I can write a small utility faster in Java than if I wrote the same utility in Python; not because Java is particularly productive, but because I do not have to spend time figuring out how to do something in Java.

    This is where I think something like the Google Web Toolkit is a Godsend for someone like me. Being able to create a web user interface in Java, without having to know the intricacies of  AJAX, JSP, Java Server Faces, etc. is way too cool. Of course, there is a learning curve here too, but it is a less steeper curve because the language is already familiar.

    I am using GWT to create a small application to demonstrate the use of SimpleDBM. I am just loving the experience. Kudos to the Google team for coming out with GWT!

    Sunday, March 21, 2010

    Network client/server update

    I am still testing the new network client server implementation. Apologies for the delay in publishing this enhancement.

    I am also working on a simple sample web application to illustrate the network API. Learning all about GWT (Google Web Toolkit) which is a lot of fun.

    Java versus Google Go

    An interesting new systems programming language was announced by Google at the end of last year - The Go Programming Language (www.golang.org). Is this what Java should have been?

    Go is of course new and old. It is a new language that derives a lot from the past work done by its creators at Bell Labs. You can even see copyright notices from Plan 9 etc. all over the place. Therefore although the language is new, it is built upon years of experience.

    In general I like the new language. Two features are particularly nice:
    • Any object can be cast to an interface as long as the object implements the signature of the interface (sorry for using Java terminology here).
    • Go routines are cool as it overcomes the problem that equating a thread with a process flow creates. In other languages, if your thread blocks, your program halts. In Go, a routine that is blocked is moved out and some other routine takes it place on the thread. This will be very good for servers that need the ability to multiplex processing over a limited set of threads.
    There are a few ugly things too. My dislikes are the built in allocation functions new() and make() and the  pointer type. Why this mess? Java is so neat you either have primitives or references. References are like pointers except that you cannot do any pointer operations with them.

    The top feature that is missing is an exception handling mechanism. I have programmed many years in C and now in Java, and I can tell you that it is far easier to create robust error handling in Java. Of course, checked exceptions were a mistake (I have changed my mind about them) and I think Go should avoid them.

    I wish I had the luxury of rewriting SimpleDBM in Go. It would be an interesting and fun thing to do. But I have better things to do...  I am hoping that I can at least create a network client in Go, so that it is possible to talk to the SimpleDBM server from Go.