Saturday, December 29, 2007

Implementing System Catalog or Data Dictionary

It is the classic chicken and egg problem.

I would like to implement the system catalog in database tables. However, the system catalog will contain information that is a pre-requisite for working with tables. I do not know how other database implementations handle this cyclic relationship. It is made more difficult for SimpleDBM because the core database engine has no knowledge of the data dictionary or the type-system. These are implemented by add-on modules.

I considered following options for implementing the system catalog:

1. Simplest option would be to pre-define the schema for a database using a configuration file, which can be loaded at database startup. This may be adequate for many use cases, but implies that the database schema is statically defined, and tables cannot be created or dropped dynamically.

2. Second option is to implement the catalog as dynamic tables. The catalog tables would be pre-created when the database is created, and their structure would be known to the database manager. When a regular table is created, a row would be added to the system catalog table, to record the structure of the new table. The table creation would be handled as a standalone transaction (covering both the container creation and updating of the system catalog table), to ensure that it is completed before any attempt is made to access the new table. Index creation would be handled similarly.

When a regular table is accessed, its type structure will have to be retrieved from the catalog tables. This means some recursive locking, which I am worried about because so far, SimpleDBM has not needed to carry out locking during system restarts. Perhaps, we could optimize away the need for locking because SimpleDBM locks containers anyway when they are accessed, which means that there is already a lock that protects the table definition. Unfortunately, at present the SimpleDBM API does not support a read uncommitted isolation mode - which would be necessary to avoid obtaining locks on catalog tables when they are being read.

To avoid accessing the catalog tables every time a container is accessed, the first time a table is accessed, its definition would be retrieved and cached in memory for subsequent use.

3. A third option is to maintain the table definitions in special containers, and manage updates to these containers without going through the buffer manager. Here is a tentative design:

The structure of a table or index will be stored in a special container. This may be a single shared container or one container for each table or index. The multiple container option has the advantage that a simple naming convention could be used to quickly locate the appropriate container.

The database module will perform special reads and writes to these containers, bypassing the Buffer Manager, and Lock Manager.

The problem is how to guarantee that the data in the special container is updated atomically as part of the transaction that created the table or index. One way of achieving this would be to generate PostCommitActions for updating the metadata when a table or index is created or dropped. The PostCommitAction would be executed even during restart recovery, and would therefore ensure that the table or index definition is atomically updated as part of the transaction that created the table.

When a table or index is accessed, its definition can be retrieved and cached. Subsequent access will refer to the cache.

Tuesday, December 04, 2007

Java on Mac OS X

I posted elsewhere that I now own an iMac and am doing most of my Java development on it. The big problem with Java on Mac is that the JDK is pretty out of date. The official version that comes with Mac OS X 10.4 Tiger is 1.5.0_07. This version seems buggy, as the JVM seems to hang when running test cases for one of my projects. I was getting quite frustrated, and in the end decided to run my test cases under OpenSolaris, which I installed as a VM using VMWare Fusion.

Fortunately, Apple has recently made available a pre-release version 1.5.0_13 from its Developer Connection web site. You need to be registered to be able to download this release. The good news is that it fixes the bug that was causing my tests to hang the JVM. I think it has something to do with the java.util.concurrency classes. Ofcourse, if you don't use these, you would not have faced the problems I was facing.

The long gap between the Java releases on Mac seems unacceptable, especially since the JDK is so buggy. The other problem is the lack of support from major vendors for the Mac platform. I would have liked to install and use IBM WebSphere and Rational products on my Mac, but it isn't amongst the supported platforms. Linux has much better support now-a-days.

Fast forward

It has been sometime since I last blogged about general programming topics. My day job has kept me from the joys of exploring the JPA. But many other things have happened to me in the meantime, life just goes on, and there is never enough time to catch up on everything.

For a start, I want to retitle this blog, by removing the Java from it. Let this just be a Programmer's Blog.