Tuesday, March 31, 2009

Dwarfs standing on the shoulders of giants

As I mentioned in a previous post, when building SimpleDBM I have used design ideas that I have picked from various technical research papers as well as other OpenSource projects. I would like to list some of the influences here, and acknowledge my indebtedness.

Papers from IBM's System-R research project have influenced the role of the database engine in SimpleDBM. I have named the core engine as RSS in honor of the System-R Relational Storage System (RSS).

A general reference that has proved invaluable is the classic Transaction Processing: Concepts and Techniques, by Jim Gray and Andreas Reuter. The Lock Manager and the Buffer Manager modules are based upon descriptions in this book.

The Lock Manager also uses ideas from the OpenSource project Shore. The handling of lock conversions and the deadlock detector are based upon code from this project.

The Transaction Manager is based upon the algorithms published in the ARIES series of papers by C. Mohan of IBM. I am also indebted to Mohan for the implementation of lock isolation modes, and implementation of free space management in containers.

The BTree implementation is based upon the algorithms described by Ibrahim Jaluta et al in the paper Concurrency control and recovery for balanced B-link trees.

The idea of using the write ahead log records for normal execution of forward actions is taken from Apache Derby. This is a neat idea that ensures that both normal and recovery processes use the same code for performing updates to the database. Essentially all updates are executed through log records, even during normal execution.

Projects that I have learned from but that haven't directly influenced the implementation include:

Delayed release

The next BETA release of SimpleDBM has been delayed due to my desire to re-factor some of the code. The re-factoring has been focused on following:
  • Removing all singletons - even Loggers. Unfortunately, Log4J and other logging packages all use singletons, so this is not completely possible unless I replace the entire logging package.
  • Ensuring that the serialization mechanism uses constructor based object initialization rather than retrieving state after construction.
  • Above enables the use of final fields in many objects, allowing them to be immutable.
All of these changes are designed to increase robustness, and also to ensure that multiple instances of SimpleDBM can co-exist within the same JVM, and even the same Class Loader, without conflict.