Wednesday, November 29, 2006

Why another database manager?

A friend asked me recently why I spent my time implementing a DBMS, when there are already a number of open source databases. It is a good question because clearly I am not breaking any new ground here. In fact, I find myself incapable of inventing great new technology. Most of what I am implementing is well known stuff. I am a software engineer, rather than a scientist, going by the definitions of engineers and scientists by C.A.R.Hoare. It seems that this project is pure self indulgence, when I could be spending my time more fruitfully, either contributing to projects like Apache Derby, or working on something more relevant.

I guess that if I am honest with myself, I have to admit that there is an element of self indulgence here. But there is also some utility. The knowledge I gain in the process is a benefit to me in my work life. But apart from that, I think that my DBMS implementation is better documented and easier to understand than other opensource implementations. This is for a couple of reasons:
  1. I use well established algorithms, which are well documented in computer science literature. I am also putting more effort into documentation than is typical of many opensource projects.
  2. The system is decomposed into well defined modules which are loosely coupled. I find most other implementations are far more integrated, and therefore difficult to understand. I have traded off performance and efficiency in favour of ease of understanding.
There are still no books that describe how to build a real DBMS, and also show you with real code how DBMS features such as transactions, locking, recovery, btrees, etc. work. The only book that comes close is Transaction Processing: Concepts and Techniques, but the sample code contained in this book is not complete. In some ways, my project provides a sample implementation of many of the techniques described in this book.

Finally, there is the question of pride. When I started this project many years ago in C++, I never thought I could do it. I had no training in this field, and no access to people who do this type of stuff at work. Having come this far, it seems a shame to give up.

3 comments:

Anonymous said...

Hi Dibyendu

Just to tell you, I have done the same in C++ many years ago (between 92 & 96) for building a rehosting product on Unix of the Bull mainframe TP monitor. (I could send you the slides if you want for an historical perspective)...

So I am very pleased to see you have redo your own C++ work in Java because I was doing the same also... and you totally rigth, it is a shame to give up having coming so far... may be we can share our forces some day to complete the big picture...

Regards

my sourceforge id: francisandre

Greg Burd said...

How is this different from Oracle Berkeley DB Java Edition? Were you unaware of its existence? Was the license too restrictive for some reason (if so, how/why)? Was there some technical reason not to choose/use it? Was this purely a learning exercise? Did you study Berkeley DB or Berkeley DB Java Edition when you considered your implementation strategy/techniques?

I'm just curious to know how we could have found you and given you a "wheel" before you re-invented it. Unless it was purely for the joy of making your own wheel from scratch, which I can understand.

:-)

-greg

Greg Burd | Senior Product Manager | Oracle Berkeley DB

Dibyendu said...

Hi Greg,

SimpleDBM and Oracle Berkeley DB Java Edition have a lot in common. I think that when I started porting SimpleDBM to Java in 2005, I wasn't aware of the Berkeley Java Edition.

I studied many open source databases, including Berkeley DB, PostgreSQL, MySQL, Apache Derby, Shore ... and for a time even considered abandoning SimpleDBM in favor of Apache Derby ... but having started it and invested time into it, it seemed a waste to let it go. Enjoyed the challenge as well, to be honest.

Regards