Sunday, October 29, 2006

Unit testing and MVCC in Berkeley DB

After a break of several months, I am about to start working on SimpleDBM again. I am determined not to add any new functionality until existing functionality is thoroughly tested and fully documented.

Currently I am working on improving the unit test cases for the BTree module. This is easily the most complex module in SimpleDBM, and producing comprehensive unit test cases is a significant task in its own right. Anyhow, the effort will be worthwhile as a database system is only useful if it is completely reliable.

It was exciting to learn that Oracle has added support for Multi Version Concurrency in Berkeley DB. I haven't looked at the implementation in great detail but a cursory look seems to indicate that the changes are mainly in the Buffer Pool. To support MVCC, the Buffer Pool has been enhanced to hold multiple versions of pages. Readers can obtain a snapshot of the database using older versions of pages, and thus avoid obtaining locks on rows being read. Unlike Oracle's own version which reconstructs older versions of pages using the undo log, the Berkeley DB implementation appears to be a simple memory based versioning solution. The downside will be that this will not scale to large workloads as it will require huge amounts of memory if the number of pages being versioned increases significantly.