Saturday, January 28, 2006

On licensing

SimpleDBM is licensed under GPL V2 or later. I decided to use GPL because I believe in the values that the GNU movement stands for. It is a pity that so much FUD is generated regarding the GPL, and more pity that there is such a proliferation of OpenSource licences. If GPL was Business Unfriendly, then Linux would never have been successful.

When GPL V3 comes out finally, I will adopt it for SimpleDBM.

On a side note, I finally managed to get around to implementing a few things that were long on my TODO list:
  1. The Log Manager now automatically deletes older archive log files.
  2. There is a background thread for generating Checkpoints.
  3. Rollback to a Savepoint will discard PostCommitActions that were scheduled after the Savepoint was created. This means that if you create a Savepoint, drop a container, and then rollback to the Savepoint, the drop action will be discarded.

Thursday, January 26, 2006

@GeneratedValue limitations

When I saw that the specification of sequence generation has been separated from the @Id annotation, I thought that this meant that sequence generators can be used for any column, and not just for surrogate primary keys. This, however, does not appear to be the case, at least in Glassfish, where if you try to use a sequence generator on any column that is not defined as a surrogate primary key, you get an Exception. You cannot use sequence generators in composite primary keys either.

Update: I could be wrong here because I am getting inconsistent results. More details are in the bug report I have filed with the Glassfish team.

Monday, January 23, 2006

Priorities for January

This month I am working on finishing the Developer's Guide, and also plan to update the Javadoc documentation. While updating the documentation I realized that I should not have used the term BTree when defining the interface for the Index Module. A BTree is an implementation strategy for Indexes, therefore, it is better to use a more generic term when specifying the interface. I am refactoring the code to correct this.

My priorities for January and February are to:
  1. Complete the documentation.
  2. Augment JUnit test cases.
  3. Tie up loose ends and produce a usable RSS component.

If I complete all this by February, from March onwards I shall start working on building the next layer of the system, i.e., type system, system catalogs, tables and indexes with multiple attributes.

Mapping Relationships in EJB 3.0

Rather than create my own schema for testing EJB 3.0 features, I decided to use the schema used in TPC-C Benchmarks. This approach has the benefit that it forces me to design an object-relational mapping for a pre-existing schema, which is probably what most developers will do in practice.

If you are designing a new schema, then bear in mind that EJB 3.0 favours a design where each table has a surrogate primary key, i.e., a meaningless numeric key that is either auto-generated or generated using a sequence. Using surrogate primary keys makes it easier to map the primary key of the table, as you can simply mark the relevant field using the @Id attribute. It also makes it easier to map relationships, as the EntityManager is able to exploit natural Has-A relationships (Composite design pattern) between objects. For example, in the TPC-C schema, an Order has a one-to-many relationship with OrderLines. This can be mapped as follows:
  1. An Order object should contain a Collection of OrderLine objects.
  2. An OrderLine object should have a reference to the Order that it is related to.

Given below is the relevant code:

public class Order {
set<OrderLine> orderLines;

public Set<OrderLine> getOrderLines() {
return orderLines;

public class OrderLine {
Order order;

@JoinColumn(name="OL_O_ID", referencedColumnname="O_ID")
public Order getOrder() {
return order;

An important point you must note here is that the OrderLine Entity does not contain an explicit field for the foreign key column "OL_O_ID" that references the primary key column "O_ID" in Order; the Persistence Engine automatically uses the appropriate field from the Order entity when inserting rows in the OrderLine table.

The situation becomes more complex when tables are not designed to use surrogate primary keys. In this scenario, it is often necessary to use composite primary keys.

I will use the Customer and District entities as examples because both have composite primary keys and there exists a many-to-one relationship between a Customer and a District. First of all, let's recall that to map a composite primary key, we need to create a primary key class that identifies the fields that form part of the composite key. This primary class is then associated with the Entity using the @IdClass annotation. The Entity itself also contains the fields used in the primary key; these are annotated using @Id and @Column. Since I described how this works in my previous post, I will only show the resulting key fields in the Customer entity here:
public class Customer {

@Column(name="C_ID", nullable=false)
public int getCustomerId() {
return customerId;

@Column(name="C_D_ID", nullable=false)
public int getDistrictId() {
return districtId;

@Column(name="C_W_ID", nullable=false)
public int getWarehouseId() {
return warehouseId;

Now in order to map the relationships, we add the following:
public class Customer {
District district;

@JoinColumn(name="C_W_ID", referencedColumnName="D_W_ID"),
@JoinColumn(name="C_D_ID", referencedColumnName="D_ID")
public District getDistrict() {
return district;

public class District {

Set<customer> customers;

public Set<customer> getCustomers() {
return customers;

The problem is that now there are two ways of updating the foreign key columns in the Customer entity, because the foreign key columns are present in the referenced District entity, but also present as fields in the entity in order to satisfy the requirements for a composite primary key. If you try to execute this code, you will encounter an Exception such as this (in Glassfish):

Multiple writable mappings exist for the field [TPCC.CUSTOMER.C_W_ID]. Only one may be defined as writable, all others must be specified read-only.

Multiple writable mappings exist for the field [TPCC.CUSTOMER.C_D_ID]. Only one may be defined as writable, all others must be specified read-only.

To resolve this problem, you need to modify the Customer Entity definition and ensure that the columns TPCC.CUSTOMER.C_W_ID and TPCC.CUSTOMER.C_D_ID are marked as readonly, by setting insertable=false and updatable=false:
public class Customer {

@Column(name="C_D_ID", nullable=false,
insertable=false, updatable=false)
public int getDistrictId() {
return districtId;

@Column(name="C_W_ID", nullable=false,
insertable=false, updatable=false)
public int getWarehouseId() {
return warehouseId;

Thursday, January 19, 2006

EJB 3.0 Entities are not POJOs

Contrary to appearances, EJB 3.0 Entities are not POJOs as they contain hidden data and functionality in order to implement some of the specification requirements. I found this out when I was stepping through some code and out of curiosity, inspected the entities returned by the EntityManager.

When you implement two Entities that are related, for example, in a one-to-many relationship, you typically have a Collection class in the first entity that contains references to instances of the second entity. For example, an Order entity may contain a Collection of OrderLine entities. EJB 3.0 supports "lazy fetches", which means that the data for the Collection is not fetched until required. To support this, the Collection class contains extra functionality and data - you can see this if you inspect the Entity using a debugger. When you try to access the collection, this hidden functionality is triggered; and data is fetched to populate the Collection. This process carries on recursively for all referenced entities.

Lazy fetching is clearly desirable because you do not want data to be unnecessarily fetched if you do not intend to use it. However, the problem is that the Entity is not fully initialized until all the data is fetched. This poses a particular problem if you want to use the Entities outside of their "managed" environment. Any attempt to access the uninitialized data may fail. Section 3.2.4 of the EJB 3.0 specification (PFD) spells out exactly what you can safely access.

See the recent thread at the Glassfish Discussion Forums on this issue, and how it affects client access.

One of the questions that will be debated in future is whether EJB Entities should be exposed outside the Data Access layer. For example, should you expose entities in your Business Logic interfaces, or to your clients. Since EJB 3.0 Entities are touted as POJOs, many developers will assume that they can use these objects ouside of the Data Access layer. However, such use is fraught with danger, due to the semantics of detatched entities.

In my view, the EJB specification should require that detached entities are "cleaned" and made POJOs. This would lead to more predictable behaviour, and less surprises.

Tuesday, January 17, 2006

Some comments on EJB 3.0 PFD

In a previous post, I complained about lack of clarity in the specification of the @IdClass annotation. Well, it seems that this lack of clarity extends to anything to do with composite primary keys. Here is what the specification has to say about composite primary keys:

Composite primary keys typically arise when mapping from legacy databases when the database key is comprised of several columns.
Perhaps this explains why there aren't any real examples to show how Entity relationships ought to be mapped when composite primary keys are involved.

Monday, January 16, 2006

@GeneratedValue annotation support in Glassfish

I reported previously that Glassfish does not yet support the new @GeneratedValue annotation specified in the EJB 3.0 PFD. I downloaded the latest build ( b33) of Glassfish today, and while testing my code, found that the new build supports @GeneratedValue. It is a pity that one has to find this out by trail and error; the Glassfish team should put up some release notes with each build which covers feature changes since last build.

Anyway, the old syntax for @Id annotation allowed atributes for specifying Id Generators. The new method requires the Id Generator to be specified separately using a @GeneratedValue annotation. Here is a comparison between the old approach and the new approach:

Old approach:
@Id (generate=GeneratorType.SEQUENCE, generator="SEQ_GEN") 
public Long id;
New approach:
public Long id;
I think the new approach is cleaner. It also enables Id Generator to be used in fields that are not necessarily the primary key.

Overall, the fact that Glassfish seems to be following the draft specification closely is great, as it enables developers to get familiar with the new technology. Thankfully, the specs are now close to final version, so changes ought to be relatively small.

Talking of changes, I hit another one. The syntax for @TableGenerator has changed. The table is not longer specified using @Table, instead, table and schema are ordinary string attributes. Here's an example of the new syntax:

@TableGenerator(name = "DISTRICT_ID_SEQUENCE", 
table = "IDGENERATOR", schema = "TPCC",
pkColumnName = "SEQ_NAME", valueColumnName = "SEQ_COUNT",
allocationSize = 10)

Saturday, January 14, 2006

Multiversion Concurrency

Some time ago, I promised to write about some of the techniques used in Oracle. A very early draft of a paper on Multi-Version concurrency is now available. It discussed MVCC implementations in PostgreSQL and Oracle.

SimpleDBM does not implement MVCC on purpose, as I wanted to understand traditional implementations before attempting to implement MVCC. Perhaps one day, a different version of SimpleDBM will implement MVCC.

I am very keen on ensuring that this paper is accurate in its description of PostgreSQL and Oracle implementations. I would also like to add descriptions of other DBMSes like Firebird and MySQL/InnoDb.

Friday, January 13, 2006

Documentation moving to LaTeX

I am ashamed to say that I just discovered LaTeX. Of course, I knew about TeX but never thought I would use it ... well, I have just converted the SimpleDBM Reference Manual to LaTeX, and I love the results. The output is so much better, and the document looks professional. Here's the link to the PDF output.

Thursday, January 12, 2006

Regarding EJB 3.0 Entity lifecycle

EJB 3.0 Entities exist in one of 4 states: new, managed, detached, and removed.

The state transitions of an Entity depend upon the type of operations on the Entity and the life-cycles of associated Persistence and Transaction Contexts.

When an entity is first instantiated using Java new operator, its state is "new".

If you obtain an entity instance via EntityManager.find() or EntityManager.getReference(), or through a Query, its state depends upon whether there is an active Persistence Context or not. If there is one, then the Entity is in "managed" state, otherwise in "detached" state.

When you invoke EntityManager.persist() on a "new", "managed" or "removed" Entity, it becomes "managed". Note that you cannot invoke persist() on a "detached" Entity.

When you invoke EntityManager.merge() on a "new" or "detached" Entity, a "managed" copy of the Enity is created. You cannot invoke merge() on a "removed" Entity. Invoking merge() on a "managed" Entity leaves it in "managed" state.

When you invoke EntityManager.remove() on a "managed" Entity, its new state becomes "removed". Attempts to remove "new" or "removed" entities are ignored, however, it is an error to try and remove a "detached" entity.

If the Persistence Context is TRANSACTION scoped, then all "managed" entities within the Persistence Context become "detached" when the associated transaction ends. If the Persistence Context is scoped as EXTENDED, then the entities remain "managed" even after transaction ends.

In terms of Entity state, "new" and "detached" states are similar in that the Persistence Context has no knowledge of such entities.

The rules governing the lifecycle of Persistence Contexts are complex; I will explore them in another post.

The EntityManager interface provides two mechanisms for persisting an Entity: persist() and merge(). The differences between these two interfaces are subtle, and in my view, a single interface would have been better and less confusing. In terms of usage, I would recommend using merge() at all times.

Tuesday, January 10, 2006

Using @IdClass in Glassfish

Given below is an example of how to use @IdClass in Glassfish. Note that the primary key class must not have any members other than the Id fields, not even serialVersionUID, even though the primary key class is defined as Serializable.

@Table(name = "WAREHOUSE", schema = "TPCC")
public class Warehouse {

String name;
String city;

@Column(name = "W_NAME", length = 10)
public String getName() {
return name;

@Column(name = "W_CITY", length = 20)
public String getCity() {
return city;

public void setCity(String city) { = city;

public void setName(String name) { = name;

public class WarehousePK implements Serializable {

String name = "";
String city = "";

public WarehousePK() {

public String getCity() {
return city;

public String getName() {
return name;

public void setCity(String city) { = city;

public void setName(String name) { = name;

public boolean equals(Object arg0) {
if (arg0 == this) {
return true;
else if (arg0 instanceof WarehousePK) {
return name.equals(((WarehousePK)arg0).name) &&
return false;

public int hashCode() {
return name.hashCode() ^ city.hashCode();
The primary key class is useful when searching for objects by their primary key. Example:

    WarehousePK key = new WarehousePK();
key.setName("LONDON WAREHOUSE");
Warehouse w = em.find(Warehouse.class, key);
When creating new entities of updating existing entities, you should access the properties within the entity as normal. Example:

    Warehouse w = new Warehouse();

Problems with EJB 3.0 @IdClass specification

Note that comments below are based upon the Proposed Final Draft specification of EJB 3.0.

I feel that the specification of @IdClass is not at all clear. There are several problems:

1. It is not clear what the class that represents the composite primary key is used for.

Is it just an informative construct designed to tell the Enity Manager which fields of the Entity represent the key?

How and when are clients expected to use this class?

From trial and error and guesswork I have deduced that the class that represents primary key is essentially for two reasons:

a) First to tell the Entity Manager which fields of an Entity correspond to the composite primary key.
b) Secondly, for use in the EntityManager.find() and EntityManager.getReference() methods, as the second argument.

It is also not clearly stated anywhere whether the primary key class needs to be annotated in any manner. Clearly, if it is used as @EmbeddedId then it must be annotated using @Embeddable, but if it is used as @IdClass then no annotation is needed.

2. There aren't any examples illustrating the use of @IdClass. There are few code snippets that depict @IdClass usage, but these do not adhere to the requirements defined for primary key classes.

Section 9.1.13 shows an entity called Employee, but does not show the EmployeePK class.

Section 9.1.31 (page 190) shows a primary key class EmpPK but this class does not adhere to the rules - ie, not Serializable, and does not implement hashCode() and equals().

Monday, January 09, 2006

Using bind variables in generated SQLs

I noticed that the generated SQLs in Glassfish/Toplink implementation of EJB 3.0 persistence, use literal values instead of bind variables. Here are a few examples:

Here is the generated SQL when creating a new entity:

Now, look at the SQL generated for a query:


Warehouse w = (Warehouse) em.createQuery(
"SELECT w FROM Warehouse w WHERE = :wname")
.setParameter("wname", "LONDON WAREHOUSE")
Generated SQL:

Finally, have a look at the UPDATE SQL:

SET W_STREET_1 = 'Braham Street', W_VERSION = 2
WHERE ((W_ID = 1) AND (W_VERSION = 1))
Clearly, the default implementation is always using literals in SQL statements. I haven't yet found a way to tell Glassfish/Toplink to use bind variables in SQLs.

Fortunately, since Glassfish and Toplink are both OpenSource, I can probably find out how to do this by looking at the source code.

Why EntityTransaction?

When managing entities outside a J2EE container, you need to obtain an instance of EntityTransaction to manage transactions. Here is an example:

        EntityManager em = null;
try {
em = Persistence.createEntityManagerFactory("em1")
System.err.println("Created entity manager");
Warehouse w = new Warehouse();

EntityTransaction t = em.getTransaction();

boolean success = false;
try {
success = true;
finally {
if (success) {
else {

} catch (Exception e) {
finally {
if (em != null) {

I would like to understand the rationale for introducing a new abstraction for transaction management when the UserTransaction interface already exists. By introducing a different mechanism for out-of-container applications, it will be harder to write code that is environment agnostic ... unless you write a wrapper to hide the differences.

Sunday, January 08, 2006

Experiments with EJB 3.0 @Id annotation

I have been messing around with @Id annotation in EJB 3.0 for generating primary keys.

Firstly, there appears to be a change in the latest version (Proposed Final Draft) of the EJB 3.0 specification which is not supported in Glassfish yet. The latest version requires a separate annotation called @GeneratedValue to define the type of Id generator. I downloaded Eclipse Dali yesterday, and this seems to support the new method.

The previous syntax supported five generator types - NONE, AUTO, SEQUENCE, IDENTITY and TABLE. In the new syntax, NONE is no longer required, as it is implied if there is no @GeneratedValue annotation. The remaining four generator types are supported.

By default, if no generator type is specified, then NONE is implied. This means that the developer must set the Id property or field correctly before attempting to persist the entity.

I have some trouble getting to terms with SEQUENCE and IDENTITY generators. Not all databases support both of these, so it seems to me that if you annotate your Entity with either of these then you are going to end up with non-portable code. My recommendation is to avoid these.

IDENTITY generator type is meant to be used when the underlying DBMS supports the IDENTITY column type - this is an autoincrement column that is maintained by the DBMS itself. For example, you can create a table in Apache Derby with a column defined as "GENERATED ALWAYS AS IDENTITY". Note that systems that support this also support a mechanism for retrieving the last generated IDENTITY value. This method is not the equivalent of using a trigger to populate the primary key - as is sometimes done in Oracle. With the trigger approach, there is no support for retrieving the last generated IDENTITY value.

The SEQUENCE generator is meant for systems that support Sequences. I know of Oracle and PostgreSQL that support sequences.

The TABLE generator, as its name implies, relies upon a special table for generating sequences. This is a portable method, as it does not rely upon native generators. Here is an example annotation that specifies a table generator:

table=@Table(name = "SEQUENCE", schema = "DIBYENDU"),
Finally, there is the AUTO generator type. This is meant to use the best strategy available in the DBMS. Unfortunately, Glassfish seems to insist on using a table generator, at least when I tested this with Apache Derby. It also assumes that you have created a table called SEQUENCE in the default schema for the JDBC Connection ID - there is no way to specify the schema. The expected table structure is not documented as far as I can see, but you can guess it by looking at the SQL logs. You can also use the DDL generation facility in Glassfish to get the basic DDL for creating this table.

Speaking of DDL generation facility in Glassfish, I think it is not very useful at the moment. The DDL generated looses information - for example, it does not respect any @Column attributes other than name.

Saturday, January 07, 2006

On Checked Exceptions

Here are a couple of my blogs on Checked Exceptions:
  1. Why I favour Checked Exceptions.
  2. Technique for ensuring method signature stability.

I recommend this excellent blog about Java API Design Guidelines, by Eamonn McManus. Agree with most of it except the bit about using Unchecked Exceptions. Eamonn provides a link to a presentation by Joshua Bloch, which covers the same subject.

Friday, January 06, 2006

More on Exceptions

In a previous post, I blogged about why I favour Checked Exceptions over Unchecked ones. Today, I'll talk about how I am circumventing some of the issues with Checked Exceptions.

The big advantage with Checked Exceptions is that the method signature tells you what Exceptions are likely to be thrown. Great as this is, it is also a liability, because any change in the Exception specification can break client code.

SimpleDBM comprises of several modules. In SimpleDBM, the module is the unit of reusability. Each module has an API which is represented by a set of Interfaces, and one or more implementation. An important objective is to make each Module's API stable so that as the code evolves, other modules are not impacted by changes in the API. This is where Exception specification becomes important.

Let us assume there are two modules, A and B, and also assume that B depends upon A. Now, suppose that some methods in A's API throw an exception called ExceptionA and some of B's methods throw ExceptionB. Since B's methods call A's methods, when defining B's methods, we have following options:

  1. Allow B's methods to throw ExceptionA.
  2. Catch ExceptionA in B's methods and wrap them in ExceptionB.

The problem with first approach is that it makes B's API unstable. What if A's methods start throwing a new Exception type?

The problem with the second approach is that ExceptionA has been wrapped and cannot be caught by a client. A client that invokes B may want to handle ExceptionA in a specific manner. In the first option, the client could write following code, but with the second option, this is not possible:

try {
// call B's API
catch (ExceptionA e) {
// Catch and handle ExceptionA
What we want is to preserve the information that A threw ExceptionA, but still avoid having ExceptionA in B's method signatures.

The solution is to wrap ExceptionA with a specific sub-class of ExceptionB, rather than plan ExceptionB. Let us call this sub-class ExceptionBExpetionA. The trick is that methods in B should only be specified to throw ExceptionB. This is okay because ExceptionBExceptionA is a sub-class of ExceptionB. However, now clients can catch ExceptionBExceptionA and handle this particular exception, while ignoring other instances of ExceptionB.
try {
// call B's API
catch (ExceptionBExceptionA e) {
// Catch and handle ExceptionBExceptionA
Not all exceptions thrown by A need be wrapped in this manner - only those that are specifically useful to the client.

In SimpleDBM, each module defines its own Exception class. Methods of the module can only throw instances of the module Exception. However, where necessary, sub-classes of the Exception are created that represent more specific information, sometimes wrapping Exceptions thrown by other modules.

Thursday, January 05, 2006

Using Glassfish EJB 3.0 Persistence in J2SE

UPDATED 7th May 2006
One of the great things about EJB 3.0 is its support for persistence in J2SE environments. This is great for trying out various persistence features without the overhead of developing an application which must be deployed to an application server before it can be tested.

To use EJB persistence in your J2SE program, you need a couple of jar files that come with Glassfish distribution. These are:

  1. /glassfish/lib/toplink-essentials.jar
  2. /glassfish/lib/toplink-essentials-agent.jar

These jar files are available as a separate standlone bundle here.

You require the toplink-essentials.jar during development. At runtime, you need to add the following option to your Java command line:


This will automatically include the toplink-essentials.jar to your classpath.

There is another preliminary step that you need to be aware of. You need to create a file named persistence.xml in a directory called META-INF. This directory must be in your classpath. Given below is an example of a persistence.xml file:

<?xml version="1.0" encoding="UTF-8"?>
<persistence version="1.0" xmlns="">
<persistence-unit name="em1" transaction-type="RESOURCE_LOCAL">
<property name="toplink.jdbc.driver" value="oracle.jdbc.driver.OracleDriver" />
<property name="toplink.jdbc.url" value="jdbc:oracle:thin:@localhost:1521:sample" />
<property name="toplink.jdbc.user" value="scott" />
<property name="toplink.jdbc.password" value="tiger" />
<property name="toplink.logging.level" value="INFO" />
If you use set logging level to FINEST you can see the SQLs being generated by Toplink.

To access the EJB 3.0 EntityManager in your program, add lines similar to following:

  EntityManager em = Persistence.createEntityManagerFactory("em1")

Wednesday, January 04, 2006

EJB 3.0

I have recently started exploring EJB 3.0. I am using the latest builds of Glassfish, Sun's OpenSource Application Server. Glassfish contains an OpenSource version of Oracle's Toplink product.

I had stayed away from EJB so far, as I was not convinced that the benefits of EJB outweighed the complexities it introduced into code. I had been looking at alternative light weight frameworks such as SpringFramework as a substitute for EJB. EJB 3.0 has however converted me. In my view, once production implementations of EJB 3.0 are available, there will be fewer occasions to use alternative frameworks.

As I learn more about EJB 3.0 and Glassfish, I will post my findings here. Stay tuned.


This is where I intend to blog about programming in general, and Java programming in particular.