Thursday, January 19, 2006

EJB 3.0 Entities are not POJOs

Contrary to appearances, EJB 3.0 Entities are not POJOs as they contain hidden data and functionality in order to implement some of the specification requirements. I found this out when I was stepping through some code and out of curiosity, inspected the entities returned by the EntityManager.

When you implement two Entities that are related, for example, in a one-to-many relationship, you typically have a Collection class in the first entity that contains references to instances of the second entity. For example, an Order entity may contain a Collection of OrderLine entities. EJB 3.0 supports "lazy fetches", which means that the data for the Collection is not fetched until required. To support this, the Collection class contains extra functionality and data - you can see this if you inspect the Entity using a debugger. When you try to access the collection, this hidden functionality is triggered; and data is fetched to populate the Collection. This process carries on recursively for all referenced entities.

Lazy fetching is clearly desirable because you do not want data to be unnecessarily fetched if you do not intend to use it. However, the problem is that the Entity is not fully initialized until all the data is fetched. This poses a particular problem if you want to use the Entities outside of their "managed" environment. Any attempt to access the uninitialized data may fail. Section 3.2.4 of the EJB 3.0 specification (PFD) spells out exactly what you can safely access.

See the recent thread at the Glassfish Discussion Forums on this issue, and how it affects client access.

One of the questions that will be debated in future is whether EJB Entities should be exposed outside the Data Access layer. For example, should you expose entities in your Business Logic interfaces, or to your clients. Since EJB 3.0 Entities are touted as POJOs, many developers will assume that they can use these objects ouside of the Data Access layer. However, such use is fraught with danger, due to the semantics of detatched entities.

In my view, the EJB specification should require that detached entities are "cleaned" and made POJOs. This would lead to more predictable behaviour, and less surprises.


Emmanuel Bernard said...

This would make them unreattacheable, which is not desirable.

Emmanuel Bernard said...

More precisely, this would make them unreattacheable unless you define some crazy rules that would lead to non POJO behavior (like lazy collection replaced by null or so).

Dibyendu said...

Not sure what you mean. Could you clarify a bit?

Rob Jellinghaus said...

Emmanuel is one of the core Hibernate team, so he's talking Hibernate. The Hibernate community has a long history with this issue.

I think what Emmanuel means more particularly is that the special collection wrappers contain state that is used when you reattach the entity to a persistence manager, so the persistence manager can efficiently determine which collection contents changed while the object was disconnected.

Turning the object into a POJO in general does not work, since you could -- depending on your collection mappings -- wind up eagerly loading much of your entire database!

In one system here at work, we implemented an XML-RPC serializer that basically converts uninitialized objects into "null". This means that you can't reattach those objects later, but that's not a use case we have to support.

Dibyendu said...

Well, the spec says that you must not attempt to access attributes of detatched entities that are marked as lazy. Therefore, you cannot change these attributes anyway. So I don't see how the Entity Manager can use this information.

I think that having hidden data in detached entities is a terrible idea. Firstly, the spec says that you must not try to access state that is marked lazy. As a developer how are you going to make sure that you are always doing the right thing? Secondly, all this hidden stuff breaks serialization. Thirdly, what happens when the object has been sent outside the environment - such as to another layer?

I do not have experience with Hibernate but I am assuming that most users of Hibernate probably access the Entities within the same container - ie, Business logic and Presentation logic co-exist with Data access logic. However, this may not be true in large-scale enterprise applications.

If detatched entities are made POJOs then users get predictable behaviour. The POJOs become re-usable in other layers, so you don't have to use workarounds as you have done. As for re-attaching a detached object - how many use-cases require this? For users who do want to re-attach detatched entities, they can merge a POJO back into the "managed state", and then refresh it. This would be equivalent to re-attaching a detatched entity, right?

Stephen Connolly said...

Seems to me that whats needed is a separate class of (lightish) collections which can be used for detached entities.

When detaching the entity, the container substitutes the collections for the detached collections.

Extending the idea, replace the unfetched properties with Lazy surrogates.

Now when the detatched objects are to be re-merged, we have the collections with the extra-sync info .

Additionally, the client can extract information about what parts of the entity it has been given (by testing for Lazy surrogates, or by querying the LazyCollection as to how many objects it has been provided with)

Emmanuel Bernard said...

The "EM" is not "you" of course. You should make the difference between the spec user and the spec implementor.

As a developper you're sure you're doing the right thing (ie not accessing lazy data) because an exception is raised if you try to. This avoids sneaky bugs in applications.

You assume wrong, of course people use Hibernate in multi-tiers / multi-container / multi-VM. This just work wo breaking serialization.

Reattaching is clearly an important feature that get rid of the DTO anti-pattern.