db4o: Object-Identity and High-Level-Caching

In the last two posts (#1, #2) we never ever used anything like an id for any object. So how the hell does db4o identify objects? How does it ensure that existing object are updated?

(All posts of this series: the basics, activation, object-identity, transactions, persistent classes, single container concurrency, Queries in Java, C# 2.0, client-server concurrency, transparent persistence, adhoc query tools)


By Equality?

Let’s assume db4o manages objects by equality. We store a object in the database. Then we retrieve it and create a equal copy of the object. Of course the hash-code method and the equals-method are properly written. Then we store the copy. Now we draw this assumption: When the database would manage object by equality, it would still only contain one element. Because we stored an equal copy.

var obj1 = new SimpleObject("Content of Obj1");
var theObjectFromTheDB = (from SimpleObject o in db select o).Single();
var copyButEqual = new SimpleObject(theObjectFromTheDB);
AssertEquals(theObjectFromTheDB, copyButEqual);
var objInDB = (from SimpleObject o in db select o);
// If db4o tracks by equality it would only store one instance. But it will fail
// since db4o doesn’t use equality

Well, the second assertion fails, so the assumption that db4o manages objects by the equality is wrong.


By Identity?

Well if it isn’t equality it has to be the object-identity, right? Again a little test-case. We retrieve a object from the database. We change the content of that object. After that, we retrieve it again from the database. Then we compare the both values, they should be the same, since we change the same object. Furthermore we check if the two objects are the same instance in memory.

var theObjectFromTheDB = (from SimpleObject o in db select o).First();
theObjectFromTheDB.Content = "changed-content";
var theObjectFromTheDBSecondTime = (from SimpleObject o in db select o).First();
AssertTrue(ReferenceEquals(theObjectFromTheDB, theObjectFromTheDBSecondTime));

Well our assumptions are confirmed. Both are actually the same object in memory. We conclude that db4o manages object by identity in memory.


How db4o Manages Objects

So, we know that when we work with db4o it uses the object-identity to track known objects. But the object-identity is only there while your application is running. What about the object which are stored on your hard disk? Well each persisted object in the database has unique internal object-id. Now when db4o loads and object into memory, it remembers which object in memory belongs to which internal object-id and vice versa. So when you call .Store() db4o checks if it had loaded that object previously. If it did, it updates that object. Otherwise its a new object. You actually can access the internal object-id if you really need to. But in a normal use case you never need to.

Now since db4o needs to remember the objects and object-ids anyway, it uses it as a ‘cache’. As long as you use the same object-container db4o will always return the same objects. This avoids unnecessary reads from the database. So objects which are already in memory are not loaded again from the database. Of course db4o uses weak-references for object-tracking. This means that db4o doesn’t eat your memory, but only keeps references to objects you actually use.


The Magic Clone

Well now we know that db4o keeps a map to remember the objects. So this should work, right?

var simpleObject = new SimpleObject("Joe");
using (IObjectContainer db = StartDataBase())
using (IObjectContainer db = StartDataBase())
    simpleObject.Content = "JoeUpdated";
    var objInDB = (from SimpleObject o in db select o);
    // So, it should be only one, since db4o tracks by references, right? But it fails.
    // Because the IObjectContainer tracks the objects. And we use a new object container here.
    AssertEquals(1, objInDB.Count());

But surprisingly now there are two SimpleObject in the database. So no object-identity-tracking? Did I just lie? No! This example is a common mistake when you start with db4o. The IObjectContainer is actually the one which keeps track of all the loaded objects. So when you use a persisted object in multiple object-containers the object will be stored twice. Because each object-container work isolated from the others.

So the object-container normally lives as long as your work-scope. In an embedded, single-user application it may lives as long as the application runs. In a web-application it may lives as long as a user-session. Each object has to be loaded and stored by the same object-container. A object must not ‘cross’ the object-container-boundary. For Hibernate-Users: The object-container is your new Hibernate-Session ;).

magic copy

magic copy

Next Time

So everything is wonderful, perfect and nothing goes wrong in fantasy land. Unluckily we’re living in the real world. So next time I’ll explain the transaction-handling =/

Example-Source-Code: Program.cs, SimpleObject.cs

Tagged on: ,

3 thoughts on “db4o: Object-Identity and High-Level-Caching

  1. Daniel

    I really enjoyed reading this series of blog posts. I learned a lot. Its amusing somehow, that you are able to explain db4o in this manner. I am looking forward for your post in the future.

  2. Joy

    Great way of using simple comics to illustrate the different points you are making. Makes it easier to understand and keeps our attention as well.