Last time we’ve made the first steps with RavenDB. I’ve already mentioned that RavenDB is a document database (see also Wikipedia). However in our examples we just stored objects. So where are the documents?
Object to JSON Document
What does happen when we pass an object to the session’s store-method? Well under the cover the RavenDB client library serializes that object to a JSON document. So your object graph is converted into a document.
That has a few implications. A document is a hierarchical data structure. Therefore your object should be hierarchically structured and shouldn’t contain any circular references. If you try to store an object with circular references you will get an exception.
When you pass in an object which references other objects then all referenced objects are serialized to a single document. What does that mean for us? Well it means that the child-objects are not separate ‘units-of-storage’ in the database. You can only access the document and then peek into it. Let me demonstrate this with a simple example. We store persons with the city their living in. For this we create a person and a city class. Then we store a person with its city. Now we can easily query for persons, but not the city. The reason is that the city was embedded in to person document and therefore there’s no city document.
using (var session = documentStore.OpenSession()){ // We store a person and its city session.Store(new Person("Gamlor",new City("Vals"))); session.SaveChanges();}using (var session = documentStore.OpenSession()){ // We can get the person, because it's in a person document var hasPeopleInDB = session.Query<Person>().Any(); Console.Out.WriteLine("Do we have people-documents in the db? {0}", hasPeopleInDB ? "Yes, we do" : "No, we dont"); // However the city isn't in its own document var hasCitiesInDB = session.Query<City>().Any(); Console.Out.WriteLine("Do we have city-documents in the db? {0}", hasCitiesInDB ? "Yes, we do" : "No, we dont");}Document-Design
We’ve seen that RavenDB stores documents. Now the question is how we split our domain model across different documents. The general rule of thumb is to split your documents in such a way up that they fit together with the operations. By that I mean that operations in your application don’t have to meddle with hundreds of documents. Instead a document should contain all the data which are needed for the most common operations.
Let’s look at an example, an oversimplified online shop. We store costumers, orders and order-items:
Now how do we split that up? Everything in one document? Each entity in its own document? Well let’s think about it what operations we do in a shop and which entities we manipulate in those operations:
- Registering customer –> customer entity
- Changing customer data –> customer entity
- New ‘shopping’-tour –> order entity
- Adding and removing item –> order and order-item entity
- Showing shopping list –> order and order-item entity
- Send the order / finish shopping –> order and order-item entity
After taking a look at the operation it looks like that either the customer entity is touched or the order together with order-item entity. Therefore I suggest storing the customer in a document and the order with its order entities in a document.
Referencing Documents
We’ve decided to split up our entities in multiple documents. But how do we reference documents to each other? Well that’s done by storing the id of the referenced document. Each document has an id by which it can be referenced. When your entity has a property ‘Id’ RavenDB will by convention put the id there. That way we get the id of a document and use it for references. Like in this example, where the order references the customer by its id. (Here are the entity-classes for the example)
var customer = new Customer("Gamlor");session.Store(customer);
// After storing we have a valid idvar firstOrder = new Order() { CustomerId = costumer.Id };firstOrder.AddToOrder(new OrderItem("Magic Unicorn"));session.Store(firstOrder);
session.SaveChanges();Later on we can load the referenced documents by id. However we should remind our self that when possible operations should be able to more or less operate on one document. If we load hundreds of documents by reference we are doing something wrong or our problem is a bad fit for a document database.
var order = session.Query<Order>().First();var costumer = session.Load<Customer>(order.CustomerId);Batch-Loading Referenced Documents
Of course in reality there will be places where we need to load referenced documents. The code above creates an additional round trip to the database to load the referenced document. Network round trips are costly, that’s why we might want to get referenced documents in one go. We can do this by explicitly telling RavenDB to include referenced documents. Like this:
var order = session.Query<Order>() .Customize(x => x.Include<Order>(o=>o.CustomerId)) // Load also the costumer .First();var customer = session.Load<Customer>(order.CustomerId);Don’t fear the Demoralization
As you might already noticed documents are not normalized! We pack things together in a document and some data are redundantly stored. For example when we store blog-posts we certainly embed the tags in the same document. Above we’ve taken a look at references. Often we’ve the issue that we need only a few things from a referenced document. For example in our web shop we want to show the user name for current order we are piling together. Instead of loading the referenced costumer document every time we just could store this information redundantly in our order document.
For example we can create a class which holds the id and a name. This class is used to represent a reference to another document, but also copies the name of that document. That way we don’t need to do any document lookup as long as we only need the name:
public interface INamedObject{ string Id { get; set; } string Name { get; set; }}
internal class Customer : INamedObject{ public Customer(string name) { Name = name; } public string Id { get; set; } public string Name { get; set; } public string Address { get; set; }}
internal class Order{ public Order() { Items = new List<OrderItem>(); } public string Id { get; set; } public DenormalizedReference<Customer> CustomerReference { get; set; } public IList<OrderItem> Items { get; private set; }}// Denormalized reference, which stores the name of the named object.public class DenormalizedReference<T> where T : INamedObject{ public string Id { get; set; } public string Name { get; set; }
public static implicit operator DenormalizedReference<T>(T doc) { return new DenormalizedReference<T> { Id = doc.Id, Name = doc.Name }; }}var customer = new Customer("Gamlor");session.Store(customer);
var firstOrder = new Order() { // This is automatically converted to a named-reference // due to the magic of the implicit cast operator. // Now the order has the id and the name of the customer document CustomerReference = customer };firstOrder.AddToOrder(new OrderItem("Magic Unicorn"));session.Store(firstOrder);
// Later on:var order = session.Query<Order>().First();// no need for loading the customer document as long as we only need the namevar customerName = order.CustomerReference.Name;Console.Out.WriteLine(customerName);Conclusion & Next Time
This time we’ve looked at documents and how we can split up data in different documents. Next time we will look at RavenDB’s queries and indexes because they behave quite differently than in most databases.






#1 by Jacqueline on July 1, 2011 - 8:58 am
Great series. I’m looking forward to the next article! I’m also curious what Ayende thinks
#2 by gamlerhart on July 1, 2011 - 11:14 am
Thanks. No idea what Ayende is thinking. Maybe he’s secretly planning the next awesome thing =D
#3 by Peter on July 1, 2011 - 6:05 pm
Great series so far. You did a really nice job explaining the denormalized referneces.
Ayende is at least retweeting your posts.