RavenDB: Queries and Indexes

So far we’ve covered the very basics and the document database nature of RavenDB. This time we take a look at the queries and indexes. I’ve already mentioned that RavenDB behaves different than most other databases in this matter.

Queries Go Hand in Hand with Indexes

On what do we define an index in most databases? In most databases we put our index on the stored piece of data. For example we put it on a column in a relational database or on a field in document and object databases. Then a query may utilizes an index to speed up the processing. In most real applications we need an index, otherwise the query is just too slow.

In RavenDB indexing works differently. RavenDB indexes the query instead of the fields in documents. Basically RavenDB takes our query, analyses it and extracts an index which can answer the query. Such indexes are called ‘dynamic indexes’. Every query we run builds or reuses a dynamic index. RavenDB tries to be smart about the indexes it builds. When a query is executed over and over again the index will become a permanent index. Less regular used queries only build a temporary index. All put together in RavenDB queries are just a way to create a new ‘dynamic’ index. Because of this behavior RavenDB nearly always talks about indexes and not queries. Also in this blog series I now will refer to ‘indexes’ most of the time.

Queries Always Go Via an Index

Queries Always Go Via an Index

When Is Stuff Indexed?

Another difference is when RavenDB indexes a document. Most databases index data when it is inserted or updated. That also implies that indexes slow down the update and insert process.

RavenDB indexes the document in the background. When we store or update a document RavenDB puts it in a queue. Then a background tasks picks it up and updates all the existing indexes. However this implies also that an index (and remember, a query always uses an index) might return a stale result. When we store a document and then use an index the document is maybe not in there yet.

RavenDB Indexes in the Background

RavenDB Indexes in the Background

Deal With the Stale Results

Now we have to deal with the potential stale results on an index. First, we actually can force RavenDB to return an accurate result. That means we wait until our last write has been indexed and then get the results. This can be achieved in two ways. We can turn it on for a particular query with the ‘WaitForNonStaleResultsAsOfLastWrite’ option. There are additional options available, which you can explore yourself.

// Include all changes since the last 'SavenChanges'
var johns = (from c in session.Query<Customer>()
			.Customize(q => q.WaitForNonStaleResultsAsOfLastWrite())
		where c.Name == "John"
		select c).Take(5);

Alternatively we can change the default consistency level for the session or even for document store.

// Include all changes since the last 'SavenChanges' for this session
session.Advanced.Conventions.DefaultQueryingConsistency = ConsistencyOptions.QueryYourWrites;
var johns = (from c in session.Query<Customer>()
			 where c.Name == "John"
			 select c).Take(5);

Force The Most Current Index

Force The Most Current Index

Choose the Right Consistency

Well now you probably think: ‘Well I just use a highest consistency level possible for all my queries/indexes’. Of course then we also pay the much higher costs of waiting for the index to be updated. It’s much better to think about which parts of our application can deal with stale results and which parts not.

Staleness in the Right Places

Staleness in the Right Places

Let’s look at an example. We build a simple blog / news site. Now let’s think about the consistency here. On the public website stale results shouldn’t be an issue. Why? Because you as a visitor can’t tell the difference between the ‘super-latest’ articles and the one published a few seconds ago. When an article shows up a few seconds later it makes no difference to you. Of course we also have an ‘administration’ backend. For the website administrator which is editing articles the story is different. He has just edited an article and wants to see his changes immediately. If he would encounter a stale content he would probably think that his changes are lost. What’s the conclusion? Well we use the ‘QueryYourWrites’ consistency for administration-backend, while we allow stale results on the public website. That makes also a lot of sense performance wise. Most traffic will be from visitors and there we don’t spend any time ensuring that there’s not stale result. Only for the few website administrators we spend a little more time getting the most recent stuff.

Permanent Indexes

Now so far we only used dynamic indexes, which are created when we are running a regular query. RavenDB also allows creating permanent and named indexes. Such an index is stored and maintained on the server until it is explicitly removed. We can create these permanent queries, give them a name and later on use them directly. First we create an index definition in a class like this:

public class CustomersWithJohnAsName : AbstractIndexCreationTask<Customer>
{
	public CustomersWithJohnAsName()
	{
		Map = customers => from customer in customers
						where customer.Name.Contains("John")
						select new { customer.Name };
	}
}

After that we tell RavenDB that there are index definitions in our assembly. RavenDB will include all our defined indexes of that assembly:

The index definition can contain all kinds of details. It can be a simple query like above or a complex map-reduce operation. I’m not going into details here due to lack of expertise and to keep the blog post short =). After we’ve created such an index we can use it:

var result = session.Query<Customer>("CustomersWithJohnAsName").Take(5);
foreach (var customer in result)
{
    Console.Out.WriteLine(customer);
}

Permanent indexes are certainly useful for very advanced index tuning. Or when we want to define some ‘view’ which returns certain data of our database. By the way when a dynamic index is used enough times RavenDB will promote it to a permanent index which we can treat like any other permanent index.

Permanent Indexs

Permanent Indexs

Conclusion

Now we’ve covered how queries and indexes work in RavenDB. Most of the time we just can query RavenDB like any other database, but we never have to fear that a query runs slow because of a missing index. On the other hand we need to be aware of stale results. Next time we look at another important design detail of RavenDB, which protects the database from killing itself due to programming mistakes. Stay tuned =).

Tagged on: ,

4 thoughts on “RavenDB: Queries and Indexes

  1. Masod Saidi

    Great series! Really like the simplicity in your articles, they are nicely structured and right to the point and not overloaded by a lot of text. Also the pictures makes it very fun to read.

    (maybe you could write tutorials for the official site, Ayende?)

    Keep up the good work!

Leave a Reply

Your email address will not be published. Required fields are marked *