RavenDB: Preventing Database Slaughter

After we’ve covered the basics, documents and queries, we take a short look at another ‘feature’ of RavenDB. Have you ever had this issue: You’ve developed nice application which runs fine during development. Then you go to a production system and your application becomes slow or even dies under higher loads. And the database administrator comes angry to your team and asks who’s killing the database. What happened? Well often these issues are related to bad queries and unbound result sets. Let’s see what RavenDB does to prevent these situations.

No Unbound Result Sets

What do most databases do when you run a query like ‘give me all .NET related blog posts’? Well they try to fulfill that request. What’s the issue with that? Often the result set too is big. When too large result sets hits your application it will choke on it. How do we prevent that? We add limitations to the result set size. However how many times have you forgotten to add that limitation in your application? I think it’s a very common mistake to forget it and get an unbounded result set. Well RavenDB just doesn’t allow unbounded set. Let’s see it in action (btw: The domain classes are here, the insert code here):

var dotNetBlogPosts = from p in session.Query<BlogPost>()
			where p.Category == ".NET"
			select p;
Console.Out.WriteLine("Number of blog posts {0}",dotNetBlogPosts.Count());
var countPosts = 0;
foreach (var blogPost in dotNetBlogPosts)
{
	// do something
	countPosts++;
}
Console.Out.WriteLine("Iterated through {0} posts", countPosts);

When we run the code we get following output:

Number of blog posts 2000
Iterated through 128 posts

So when you don’t specify any limit then a result will contain at maximum 128 entries. Therefore you should always specify how many entries you want. For that just use the Take() and Skip() operators on your result:

foreach (var blogPost in dotNetBlogPosts.Take(50))
{
	// do something
	countPosts++;
}

Limits Are Your Friend

Limits Are Your Friend

Server Side Result Set Limitation

Let’s run another experiment: We just specify an insanely high limit for our query. Like this:

foreach (var blogPost in dotNetBlogPosts.Take(5000))
{
	// do something
	countPosts++;
}

What we notice is that RavenDB only returns 1024 documents from the server. It prevents that giant result sets are transferred across the wire. If you really want to have more data you need to page across the data with Skip() and Take(). Alternatively you could configure the server to allow a higher limit, but that’s not the recommended way to get larger result sets.

Also the server has its limits

Also the server has its limits

Request Limitations

One of the common performance issue in applications is doing too many requests and round trips to the database. (Especially feared as the ‘N+1 Selects’ pitfall with ORMs). This often just happens by accident, because we call different methods which all do some operations on the database. So what happens when we do that with RavenDB? Let’s find out. In this example we load a few blog posts and the go off and load all comments for a blog post.

var dotNetBlogPosts = from p in session.Query<BlogPost>()
					  where p.Category == ".NET"
					  select p;
foreach (var blogPost in dotNetBlogPosts.Take(50))
{
	var blogId = blogPost.Id;
	// This is most of the time not that obvious but hidden
	// in further methods
	var commentsForPost = from c in session.Query<Comment>()
						  where c.BlogPostId == blogId 
						  select c;
	DoSomethingWithComments(commentsForPost);
}

In this situation RavenDB will throw an exception telling us that we reached the request limit. By default each session can only do 30 requests to the server. When it does more RavenDB will complain. And most of the time we really shouldn’t use that many requests. For example when we want to load related documents we can tell that RavenDB directly and avoid addition requests, like we’ve saw in this blog post. If we really need more requests we also can increase the limit:

session.Advanced.Conventions.MaxNumberOfRequestsPerSession = 50;

Request Limits

Request Limits

Timeouts

Another protection of RavenDB is that operations have timeouts. Let’s run another example. We first insert tons of blog posts and the use a dynamic index to get non stale results:

for (int i = 0; i < 50000; i++)
{
	var blogPost = new BlogPost("Programming-Blah")
					   {
						   Content = "Tech stuff",
						   Category = ".NET"
					   };
	session.Store(blogPost);
}
session.SaveChanges();

var dotNetBlogPosts = from p in session.Query<BlogPost>()
					  .Customize(o=>o.WaitForNonStaleResultsAsOfLastWrite())
					  where p.Category == ".NET"
					  select p;
foreach (var blogPost in dotNetBlogPosts.Take(50))
{
	// do something with blogs
}

What will happen? We almost certainly get an time out exception telling us that it took to long to get non-stale results for that dynamic index. That’s because the background indexing hasn’t finished yet for all the new inserted documents. This also clearly demonstrates that RavenDB isn’t build for write-heavy scenarios, but for read heavy applications.
This also can occur when the server has just been started, because then it first needs to build up all the indexes used by the application. However in everyday use that should rarely happen.

Timeouts

Timeouts

Conclusion & Next Time

We’ve seen that RavenDB aggressively protects itself and our application from accidental overload scenarios. That also means that we need to ensure that we limit the result sets to reasonable sizes.

I’m yet sure what the topic of the next blog post will be, but there will one for sure.

Tagged on: ,

Leave a Reply

Your email address will not be published. Required fields are marked *