Database performance is a crucial factor in web application performance, and can mean the difference between a responsive web application and a slow one. Here, we summarise methods for identifying database performance issues, and how to approach fixing them.
Benchmarking Overall Performance
First, it is worth establishing whether database queries are a performance bottleneck, or whether you should be focusing your efforts on something else. There are a number of ways to do this, and two that I’ve found simple and effective are
and add the following middleware class to
Now, you can simply append
?prof to your application URLs to profile the code run to generate the page. This gives a quick way of telling whether any particular methods are consuming disproportionate resources. Sample output is shown below.
1 2 3 4 5 6 7
See the snippetscream documentation for more details.
The django-debug-toolbar is another easy-to-use profiler (and has many other functions). To install:
And add the following to your
1 2 3
Now, when you visit your site in a browser, the SQL query view will display queries that have been executed and the time taken.
So, you can now judge for yourself whether your application is slow executing queries or if there is some other performance bottleneck. To deal with database issues, read on.
A closer look at SQL query generation in Django
Now that you have a couple of tools by which to measure performance, let’s look at some examples of optimising database performance. We’ll use the following simple models for our discussion.
1 2 3 4 5 6 7 8
Fetching Multiple Models at Once
For our first example, assume we want to print a list of articles with the author’s name. We could do the following:
1 2 3
How many SQL queries would this generate? Let’s use
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
For each article, there will be a separate query to retrieve details about the user. We can avoid this by using
select_related, which tells
Django’s ORM to select related models in the same query:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Here, we see that the article’s owner is retrieved using the
JOIN clause. So in this example, use of
select_related has halved the number of queries executed.
One thing that should be remembered about
select_related is that it does not work for many-to-many fields.
In our example, that means that article tags would not be fetched in a single query.
For these model attributes, there is a new method included in Django 1.4 called
prefetch_related. Django 1.4 is currently an alpha release, download it here.
Let’s look at how
prefetch_related reduces the number of queries with our example classes. First, observe what happens if we access article tags without prefetching:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
Separate queries are generated to retrieve the tags. And now with tag prefetching:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
As we can see, all tags are fetched in a singlee query using an
IN statement. So,
prefetch_related are two effective ways of reducing the number of queries needed for accessing models.
Other techniques for reducing generated queries
If you find that your models are too complex to benefit from the above methods, you can always start writing your own SQL. Of course, this comes with the usual disclaimers: your code may be less portable between databases, and may be harder to maintain.
One method of doing this is to use the
extra() method on a queryset like so:
This can be useful if you need, for example, a nested
SELECT clause. The other option is to use raw queries, which are documented here.
One last point worth mentioning is to be careful with the use of
iterator(), which loads only a subset of a query set into memory. If you then iterate over this queryset, this can generate lots of queries. Here’s a longer discussion.
Hopefully this post has provided a useful discussion of query generation in Django and how you can optimise database access in your application.