Blueprint Forge.

Building things with computers.

Measuring & Optimising Django Database Performance

Database performance is a crucial factor in web application performance, and can mean the difference between a responsive web application and a slow one. Here, we summarise methods for identifying database performance issues, and how to approach fixing them.

Benchmarking Overall Performance

First, it is worth establishing whether database queries are a performance bottleneck, or whether you should be focusing your efforts on something else. There are a number of ways to do this, and two that I’ve found simple and effective are django-snippetscream and django-debug-toolbar.

Install django-snippetscream:

1
pip install django-snippetscream

and add the following middleware class to settings.py:

1
MIDDLEWARE_CLASSES = MIDDLEWARE_CLASSES + ('snippetscream.ProfileMiddleware',)

Now, you can simply append ?prof to your application URLs to profile the code run to generate the page. This gives a quick way of telling whether any particular methods are consuming disproportionate resources. Sample output is shown below.

1
2
3
4
5
6
7
72063 function calls (68833 primitive calls) in 0.267 seconds

Ordered by: internal time, call count

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
8084    0.034    0.000    0.034    0.000 env/lib/python2.7/posixpath.py:60(join)
8515    0.031    0.000    0.039    0.000 env/lib/python2.7/posixpath.py:130(islink)

See the snippetscream documentation for more details.

The django-debug-toolbar is another easy-to-use profiler (and has many other functions). To install:

1
pip install django-debug-toolbar

And add the following to your settings.py:

1
2
3
MIDDLEWARE_CLASSES = MIDDLEWARE_CLASSES + ('debug_toolbar.middleware.DebugToolbarMiddleware',)
INSTALLED_APPS = INSTALLED_APPS + ('debug_toolbar',)
INTERNAL_IPS = ('127.0.0.1',)

Now, when you visit your site in a browser, the SQL query view will display queries that have been executed and the time taken.

Django debug toolbar database query view

So, you can now judge for yourself whether your application is slow executing queries or if there is some other performance bottleneck. To deal with database issues, read on.

A closer look at SQL query generation in Django

Now that you have a couple of tools by which to measure performance, let’s look at some examples of optimising database performance. We’ll use the following simple models for our discussion.

1
2
3
4
5
6
7
8
class Article(models.Model):
    title = models.CharField(max_length=255)
    owner = models.ForeignKey(User)
    tags = models.ManyToManyField('Tag')
    content = models.TextField()

class Tag(models.Model):
    name = models.CharField(max_length=255)

Fetching Multiple Models at Once

For our first example, assume we want to print a list of articles with the author’s name. We could do the following:

1
2
3
for article in Article.object.all():
    print article.title
    print article.owner.name

How many SQL queries would this generate? Let’s use debugsqlshell:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
SELECT `testapp_article`.`id`,
       `testapp_article`.`owner_id`,
       `testapp_article`.`title`,
       `testapp_article`.`content`
FROM `testapp_article`  [0.13ms]

SELECT `auth_user`.`id`,
       `auth_user`.`username`,
       `auth_user`.`first_name`,
       `auth_user`.`last_name`,
       `auth_user`.`email`,
       `auth_user`.`password`,
       `auth_user`.`is_staff`,
       `auth_user`.`is_active`,
       `auth_user`.`is_superuser`,
       `auth_user`.`last_login`,
       `auth_user`.`date_joined`
FROM `auth_user`
WHERE `auth_user`.`id` = 1  [0.84ms]

etc...

For each article, there will be a separate query to retrieve details about the user. We can avoid this by using select_related, which tells Django’s ORM to select related models in the same query:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
for article in Article.objects.all().select_related():
    print article.owner.id

SELECT `testapp_article`.`id`,
       `testapp_article`.`owner_id`,
       `testapp_article`.`title`,
       `testapp_article`.`content`,
       `auth_user`.`id`,
       `auth_user`.`username`,
       `auth_user`.`first_name`,
       `auth_user`.`last_name`,
       `auth_user`.`email`,
       `auth_user`.`password`,
       `auth_user`.`is_staff`,
       `auth_user`.`is_active`,
       `auth_user`.`is_superuser`,
       `auth_user`.`last_login`,
       `auth_user`.`date_joined`
FROM `testapp_article`
INNER JOIN `auth_user` ON (`testapp_article`.`owner_id` = `auth_user`.`id`)  [0.39ms]

Here, we see that the article’s owner is retrieved using the JOIN clause. So in this example, use of select_related has halved the number of queries executed.

One thing that should be remembered about select_related is that it does not work for many-to-many fields. In our example, that means that article tags would not be fetched in a single query. For these model attributes, there is a new method included in Django 1.4 called prefetch_related. Django 1.4 is currently an alpha release, download it here.

Let’s look at how prefetch_related reduces the number of queries with our example classes. First, observe what happens if we access article tags without prefetching:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
for article in Article.objects.all().select_related():
    print article.tags.all()
   ...:
SELECT `testapp_article`.`id`,
       `testapp_article`.`owner_id`,
       `testapp_article`.`content`
FROM `testapp_article`  [0.50ms]

SELECT `testapp_tag`.`id`,
       `testapp_tag`.`name`
FROM `testapp_tag`
INNER JOIN `testapp_article_tags` ON (`testapp_tag`.`id` = `testapp_article_tags`.`tag_id`)
WHERE `testapp_article_tags`.`article_id` = 1 LIMIT 21  [1.36ms]

[<Tag: Tag object>, <Tag: Tag object>]
SELECT `testapp_tag`.`id`,
       `testapp_tag`.`name`
FROM `testapp_tag`
INNER JOIN `testapp_article_tags` ON (`testapp_tag`.`id` = `testapp_article_tags`.`tag_id`)
WHERE `testapp_article_tags`.`article_id` = 2 LIMIT 21  [0.36ms]

[<Tag: Tag object>, <Tag: Tag object>]
SELECT `testapp_tag`.`id`,
       `testapp_tag`.`name`
FROM `testapp_tag`
INNER JOIN `testapp_article_tags` ON (`testapp_tag`.`id` = `testapp_article_tags`.`tag_id`)
WHERE `testapp_article_tags`.`article_id` = 3 LIMIT 21  [0.36ms]

Separate queries are generated to retrieve the tags. And now with tag prefetching:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
for article in Article.objects.all().prefetch_related('tags'):
    print article.tags.all()
   ...:
SELECT `testapp_article`.`id`,
       `testapp_article`.`owner_id`,
       `testapp_article`.`content`
FROM `testapp_article`  [0.21ms]

SELECT (`testapp_article_tags`.`article_id`) AS `_prefetch_related_val`,
       `testapp_tag`.`id`,
       `testapp_tag`.`name`
FROM `testapp_tag`
INNER JOIN `testapp_article_tags` ON (`testapp_tag`.`id` = `testapp_article_tags`.`tag_id`)
WHERE `testapp_article_tags`.`article_id` IN (1,
                                              2,
                                              3)  [121.34ms]

As we can see, all tags are fetched in a singlee query using an IN statement. So, select_related and prefetch_related are two effective ways of reducing the number of queries needed for accessing models.

Other techniques for reducing generated queries

If you find that your models are too complex to benefit from the above methods, you can always start writing your own SQL. Of course, this comes with the usual disclaimers: your code may be less portable between databases, and may be harder to maintain.

One method of doing this is to use the extra() method on a queryset like so:

1
for article in Article.objects.all().extra('raw SQL here')

This can be useful if you need, for example, a nested SELECT clause. The other option is to use raw queries, which are documented here.

One last point worth mentioning is to be careful with the use of iterator(), which loads only a subset of a query set into memory. If you then iterate over this queryset, this can generate lots of queries. Here’s a longer discussion.

Conclusion

Hopefully this post has provided a useful discussion of query generation in Django and how you can optimise database access in your application.

I’ll close with a couple of useful resources: some useful tips on writing performant models, and the official django optimization page.