Blueprint Forge.

Building things with computers.

A Django Developer’s First Experience of Play

Having used Django (and other python web frameworks) on and off for three years, I came across the Play framework some time ago. However, it’s only recently that I’ve been able to use it for for a project. Note that this post is written from a Django developer’s perspective, but I’m not attempting a “Django vs Play” type of comparison. Instead, the following notes how a Django developer might approach the framework and map their own experience to it.

Play looks great for a number of reasons: it’s simple to set up, has a wide range of libraries and plugins that come out of the box, and has an active community surrounding it.

Initial Project Setup

Project setup is simple, and generates the necessary files and directory structure.

1
play new [app]

Replaces the familiar startproject command.

A range of libraries are already packaged with Play, including Joda Time, google-gson, Log4j, test runners, and database connectors. “Convention over configuration” is often cited as one of Play’s aims.

IDE Support

Creating an Eclipse project configuration for your project happens through the eclipsify command. Pydev, on the other hand, requires some manual setup, particularly if you are using different interpreters with vitualenv. Though simple, this can become tedious. And of course, static analysis capability is likely to differ signficantly because of the different type systems, but that’s a discussion outside the scope of this article.

Although this is not directly related to IDE support, it’s also worth mentioning that both frameworks support automatic application reloading based on file changes.

Model Declaration

Play exposes its own JPA (Java Persistence API) interface. The entity manager is already configured and can be easily accessed (e.g. for transactions).

1
2
3
4
5
6
7
8
9
import javax.persistence.Entity;
import play.db.jpa.Model;

@Entity
public class TestEntity extends Model {

    public String title;
    public Integer count;
}

As you can see, model declaration is concise, though the OO Architect in you might be shouting about encapsulation. Models are accessed directly through model.attribute, but getters and setters are automatically generated and used for access. These can be easily overridden. As a python developer, you might already be missing descriptors and the @property annotation!

A pre-configured entity manager is provided, but can easily be adjusted. In fact, my first application used an exisiting database, so I needed to declare models but disable DDL generation. This was easily done in application.conf.

Templating

Generally, Play’s templating engine can be described as similar to Django’s.

One nice feature is the %{ }% (script) tag that allows you to write scripts (with variable assignment, etc.) Of course, you don’t want to be doing any heavy lifting in the templates, but it can be very useful, e.g.:

1
2
3
4
5
%{
       nameCaps= name.toUpperCase();
}%
 
 <h1>${nameCaps}</h1>

Java object extensions (methods added to an object when used in a template) are another useful feature. To take the example from the documentation, the format method applied to Java.lang.Number gives the formatted result:

1
 <p>Total: ${order.total.format('## ###,00')}</p>

See the play template documentation for more details on these and more.

Test Framework Integration

Both Django and Play feature excellent test framework integration. Play has integrated the selenium test framework, and also allows tests to be run from the browser, a very handy feature.

Screenshot of the selenium test runner in Play Framework

Selenium support will ship in Django 1.4, but is not currently available.

Admin Interface

A great help in Django development is the admin interface. While it requires a little more setup, the CRUD module in Play gives a simple browser-based way of managing entities.

Screenshot of the CRUD interface in Play Framework

See more in the CRUD documentation.

Conclusion

Both frameworks have many strengths, and it is encouraging to see highly active development in both. There are a few features in Play which might make a Django developer envious, such as modules for OAuth/OAuth2 authentication and websockets built in. However, the frameworks are certainly very evenly matched.

Measuring & Optimising Django Database Performance

Database performance is a crucial factor in web application performance, and can mean the difference between a responsive web application and a slow one. Here, we summarise methods for identifying database performance issues, and how to approach fixing them.

Benchmarking Overall Performance

First, it is worth establishing whether database queries are a performance bottleneck, or whether you should be focusing your efforts on something else. There are a number of ways to do this, and two that I’ve found simple and effective are django-snippetscream and django-debug-toolbar.

Install django-snippetscream:

1
pip install django-snippetscream

and add the following middleware class to settings.py:

1
MIDDLEWARE_CLASSES = MIDDLEWARE_CLASSES + ('snippetscream.ProfileMiddleware',)

Now, you can simply append ?prof to your application URLs to profile the code run to generate the page. This gives a quick way of telling whether any particular methods are consuming disproportionate resources. Sample output is shown below.

1
2
3
4
5
6
7
72063 function calls (68833 primitive calls) in 0.267 seconds

Ordered by: internal time, call count

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
8084    0.034    0.000    0.034    0.000 env/lib/python2.7/posixpath.py:60(join)
8515    0.031    0.000    0.039    0.000 env/lib/python2.7/posixpath.py:130(islink)

See the snippetscream documentation for more details.

The django-debug-toolbar is another easy-to-use profiler (and has many other functions). To install:

1
pip install django-debug-toolbar

And add the following to your settings.py:

1
2
3
MIDDLEWARE_CLASSES = MIDDLEWARE_CLASSES + ('debug_toolbar.middleware.DebugToolbarMiddleware',)
INSTALLED_APPS = INSTALLED_APPS + ('debug_toolbar',)
INTERNAL_IPS = ('127.0.0.1',)

Now, when you visit your site in a browser, the SQL query view will display queries that have been executed and the time taken.

Django debug toolbar database query view

So, you can now judge for yourself whether your application is slow executing queries or if there is some other performance bottleneck. To deal with database issues, read on.

A closer look at SQL query generation in Django

Now that you have a couple of tools by which to measure performance, let’s look at some examples of optimising database performance. We’ll use the following simple models for our discussion.

1
2
3
4
5
6
7
8
class Article(models.Model):
    title = models.CharField(max_length=255)
    owner = models.ForeignKey(User)
    tags = models.ManyToManyField('Tag')
    content = models.TextField()

class Tag(models.Model):
    name = models.CharField(max_length=255)

Fetching Multiple Models at Once

For our first example, assume we want to print a list of articles with the author’s name. We could do the following:

1
2
3
for article in Article.object.all():
    print article.title
    print article.owner.name

How many SQL queries would this generate? Let’s use debugsqlshell:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
SELECT `testapp_article`.`id`,
       `testapp_article`.`owner_id`,
       `testapp_article`.`title`,
       `testapp_article`.`content`
FROM `testapp_article`  [0.13ms]

SELECT `auth_user`.`id`,
       `auth_user`.`username`,
       `auth_user`.`first_name`,
       `auth_user`.`last_name`,
       `auth_user`.`email`,
       `auth_user`.`password`,
       `auth_user`.`is_staff`,
       `auth_user`.`is_active`,
       `auth_user`.`is_superuser`,
       `auth_user`.`last_login`,
       `auth_user`.`date_joined`
FROM `auth_user`
WHERE `auth_user`.`id` = 1  [0.84ms]

etc...

For each article, there will be a separate query to retrieve details about the user. We can avoid this by using select_related, which tells Django’s ORM to select related models in the same query:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
for article in Article.objects.all().select_related():
    print article.owner.id

SELECT `testapp_article`.`id`,
       `testapp_article`.`owner_id`,
       `testapp_article`.`title`,
       `testapp_article`.`content`,
       `auth_user`.`id`,
       `auth_user`.`username`,
       `auth_user`.`first_name`,
       `auth_user`.`last_name`,
       `auth_user`.`email`,
       `auth_user`.`password`,
       `auth_user`.`is_staff`,
       `auth_user`.`is_active`,
       `auth_user`.`is_superuser`,
       `auth_user`.`last_login`,
       `auth_user`.`date_joined`
FROM `testapp_article`
INNER JOIN `auth_user` ON (`testapp_article`.`owner_id` = `auth_user`.`id`)  [0.39ms]

Here, we see that the article’s owner is retrieved using the JOIN clause. So in this example, use of select_related has halved the number of queries executed.

One thing that should be remembered about select_related is that it does not work for many-to-many fields. In our example, that means that article tags would not be fetched in a single query. For these model attributes, there is a new method included in Django 1.4 called prefetch_related. Django 1.4 is currently an alpha release, download it here.

Let’s look at how prefetch_related reduces the number of queries with our example classes. First, observe what happens if we access article tags without prefetching:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
for article in Article.objects.all().select_related():
    print article.tags.all()
   ...:
SELECT `testapp_article`.`id`,
       `testapp_article`.`owner_id`,
       `testapp_article`.`content`
FROM `testapp_article`  [0.50ms]

SELECT `testapp_tag`.`id`,
       `testapp_tag`.`name`
FROM `testapp_tag`
INNER JOIN `testapp_article_tags` ON (`testapp_tag`.`id` = `testapp_article_tags`.`tag_id`)
WHERE `testapp_article_tags`.`article_id` = 1 LIMIT 21  [1.36ms]

[<Tag: Tag object>, <Tag: Tag object>]
SELECT `testapp_tag`.`id`,
       `testapp_tag`.`name`
FROM `testapp_tag`
INNER JOIN `testapp_article_tags` ON (`testapp_tag`.`id` = `testapp_article_tags`.`tag_id`)
WHERE `testapp_article_tags`.`article_id` = 2 LIMIT 21  [0.36ms]

[<Tag: Tag object>, <Tag: Tag object>]
SELECT `testapp_tag`.`id`,
       `testapp_tag`.`name`
FROM `testapp_tag`
INNER JOIN `testapp_article_tags` ON (`testapp_tag`.`id` = `testapp_article_tags`.`tag_id`)
WHERE `testapp_article_tags`.`article_id` = 3 LIMIT 21  [0.36ms]

Separate queries are generated to retrieve the tags. And now with tag prefetching:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
for article in Article.objects.all().prefetch_related('tags'):
    print article.tags.all()
   ...:
SELECT `testapp_article`.`id`,
       `testapp_article`.`owner_id`,
       `testapp_article`.`content`
FROM `testapp_article`  [0.21ms]

SELECT (`testapp_article_tags`.`article_id`) AS `_prefetch_related_val`,
       `testapp_tag`.`id`,
       `testapp_tag`.`name`
FROM `testapp_tag`
INNER JOIN `testapp_article_tags` ON (`testapp_tag`.`id` = `testapp_article_tags`.`tag_id`)
WHERE `testapp_article_tags`.`article_id` IN (1,
                                              2,
                                              3)  [121.34ms]

As we can see, all tags are fetched in a singlee query using an IN statement. So, select_related and prefetch_related are two effective ways of reducing the number of queries needed for accessing models.

Other techniques for reducing generated queries

If you find that your models are too complex to benefit from the above methods, you can always start writing your own SQL. Of course, this comes with the usual disclaimers: your code may be less portable between databases, and may be harder to maintain.

One method of doing this is to use the extra() method on a queryset like so:

1
for article in Article.objects.all().extra('raw SQL here')

This can be useful if you need, for example, a nested SELECT clause. The other option is to use raw queries, which are documented here.

One last point worth mentioning is to be careful with the use of iterator(), which loads only a subset of a query set into memory. If you then iterate over this queryset, this can generate lots of queries. Here’s a longer discussion.

Conclusion

Hopefully this post has provided a useful discussion of query generation in Django and how you can optimise database access in your application.

I’ll close with a couple of useful resources: some useful tips on writing performant models, and the official django optimization page.

Remote Debugging Django Applications With Pydevd

Important disclaimer: enabling remote debugging of applications is a potential security risk. The following method should only be used in a development environment, and on a network you control. Use at your own risk.

Most django application issues can be successfully debugged using a local runserver instance on your development machine. But there are times when it is advantageous to be able to debug your application in an environment that more closely resembles production.

For instance, I recently encountered a caching issue that was caused by an incorrect Cache-Control header. The fault was at the application level, and it was useful to be able to trace the request/response behaviour in the full context of the web stack.

This post discusses debugging django, but the same technique can be used to debug any mod_wsgi application.

Environments and Remote Debugging

On my development machine, a set of VMs replicates the web stack. This includes load balancer, web cache, application server, and database machine. runserver is used to test the application locally, then I use fabric to push code to the local VMs.

To debug code when it is running on the server, you can use Aptana’s Pydev plugin for Eclipse. This allows you to run a debug session on your local machine which listens for connections. With the correct setup on your application server, you can step through your application as if it were running locally.

Screenshot of pydev remote debugger running in eclipse

Setup

First, the pydev debugger needs to be accessible to your application. You could use a shared filesystem for this, or you could clone the github repository (though in theory this might introduce compatibility issues). Whichever method you choose, the pydevd module needs to be added to $PYTHONPATH, which is done in your wsgi file. See the django documentation on deployment with wsgi if you haven’t yet created this.

1
2
3
import sys
# append location of pydev module to $PYTHONPATH
sys.path.append('srv/site/plugins/org.python.pydev.debug_2.2.4.2011110216/pysrc/')

After that, invoking the pydevd debugger is simple:

1
2
import pydevd
pydevd.settrace(IP_ADDRESS)

Where IP_ADDRESS is the IP address of machine where you are running Eclipse. You can omit this if you are running both Eclipse and Apache on the same host.

Ensuring Eclipse has Access to Application Code

The last thing to do is to ensure that the debugger can display the source code it is stepping through. This is done by modifying pydevd_file_utils.py on your server like so:

1
PATHS_FROM_ECLIPSE_TO_PYTHON = [(r'/home/matt/site', r'/srv/site')]

The first element of the tuple is the path to the source code on the host where Eclipse is running. The second element gives the location on the application server. If you don’t do this, Eclipse will prompt you for the location of the source files when the debug session starts.

Once your Eclipse debug session is listening for connections, make sure you have reloaded your application:

1
service apache2 reload

Now, just visit the site in your browser. You should see the pydev debugger receive the connection and spring into action.

Conclusion

Remote debugging is relatively simple to set up, and can be extremely useful for debugging certain issues.

To organise the code changes shown above, it might be helpful to create a remote-debug branch containing the necessary changes to your wsgi file. Since you won’t be developing on this branch, merging master periodically should incur minimal overhead.

There are a host of other debugging methods for mod_wsgi, including interactive command-line debugger and a browser-based shell. These are described in the mod_wsgi documentation. There’s also a good discussion of how you can set up single and multi-threaded debugging on the mod_wsgi Google Group.

You may also want to view the pydev documentation on remote debugging, which was used in writing this article.

Python: Injecting Mock Objects for Powerful Testing

The code discussed in this post can be found in the github repository.

While writing a module to handle Google ClientLogin recently, I wanted to test error handling by simulating error responses from the server. A simple but powerful way of doing this is to use the patching ability of the mock module.

The patching ability allows you to replace objects in scope with mocks so that different side effects or return values can be defined. Note that ‘object’ is used here in the pythonic sense – referring to entities such as modules and functions as well as class instances.

This is best illustrated by a real example, so in this post we’re going to mock the requests module and generate the exceptions described in the documentation when a request is made.

Our example module sends credentials to Google’s ClientLogin service in order to receive an authentication token, required for accessing certain Google services (such as C2DM). If you are interested, you can read more about ClientLogin on the Google Code page.

So, to request an authentication token from the ClientLogin service, you POST a set of parameters including email and password to the service endpoint. Here is a cut-down version of the code that initiates the authentication request:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
class Authenticator(object):

    @staticmethod
    def generate_token(email, password, source,
                       account_type='GOOGLE', service='ac2dm'):

        """Returns an auth token for the supplied service."""

        base_url = 'https://www.google.com/accounts/ClientLogin'

        headers = {
            'content-type': 'application/x-www-form-urlencoded',
        }

        payload = {
            'Email': email,
            'Passwd': password,
            'source': source,
            'accountType': account_type,
            'service': service,
        }

        try:
            response = requests.post(base_url, data=payload, headers=headers)
        except(RequestException):
            raise

If the requests.post method throws an exception, we simply raise it to the caller rather than handling it immediately. In the requests module, RequestException is the base class from which others (like ConnectionError) inherit. We could improve our approach to exception handling but it is sufficient for this example.

These exceptions will be thrown from our mocked class, which is patched into the above code using a context manager:

1
2
3
4
5
6
7
8
9
10
11
import unittest2 as unittest
from requests.exceptions import RequestException, ConnectionError
from mock import patch
from clientlogin import Authenticator

class AuthenticationTests(unittest.TestCase):

    def test_connection_error(self):
        with patch.object(requests, 'post') as mock_method:
            mock_method.side_effect = ConnectionError
            Authenticator.generate_token('[email protected]','password','tag')

Here, we have patched the post method in the requests module to throw a ConnectionError. You can think of this like code injection, where with acts as the closure.

In the real test method, we assert the exception was raised with another context manager:

1
2
3
4
5
def test_connection_error(self):
    with patch.object(requests, 'post') as mock_method:
        with self.assertRaises(RequestException):
            mock_method.side_effect = ConnectionError
            Authenticator.generate_token('[email protected]','password','tag')

Here, we assert that the ConnectionError exception is raised to the caller, but we could easily have asserted a different condition. We could, for instance, have verified some exception handling logic.

As we’ve seen, mocking objects and methods in this manner is a simple but powerful way of running your code under different simulated conditions, allowing thorough testing of error-handling logic.

You can see the full module including tests and usage instructions at the github repository. For more information on the mock module, the full documentation is available.

A Short Explanation of Hypermedia Controls in RESTful Services

This article roughly follows on from my previous post on REST, which discusses some points relating to the first two levels of the Richardson Maturity Model. Since we’re only going to discuss the top level of the model in this post (shown below), the other levels are greyed out.

Diagram of the Richardson Maturity Model showing three elements of RESTful services: resources, http verbs and hypermedia controls

Introducing Hypermedia Controls

Hypermedia as the Engine of Application State (HATEOAS) is an element of the REST architectural style, which describes how a consumer can control a given web application using hypermedia controls.

Let’s cut straight to the chase: what do we mean by hypermedia controls?

When I say hypertext, I mean the simultaneous presentation of information and controls such that the information becomes the affordance through which the user (or automaton) obtains choices and selects actions… Hypertext does not need to be HTML on a browser. Machines can follow links when they understand the data format and relationship types. –Roy Fielding

This explanation, from Roy Fielding’s widely cited blog post REST APIs must be Hypertext driven mentions the “simultaneous presentation of information and controls”. To simplify, we can imagine a hyperlink that our consumer is able to understand and manipulate.

So that’s the first word of HATEOAS. What does “Engine of Application State” mean? Quite literally, it means that the hyperlinks mentioned above can be used to control (drive) the web application. If you haven’t already drawn a state machine for your application, now’s the time!

Take, for instance, a RESTful webservice that allows virtual machines to be controlled. The Sun Cloud API is one example of such a service – the examples below are based loosely on the format it uses. If a given virtual machine is stopped, it be may started. Or, if a machine is already running, it may be stopped.

Example request for a machine representation:

1
2
3
GET /machines/1/
Host: example.com
Accept: application/xml

Example response:

1
2
3
4
HTTP/1.1 200 OK
Content-Type: application/xml
<status>stopped</status>
<link rel="start" method="post" href="machines/2?op=start" />

Note the link tag. This is a hypermedia control which tells the consumer how it could start the VM by POSTing to machines/{id}?op=start. (For simplicity I’ve omitted necessary details such as Content-Length and xml schema declaration.)

If the requested VM is running, the response would be:

1
2
3
4
HTTP/1.1 200 OK
Content-Type: application/xml
<status>running</status>
<link rel="stop" method="post" href="machines/2?op=stop" />

When our VM is running, the resource at machines/1/ contains a rel link to stop it. Both methods would probably return 201 Accepted in the case of success, as it is unlikely they would run synchronously.

In a nutshell, this is the concept of HATEOAS – these hypermedia resources allow us to control application state.

Note that the operation type in the query parameter could also have been a separate resource (such as machines/1/stop). There are arguments for either approach but it isn’t necessary to delve into this to understand the concept of HATEOAS.

Why Hypermedia Controls are Important

In the example above, the representation of a resource also gives us a way to manipulate the resource. In this way, HATEOAS allows certain aspects of the application to be self-describing, fostering loose coupling between service and consumer. If we consider that a service may evolve over time, then this loose coupling could allow the client to evolve with fewer issues than if all controls were hard-coded.

A useful resource that covers this in more detail is Wayne Lee’s presentation Why HATEOAS. Craig McClanahan also discusses the benefits of HATEOAS for service discovery and integration.

This post has necessarily glossed over many details of how HATEOAS is implemented, but has hopefully served as a useful introduction to the notion. I recommend REST in Practice as a resource for going into the subject in more detail. Additionally, the RESTful Services Cookbook contains details on implementation.

Further sources:

  1. Roy Fielding: Rest APIs must be Hypermedia Driven
  2. Ian Bicking on Hypertext Driven URLs
  3. Interesting Stack Overflow discussion