Category Archives: Django

Links

There is a new version of gunicorn, 19.0 which has a couple of significant changes, including some interesting workers (gthread and gaiohttp) and actually responding to signals properly, which will make it work with Heroku.

The HTTP RFC, 2616, is now officially obsolete. It has been replaced by a bunch of RFCs from 7230 to 7235, covering different parts of the specification. The new RFCs look loads better, and it’s worth having a look through them to get familiar with them.

Some kind person has produced a recommended set of SSL directives for common webservers, which provide an A+ on the SSL Labs test, while still supporting older IEs. We’ve struggled to find a decent config for SSL that provides broad browser support, whilst also having the best levels of encryption, so this is very useful.

A few people are still struggling with Git.  There are lots of git tutorials around the Internet, but this one from Git Tower looks like it might be the best for the complete beginner. You know it’s for noobs, of course, because they make a client for the Mac :)

I haven’t seen a lot of noise about this, but the EU has outlawed pre-ticked checkboxes.  We have always recommended that these are not used, since they are evil UX, but now there’s an argument that might persuade everyone.

Here is a really nice post about splitting user stories. I think we are pretty good at this anyhow, but this is a nice way of describing the approach.

@monkchips gave a talk at IBM Impact about the effect of Mobile First. I think we’re on the right page with most of these things, but it’s interesting to see mobile called-out as one of the key drivers for these changes.

I’d not come across the REST Cookbook before, but here is a decent summary of how to treat PUT vs POST when designing RESTful APIs.

Fastly have produced a spectacularly detailed article about how to get tracking cookies working with Varnish.  This is very relevant to consumer facing projects.

This post from Thought Works is absolutely spot on, and I think accurately describes an important aspect of testing The Software Testing Cupcake.

As an example for how to make unit tests less fragile, this is a decent description of how to isolate tests, which is a key technique.

The examples are Ruby, but the principle is valid everywhere. Still on unit testing, Facebook have open sourced a Javascript unit testing framework called Jest. It looks really very good.

A nice implementation of “sudo mode” for Django. This ensures the user has recently entered their password, and is suitable for protecting particularly valuable assets in a web application like profile views or stored card payments.

If you are using Redis directly from Python, rather than through Django’s cache wrappers, then HOT Redis looks useful. This provides atomic operations for compound Python types stored within Redis.

A Different View (part 2)

In the previous post, we saw how we could use a single, raw query in Django to combat ORMs’ tendency to generate query explosions within loops. We used raw() with joins and some column renaming to ensure all the data we needed came back in one go. We had to modify the property names in the template slightly (e.g. book.jacket.image became book.jacket_image) and the result was a RawQuerySet which had a fair number of limitations but, we avoided the typical order-of-magnitude increase in query count.

Super models

supermodel - by Jamie Beck and Kevin Burg

There is a way we can use a raw query, to get all the benefits described in the previous post, and return a real QuerySet. We need a Django model and so need an underlying table. A much underused feature of SQL databases is the view. Views are virtual tables derived from queries. Most are currently implemented as read-only, but they provide many benefits – again another blog post would be needed to explain them all – but they include:

  • adding a level of abstraction, effectively an API (which is also available to other applications)
  • protecting the application from underlying table and column changes
  • giving fine-grained control of permissions, especially row-level
  • providing a way to efficiently add calculated columns such as line_total = quantity * price
  • avoiding de-normalising things which should remain derived from other data
  • avoiding the not-insubstantial cost of repeatedly sending the same complicated queries over the network
  • enabling the database to pre-compile and cache the privilege-checks and query plans

So we could define a database view (with a few extra foreign-key columns for use later):

CREATE VIEW vbook AS
SELECT
  book.id,
  book.id AS book_id,
  book.title,
  jacket.id AS jacket_id,
  jacket.image AS jacket_image,
  author.id AS author_id
  author.name AS author_name,
  shelf."position"
FROM
  book
  NATURAL JOIN shelf
  NATURAL JOIN author
  NATURAL JOIN jacket

and then map a Django model to it, using Meta.db_table='vbook', and set the Meta.managed = False to tell Django not to maintain the table definition.

(note again the use of natural join for simplicity (and short table names) and that they don’t actually work as-is with Django’s table/column naming)

We’ve done this in the past for some of our projects, and shoe-horning the view-creation script into Django/buildout was always the fiddly bit.

Recently, we used django-postgres which makes the model-view mapping a bit simpler. You subclass pg.View instead of models.Model and set the sql property to your query, e.g.

class vBook(pg.View):    title = models.CharField(max_length=100)
    jacket_image = models.ImageField(null=True)
    author_name = models.CharField(max_length=100)
    shelf_position = models.IntegerField()
 
    sql = BOOK_DETAILS

manage.py sync_pgviews then creates the views for the models in the specified application. It’s quite a lightweight tool, and simply builds and issues the SQL CREATE VIEW statements needed to wrap each view-based model’s sql query.

(There is a Kickstarter project under way which should also add view handling to Django)

book_listingWe now have an almost fully-fledged Django model. We can query the objects against an effectively de-normalised table, while still retaining an essentially normalised schema. We can chain refinement methods to the resulting QuerySet to further filter and order the objects, including select_related() if necessary (although it would always be preferable, given time, to put the extra join into the view). We can have custom object managers, Meta options, introspection and all the other features of a Django model, even some read-only admin which would be good for reporting.

Note that we removed the

WHERE shelf.position <= 10

from the view definition. We can define the view without such filters and leave it to the Django application to apply them, e.g.

vbooks = vBook.objects.filter(shelf__position__lte = 10)

This gives a more flexible model, more akin to the original Book model, although you could keep the WHERE clauses in if you wanted to logically partition a table into multiple views/models.

Agile

Another great reason for using such view-based models is to postpone decisions about optimising bits of the system before we know anything about it. As an example, let’s say that a book needs to have an average rating that should be calculated from the entries in a ratings table. This could accurately be calculated using SQL’s AVG function, but that might be expensive and so we might need to pre-calculate it and possibly store it outside of the database. That would require us to design, build, test, document, and maintain:

  • an external data source capable of handling the unknown load and volume
  • a way to link a book to the external data source, possibly using new drivers
  • a way of updating the external data source for a book, duplicating the rating table entries
  • a way to synchronise the two data storage systems, possibly involving cron jobs and queues
  • a way to build and sustain the data source and links in whatever production environment we use, e.g. AWS

All of this would add complexity, up-front R&D and ongoing maintenance. The results would not be as fresh or reliable as the database results, and not necessarily any faster. Using AVG in a view leaves us scope to replace how it is calculated at a later date, once we know more about the performance.

Often, caching the book results using Django’s cache is enough and we can stick with the basic approach. Premature optimisation could well be a huge waste of effort, and the view-based models let us defer those decisions and get started very quickly.

Abstract

Picasso - Skull and Pitcher

Defining views early in a project could also be a way to postpone building complex parts of the system. Let’s say a book has a complicated ‘cost’ attribute and we’re not sure yet how it will be calculated or where all the data will come from but we need to start to display it during the initial iterations. We could add the column to the book view on day one and worry about where it comes from later. e.g.

CREATE VIEW vbook AS
SELECT
  book.id,
  book.id AS book_id,
  book.title,
  6.283185 AS cost, --todo: use a cost function here instead  ...

And then vbook.cost can be used in the knowledge that the reference won’t change. Also, if the cost calculation is defined within the view and it needs to change post-production, the view can be recreated while the application is running with no migration or down-time.

More modelling

We can further enhance the view-based model and add relationships to give us back the all-too-convenient dot-notation – still useful if we’re careful to use it outside of large loops. We should make sure any relationships don’t give Django the impression that it needs to try to maintain integrity – it doesn’t need to since the underlying table only has virtual rows. We can do this using on_delete=models.DO_NOTHING, e.g.

class vBook(pg.View):
    title = models.CharField(max_length=100)
    jacket_image = models.ImageField(null=True)
    author_name = models.CharField(max_length=100)
    shelf_position = models.IntegerField()
 
    author = models.ForeignKey(Author,                               on_delete=models.DO_NOTHING)             #needs author_id to be returned by the sql 
    sql = BOOK_DETAILS

These are, of course, complete Django relationships and so we can access them from any direction, e.g.

my_author.vbook_set.all()

would return an author’s books as vBook objects using a single extra query. You could even go further and use select_related('vbook') when getting the author. Django treats these models just like any other.

We can’t save or create using such view-based models – the database will reject that for anything but the simplest views, so we still use the underlying table-based models to do that. But we can link the models together, with a one-to-one relationship, to make things easier, e.g.

class vBook(pg.View):
    book = models.OneToOneField(Book,                                on_delete=models.DO_NOTHING)           #needs book_id to be returned by the sql    title = models.CharField(max_length=100)
    jacket_image = models.ImageField(null=True)
    author_name = models.CharField(max_length=100)
    shelf_position = models.IntegerField()
 
    author = models.ForeignKey(Author,
                               on_delete=models.DO_NOTHING)
             #needs author_id to be returned by the sql
 
    sql = BOOK_DETAILS

Now, given a vbook instance, we can update the underlying book, for example:

vbook.book.title='A Tale of Two Cities'
vbook.book.title.save()

And also, given a book instance, we can ask for all the pre-joined book information:

book.vbook.author_name

There are still some questions about how best to use these models, e.g. how to share methods with underlying table-based models and how best to name fields from other models.

Deferred Optimisation

Once we’ve defined view-based models to bring together the core concepts of the system, we can build the application to use them for reading and the base-table models for writing. We then have a variety of options to greatly optimise the system at a later date.

Here are some ways we can optimise database views. The important points are:

  • just by using views to join tables, we’ve already made great savings in the number of database queries needed
  • the ideas below can be applied later, once we know if and where further optimisation is needed
  • none of these would require any changes to the application code using the data: they’re true implementation changes, and database views give us such a clear separation of interface and implementation that the application could stay running!

Pull from other tables

For example, at a later point, we could modify the view to join to a table containing pre-calculated ratings and return that instead of the AVG sub-select.

Pull from external systems

We could use PostgreSQL’s foreign data wrappers to join to external data sources. It might be useful in itself to map a Django model to such a foreign table without an intermediate view, if all the data is foreign.

Use database functions

Database functions can be written in a number of languages including PL/pgSQL, Python, Ruby and Javascript. They are pre-compiled and are very efficient not least because they can access the data without it needing to be returned first and are in situ so can filter out unwanted results.

Load-balance across read-only replicas

PostgreSQL has the concept of standby/slave servers that can be kept synchronised with a master server. These can be made available for read-only queries to spread the load. Since we can be sure that view-based models are only used for reading, we could use Django’s database router to route all queries against those models to the slave servers.

Materialise

Views are virtual, except when they’re not. PostgreSQL 9.3 added materialised views.

CREATE MATERIALIZED VIEW vbook AS
SELECT ...

These are still declared as derived tables but the data in them is physically stored for much faster access: the results are effectively cached in the database. These are useful if the application can handle some staleness, and the only change necessary is in the way the view is declared – the application needn’t know or care. Such views can be refreshed periodically (although before 9.4 the refresh takes out a table lock). Indexes can be added to materialised views, potentially giving massive performance increases, especially for calculated columns.

Although I think materialisation is a last resort, it’s a very powerful one. I’ve added a MaterializedView class to isotoma/django-postgres which should help create them, though it could do with some more testing and options to control refreshing.

This really gives the best of both worlds: a single query can provide Django objects with the speed expected from de-normalised storage, but derived by the database from a normalised schema.

Summary

Django’s ORM makes accessing related objects simple and convenient. However, when using an ORM, accessing related objects in loops often leads to an explosion of supporting queries, which can go unnoticed during development but which can lead to poor performance. Django’s ORM has some methods that try to alleviate the problem but they have limitations. We can use raw SQL to efficiently join related information in the database to avoid these query explosions, as well as giving us more powerful ways to group and summarise data. I think these SQL queries should be given more prominence in our projects.

We can go further and push the raw SQL into the database by declaring virtual tables (database views). We can then map Django models onto these virtual tables to also give an extra layer of abstraction on top of the base models. This lets us defer implementation decisions and provides lots of ways to optimise the system at a later stage, once we know more about its performance, without affecting the application code.

A Different View (materialised)

About us: Isotoma is a bespoke software development company based in York and London specialising in web apps, mobile apps and product design. If you’d like to know more you can review our work or get in touch.

A Different View

A Different View

One of the consequences of the mismatches between databases and ORMs is that iterating over objects can generate explosions of database queries. Often, these explosions go unnoticed because the individual queries are fast and the number of concurrent users is small, especially during development. The explosions are sometimes hard to detect, either because developers don’t look for them or the tools that can expose the explosions don’t report everything (e.g. the Django debug toolbar doesn’t report the queries generated by ajax calls).

Exploding

debug_toolbarDjango does a good job of lazily evaluating what it can, but the simple act of using the convenient dot-notation inside a loop can turn a single database query into a cascade of supporting queries.

For example, given this book model:

class Book(models.Model):
    title = models.CharField(max_length=100)
    author = models.ForeignKey(Author)
    jacket = models.ForeignKey(Jacket)
    shelf = models.ForeignKey(Shelf)

and this books query:

books = Book.objects.filter(shelf__position__lte = 10)

which returns 10 rows. Then this template:

{% for book in books %}
  {{ book.jacket.image }} {{ book.title }} 
  by {{ book.author.name }}
{% endfor %}

book_listing_simplewould issue a single query to get all 10 rows, but would then generate 2 additional queries per iteration to get the jacket and the author for each book. So a total of 21 queries. You really need to use the Django debug toolbar to realise this – install it now and keep your eye on it until the very end of the project: it only takes a late-breaking, well-intentioned dot to cause an explosion.

A good way to detect when explosions happen is to check the number of queries in your unit tests.

self.assertNumQueries(self.BOOKS_QUERY_COUNT, self.client.get, reverse('books'))

Add these at the start of your project. It means you have to keep increasing the counts while developing, but it does make it obvious when something has exploded.

Remember, every one of those queries involves the following steps:

  • create a connection to the database (tcp/ip – handshake – username, password) (set CONN_MAX_AGE in Django 1.6 to help reduce this overhead)
  • send the SQL query over the network
  • parse the SQL query
  • check the table and column read/write permissions
  • optimise the query plan based on the latest statistics
  • start a transaction
  • perform the query (checking for other users’ locks)
  • return the resulting metadata and rows over the network
  • commit the transaction
  • close the connection

Joining

A far better approach is to get the database to do the work. It can join together the jacket and author information much more efficiently (orders of magnitude more efficiently – but that’s another topic).

Django added select_related() to try to address this problem. Given a field name, it passes the required table to the database to be joined with the original queryset, e.g.

books = Book.objects.filter(shelf__position__lte = 10
                           ).select_related('jacket',
                                            'author')

It is limited though, since it only works with some relationships in some directions, and it pulls in all the columns which is usually very wasteful, especially if large text fields are present. Adding defer() or only() to work around this can help, but isn’t pretty, e.g.

books = Book.objects.filter(shelf__position__lte = 10
                           ).select_related('jacket', 'author'
                           ).defer('author__biography',
                                   'author__summary',
                                   'author__resume')

prefetch_related() was added later to help with more complex relationships but still issues multiple queries and does the ‘joining’ in Python. Besides that, it also has a number of caveats and special interactions with other methods. Trying to combine these with annotations for aggregate counts and such would lead to some fragile, hard-to-construct and, I think, hard-to-read syntax that can fail even for slightly complicated queries.

So instead, we could issue the following query:

books = Book.objects.raw("""
  SELECT
    book.id, book.title,
    jacket.image AS jacket_image,
    author.name AS author_name
  FROM book
    NATURAL JOIN shelf NATURAL JOIN author NATURAL JOIN jacket
  WHERE shelf.position <= 10
""")

(note: I’ll use natural join throughout for simplicity, and short table names for that matter, but they don’t actually work as-is with Django’s table/column naming)

And then, a slightly modified template:

{% for book in books %}
  {{ book.jacket_image }} {{ book.title }} 
  by {{ book.author_name }}
{% endfor %}

would only issue 1 query, regardless of the number of books.

Adding further related information to the original template will dramatically increase the number of queries, whereas joining another table to the raw SQL version will add none. Similarly, returning 20 books instead of 10 would just about double the number of queries for the original version (so 41 for our example), whereas the raw one would still only issue a single query.

As well as avoiding query explosions, there are other benefits to using raw SQL. A full explanation needs another blog post, but some of the major ones are:

  • it’s the only way to do anything but the simplest of aggregation queries.
  • it only returns what is required (assuming you avoid SELECT *). Pulling every column, including large blobs, over the network when they’re not referenced wastes CPU, temporary sort/cache space and network bandwidth.
  • you can use easy-to-read comparisons (e.g. position <= 10 instead of shelf__position__lte = 10).
  • if you need/want, you can use database-specific features, such as function calls, range queries, NULL ordering, etc.
  • checking the query plan is easier – there’s no need to set a breakpoint and then print the queryset.query.
  • it returns consistent results, whereas multiple queries, even in the same transaction, can see different data depending on what’s been committed by concurrent users (unless we used the serializable isolation level, which we don’t).

Formatting

I’ve tried a number of ways to embed SQL queries into Django modules:

...raw("""
SELECT ...
FROM ...
WHERE ...
""")

is jarring.

...raw("""SELECT ...
          FROM ...
          WHERE ...
       """)

is better, but adds too much whitespace and makes it harder to edit.

...raw("""SELECT ..."""
       """FROM ..."""
       """WHERE ...""")

avoids the extra whitespace but doesn’t ensure enough whitespace unless you’re careful to add a space at the end of each line.

None are great and none of them provide an easy way to copy and paste the query, which is necessary to tune it and check its plan in a database client.

Elevating

For a better approach, I suggest creating an sql directory (alongside models.py – it is that important) and create python modules in there to hold the named query declarations, e.g. sql/books.py:

BOOK_DETAILS = """
SELECT
  book.id,
  book.title,
  jacket.image AS jacket_image,
  author.name AS author_name
FROM
  book
  NATURAL JOIN shelf
  NATURAL JOIN author
  NATURAL JOIN jacket
WHERE
  shelf.position <= 10
"""

then it can be called like this:

from sql.books import BOOK_DETAILS
 
books = Book.objects.raw(BOOK_DETAILS)

Clicking the BOOK_DETAILS reference in any decent IDE will take you to the query. Now that the query has a bit more respect it can more easily be copied and pasted into other tools, and formatted as you’d like, and re-used without repetition.

One more note about formatting. Although you could put Python comments before the declaration, I suggest keeping the query comments inside the query because they may be useful on the server, e.g. when viewing the database logs. So instead of:

#Get top 10 book summaries
#(including author and jacket details)
BOOK_DETAILS = """
SELECT
...
"""

use:

BOOK_DETAILS = """
/*
Get top 10 book summaries
(including author and jacket details)
*/
SELECT
...
"""

Downsides

There are some downsides with raw(), however.

  • You need some SQL knowledge of course, and an understanding of how Django links tables together, but I think you should have that anyway.
  • If your SQL does introduce some non-standard syntax (which shouldn’t be the case for straightforward joins) it could make your application less portable.  Although I’d argue that some features are worth that sacrifice, and would hope the likelihood of switching to a different database would be low if you’re already using a good one.
  • Even though raw() ensures only 1 query is used, that query must still be sent over the network and parsed every time it’s run.
  • The real problem is that raw() returns a RawQuerySet, not a QuerySet, so no further refinement methods can be applied (e.g. filter, order_by) – though arguably they’re better off being added to the SQL.

We can do better, as I’ll explain in part 2

About us: Isotoma is a bespoke software development company based in York and London specialising in web apps, mobile apps and product design. If you’d like to know more you can review our work or get in touch.

Reviewing Django REST Framework

Recently, we used Django REST Framework to build the backend for an API-first web application. Here I’ll attempt to explain why we chose REST Framework and how successfully it helped us build our software.

Why Use Django REST Framework?

RFC-compliant HTTP Response Codes

Clients (javascript and rich desktop/mobile/tablet applications) will more than likely expect your REST service endpoint to return status codes as specified in the HTTP/1.1 spec. Returning a 200 response containing {‘status’: ‘error’} goes against the principles of HTTP and you’ll find that HTTP-compliant javascript libraries will get their knickers in a twist. In our backend code, we ideally want to raise native exceptions and return native objects; status codes and content should be inferred and serialised as required.

If authentication fails, REST Framework serves a 401 response. Raise a PermissionDenied and you automatically get a 403 response. Raise a ValidationError when examining the submitted data and you get a 400 response. POST successfully and get a 201, PATCH and get a 200. And so on.

Methods

You could PATCH an existing user profile with just the field that was changed in your UI, DELETE a comment, PUT a new shopping basket, and so on. HTTP methods exist so that you don’t have to encode the nature of your request within the body of your request. REST Framework has support for these methods natively in its base ViewSet class which is used to build each of your endpoints; verbs are mapped to methods on your view class which, by default, are implemented to do everything you’d expect (create, update, delete).

Accepts

The base ViewSet class looks for the Accepts header and encodes the response accordingly. You need only specify which formats you wish to support in your settings.py.

Serializers are not Forms

Django Forms do not provide a sufficient abstraction to handle object PATCHing (only PUT) and cannot encode more complex, nested data structures. The latter limitation lies with HTTP, not with Django Forms; HTTP forms cannot natively encode nested data structures (both application/x-www-form-urlencoded and multipart/form-data rely on flat key-value formats). Therefore, if you want to declaratively define a schema for the data submitted by your users, you’ll find life a lot easier if you discard Django Forms and use REST Framework’s Serializer class instead.

If the consumers of your API wish to use PATCH rather than PUT, and chances are they will, you’ll need to account for that in your validation. The REST Framework ModelSerializer class adds fields that map automatically to Model Field types, in much the same way that Django’s ModelForm does. Serializers also allow nesting of other Serializers for representing fields from related resources, providing an alternative to referencing them with a unique identifier or hyperlink.

More OPTIONS

Should you choose to go beyond an AJAX-enabled site and implement a fully-documented, public API then best practice and an RFC or two suggest that you make your API discoverable by allowing OPTIONS requests. REST Framework allows an OPTIONS request to be made on every endpoint, for which it examines request.user and returns the HTTP methods available to that user, and the schema required for making requests with each one.

OAuth2

Support for OAuth 1 and 2 is available out of the box and OAuth permissions, should you choose to use them, can be configured as a permissions backend.

Browsable

REST framework provides a browsable HTTP interface that presents your API as a series of forms that you can submit to. We found it incredibly useful for development but found it a bit too rough around the edges to offer as an aid for third parties wishing to explore the API. We therefore used the following snippet in our settings.py file to make the browsable API available only when DEBUG is set to True:

if DEBUG:
    REST_FRAMEWORK['DEFAULT_RENDERER_CLASSES'].append(
        'rest_framework.renderers.BrowsableAPIRenderer'
    )

Testing

REST Framework gives you an APITestCase class which comes with a modified test client. You give this client a dictionary and encoding and it will serialise the request and deserialise the response. You only ever deal in python dictionaries and your tests will never need to contain a single instance of json.loads.

Documentation

The documentation is of a high quality. By copying the Django project’s three-pronged approach to documentation – tutorial, topics, and API structure, Django buffs will find it familiar and easy to parse. The tutorial quickly gives readers the feeling of accomplishment, the high-level topic-driven core of the documentation allows readers to quickly get a solid understanding of how the framework should be used, and method-by-method API documentation is very detailed, frequently offering examples of how to override existing functionality.

Project Status

At the time of writing the project remains under active development. The roadmap is fairly clear and the chap in charge has a solid grasp of the state of affairs. Test coverage is good. There’s promising evidence in the issue history that creators of useful but non-essential components are encouraged to publish their work as new, separate projects, which are then linked to from the REST Framework documentation.

Criticisms

Permissions

We found that writing permissions was messy and we had to work hard to avoid breaking DRY. An example is required. Let’s define a ViewSet representing both a resource collection and any document from that collection:

views.py:

class JobViewSet(ViewSet):
    """
    Handles both URLS:
    /jobs
    /jobs/(?P&lt;id&gt;\d+)/$
    """
    serializer_class = JobSerializer
    permission_classes = (IsAuthenticated, JobPermission)
 
    def get_queryset(self):
        if self.request.user.is_superuser:
            return Job.objects.all()
 
        return Job.objects.filter(
            Q(applications__user=request.user) |
            Q(reviewers__user=request.user)
        )

If the Job collection is requested, the queryset from get_queryset() will be run through the serializer_class and returned as an HTTPResponse with the requested encoding.

If a Job item is requested and it is in the queryset from get_queryset(), it is run through the serializer_class and served. If a Job item is requested and is not in the queryset, the view returns a 404 status code. But we want a 403.

So if we define that JobPermission class, we can fail the object permission test, resulting in a 403 status code:

permissions.py:

class JobPermission(Permission):
    def get_object_permission(self, request, view, obj):
    if obj in Job.objects.filter(
        Q(applications__user=request.user) |
        Q(reviewers__user=request.user)):
        return True
    return False

Not only have we duplicated the logic from the view method get_queryset (we could admittedly reuse view.get_queryset() but the method and underlying query would still be executed twice), if we don’t then the client is sent a completely misleading response code.

The neatest way to solve this issue seems to be to use the DjangoObjectPermissionsFilter together with the django-guardian package. Not only will this allow you to define object permissions independently of your views, it’ll also allow you filter querysets using the same logic. Disclaimer: I’ve not tried this solution, so it might be a terrible thing to do.

Nested Resources

REST Framework is not built to support nested resources of the form /baskets/15/items. It requires that you keep your API flat, of the form /baskets/15 and /items/?basket=15.

We did eventually choose to implement some parts of our API using nested URLs however it was hard work and we had to alter public method signatures and the data types of public attributes within our subclasses. We required entirely highly modified Router, Serializer, and ViewSet classes. It is worth noting that REST Framework deserves praise for making each of these components so pluggable.

Very specifically, the biggest issue preventing us pushing our nested resources components upstream was REST Framework’s decision to make lookup_field on the HyperlinkedIdentityField and HyperlinkedRelatedField a single string value (e.g. “baskets”). To support any number of parent collections, we had to create a NestedHyperlinkedIdentityField with a new lookup_fields list attribute, e.g. ["baskets", "items"].

Conclusions

REST Framework is great. It has flaws but continues to mature as an increasingly popular open source project. I’d whole-heartedly recommend that you use it for creating full, public APIs, and also for creating a handful of endpoints for the bits of your site that need to be AJAX-enabled. It’s as lightweight as you need it to be and most of what it does, it does extremely well.

About us: Isotoma is a bespoke software development company based in York and London specialising in web apps, mobile apps and product design. If you’d like to know more you can review our work or get in touch.

Django Class-Based Generic Views: tips for beginners (or things I wish I’d known when I was starting out)

Django is renowned for being a powerful web framework with a relatively shallow learning curve, making it easy to get into as a beginner and hard to put down as an expert. However, when class-based generic views arrived on the scene, they were met with a lukewarm reception from the community: some said they were too difficult, while others bemoaned a lack of decent documentation. But if you can power through the steep learning curve, you will see they are also incredibly powerful and produce clean, reusable code with minimal boilerplate in your views.py.

So to help you on your journey with CBVs, here are some handy tips I wish I had known when I first started learning all about them. This isn’t a tutorial, but more a set of side notes to refer to as you are learning; information which isn’t necessarily available or obvious in the official docs.

Starting out

If you are just getting to grips with CBVs, the only view you need to worry about is TemplateView. Don’t try anything else until you can make a ‘hello world’ template and view it on your dev instance. This is covered in the docs. Once you can handle that, keep reading the docs and make sure you understand how to subclass a ListView and DetailView to render model data into a template.

OK, now we’re ready for the tricky stuff!

Customising CBVs

Once you have the basics down, you will find that most of your work revolves around subclassing the built-in class-based generic views and overriding one or two methods. At the start of your journey, it is not very obvious what to override to achieve your goals, so remember:

  • If you need to get some extra variables into a template, use get_context_data()
  • If it is a low-level permissions check on the user, you probably want dispatch()
  • If you need to do a complicated database query on a DetailView, ListView etc, try get_queryset()
  • If you need to pass some extra parameters to a form when constructing it via a FormView, UpdateView etc, try get_form() or get_form_kwargs()

ccbv.co.uk

If you haven’t heard of ccbv.co.uk, go there and bookmark it now. It is possibly the most useful reference out there for working with class-based generic views. When you are subclassing views and trying to work out which methods to override, and the official docs just don’t seem to cut it, ccbv.co.uk has your back. If it wasn’t for that site, I think we would all be that little bit grumpier about using CBVs.

Forms

CBVs cut a LOT of boilerplate code out of the process of writing forms. You should already be using ModelForms wherever you can to save effort, and there are generic class-based views available (CreateView/UpdateView) that allow you to plug in your ModelForms and reduce your boilerplate code even further. Always use this approach if you can. If your form does not map to a particular model in the database, use FormView.

Permissions

If you want to put some guards on your view e.g. check if the user is logged in, check they have a certain permission etc, you will usually want to do it on the dispatch() method of the view. This is the very first method that is called in your view, so if a user shouldn’t have access then this is the place to intercept them:

1
2
3
4
5
6
7
8
9
10
from django.core.exceptions import PermissionDenied
from django.views.generic import TemplateView
 
class NoJimsView(TemplateView):
    template_name = 'secret.html'
 
    def dispatch(self, request, *args, **kwargs):
        if request.user.username == 'jim':
            raise PermissionDenied # HTTP 403
        return super(NoJimsView, self).dispatch(request, *args, **kwargs)

Note: If you just want to restrict access to logged-in users, there is a @require_login decorator you can add around the dispatch() method. This is covered in the docs, and it may be sufficient for your purposes, but I usually end up having to modify it to handle AJAX requests nicely as well.

Multiple inheritance

Once you start subclassing and overriding generic views, you will probably find yourself needing multiple inheritance. For example, perhaps you want to extend your “No Jims” policy (see above) to several other views. The best way to achieve this is to write a small Mixin and inherit from it along with the generic view. For example:

1
2
3
4
5
6
7
8
9
10
11
class NoJimsMixin(object):
    def dispatch(self, request, *args, **kwargs):
        if request.user.username == 'jim':
            raise PermissionDenied # HTTP 403
        return super(NoJimsMixin, self).dispatch(request, *args, **kwargs)
 
class NoJimsView(NoJimsMixin, TemplateView):
    template_name = 'secret.html'
 
class OtherNoJimsView(NoJimsMixin, TemplateView):
    template_name = 'other_secret.html'

Now you have entered the world of python’s multiple inheritance and Method Resolution Order. Long story short: order is important. If you inherit from two classes that both define a foo() method, your new class will use the one from the parent class that was first in the list. So in the above example, in your NoJimsView class, if you listed TemplateView before NoJimsMixin, django would use TemplateView’s dispatch() method instead of NoJimsMixin’s. But in the above example, not only will your NoJimsMixin’s dispatch() get called first, but when you call super(NoJimsMixin, self).dispatch(), it will call TemplateView’s dispatch() method. How I wish I had known this when I was learning about CBVs!

View/BaseView/Mixin

As you browse around the docs, code and ccbv.co.uk, you will see references to Views, BaseViews and Mixins. They are largely a naming convention in the django code: a BaseView is like a View except it doesn’t have a render_to_response() method so it won’t render a template. Almost all Views inherit from a corresponding BaseView and add a render_to_response() method e.g. DetailView/BaseDetailView, UpdateView/BaseUpdateView etc. This is useful if you are subclassing from two Views, because it means you can choose which one renders the final output. It is also useful if you want to render to JSON, say in an AJAX response, and don’t need HTML rendering at all (in this case you’d need to provide your own render_to_response() method that returns a HttpResponse).

Mixin classes provide a few helper methods, but can’t be used on their own, as they are not full Views.

So in short, if you are just subclassing one thing, you will usually subclass a View. If you want to manually render a non-HTML response, you probably need a BaseView. If you are inheriting from multiple classes, you will need a combination of some or all of View, BaseView and Mixin.

A final note on AJAX

Django is not particularly good at serving AJAX requests out of the box, and once you start trying to use CBVs to do AJAX form submissions, things get quite complicated.

The docs offer some help with this in the form of a Mixin you can copy and paste into your code, which gives you JSON responses instead of HTML. You will also need to pass CSRF tokens in your POST requests, and again there is an example of how to do this in the docs.

This should be enough to get you started, but I often find myself having to write some extra Mixins, and that is before even considering the javascript code on the front end to send requests and parse responses, complete with handling of validation and transport errors. Here at Isotoma, we are working on some tools to address this, which we hope to open-source in the near future. So watch this space!

Conclusion

In case you hadn’t worked it out, we at Isotoma are fans of Django’s class-based generic views. They are definitely not straightforward for newcomers, but hopefully with the help of this article and other resources (did I mention ccbv.co.uk?), it’ll be plain sailing before you know it. And once you get what they’re all about, you won’t look back.

About us: Isotoma is a bespoke software development company based in York and London specialising in web apps, mobile apps and product design. If you’d like to know more you can review our work or get in touch.

API First

Recently, we were faced with the task of writing an API-first web application in order to support future mobile platform development. Here’s a summary of the project from the point of view of one of the developers.

Agile API

For the first couple of iterations, we had problems demonstrating the project progress to the customer at the end of iteration meetings. The customer on this project was extremely understanding and reasonably tech-savvy but despite that, he remained uninterested in the progress of the API and became quite concerned by the lack of UI progress. Although we were busy writing and testing the API code sitting just beneath the surface, letting the customer watch our test suite run would have achieved nothing. It was frustrating to find that, when there was nothing for the customer to click around on, we couldn’t get the level of engagement and collaboration we would typically achieve. In the end, we had to rely on the wireframes from the design process which the customer had signed off on to inform our technical decisions and, to allay the customer’s fears, we ended up throwing together some user interfaces which lacked any functionality purely to give the illusion of progress.

On the plus side, once we had written enough of our API to know that it was fit for purpose, development on the front-end began and progressed very rapidly; most of the back-end validation was already in place, end-points were well defined, and the comprehensive integration tests we’d written served as a decent how-to-use manual for our API.

Extra Work

Developing the application API-first took more work and more lines of code than it would have required if implemented as a typical post-back website.

Each interface had to be judged by its general usefulness rather than by its suitability for one particular bit of functionality alluded to by our wireframes or specification. Any view that called upon a complex or esoteric query had to instead be implemented using querystring filters or a peculiar non-generic endpoint.

In a typical postback project with private, application-specific endpoints, we’d be able to pick and choose the HTTP verbs relevant to the template we’re implementing however our generic API required considerably more thought. For each resource and collection, we had to carefully think about the permissions structure for each HTTP method, and the various circumstances in which the endpoint might be used.

We wrote around 4000 lines of integration test code just to pin down the huge combination of HTTP methods and user permissions however I sincerely doubt that all of those combinations are required by the web application. Had we not put in the extra effort however, we’d have risked making our API too restrictive to future potential consumers.

In terms of future maintainability, I’d say that each new generic endpoint will require a comparable amount of otherwise-unnecessary consideration and testing of permissions and HTTP methods.

Decoupling

Having such an explicitly documented split between the front and back end was actually very beneficial. The front end and back-end were developed and tested based on the API we’d designed and documented. For over a month, I worked solely on the back-end and my colleague worked solely on the front and we found this division of labour was an incredibly efficient way to work. By adhering to the HTTP 1.1 specification, using the full range of available HTTP verbs and response codes, and to our endpoint specification, we required far less interpersonal coordination than would typically be the case.

Beyond CRUD

The two major issues we found with generic CRUD endpoints were (1) when we needed to perform a complex data query, and (2) update multiple resources in a single transaction.

To a certain extent we managed to solve the first problem using querystrings, with keys representing fields on the resource. For all other cases, and also to solve the second problem, we used an underused yet still perfectly valid REST resource archetype: the controller, used to model a procedural concept.

We used controller endpoints on a number of occasions to accommodate things like /invitations/3/accept (“accept” represents the controller) which would update the invitation instance and other related user instances, as well as sending email notifications.

Where we needed to support searching, we added procedures to collections, of the form /applicants/search, to which we returned members of the collection (in this example “applicants”) which passed a case-insensitive containment test based on the given key.

Conclusion

API-first required extra implementation effort and a carefully-considered design. We found it was far easier and more efficient to implement as a generic, decoupled back-end component than in the typical creation process (model -> unit test -> url -> view -> template -> integration test), with the front-end being created completely independently.

In the end, we wrote more lines of code and far more lines of integration tests. The need to stringently adhere to the HTTP specification for our public API really drove home the benefits to using methods and status codes.

In case you’re curious, we used Marionette to build the front-end, and Django REST Framework to build the back end.

About us: Isotoma is a bespoke software development company based in York and London specialising in web apps, mobile apps and product design. If you’d like to know more you can review our work or get in touch.

Content types and Django CMS

Screenshot of the new ENB website

The new ENB website

One of our latest projects to go live is a new website for the English National Ballet. Part of a major rebrand, we completely replaced their old PHP site with a new content-managed site powered by Django CMS.

Django CMS is very flexible, largely due to its minimalistic approach. It provides no page templates out of the box, so you can construct your HTML from the ground up. This is great if you want to make a CMS with a really strong design, because there is very little interference from the framework. However, its minimalistic approach also means that you sometimes have to write extra code to tie all the content together.

A good example of this is content types. In Django CMS, there is only one content type: Page. It has certain fields associated with it e.g. title, slug, published. Any other information that appears on a page comes courtesy of plugins. The default Django CMS plugins give you everything you need to add arbitrary text, images and video to a page. But what if you want more fields for your page? Let’s say, for example, you are representing a ballet production and you want category, thumbnail and summary text fields, which don’t appear on the page itself but are needed for listings elsewhere on the site?

We decided to create a special “metadata” plugin to be added to the production pages, that would only be visible to content editors and not end users. This was seen as the best solution that achieved our goal while maintaining a decent user experience for the editors.

The plugin model looks something like this:

1
2
3
4
class ProductionDetails(CMSPlugin):
    summary = models.CharField(max_length=200) # Short summary, shown in listings
    image = FilerImageField() # Thumbnail image, shown in listings
    audiences = models.ManyToManyField(Audience) # Categorisation

Note the use of django-filer for the image field. This is simply the best add-on I have encountered for dealing with image uploads and the inevitable cropping and resizing of said images. You can also use cmsplugin-filer (by the same author) to replace the standard image plugin that comes with Django CMS.

Now querying the database for, say, the first 10 productions for a family audience (audience id 3) is as simple as:

ProductionDetails.objects.filter(audiences=3, placeholder__page__published=True)[:10]

So now we have a plugin model that we can query, and we don’t need a template as we don’t want it to appear on the actual page, right? Wrong. We still want to provide a good user experience for the editors, and this includes looking at a page in edit mode and being able to tell whether the page already has the plugin or not. So we use request.toolbar.edit_mode in the template to decide whether to render the plugin:

1
2
3
4
5
6
7
8
9
{% load thumbnail %}
 
{% if request.toolbar.edit_mode %}
<div id="production-details">
 <img src="{% thumbnail instance.image 100x100 crop upscale subject_location=instance.image.subject_location %}" />
 <p>Summary: {{ instance.summary }}</p>
 <p>Audiences: {{ instance.audiences.all|join:', ' }}</p>
</div>
{% endif %}

Now this information will only appear if an editor has activated the inline editing mode while looking at the page. If they look at the page and the information is missing, they know they need to add the plugin!

This solution works quite well for us, although it is still fairly easy to create a page and forget to give it any metadata. Ideally it would be mandatory to add a metadata plugin. Perhaps the subject of a future blog post!

About us: Isotoma is a bespoke software development company based in York and London specialising in web apps, mobile apps and product design. If you’d like to know more you can review our work or get in touch.

Running a Django (or any other) dev instance over HTTPS

Being able to run your dev instance over HTTPS is really useful: you might spot some weird bug that would have bitten you in production, and if you do find one, you can debug it much more easily. Googling for this subject resulted in several different tutorials using stunnel, but all of them broke in some way on my machine running Ubuntu Maverick. So here is how I got stunnel working – perhaps it will help someone else too:

sudo aptitude install stunnel
sudo su -
cd /etc
mkdir stunnel
cd stunnel
openssl req -new -x509 -days 365 -nodes -out stunnel.pem -keyout stunnel.pem
openssl gendh 2048 >> stunnel.pem
chmod 600 stunnel.pem
logout
cd

Now create a file called dev_https with the following text:

pid=
foreground=yes
debug = 7

[https]
accept=8443
connect=8000
TIMEOUTclose=1

Note: this assumes your web server is running on port 8000. If it’s not, change the value of “connect” to the appropriate port.

Finally, run:

sudo stunnel4 dev_https

Now if you go to https://localhost:8443/, you should see your HTTPS-enabled dev instance!

Note: To properly simulate a HTTPS connection in Django, you should also set an environment variable HTTPS=on. Without this, request.is_secure() will return False. You could set it at the same time as starting your dev instance e.g:

HTTPS=on python manage.py runserver

About us: Isotoma is a bespoke software development company based in York and London specialising in web apps, mobile apps and product design. If you’d like to know more you can review our work or get in touch.

Scaffolding template tags for Django forms

We love Django here at Isotoma, and we love using Django’s awesome form classes to generate self-generating, self-validating, [X]HTML forms.

However, in practically every new Django project I find myself doing the same thing over and over again (and I know others do too): breaking the display of a Django form instance up into individual fields, with appropriate mark-up wrappers.

Effectively I keep recreating the output of BaseForm.as_p/as_ul/as_table with template tags and mark-up.

For example, outputting a login form, rather than doing:

{{ form.as_p }}

We would do:

<p>
{% if form.username.errors %}
  {% for error in form.username.errors %}
    {{ error }}
  {% endfor %}
{% endif %}
{{ form.username.label }} {{ form.username }}
</p>
<p>
{% if form.password.errors %}
  {% for error in form.password.errors %}
    {{ error }}
  {% endfor %}
{% endif %}
{{ form.password.label }} {{ form.password }}
</p>

Why would you want to do this? There are several reasons, but generally it’s to apply custom mark-up to a particular element (notice I said mark-up, not styling, that can be done with the generated field IDs), as well as completely customising the output of the form (using <div>‘s instead etc.), and also because some designers tend to prefer this way of looking at a template.

“But”, you might say, “Django already creates all this for us with the handy as_p/as_ul/as_table methods, can you just take the ouput from that?”
Well, yes, in fact on a project a couple of weeks ago that’s exactly what I did, outputting as_p in a template, and then editing the source chucked out in a browser.
Which gave me the idea to create a simple little tool to do this for me, but with the Django template tags for dynamically outputting the field labels and fields themselves.

I created django-form-scaffold to do just this, and now I can do this from a Python shell:

>>> from dfs import scaffold
>>> from MyProject.MyApp.forms import MyForm
>>> form = MyForm()
>>> # We can pass either an instance of our form class
>>> # or the class itself, but better to pass an instance.
>>> print scaffold.as_p(form)

{% if form.email.errors %}{% for error in form.email.errors %}
{{ error }}{% endfor %}{% endif %}
<p>{{ form.email.label }} {{ form.email }}</p>
{% if form.password1.errors %}{% for error in form.password1.errors %}
{{ error }}{% endfor %}{% endif %}
<p&gtl{{ form.password1.label }} {{ form.password1 }}</p>
{% if form.password2.errors %}{% for error in form.password2.errors %}
{{ error }}{% endfor %}{% endif %}
<p>{{ form.password2.label }} {{ form.password2 }}</p>

Copy and paste this into a template, tweak, and Robert’s your mother’s brother.

As well as as_p(), the dfs.scaffold module also has the equivalent functions as_ul(), as_table, and an extra as_div() function.

Returning an actual proper real life HTTP code from a Django error page

Go to a non existent page on a Django site and you will (hopefully) be met with a friendly error page telling you not to panic, everything is OK and all you’ve done is mistyped the URL or something.

If it’s your thing, you may be interested enough to look see what the actual HTTP code for the page is in the header; chances are that it will be a 200 rather than a 404 as the default handler just passes the dealings onto the HttpResponse class.

Generally speaking this is fine, but there are situations where an accurate code would be very handy, as I found out the other day when I was trying to detect whether a file had been uploaded to a remote server. Scraping the resultant HTML for “Page not found” is not my idea of a robust solution.

So, instead, pass the error page’s HTML into the respective class by putting something like this in urls.py:


handler404 = 'urls.return_404'
handler500 = 'urls.return_500'

def return_404(request):
	return HttpResponseNotFound(
                render_to_string("errors/404.html"))

def return_500(request):
	return HttpResponseServerError(
                render_to_string("errors/500.html"))

Fullest of props to PiotrLegnica at Stack Overflow for this most elegant of solutions.

Edit: After further examination (see the comments) the default handlers do act as expected, but you’re still restricted to where you put your error templates, i.e. the root of the templates directory.
To my mind, it’s neater if you can specify a dedicated location.