Why mobile last?

In his post Mobile Last? Jonathan Stark recounts his experiences at a recent hackathon, where teams were given 48 hours to build an innovative web application. While he ensured as usual that the site he built looked and worked great on any device, he was dismayed that “only one of the top ten winning entries was even a little bit mobile friendly.” He concludes that “the vast majority of web professionals have not truly embraced mobile.”

Jonathan points out, correctly, that mobile devices are already the default connected device for most people, and that poor mobile experiences are driving users towards native apps. After all, he says, working mobile-first is not that hard, and asks:

Are you working mobile-first? If not, why not? If you are working mobile-first, how do you like it? What were the biggest challenges you faced in making the transition from desktop? What platforms are you targeting on mobile? Web? Native iOS/Android? Something else?

I thought I’d write a quick response.

I’d guess the top reason why front-end web developers are not working mobile-first is that they are usually not the visual designers for the sites they’re building, and the designs they receive from the visual designers will usually be desktop designs. Visual designers:

  • are likely to be more set in their ways of designing for the desktop, and unaware of the “mobile first” philosophy
  • are working in tools that are not responsive (such as Photoshop), and creating mobile mockups as well means a doubling of effort

Clients are also slow to adapt. Unless they are conceiving of their project as primarily a mobile site/app, they’ll expect to be sold the design on the strength of a desktop mockup. This is what most visual designers are used to delivering.

Finally, the front-end developer and everyone else on the team will be working on desktop systems, where it’s far easier to view the work in progress in desktop browsers. It takes extra effort to view the site in a mobile device or emulation thereof.

I’m usually the front-end developer in this equation. I am given a clear visual spec for the desktop, and it’s left up to me to make it responsive. It’s usually not that difficult, but it means starting by implementing the visual spec I have, i.e. the desktop design, and adapting it via media queries afterwards, resulting in a “mobile last” process.

Mobile last is somewhat inefficient, as it means redefining many rules. (It’s “subtractive”, where mobile first is “additive”.) But in my experience it’s not that bad. In my 2 most recent projects the responsive.less include file (containing all the styles dependent on media queries) accounted for only 179 of 1492 lines of CSS (11%) and 641 of 5047 (12%), and adds at most 3 or 4 extra days. It’s also a conceptually simple way of working. The “layering” of a mobile first stylesheet can make the stylesheet a lot harder to interpret and maintain.

I’m currently implementing quite a complex visual design, provided to me, as usual, as desktop mockups. I have been attempting to follow a mobile first approach, but have found it extremely difficult, given that I don’t know at the outset what a polished look will be at mobile sizes (since I don’t have mockups for it). I’ve ended up following a mobile-first approach in some areas of the CSS, and mobile-last in others. Rather than a single responsive.less include right at the end, every one of the 20-odd Less files is riddled with media queries. The whole thing is, I have to admit, a lot more complex and confusing.

If I were starting a new project where I am also the visual designer, or a project where the visual design is relatively simple, then I’d probably work mobile-first. (Photoshop plays quite a small role in such cases, as my ideal process is to do much of the design in the browser.) But that, alas, wouldn’t be a typical project.

I hope this provides some insights into why so many developers are currently not yet working mobile first. What are your experiences? Let me know in the comments below or @fjordaan

On December 11 Jonathan will be redesigning his personal site in front of a live audience – you can sign up for the webcast here. I’m sure it will be a very interesting demonstration of mobile-first development.

My experiences are with websites and web apps, rather than native iOS or Android. I test on iPhone and Android smartphones, 7″ tablets and 10″ tablets (both iOS and Android). I also use the “Responsive Design View” on Firefox and Chrome’s mobile device emulation.

Links

There is a new version of gunicorn, 19.0 which has a couple of significant changes, including some interesting workers (gthread and gaiohttp) and actually responding to signals properly, which will make it work with Heroku.

The HTTP RFC, 2616, is now officially obsolete. It has been replaced by a bunch of RFCs from 7230 to 7235, covering different parts of the specification. The new RFCs look loads better, and it’s worth having a look through them to get familiar with them.

Some kind person has produced a recommended set of SSL directives for common webservers, which provide an A+ on the SSL Labs test, while still supporting older IEs. We’ve struggled to find a decent config for SSL that provides broad browser support, whilst also having the best levels of encryption, so this is very useful.

A few people are still struggling with Git.  There are lots of git tutorials around the Internet, but this one from Git Tower looks like it might be the best for the complete beginner. You know it’s for noobs, of course, because they make a client for the Mac :)

I haven’t seen a lot of noise about this, but the EU has outlawed pre-ticked checkboxes.  We have always recommended that these are not used, since they are evil UX, but now there’s an argument that might persuade everyone.

Here is a really nice post about splitting user stories. I think we are pretty good at this anyhow, but this is a nice way of describing the approach.

@monkchips gave a talk at IBM Impact about the effect of Mobile First. I think we’re on the right page with most of these things, but it’s interesting to see mobile called-out as one of the key drivers for these changes.

I’d not come across the REST Cookbook before, but here is a decent summary of how to treat PUT vs POST when designing RESTful APIs.

Fastly have produced a spectacularly detailed article about how to get tracking cookies working with Varnish.  This is very relevant to consumer facing projects.

This post from Thought Works is absolutely spot on, and I think accurately describes an important aspect of testing The Software Testing Cupcake.

As an example for how to make unit tests less fragile, this is a decent description of how to isolate tests, which is a key technique.

The examples are Ruby, but the principle is valid everywhere. Still on unit testing, Facebook have open sourced a Javascript unit testing framework called Jest. It looks really very good.

A nice implementation of “sudo mode” for Django. This ensures the user has recently entered their password, and is suitable for protecting particularly valuable assets in a web application like profile views or stored card payments.

If you are using Redis directly from Python, rather than through Django’s cache wrappers, then HOT Redis looks useful. This provides atomic operations for compound Python types stored within Redis.

Selecting off-the-shelf software (or, how to avoid buying a lemon)

Here at Isotoma we often get asked what the differences are between off-the-shelf and bespoke software, and how to decide which is the right approach for a given situation.

Generally software that has been written specifically for you (by companies like us) is referred to as “custom” or “bespoke”, while software that is customised, configured and deployed to many customers is referred to as “boxed”, “off-the-shelf” or “COTS” (Commercial Off-The-Shelf).

Isotoma is a bespoke software development company. This means that we take generic frameworks and build applications tailored to our customers’ needs on top of those frameworks, rather than having a single product of our own that meets a particular need that we resell to our customers (for example a content management system or an ecommerce system).

We’re very upfront in our answer to the “bespoke or off-the-shelf” question; if there’s off-the-shelf software that meets your needs we strongly recommend you go that route. While bespoke software brings huge benefits in the right situation it also has limitations, complexities and costs. You need to be sure that it’s right for you before going down the bespoke road.

Equally you need to be sure that you’re assessing the right things (not just features!) when considering off-the-shelf software for your project. All software investments bring risks to an organisation, and the risks of off-the-shelf can often be subtler and harder to spot, yet equally as impactful, as those of bespoke software.

At its best buying off-the-shelf software can be an extremely cost effective way of getting the features you needs and of future-proofing your software investment. But at its worst it can be a way of tying yourself in to a closed environment that is expensive to customise and hard to migrate away from.

Any off-the-shelf product works best when the requirements it serves are shared by lots of organisations. There the economy of scale means that every customer gets a great product without having to pay for the development of all the features. We wouldn’t ever suggest, for example, building a blogging platform from scratch, nor a word processor or an accounting package. These are all problems shared by countless individuals and organisations around the globe where product companies are easily able to serve everyone’s needs.

However, whenever an organisation’s requirements for the product diverge substantially from the needs of the primary customers problems very rapidly occur, meaning either increased cost, compromised experience or, more often than not, both.

When selecting a product consider how much customisation you will need to make the product really sing for your organisation. And consider how you would be affected if you simply couldn’t have half of those customisations you’re imagining (regardless of whether this was through technical limitations or lack of budget for the necessary changes).

Next it’s important to think about the existing customer base of the product. If the product has thousands of customers, many like you, all successfully using it then you can probably feel comfortable that a) the product really is likely to meet your needs and b) the team behind it are likely to be around and supporting the product for the lifetime of your use of it.

Taking on very niche off-the-shelf products or off-the-shelf products from small suppliers can be extremely risky. The worst case is that under the hood each installation is in fact an alteration of the underlying base product, the “licence fee” being used to cover the cost of the developers implementing the features that the salesman told you already existed. Here you risk expensive maintenance costs, delays in implementation and a high rate of bugs, as you were sold something off-the-shelf (and had assumed the time scales that go with that) yet in fact custom software is being developed on your behalf at cut rates and at double quick time.

You should definitely carefully assess products with very small numbers of installed customers. Unless you are extremely confident that it meets your needs entirely unchanged or you are happy to invest in the development of the product in the long term they should probably be avoided. These are often only products in the sense of “hey, we’ve made this thing for so and so, I bet others like them will need it too.”

Potentially as troublesome is the risk of delays or exorbitant fees if the resources of the team behind the product become stretched. If you need to call upon consultancy or professional services will the team be able to support you in a timely and cost effective manner?

Finally it’s worth considering the impact on your business processes. In general products work well for organisations that are just setting out, as any limitations imposed by a boxed product do not conflict directly with the way the organisation wants to do business. In addition, very young organisations do not entirely understand their requirements and so having a set of predefined features given to them allows them to develop their own business processes alongside those features.

Older organisations tend to struggle with products, particularly at the cheaper (and less customisable) end of the spectrum. They have a clear understanding of their requirements and are looking to create efficiency within their organisation. It is no longer acceptable to have predefined solutions forced upon the business, as it is exactly those predefined solutions that are causing the inefficiencies that the company is trying to drive out. In this case bespoke software has huge advantages, being tailored exactly to the organisation’s understanding of its own needs.

So you must consider the scope and scale of changes to your business processes that the new software will bring. How substantial are they? Are you ready for them? Will your users support you in making them? And can you drive them through at the same time as running a new software project?

With all the above in mind, how do you make sure that an off-the-shelf product will meet your needs and that you’re not investing in a lemon?

Here are some questions worth asking during the selection process. There are no right answers, but by having the conversation with potential vendors at this stage you may flush out some problems that you would otherwise have uncovered much later in the project:

  • How many paying customers do you already have for the version you are selling to me?
  • How many paying customers of our installation size and type do you have?
  • Can I talk to at least 2 of your customers about their experiences of the product and your services?
  • What is the average customisation cost of an implementation?
  • How much do you charge per day/per hour for professional services?
  • What is the average turnaround time for professional services requests?
  • Is there a VAR (Value Added Reseller) or partner network around the product that I can tap into if I need extra help with customisation/integration?
  • Is there an API that I can build on and use to integrate with our other systems? (if yes, can we see the documentation before we sign up?)
  • How easy is it for me to export my data from your product in a way that’s possible to import into an alternative?
  • Can I host the product myself? If so, what are the minimum hardware specifications for an installation of our size and type, and who is responsible for performance issues should they occur?
  • Do you offer a support agreement of any sort? What SLA options do you offer and how much might a support agreement cost?
  • When was the last major version released, and how many customers upgraded (and how many do you still have running the old version)?
  • What was the average upgrade cost when customers moved to this major version?
  • When do you plan to release the next major version?
  • Is there a user group for the product that I can talk to or join?
  • How are new features for future versions decided?
  • Do features I request and pay for get rolled out to other customers in future versions?
  • And perhaps most importantly: what is the roadmap for the product over the next 18 to 24 months?

Of course, there are equally detailed lists of questions you should be asking when selecting a bespoke development partner or when selecting an open source project, but both of those are for another day. Hopefully this list at least helps a little with the decision making process.

About usIsotoma is a bespoke software development company based in York and London specialising in web apps, mobile apps and product design. If you’d like to know more you can review our work or get in touch.

The problem with Backing Stores, or what is NoSQL and why would you use it anyway

Durability is something that you normally want somewhere in a system: where the data will survive reboots, crashes, and other sorts of things that routinely happen to real world systems.

Over the many years that I have worked in system design, there has been a recurring thorny problem of how to handle this durable data.  What this means in practice when building a new system is the question “what should we use as our backing store?”.  Backing stores are often called “databases”, but everyone has a different view of what database means, so I’ll try and avoid it for now.

In a perfect world a backing store would be:

  • Correct
  • Quick
  • Always available
  • Geographically distributed
  • Highly scalable

While we can do these things quite easily these days with the stateless parts of an application, doing them with durable data is non-trivial. In fact, in the general case, it’s impossible to do all of these things at once (The CAP theorem describes this quite well).

This has always been a challenge, but as applications move onto the Internet, and as businesses become more geographically distributed, the problem has become more acute.

Relational databases (RDBMSes) have been around a very long time, but they’re not the only kind of database you can use. There have always been other kinds of store around, but the so-called NoSQL Movement has had particular prominence recently. This champions the use of new backing stores not based on the relational design, and not using SQL as a language. Many of these have radically different designs from the sort of RDBMS system that has been widely used for the last 30 years.

When and how to use NoSQL systems is a fascinating question, and I put forward our thinking on this. As always, it’s kind of complicated.  It certainly isn’t the case that throwing out an RDBMS and sticking in Mongo will make your application awesome.

Although they are lumped together as “NoSQL”, this is not actually a useful definition, because there is very little that all of these have in common. Instead I suggest that there are these types of NoSQL backing store available to us right now:

  • Document stores – MongoDB, XML databases, ZODB
  • Graph databases – Neo4j
  • Key/value stores – Dynamo, BigTable, Cassandra, Redis, Riak, Couch

These are so different from each other that lumping them in to the same category together is really quite unhelpful.

Graph databases

Graph databases have some very specific use cases, for which they are excellent, and probably a lot of utility elsewhere. However, for our purposes they’re not something we’d consider generally, and I’ll not say any more about them here.

Document stores

I am pretty firmly in the camp that Document stores, such as MongoDB, should never be used generally either (for which I will undoubtedly catch some flak). I have a lot of experience with document databases, particularly ZODB and dbxml, and I know whereof I speak.

These databases store “documents” as schema-less objects. What we mean by a “document” here is something that is:

  • self-contained
  • always required in it’s entirety
  • more valuable than the links between documents or it’s metadata.

My experience is that although often you may think you have documents in your system, in practice this is rarely the case, and it certainly won’t continue to be the case. Often you start with documents, but over time you gain more and more references between documents, and then you gain records and and all sorts of other things.

Document stores are poor at handling references, and because of the requirement to retrieve things in their entirety you denormalise a lot. The end result of this is loss of consistency, and eventually doom with no way of recovering consistency.

We do not recommend document stores in the general case.

Key/value stores

These are the really interesting kind of NoSQL database, and I think these have a real general potential when held up against the RDBMS options.  However, there is no magic bullet and you need to choose when to use them carefully.

You have to be careful when deciding to build something without an RDBMS. An RDBMS delivers a huge amount of value in a number of areas, and for all sorts of reasons. Many of the reasons are not because the RDBMS architecture is necessarily better but because they are old, well-supported and well-understood.

For example, PostgreSQL (our RDBMS of choice):

  • has mature software libraries for all platforms
  • has well-understood semantics for backup and restore, which work reliably
  • has mature online backup options
  • has had decades of performance engineering
  • has well understood load and performance characteristics
  • has good operational tooling
  • is well understood by many developers

These are significant advantages over newer stores, even if they might technically be better in specific use cases.

All that said, there are some definite reasons you might consider using a key/value store instead of an RDBMS.

Reason 1: Performance

Key/value stores often naively appear more performant than RDBMS products, and you can see some spectacular performance figures in direct comparisons. However, none of them really provide magic performance increases over RDBMS systems, what they do is provide different tradeoffs. You need to decide where your performance tradeoffs lie for your particular system.

In practice what key/value stores mostly do is provide some form of precomputed cache of your data, by making it easy (or even mandatory) to denormalize your data, and by providing the performance characteristics to make pre-computation reasonable.

If you have a key/value store that has high write throughput characteristics, and you write denormalized data into it in a read-friendly manner then what you are actually doing is precomputing values. This is basically Just A Cache. Although it’s a pattern that is often facilitated by various NoSQL solutions, it doesn’t depend on them.

RDBMS products are optimised for correctness and query performance and  write performance takes second place to these.  This means they are often not a good place to implement a pre-computed cache (where you often write values you never read).

It’s not insane to combine an RDBMS as your master source of data with something like Redis as an intermediate cache.  This can give you most of the advantages of a completely NoSQL solution, without throwing out all of the advantages of the RDBMS backing store, and it’s something we do a lot.

Reason 2: Distributed datastores

If you need your data to be highly available and distributed (particularly geographically) then an RDBMS is probably a poor choice. It’s just very difficult to do this reliably and you often have to make some very painful and hard-to-predict tradeoffs in application design, user interface and operational procedures.

Some of these key/value stores (particularly Riak) can really deliver in this environment, but there are a few things you need to consider before throwing out the RDBMS completely.

Availability is often a tradeoff one can sensibly make.  When you understand quite what this means in terms of cost, both in design and operational support (all of these vary depending on the choices you make), it is often the right tradeoff to tolerate some downtime occasionally.  In practice a system that works brilliantly almost all of the time, but goes down in exceptional circumstances, is generally better than one that is in some ways worse all of the time.

If you really do need high availability though, it is still worth considering a single RDBMS in one physical location with distributed caches (just as with the performance option above).  Distribute your caches geographically, offload work to them and use queue-based fanout on write. This gives you eventual consistency, whilst still having an RDBMS at the core.

This can make sense if your application has relatively low write throughput, because all writes can be sent to the single location RDBMS, but be prepared for read-after-write race conditions. Solutions to this tend to be pretty crufty.

Reason 3: Application semantics vs SQL

NoSQL databases tend not to have an abstraction like SQL. SQL is decent in its core areas, but it is often really hard to encapsulate some important application semantics in SQL.

A good example of this is asynchronous access to data as parts of calculations. It’s not uncommon to need to query external services, but SQL really isn’t set up for this. Although there are some hacky workarounds if you have a microservice architecture you may find SQL really doesn’t do what you need.

Another example is staleness policies.  These are particularly problematic when you have distributed systems with parts implemented in other languages such as Javascript, for example if your client is a browser or a mobile application and it encapsulates some business logic.

Endpoint caches in browsers and mobile apps need to represent the same staleness policies you might have in your backing store and you end up implementing the same staleness policies in Javascript and then again in SQL, and maintaining them. These are hard to maintain and test at the best of times. If you can implement them in fewer places, or fewer languages, that is a significant advantage.

In addition, it is a practical case that we’re not all SQL gurus. Having something that is suboptimal in some cases but where we are practically able to exploit it more cheaply is a rational economic tradeoff.  It may make sense to use a key/value store just because of the different semantics it provides – but be aware of how much you are losing without including an RDBMS, and don’t be surprised if you end up reintroducing one later as a platform for analysis of your key/value data.

Reason 4: Load patterns

NoSQL systems can exhibit very different performance characteristics from SQL systems under real loads. Having some choice in where load falls in a system is sometimes useful.

For example, if you have something that scales front-end webservers horizontally easily, but you only have one datastore, it can be really useful to have the load occur on the application servers rather than the datastore – because then you can distribute load much more easily.

Although this is potentially less efficient, it’s very easy and often cheap to spin up more application servers at times of high load than it is to scale a database server on the fly.

Also, SQL databases tend to have far better read performance than write performance, so fan-out on write (where you might have 90% writes to 10% reads as a typical load pattern) is probably better implemented using a different backing store that has different read/write performance characteristics.

Which backing store to use, and how to use it, is the kind of decision that can have huge ramifications for every part of a system.  This post has only had an opportunity to scratch the surface of this subject and I know I’ve not given some parts of it the justice they deserve – but hopefully it’s clear that every decision has tradeoffs and there is no right answer for every system.

About us: Isotoma is a bespoke software development company based in York and London specialising in web apps, mobile apps and product design. If you’d like to know more you can review our work or get in touch.

A Different View (part 2)

In the previous post, we saw how we could use a single, raw query in Django to combat ORMs’ tendency to generate query explosions within loops. We used raw() with joins and some column renaming to ensure all the data we needed came back in one go. We had to modify the property names in the template slightly (e.g. book.jacket.image became book.jacket_image) and the result was a RawQuerySet which had a fair number of limitations but, we avoided the typical order-of-magnitude increase in query count.

Super models

supermodel - by Jamie Beck and Kevin Burg

There is a way we can use a raw query, to get all the benefits described in the previous post, and return a real QuerySet. We need a Django model and so need an underlying table. A much underused feature of SQL databases is the view. Views are virtual tables derived from queries. Most are currently implemented as read-only, but they provide many benefits – again another blog post would be needed to explain them all – but they include:

  • adding a level of abstraction, effectively an API (which is also available to other applications)
  • protecting the application from underlying table and column changes
  • giving fine-grained control of permissions, especially row-level
  • providing a way to efficiently add calculated columns such as line_total = quantity * price
  • avoiding de-normalising things which should remain derived from other data
  • avoiding the not-insubstantial cost of repeatedly sending the same complicated queries over the network
  • enabling the database to pre-compile and cache the privilege-checks and query plans

So we could define a database view (with a few extra foreign-key columns for use later):

CREATE VIEW vbook AS
SELECT
  book.id,
  book.id AS book_id,
  book.title,
  jacket.id AS jacket_id,
  jacket.image AS jacket_image,
  author.id AS author_id
  author.name AS author_name,
  shelf."position"
FROM
  book
  NATURAL JOIN shelf
  NATURAL JOIN author
  NATURAL JOIN jacket

and then map a Django model to it, using Meta.db_table='vbook', and set the Meta.managed = False to tell Django not to maintain the table definition.

(note again the use of natural join for simplicity (and short table names) and that they don’t actually work as-is with Django’s table/column naming)

We’ve done this in the past for some of our projects, and shoe-horning the view-creation script into Django/buildout was always the fiddly bit.

Recently, we used django-postgres which makes the model-view mapping a bit simpler. You subclass pg.View instead of models.Model and set the sql property to your query, e.g.

class vBook(pg.View):    title = models.CharField(max_length=100)
    jacket_image = models.ImageField(null=True)
    author_name = models.CharField(max_length=100)
    shelf_position = models.IntegerField()
 
    sql = BOOK_DETAILS

manage.py sync_pgviews then creates the views for the models in the specified application. It’s quite a lightweight tool, and simply builds and issues the SQL CREATE VIEW statements needed to wrap each view-based model’s sql query.

(There is a Kickstarter project under way which should also add view handling to Django)

book_listingWe now have an almost fully-fledged Django model. We can query the objects against an effectively de-normalised table, while still retaining an essentially normalised schema. We can chain refinement methods to the resulting QuerySet to further filter and order the objects, including select_related() if necessary (although it would always be preferable, given time, to put the extra join into the view). We can have custom object managers, Meta options, introspection and all the other features of a Django model, even some read-only admin which would be good for reporting.

Note that we removed the

WHERE shelf.position <= 10

from the view definition. We can define the view without such filters and leave it to the Django application to apply them, e.g.

vbooks = vBook.objects.filter(shelf__position__lte = 10)

This gives a more flexible model, more akin to the original Book model, although you could keep the WHERE clauses in if you wanted to logically partition a table into multiple views/models.

Agile

Another great reason for using such view-based models is to postpone decisions about optimising bits of the system before we know anything about it. As an example, let’s say that a book needs to have an average rating that should be calculated from the entries in a ratings table. This could accurately be calculated using SQL’s AVG function, but that might be expensive and so we might need to pre-calculate it and possibly store it outside of the database. That would require us to design, build, test, document, and maintain:

  • an external data source capable of handling the unknown load and volume
  • a way to link a book to the external data source, possibly using new drivers
  • a way of updating the external data source for a book, duplicating the rating table entries
  • a way to synchronise the two data storage systems, possibly involving cron jobs and queues
  • a way to build and sustain the data source and links in whatever production environment we use, e.g. AWS

All of this would add complexity, up-front R&D and ongoing maintenance. The results would not be as fresh or reliable as the database results, and not necessarily any faster. Using AVG in a view leaves us scope to replace how it is calculated at a later date, once we know more about the performance.

Often, caching the book results using Django’s cache is enough and we can stick with the basic approach. Premature optimisation could well be a huge waste of effort, and the view-based models let us defer those decisions and get started very quickly.

Abstract

Picasso - Skull and Pitcher

Defining views early in a project could also be a way to postpone building complex parts of the system. Let’s say a book has a complicated ‘cost’ attribute and we’re not sure yet how it will be calculated or where all the data will come from but we need to start to display it during the initial iterations. We could add the column to the book view on day one and worry about where it comes from later. e.g.

CREATE VIEW vbook AS
SELECT
  book.id,
  book.id AS book_id,
  book.title,
  6.283185 AS cost, --todo: use a cost function here instead  ...

And then vbook.cost can be used in the knowledge that the reference won’t change. Also, if the cost calculation is defined within the view and it needs to change post-production, the view can be recreated while the application is running with no migration or down-time.

More modelling

We can further enhance the view-based model and add relationships to give us back the all-too-convenient dot-notation – still useful if we’re careful to use it outside of large loops. We should make sure any relationships don’t give Django the impression that it needs to try to maintain integrity – it doesn’t need to since the underlying table only has virtual rows. We can do this using on_delete=models.DO_NOTHING, e.g.

class vBook(pg.View):
    title = models.CharField(max_length=100)
    jacket_image = models.ImageField(null=True)
    author_name = models.CharField(max_length=100)
    shelf_position = models.IntegerField()
 
    author = models.ForeignKey(Author,                               on_delete=models.DO_NOTHING)             #needs author_id to be returned by the sql 
    sql = BOOK_DETAILS

These are, of course, complete Django relationships and so we can access them from any direction, e.g.

my_author.vbook_set.all()

would return an author’s books as vBook objects using a single extra query. You could even go further and use select_related('vbook') when getting the author. Django treats these models just like any other.

We can’t save or create using such view-based models – the database will reject that for anything but the simplest views, so we still use the underlying table-based models to do that. But we can link the models together, with a one-to-one relationship, to make things easier, e.g.

class vBook(pg.View):
    book = models.OneToOneField(Book,                                on_delete=models.DO_NOTHING)           #needs book_id to be returned by the sql    title = models.CharField(max_length=100)
    jacket_image = models.ImageField(null=True)
    author_name = models.CharField(max_length=100)
    shelf_position = models.IntegerField()
 
    author = models.ForeignKey(Author,
                               on_delete=models.DO_NOTHING)
             #needs author_id to be returned by the sql
 
    sql = BOOK_DETAILS

Now, given a vbook instance, we can update the underlying book, for example:

vbook.book.title='A Tale of Two Cities'
vbook.book.title.save()

And also, given a book instance, we can ask for all the pre-joined book information:

book.vbook.author_name

There are still some questions about how best to use these models, e.g. how to share methods with underlying table-based models and how best to name fields from other models.

Deferred Optimisation

Once we’ve defined view-based models to bring together the core concepts of the system, we can build the application to use them for reading and the base-table models for writing. We then have a variety of options to greatly optimise the system at a later date.

Here are some ways we can optimise database views. The important points are:

  • just by using views to join tables, we’ve already made great savings in the number of database queries needed
  • the ideas below can be applied later, once we know if and where further optimisation is needed
  • none of these would require any changes to the application code using the data: they’re true implementation changes, and database views give us such a clear separation of interface and implementation that the application could stay running!

Pull from other tables

For example, at a later point, we could modify the view to join to a table containing pre-calculated ratings and return that instead of the AVG sub-select.

Pull from external systems

We could use PostgreSQL’s foreign data wrappers to join to external data sources. It might be useful in itself to map a Django model to such a foreign table without an intermediate view, if all the data is foreign.

Use database functions

Database functions can be written in a number of languages including PL/pgSQL, Python, Ruby and Javascript. They are pre-compiled and are very efficient not least because they can access the data without it needing to be returned first and are in situ so can filter out unwanted results.

Load-balance across read-only replicas

PostgreSQL has the concept of standby/slave servers that can be kept synchronised with a master server. These can be made available for read-only queries to spread the load. Since we can be sure that view-based models are only used for reading, we could use Django’s database router to route all queries against those models to the slave servers.

Materialise

Views are virtual, except when they’re not. PostgreSQL 9.3 added materialised views.

CREATE MATERIALIZED VIEW vbook AS
SELECT ...

These are still declared as derived tables but the data in them is physically stored for much faster access: the results are effectively cached in the database. These are useful if the application can handle some staleness, and the only change necessary is in the way the view is declared – the application needn’t know or care. Such views can be refreshed periodically (although before 9.4 the refresh takes out a table lock). Indexes can be added to materialised views, potentially giving massive performance increases, especially for calculated columns.

Although I think materialisation is a last resort, it’s a very powerful one. I’ve added a MaterializedView class to isotoma/django-postgres which should help create them, though it could do with some more testing and options to control refreshing.

This really gives the best of both worlds: a single query can provide Django objects with the speed expected from de-normalised storage, but derived by the database from a normalised schema.

Summary

Django’s ORM makes accessing related objects simple and convenient. However, when using an ORM, accessing related objects in loops often leads to an explosion of supporting queries, which can go unnoticed during development but which can lead to poor performance. Django’s ORM has some methods that try to alleviate the problem but they have limitations. We can use raw SQL to efficiently join related information in the database to avoid these query explosions, as well as giving us more powerful ways to group and summarise data. I think these SQL queries should be given more prominence in our projects.

We can go further and push the raw SQL into the database by declaring virtual tables (database views). We can then map Django models onto these virtual tables to also give an extra layer of abstraction on top of the base models. This lets us defer implementation decisions and provides lots of ways to optimise the system at a later stage, once we know more about its performance, without affecting the application code.

A Different View (materialised)

About us: Isotoma is a bespoke software development company based in York and London specialising in web apps, mobile apps and product design. If you’d like to know more you can review our work or get in touch.

A Different View

A Different View

One of the consequences of the mismatches between databases and ORMs is that iterating over objects can generate explosions of database queries. Often, these explosions go unnoticed because the individual queries are fast and the number of concurrent users is small, especially during development. The explosions are sometimes hard to detect, either because developers don’t look for them or the tools that can expose the explosions don’t report everything (e.g. the Django debug toolbar doesn’t report the queries generated by ajax calls).

Exploding

debug_toolbarDjango does a good job of lazily evaluating what it can, but the simple act of using the convenient dot-notation inside a loop can turn a single database query into a cascade of supporting queries.

For example, given this book model:

class Book(models.Model):
    title = models.CharField(max_length=100)
    author = models.ForeignKey(Author)
    jacket = models.ForeignKey(Jacket)
    shelf = models.ForeignKey(Shelf)

and this books query:

books = Book.objects.filter(shelf__position__lte = 10)

which returns 10 rows. Then this template:

{% for book in books %}
  {{ book.jacket.image }} {{ book.title }} 
  by {{ book.author.name }}
{% endfor %}

book_listing_simplewould issue a single query to get all 10 rows, but would then generate 2 additional queries per iteration to get the jacket and the author for each book. So a total of 21 queries. You really need to use the Django debug toolbar to realise this – install it now and keep your eye on it until the very end of the project: it only takes a late-breaking, well-intentioned dot to cause an explosion.

A good way to detect when explosions happen is to check the number of queries in your unit tests.

self.assertNumQueries(self.BOOKS_QUERY_COUNT, self.client.get, reverse('books'))

Add these at the start of your project. It means you have to keep increasing the counts while developing, but it does make it obvious when something has exploded.

Remember, every one of those queries involves the following steps:

  • create a connection to the database (tcp/ip – handshake – username, password) (set CONN_MAX_AGE in Django 1.6 to help reduce this overhead)
  • send the SQL query over the network
  • parse the SQL query
  • check the table and column read/write permissions
  • optimise the query plan based on the latest statistics
  • start a transaction
  • perform the query (checking for other users’ locks)
  • return the resulting metadata and rows over the network
  • commit the transaction
  • close the connection

Joining

A far better approach is to get the database to do the work. It can join together the jacket and author information much more efficiently (orders of magnitude more efficiently – but that’s another topic).

Django added select_related() to try to address this problem. Given a field name, it passes the required table to the database to be joined with the original queryset, e.g.

books = Book.objects.filter(shelf__position__lte = 10
                           ).select_related('jacket',
                                            'author')

It is limited though, since it only works with some relationships in some directions, and it pulls in all the columns which is usually very wasteful, especially if large text fields are present. Adding defer() or only() to work around this can help, but isn’t pretty, e.g.

books = Book.objects.filter(shelf__position__lte = 10
                           ).select_related('jacket', 'author'
                           ).defer('author__biography',
                                   'author__summary',
                                   'author__resume')

prefetch_related() was added later to help with more complex relationships but still issues multiple queries and does the ‘joining’ in Python. Besides that, it also has a number of caveats and special interactions with other methods. Trying to combine these with annotations for aggregate counts and such would lead to some fragile, hard-to-construct and, I think, hard-to-read syntax that can fail even for slightly complicated queries.

So instead, we could issue the following query:

books = Book.objects.raw("""
  SELECT
    book.id, book.title,
    jacket.image AS jacket_image,
    author.name AS author_name
  FROM book
    NATURAL JOIN shelf NATURAL JOIN author NATURAL JOIN jacket
  WHERE shelf.position <= 10
""")

(note: I’ll use natural join throughout for simplicity, and short table names for that matter, but they don’t actually work as-is with Django’s table/column naming)

And then, a slightly modified template:

{% for book in books %}
  {{ book.jacket_image }} {{ book.title }} 
  by {{ book.author_name }}
{% endfor %}

would only issue 1 query, regardless of the number of books.

Adding further related information to the original template will dramatically increase the number of queries, whereas joining another table to the raw SQL version will add none. Similarly, returning 20 books instead of 10 would just about double the number of queries for the original version (so 41 for our example), whereas the raw one would still only issue a single query.

As well as avoiding query explosions, there are other benefits to using raw SQL. A full explanation needs another blog post, but some of the major ones are:

  • it’s the only way to do anything but the simplest of aggregation queries.
  • it only returns what is required (assuming you avoid SELECT *). Pulling every column, including large blobs, over the network when they’re not referenced wastes CPU, temporary sort/cache space and network bandwidth.
  • you can use easy-to-read comparisons (e.g. position <= 10 instead of shelf__position__lte = 10).
  • if you need/want, you can use database-specific features, such as function calls, range queries, NULL ordering, etc.
  • checking the query plan is easier – there’s no need to set a breakpoint and then print the queryset.query.
  • it returns consistent results, whereas multiple queries, even in the same transaction, can see different data depending on what’s been committed by concurrent users (unless we used the serializable isolation level, which we don’t).

Formatting

I’ve tried a number of ways to embed SQL queries into Django modules:

...raw("""
SELECT ...
FROM ...
WHERE ...
""")

is jarring.

...raw("""SELECT ...
          FROM ...
          WHERE ...
       """)

is better, but adds too much whitespace and makes it harder to edit.

...raw("""SELECT ..."""
       """FROM ..."""
       """WHERE ...""")

avoids the extra whitespace but doesn’t ensure enough whitespace unless you’re careful to add a space at the end of each line.

None are great and none of them provide an easy way to copy and paste the query, which is necessary to tune it and check its plan in a database client.

Elevating

For a better approach, I suggest creating an sql directory (alongside models.py – it is that important) and create python modules in there to hold the named query declarations, e.g. sql/books.py:

BOOK_DETAILS = """
SELECT
  book.id,
  book.title,
  jacket.image AS jacket_image,
  author.name AS author_name
FROM
  book
  NATURAL JOIN shelf
  NATURAL JOIN author
  NATURAL JOIN jacket
WHERE
  shelf.position <= 10
"""

then it can be called like this:

from sql.books import BOOK_DETAILS
 
books = Book.objects.raw(BOOK_DETAILS)

Clicking the BOOK_DETAILS reference in any decent IDE will take you to the query. Now that the query has a bit more respect it can more easily be copied and pasted into other tools, and formatted as you’d like, and re-used without repetition.

One more note about formatting. Although you could put Python comments before the declaration, I suggest keeping the query comments inside the query because they may be useful on the server, e.g. when viewing the database logs. So instead of:

#Get top 10 book summaries
#(including author and jacket details)
BOOK_DETAILS = """
SELECT
...
"""

use:

BOOK_DETAILS = """
/*
Get top 10 book summaries
(including author and jacket details)
*/
SELECT
...
"""

Downsides

There are some downsides with raw(), however.

  • You need some SQL knowledge of course, and an understanding of how Django links tables together, but I think you should have that anyway.
  • If your SQL does introduce some non-standard syntax (which shouldn’t be the case for straightforward joins) it could make your application less portable.  Although I’d argue that some features are worth that sacrifice, and would hope the likelihood of switching to a different database would be low if you’re already using a good one.
  • Even though raw() ensures only 1 query is used, that query must still be sent over the network and parsed every time it’s run.
  • The real problem is that raw() returns a RawQuerySet, not a QuerySet, so no further refinement methods can be applied (e.g. filter, order_by) – though arguably they’re better off being added to the SQL.

We can do better, as I’ll explain in part 2

About us: Isotoma is a bespoke software development company based in York and London specialising in web apps, mobile apps and product design. If you’d like to know more you can review our work or get in touch.

Using mock.patch in automated unit testing

Mocking is a critical technique for automated testing. It allows you to isolate the code you are testing, which means you test what you think are testing. It also makes tests less fragile because it removes unexpected dependencies.

However, creating your own mocks by hand is fiddly, and some things are quite difficult to mock unless you are a metaprogramming wizard. Thankfully Michael Foord has written a mock module, which automates a lot of this work for you, and it’s awesome. It’s included in Python 3, and is easily installable in Python 2.

Since I’ve just written a test case using mock.patch, I thought I could walk through the process of how I approached writing the test case and it might be useful for anyone who hasn’t come across this.

It is important to decide when you approach writing an automated test what level of the system you intend to test. If you think it would be more useful to test an orchestration of several components then that is an integration test of some form and not a unit test. I’d suggest you should still write unit tests where it makes sense for this too, but then add in a sensible sprinkling of integration tests that ensure your moving parts are correctly connected.

Mocks can be useful for integration tests too, however the bigger the subsystem you are mocking the more likely it is that you want to build your own “fake” for the entire subsystem.

You should design fake implementations like this as part of your architecture, and consider them when factoring and refactoring. Often the faking requirements can drive out some real underlying architectural requirements that are not clear otherwise.

Whereas unit tests should test very limited functionality, I think integration tests should be much more like smoke tests and exercise a lot of functionality at once. You aren’t interested in isolating specific behaviour, you want to make it break. If an integration test fails, and no unit tests fail, you have a potential hotspot for adding additional unit tests.

Anway, my example here is a Unit Test. What that means is we only want to test the code inside the single function being tested. We don’t want to actually call any other functions outside the unit under test. Hence mocking: we want to replace all function calls and external objects inside the unit under test with mocks, and then ensure they were called with the expected arguments.

Here is the code I need to test, specifically the ‘fetch’ method of this class:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
class CloudImage(object):
 
    __metaclass__ = abc.ABCMeta
 
    blocksize = 81920
    def __init__(self, pathname, release, arch):
        self.pathname = pathname
        self.release = release
        self.arch = arch
        self.remote_hash = None
        self.local_hash = None
 
    @abc.abstractmethod
    def remote_image_url(self):
        """ Return a complete url of the remote virtual machine image """
 
    def fetch(self):
        remote_url = self.remote_image_url()
        logger.info("Retrieving {0} to {1}".format(remote_url, self.pathname))
        try:
            response = urllib2.urlopen(remote_url)
        except urllib2.HTTPError:
            raise error.FetchFailedException("Unable to fetch {0}".format(remote_url))
        local = open(self.pathname, "w")
        while True:
            data = response.read(self.blocksize)
            if not data:
                break
            local.write(data)

I want to write a test case for the ‘fetch’ method. I have elided everything in the class that is not relevant to this example.

Looking at this function, I want to test that:

  1. The correct URL is opened
  2. If an HTTPError is raised, the correct exception is raised
  3. Open is called with the correct pathname, and is opened for writing
  4. Read is called successive times, and that everything returned is passed to local.write, until a False value is returned

I need to mock the following:

  1. self.remote_image_url()
  2. urllib2.urlopen()
  3. open()
  4. response.read()
  5. local.write()

This is an abstract base class, so we’re going to need a concrete implementation to test. In my test module therefore I have a concrete implementation to use. I’ve implemented the other abstract methods, but they’re not shown.

1
2
3
class MockCloudImage(base.CloudImage):
    def remote_image_url(self):
        return "remote_image_url"

Because there are other methods on this class I will also be testing, I create an instance of it in setUp as a property of my test case:

class TestCloudImage(unittest2.TestCase):

    def setUp(self):
        self.cloud_image = MockCloudImage("pathname", "release", "arch")

Now I can write my test methods.

I’ve mocked self.remote_image_url, now i need to mock urllib2.urlopen() and open(). The other things to mock are things returned from these mocks, so they’ll be automatically mocked.

Here’s the first test:

1
2
3
4
5
6
7
8
    @mock.patch('urllib2.urlopen')
    @mock.patch('__builtin__.open')
    def test_fetch(self, m_open, m_urlopen):
        m_urlopen().read.side_effect = ["foo", "bar", ""]
        self.cloud_image.fetch()
        self.assertEqual(m_urlopen.call_args, mock.call('remote_image_url'))
        self.assertEqual(m_open.call_args, mock.call('pathname', 'w'))
        self.assertEqual(m_open().write.call_args_list, [mock.call('foo'), mock.call('bar')])

The mock.patch decorators replace the specified functions with mock objects within the context of this function, and then unmock them afterwards. The mock objects are passed into your test, in the order in which the decorators are applied (bottom to top).

Now we need to make sure our read calls return something useful. Retrieving any property or method from a mock returns a new mock, and the new returned mock is consistently returned for that method. That means we can write:

1
m_urlopen().read

To get the read call that will be made inside the function. We can then set its “side_effect” – what it does when called. In this case, we pass it an iterator and it will return each of those values on each call.

Now we call call our fetch method, which will terminate because read eventually returns an empty string.

Now we just need to check each of our methods was called with the appropriate arguments, and hopefully that’s pretty clear how that works from the code above. It’s important to understand the difference between:

1
m_open.call_args

and

1
m_open().write.call_args_list

The first is the arguments passed to “open(…)”. The second are the arguments passed to:

1
local = open(); local.write(...)

Another test method, testing the exception is now very similar:

1
2
3
4
5
    @mock.patch('urllib2.urlopen')
    @mock.patch('__builtin__.open')
    def test_fetch_httperror(self, m_open, m_urlopen):
        m_urlopen.side_effect = urllib2.HTTPError(*[None] * 5)
        self.assertRaises(error.FetchFailedException, self.cloud_image.fetch)

You can see I’ve created an instance of the HTTPError exception class (with dummy arguments), and this is the side_effect of calling urlopen().

Now we can assert our method raises the correct exception.

Hopefully you can see how the mock.patch decorator saved me a spectacular amount of grief.

If you need to, it can be used as a context manager as well, with “with”, giving similar behaviour. This is useful in setUp functions particularly, where you can use the with context manager to create a mocked closure used by the system under test only, and not applied globally.

About us: Isotoma is a bespoke software development company based in York and London specialising in web apps, mobile apps and product design. If you’d like to know more you can review our work or get in touch.

What Heartbleed means for you

On the 8th April a group of security researchers published information about a newly discovered exploit in a popular encryption library. With some marketing panache, they called this exploit “Heartbleed”.

A huge number of Internet services were vulnerable to this exploit, and although many of them have now been patched many remain. In particular, this was an Open Source library and so many of the very largest and most popular sites were directly affected.

Attention on the exploit has so far focused on the possible use of the exploit to obtain “key material” from affected sites, but there are some more immediate ramifications and you need to act to protect yourself.

Unfortunately the attack will also reveal other random bits of webserver’s memory, which can include usernames, passwords and cookies. Obtaining this information will allow attackers to log into these services as you, and then conduct more usual fraud and identity theft.

Once the dust has settled (so later today on the 9th, or tomorrow on the 10th) you should go and change every single one of your passwords. Start with the passwords you’ve used recently and high value services.

It’s probably a good idea to clear all your cookies too once you’ve done this, to force you to re-login to every service with your new password.

You should also log out of every single service on your phone, and then re-login in, to get new session cookies. If you are particularly paranoid, wipe your phone and reinstall. Mobile app session cookies are likely to be a very popular vector for this attack.

This is an enormous amount of work, but you can use it as an opportunity to set some decent random passwords for every service and adopt a tool like LastPass, 1Password or KeePass while you are at it.

Most people are hugely vulnerable to password disclosure because they share passwords between accounts, and the entire world of black-hats are out there right now slurping passwords off every webserver they can get them from. There is going to be a huge spike in fraud and identity theft soon, and you want to make sure you are not a victim to it.

The Man-In-The-Middle Attack

In simple terms this would allow an attacker to impersonate the site’s SSL certificate, so they can show the padlock icon that demonstrates a secure connection, even though they control your connection.

They can only do this if they also manage to somehow make your browser connect to their computers for the request. This can normally only be done by either controlling part of your connection directly (hacking your router maybe), or by “poisoning” your access to the Domain Name Service with which you find out how to reach a site (there are many ways to do this, but none of them are trivial).

You can expect Internet security types to be fretting about this one for a long time to come, and there are likely to be some horrific exploits against some high-profile sites executed by some of the world’s most skilled hackers. If they do it well enough, we may never hear of it.

The impact of this exploit is going to have huge ramifications for server operators and system designers, but there is very little in practical terms that most people can mitigate this risk for their own browsing.

About us: Isotoma is a bespoke software development company based in York and London specialising in web apps, mobile apps and product design. If you’d like to know more you can review our work or get in touch.

Reviewing Django REST Framework

Recently, we used Django REST Framework to build the backend for an API-first web application. Here I’ll attempt to explain why we chose REST Framework and how successfully it helped us build our software.

Why Use Django REST Framework?

RFC-compliant HTTP Response Codes

Clients (javascript and rich desktop/mobile/tablet applications) will more than likely expect your REST service endpoint to return status codes as specified in the HTTP/1.1 spec. Returning a 200 response containing {‘status’: ‘error’} goes against the principles of HTTP and you’ll find that HTTP-compliant javascript libraries will get their knickers in a twist. In our backend code, we ideally want to raise native exceptions and return native objects; status codes and content should be inferred and serialised as required.

If authentication fails, REST Framework serves a 401 response. Raise a PermissionDenied and you automatically get a 403 response. Raise a ValidationError when examining the submitted data and you get a 400 response. POST successfully and get a 201, PATCH and get a 200. And so on.

Methods

You could PATCH an existing user profile with just the field that was changed in your UI, DELETE a comment, PUT a new shopping basket, and so on. HTTP methods exist so that you don’t have to encode the nature of your request within the body of your request. REST Framework has support for these methods natively in its base ViewSet class which is used to build each of your endpoints; verbs are mapped to methods on your view class which, by default, are implemented to do everything you’d expect (create, update, delete).

Accepts

The base ViewSet class looks for the Accepts header and encodes the response accordingly. You need only specify which formats you wish to support in your settings.py.

Serializers are not Forms

Django Forms do not provide a sufficient abstraction to handle object PATCHing (only PUT) and cannot encode more complex, nested data structures. The latter limitation lies with HTTP, not with Django Forms; HTTP forms cannot natively encode nested data structures (both application/x-www-form-urlencoded and multipart/form-data rely on flat key-value formats). Therefore, if you want to declaratively define a schema for the data submitted by your users, you’ll find life a lot easier if you discard Django Forms and use REST Framework’s Serializer class instead.

If the consumers of your API wish to use PATCH rather than PUT, and chances are they will, you’ll need to account for that in your validation. The REST Framework ModelSerializer class adds fields that map automatically to Model Field types, in much the same way that Django’s ModelForm does. Serializers also allow nesting of other Serializers for representing fields from related resources, providing an alternative to referencing them with a unique identifier or hyperlink.

More OPTIONS

Should you choose to go beyond an AJAX-enabled site and implement a fully-documented, public API then best practice and an RFC or two suggest that you make your API discoverable by allowing OPTIONS requests. REST Framework allows an OPTIONS request to be made on every endpoint, for which it examines request.user and returns the HTTP methods available to that user, and the schema required for making requests with each one.

OAuth2

Support for OAuth 1 and 2 is available out of the box and OAuth permissions, should you choose to use them, can be configured as a permissions backend.

Browsable

REST framework provides a browsable HTTP interface that presents your API as a series of forms that you can submit to. We found it incredibly useful for development but found it a bit too rough around the edges to offer as an aid for third parties wishing to explore the API. We therefore used the following snippet in our settings.py file to make the browsable API available only when DEBUG is set to True:

if DEBUG:
    REST_FRAMEWORK['DEFAULT_RENDERER_CLASSES'].append(
        'rest_framework.renderers.BrowsableAPIRenderer'
    )

Testing

REST Framework gives you an APITestCase class which comes with a modified test client. You give this client a dictionary and encoding and it will serialise the request and deserialise the response. You only ever deal in python dictionaries and your tests will never need to contain a single instance of json.loads.

Documentation

The documentation is of a high quality. By copying the Django project’s three-pronged approach to documentation – tutorial, topics, and API structure, Django buffs will find it familiar and easy to parse. The tutorial quickly gives readers the feeling of accomplishment, the high-level topic-driven core of the documentation allows readers to quickly get a solid understanding of how the framework should be used, and method-by-method API documentation is very detailed, frequently offering examples of how to override existing functionality.

Project Status

At the time of writing the project remains under active development. The roadmap is fairly clear and the chap in charge has a solid grasp of the state of affairs. Test coverage is good. There’s promising evidence in the issue history that creators of useful but non-essential components are encouraged to publish their work as new, separate projects, which are then linked to from the REST Framework documentation.

Criticisms

Permissions

We found that writing permissions was messy and we had to work hard to avoid breaking DRY. An example is required. Let’s define a ViewSet representing both a resource collection and any document from that collection:

views.py:

class JobViewSet(ViewSet):
    """
    Handles both URLS:
    /jobs
    /jobs/(?P&lt;id&gt;\d+)/$
    """
    serializer_class = JobSerializer
    permission_classes = (IsAuthenticated, JobPermission)
 
    def get_queryset(self):
        if self.request.user.is_superuser:
            return Job.objects.all()
 
        return Job.objects.filter(
            Q(applications__user=request.user) |
            Q(reviewers__user=request.user)
        )

If the Job collection is requested, the queryset from get_queryset() will be run through the serializer_class and returned as an HTTPResponse with the requested encoding.

If a Job item is requested and it is in the queryset from get_queryset(), it is run through the serializer_class and served. If a Job item is requested and is not in the queryset, the view returns a 404 status code. But we want a 403.

So if we define that JobPermission class, we can fail the object permission test, resulting in a 403 status code:

permissions.py:

class JobPermission(Permission):
    def get_object_permission(self, request, view, obj):
    if obj in Job.objects.filter(
        Q(applications__user=request.user) |
        Q(reviewers__user=request.user)):
        return True
    return False

Not only have we duplicated the logic from the view method get_queryset (we could admittedly reuse view.get_queryset() but the method and underlying query would still be executed twice), if we don’t then the client is sent a completely misleading response code.

The neatest way to solve this issue seems to be to use the DjangoObjectPermissionsFilter together with the django-guardian package. Not only will this allow you to define object permissions independently of your views, it’ll also allow you filter querysets using the same logic. Disclaimer: I’ve not tried this solution, so it might be a terrible thing to do.

Nested Resources

REST Framework is not built to support nested resources of the form /baskets/15/items. It requires that you keep your API flat, of the form /baskets/15 and /items/?basket=15.

We did eventually choose to implement some parts of our API using nested URLs however it was hard work and we had to alter public method signatures and the data types of public attributes within our subclasses. We required entirely highly modified Router, Serializer, and ViewSet classes. It is worth noting that REST Framework deserves praise for making each of these components so pluggable.

Very specifically, the biggest issue preventing us pushing our nested resources components upstream was REST Framework’s decision to make lookup_field on the HyperlinkedIdentityField and HyperlinkedRelatedField a single string value (e.g. “baskets”). To support any number of parent collections, we had to create a NestedHyperlinkedIdentityField with a new lookup_fields list attribute, e.g. ["baskets", "items"].

Conclusions

REST Framework is great. It has flaws but continues to mature as an increasingly popular open source project. I’d whole-heartedly recommend that you use it for creating full, public APIs, and also for creating a handful of endpoints for the bits of your site that need to be AJAX-enabled. It’s as lightweight as you need it to be and most of what it does, it does extremely well.

About us: Isotoma is a bespoke software development company based in York and London specialising in web apps, mobile apps and product design. If you’d like to know more you can review our work or get in touch.

Django Class-Based Generic Views: tips for beginners (or things I wish I’d known when I was starting out)

Django is renowned for being a powerful web framework with a relatively shallow learning curve, making it easy to get into as a beginner and hard to put down as an expert. However, when class-based generic views arrived on the scene, they were met with a lukewarm reception from the community: some said they were too difficult, while others bemoaned a lack of decent documentation. But if you can power through the steep learning curve, you will see they are also incredibly powerful and produce clean, reusable code with minimal boilerplate in your views.py.

So to help you on your journey with CBVs, here are some handy tips I wish I had known when I first started learning all about them. This isn’t a tutorial, but more a set of side notes to refer to as you are learning; information which isn’t necessarily available or obvious in the official docs.

Starting out

If you are just getting to grips with CBVs, the only view you need to worry about is TemplateView. Don’t try anything else until you can make a ‘hello world’ template and view it on your dev instance. This is covered in the docs. Once you can handle that, keep reading the docs and make sure you understand how to subclass a ListView and DetailView to render model data into a template.

OK, now we’re ready for the tricky stuff!

Customising CBVs

Once you have the basics down, you will find that most of your work revolves around subclassing the built-in class-based generic views and overriding one or two methods. At the start of your journey, it is not very obvious what to override to achieve your goals, so remember:

  • If you need to get some extra variables into a template, use get_context_data()
  • If it is a low-level permissions check on the user, you probably want dispatch()
  • If you need to do a complicated database query on a DetailView, ListView etc, try get_queryset()
  • If you need to pass some extra parameters to a form when constructing it via a FormView, UpdateView etc, try get_form() or get_form_kwargs()

ccbv.co.uk

If you haven’t heard of ccbv.co.uk, go there and bookmark it now. It is possibly the most useful reference out there for working with class-based generic views. When you are subclassing views and trying to work out which methods to override, and the official docs just don’t seem to cut it, ccbv.co.uk has your back. If it wasn’t for that site, I think we would all be that little bit grumpier about using CBVs.

Forms

CBVs cut a LOT of boilerplate code out of the process of writing forms. You should already be using ModelForms wherever you can to save effort, and there are generic class-based views available (CreateView/UpdateView) that allow you to plug in your ModelForms and reduce your boilerplate code even further. Always use this approach if you can. If your form does not map to a particular model in the database, use FormView.

Permissions

If you want to put some guards on your view e.g. check if the user is logged in, check they have a certain permission etc, you will usually want to do it on the dispatch() method of the view. This is the very first method that is called in your view, so if a user shouldn’t have access then this is the place to intercept them:

1
2
3
4
5
6
7
8
9
10
from django.core.exceptions import PermissionDenied
from django.views.generic import TemplateView
 
class NoJimsView(TemplateView):
    template_name = 'secret.html'
 
    def dispatch(self, request, *args, **kwargs):
        if request.user.username == 'jim':
            raise PermissionDenied # HTTP 403
        return super(NoJimsView, self).dispatch(request, *args, **kwargs)

Note: If you just want to restrict access to logged-in users, there is a @require_login decorator you can add around the dispatch() method. This is covered in the docs, and it may be sufficient for your purposes, but I usually end up having to modify it to handle AJAX requests nicely as well.

Multiple inheritance

Once you start subclassing and overriding generic views, you will probably find yourself needing multiple inheritance. For example, perhaps you want to extend your “No Jims” policy (see above) to several other views. The best way to achieve this is to write a small Mixin and inherit from it along with the generic view. For example:

1
2
3
4
5
6
7
8
9
10
11
class NoJimsMixin(object):
    def dispatch(self, request, *args, **kwargs):
        if request.user.username == 'jim':
            raise PermissionDenied # HTTP 403
        return super(NoJimsMixin, self).dispatch(request, *args, **kwargs)
 
class NoJimsView(NoJimsMixin, TemplateView):
    template_name = 'secret.html'
 
class OtherNoJimsView(NoJimsMixin, TemplateView):
    template_name = 'other_secret.html'

Now you have entered the world of python’s multiple inheritance and Method Resolution Order. Long story short: order is important. If you inherit from two classes that both define a foo() method, your new class will use the one from the parent class that was first in the list. So in the above example, in your NoJimsView class, if you listed TemplateView before NoJimsMixin, django would use TemplateView’s dispatch() method instead of NoJimsMixin’s. But in the above example, not only will your NoJimsMixin’s dispatch() get called first, but when you call super(NoJimsMixin, self).dispatch(), it will call TemplateView’s dispatch() method. How I wish I had known this when I was learning about CBVs!

View/BaseView/Mixin

As you browse around the docs, code and ccbv.co.uk, you will see references to Views, BaseViews and Mixins. They are largely a naming convention in the django code: a BaseView is like a View except it doesn’t have a render_to_response() method so it won’t render a template. Almost all Views inherit from a corresponding BaseView and add a render_to_response() method e.g. DetailView/BaseDetailView, UpdateView/BaseUpdateView etc. This is useful if you are subclassing from two Views, because it means you can choose which one renders the final output. It is also useful if you want to render to JSON, say in an AJAX response, and don’t need HTML rendering at all (in this case you’d need to provide your own render_to_response() method that returns a HttpResponse).

Mixin classes provide a few helper methods, but can’t be used on their own, as they are not full Views.

So in short, if you are just subclassing one thing, you will usually subclass a View. If you want to manually render a non-HTML response, you probably need a BaseView. If you are inheriting from multiple classes, you will need a combination of some or all of View, BaseView and Mixin.

A final note on AJAX

Django is not particularly good at serving AJAX requests out of the box, and once you start trying to use CBVs to do AJAX form submissions, things get quite complicated.

The docs offer some help with this in the form of a Mixin you can copy and paste into your code, which gives you JSON responses instead of HTML. You will also need to pass CSRF tokens in your POST requests, and again there is an example of how to do this in the docs.

This should be enough to get you started, but I often find myself having to write some extra Mixins, and that is before even considering the javascript code on the front end to send requests and parse responses, complete with handling of validation and transport errors. Here at Isotoma, we are working on some tools to address this, which we hope to open-source in the near future. So watch this space!

Conclusion

In case you hadn’t worked it out, we at Isotoma are fans of Django’s class-based generic views. They are definitely not straightforward for newcomers, but hopefully with the help of this article and other resources (did I mention ccbv.co.uk?), it’ll be plain sailing before you know it. And once you get what they’re all about, you won’t look back.

About us: Isotoma is a bespoke software development company based in York and London specialising in web apps, mobile apps and product design. If you’d like to know more you can review our work or get in touch.