Author Archives: Wes Mason

Scaffolding template tags for Django forms

We love Django here at Isotoma, and we love using Django’s awesome form classes to generate self-generating, self-validating, [X]HTML forms.

However, in practically every new Django project I find myself doing the same thing over and over again (and I know others do too): breaking the display of a Django form instance up into individual fields, with appropriate mark-up wrappers.

Effectively I keep recreating the output of BaseForm.as_p/as_ul/as_table with template tags and mark-up.

For example, outputting a login form, rather than doing:

{{ form.as_p }}

We would do:

<p>
{% if form.username.errors %}
  {% for error in form.username.errors %}
    {{ error }}
  {% endfor %}
{% endif %}
{{ form.username.label }} {{ form.username }}
</p>
<p>
{% if form.password.errors %}
  {% for error in form.password.errors %}
    {{ error }}
  {% endfor %}
{% endif %}
{{ form.password.label }} {{ form.password }}
</p>

Why would you want to do this? There are several reasons, but generally it’s to apply custom mark-up to a particular element (notice I said mark-up, not styling, that can be done with the generated field IDs), as well as completely customising the output of the form (using <div>‘s instead etc.), and also because some designers tend to prefer this way of looking at a template.

“But”, you might say, “Django already creates all this for us with the handy as_p/as_ul/as_table methods, can you just take the ouput from that?”
Well, yes, in fact on a project a couple of weeks ago that’s exactly what I did, outputting as_p in a template, and then editing the source chucked out in a browser.
Which gave me the idea to create a simple little tool to do this for me, but with the Django template tags for dynamically outputting the field labels and fields themselves.

I created django-form-scaffold to do just this, and now I can do this from a Python shell:

>>> from dfs import scaffold
>>> from MyProject.MyApp.forms import MyForm
>>> form = MyForm()
>>> # We can pass either an instance of our form class
>>> # or the class itself, but better to pass an instance.
>>> print scaffold.as_p(form)

{% if form.email.errors %}{% for error in form.email.errors %}
{{ error }}{% endfor %}{% endif %}
<p>{{ form.email.label }} {{ form.email }}</p>
{% if form.password1.errors %}{% for error in form.password1.errors %}
{{ error }}{% endfor %}{% endif %}
<p&gtl{{ form.password1.label }} {{ form.password1 }}</p>
{% if form.password2.errors %}{% for error in form.password2.errors %}
{{ error }}{% endfor %}{% endif %}
<p>{{ form.password2.label }} {{ form.password2 }}</p>

Copy and paste this into a template, tweak, and Robert’s your mother’s brother.

As well as as_p(), the dfs.scaffold module also has the equivalent functions as_ul(), as_table, and an extra as_div() function.

Querying Webtrends ODBC from the command line with WebtrendsQT

As I alluded to yesterday, and in my post about SQLAWebtrends, I’ve recently been doing a lot of work with the Webtrends analytics service, concerned mostly with getting data out of it via the old Windows ODBC drivers.

While turn around on new data available from reports could cause Methuselah to yawn, it could still be exceedingly time consuming loading up a spreadsheet app, defining queries in an ODBC query builder, and waiting for data to populate sheets; or at best writing several Python functions to query the last data; I would still have to spend tedious amounts of time tweaking and re-tweaking queries for different reports and/or datasets.

This lead me to make WebtrendsQT, a psql/mysql-like command line query tool for Webtrends using pyODBC.

WebtrendsQT is mostly just the ODBC extra tool provided by pyODBC, with some WT-specific changes. Namely the introduction of a “\p” command, which issues the {Call wtGetProfileList()} stored procedure against the WTSystem schema (via the system_cursor property), returning a list of profiles.
Similarly do_l (the handler for “\l”) instead of listing real schemas, lists the Webtrends ODBC equivalent templates.

do_c (“\c”) will work as you’d expect, taking a “schema” (e.g. template), and changing cursor to point to it, but also takes profile GUID as an optional first option to switch both profile and template (profiles define the data source and which report templates are available).

It took me some time to figure out that PyODBC‘s lovely columns() method wouldn’t work with the Webtrends driver, as some metadata isn’t provided by the driver and causes a segfault. Instead my hack is to use the DB API Cursor.description to get name and type details for columns on a table, unfortunately in order to get this information I need a cursor that specifically targets the table in question; and to get around this I make a simple query against the table that won’t return any information, but will still return a cursor:

@memoized()
def get_columns(self, name):
    columns = [['Column name', 'Type', 'Size',]]
    row = self.cursor.execute(
        'SELECT * FROM %s LIMIT 0' % (name,)
    ).fetchone()
    for r in row.cursor_description:
        columns.append(
            [r[0],
            self.db_types[r[1]],
            r[3],]
        )
    return columns

cursor_description is PyODBC’s special “always available even after query-set has been closed” reference to the cursor.description instance.

Unlike pyDBCLI.extras.odbc, WebtrendsQT takes a set of arguments rather than a single DSN string, due to the ODBC driver requiring a specific set of details to connect.

You most likely just want to install and run the tool under Windows, which if you have any experience with Python on Windows should be easy enough using easy_install or the included setup.py; if however you don’t have any Python-Windows experience and just want to get up and running with WebtrendsQT, the FAQ has a 5 step simple guide, including a pre-rolled pair of Windows scripts, that will install everything and create a batch script with all the Python paths set up to use.
When installed just type wtqt in the cmd.exe Window, provided by the batch script, and away you go.

C:\Users\test\Desktop> wtqt

ERROR: Must have a profile GUID, -p

Usage: wtqt.py [-u <user>] [-p <pass>] -d <system DSN> -h <host> [-P <port>] -t <template> -p <profile>

Options:
  -d, --systemdsn: Predefined system DSN
  -p, --profile : Webtrends profile GUID
  -t, --template : Template/schema
  -h, --host : Webtrends web instance
  -P, --port : Optional server port (default: 80)
  -u, --username: Optional username
  -k, --password: Optional password

Writing interactive command line DB query tools with pyDBCLI

While some people can’t navigate a relational database without reaching for a GUI, the vast majority of us spend a good proportion of our lives inside interactive command line interfaces, such as psql or mysql.

What do you do, however, if there isn’t a CLI query tool available for the DB you’re working with?
I had this exact same problem recently with a project, the main reason for the lack of any such tooling is because I wasn’t actually dealing with a real DB at all, but an ODBC interface to a web service that exposed reporting data as if it were tables in a DB. Rather than spend the rest of my life messing around with queries in a hooked up spreadsheet, or repeatedly writing one off Python snippets, I decided to write my own psql-like tool.

Thus the first version of pyDBCLI was born; well not really, my first tool really only handled querying Webtrends (I’ll save that for another post), while pyDBCLI is a base class for making such tools as long as you have a DB API compatible cursor to query.
pyDBCLI is based on the fantastic cmd.Cmd, so you can extend pretty much exactly as you would would Cmd, except with some extra properties such as cursor and multi_prompt are provided.

In order to make a Cmd based tool behave more like psql I ended up overriding the parseline method with regular expressions to handle escaped commands such as “\d” and “\c”, fuzzing them if not escaped, or un-escaping them if escaped so that Cmd’s unmodified parseline method can handle parsing and dispatching to defined do_* methods.

The other main change was modifying the default method to dispatch command lines to the a query method for querying against the DB API cursor; as well as this default and several other commands will detect an unfinished SQL query and wait for a finishing character (“;” by default) before sending it (or doing anything) else. This is what the multi_prompt property is for, the prompt property is replaced with multi_prompt when a query spans more than one line, and is set back again when the query is finished and executed.

2 example tools are bundled with pyDBCLI, in the extras package:

  • odbc – a tool to query an ODBC exposed data source, using PyODBC; takes PyODBC compatible DSN strings.
  • litecli – a tool to query a SQLite database; SQLite has it’s own CLI tool to do this, which is very well rounded and much better than litecli, but this is provided as a fairly functional example tool.

Tomorrow I’ll discuss the original reason I whipped up pyDBCLI: creating a tool for querying Webtrends, via ODBC, quickly and with more ease.

Querying Webtrends analytics via ODBC with SQLAlchemy

Webtrends is a web traffic analytics package, similar to Google Analytics. Recently we had the requirement of being able to pull data out of reports on a Webtrends instance.
Luckily enough they have a nice RESTful data extraction API; not so luckily it is only available for Webtrends Analytics 9 instances, while we were limited in our requirements to Webtrends 8.

Prior to Webtrends 9 the official data extraction method is a Windows-only ODBC driver, primarily used for connecting Excel spreadsheets and Microsoft ADO applications. The driver provides a pseudo-relational-database interface to pre-made reports on a Webtrends instance, which you can query using a simple SQL-subset.
Notice I use qualifiers like “pseudo” and “interface”, that’s because what’s really going on in the background is the driver makes an HTTP call (with details such as the SQL being sent) to a web service on the Webtrends instance, and the web service returns some binary data representing data in the queried report, which the driver then returns as a table. The reports themselves aren’t actually real tables in a real database, although I’m sure this is how they’re represented somewhere in the Webtrends system, what we get back is a set of aggregated data normally used to display pretty graphs and bar charts in the web interface.

To make life easier for us, as we were already using SQLAlchemy for querying PostgreSQL tables, and we would need to mock our Webtrends data at some point, it made sense to be able to use SQLAlchemy for all the data objects; with that in mind I made the SQLAWebtrends dialect.
After installing SQLAWebtrends you can create ORM classes matching the Webtrends reports you want to query, and then, using a specially formatted DSN, run queries against them as you would any other DB.

Considerations in making a dialect for Webtrends:

  • Nearly all the special features of SQLAlchemy from full unicode support, to field binding and various row counting hacks, need to be disabled as the feature-set provided by the Windows ODBC driver are extremely limited.
  • Method for getting meta-data such as table names and columns needed to be overridden and done using Microsoft ADO compatible method.
  • Some combination of PyODBC’s column pre-binding and the Webtrends driver’s complete lack of features means any attempt at binding will fail, so being executing queries I needed to “unbind” them, replacing ? placeholders with actual data, and relying on inbuilt filtering method to provide any sort of field escaping/filtering.
  • LIMIT clauses are also extremely, erm, limited for want of better words. So that needed overriding too.
  • Finally, SQLAlchemy likes to wrap every value, include names, in quotes; Webtrends doesn’t like this, and quite frankly borks, so we disable this functionality too.
  • Bonus point: PyODBC rocks.

Caveats for use:

  • Your SQLAlchemy models for Webtrends reports shouldn’t include primary key columns, because frankly there probably aren’t any unique primary keys in the reports, but this doesn’t matter as SQLAlchemy won’t care as long as Webtrends doesn’t complain (which it won’t).
  • As with any other SQLAlchemy model you can call properties in the ORM class anything you like, but the underlying table column names need to match up to the column names in the Webtrends report.
  • The ODBC driver and web service don’t support JOINs, so you can’t use these with your ORM models either.
  • The iterator wrapper around the PyODBC cursor instance returned by queries will only ever returns one row unless you call .yield_per(1) on the query-set. I haven’t had time to figure out why this is the case, but I suspect it’s something to do with row pre-buffering, which is disabled as a consequence of yield_per.
  • Every now and again you’ll see rows with lots of blank values in them, except that any number values (measures in Webtrends) will be higher. If you look closely these are actually sums of all the values following it up until the next row of seemingly blank data. These are aggregated summary rows, displaying sums of the data for that particular subset of data (depending on which field is the “dimension” for the report, a sort of primary key used in creating reports). Unless you’re after this data as well, I find the best thing is to just do a quick check for so many blank fields and skip these rows.

Example using models and running a query:

from sqlalchemy.orm import mapper, sessionmaker
from sqlalchemy import create_engine
from sqlalchemy import String, MetaData, Column, Table

# You would probably setup your username, password
# host etc. here, including the profile and template
# you want to query.

# Create the DB connection
engine = create_engine(
    "webtrends+pyodbc://%s:%s@%s:80/%s?dsn=Webtrends&profile_guid=%s" %
        {user, password, host, template, profile}
)
metadata = MetaData(bind=engine)
Session = sessionmaker(bind=engine)
session = Session()

# Table schema
wt_user_report = Table('UsersByPages', metadata,
    Column('User', String, nullable=True),
    Column('PagesURLs', String, nullable=True),
    Column('PageHits', String, nullable=True),
    Column('TimePeriod', String, nullable=True),
    Column('StartDate', String, nullable=True),
    Column('EndDate', String, nullable=True)
)

# ORM class
class WTUserReport(object):
    pass
mapper(WTUserReport, wt_user_report)

# Create a query
query = session.query(WTUserReport).filter(
    TimePeriod="2010.m06.d22"
).yield_per(1) # Remember we need this for the iterator to work

# Iterate over the query-set and print some columns
for r in query:
    print "User %s hit %s %s times" % (
        r.User,
        r.PagesURLs,
        r.PageHits,
)

Getting the name of the current view in a Django template

Recently I have been working with generalised/reusable Django base templates, and I had the need to know the name of the current view using the template.
As it turns out, there currently aren’t any simple methods of doing this out there, and some of the nicer ones at their best make use of monkey patching.

After some playing around I found the cleanest, and quite simple, way is to use a custom template context processor.
A context processor provides extra values to bind to a template instance (it’s context; similar to including items in the data dict when calling render_to_response()).

By using a custom context processor we can assign a ‘current_view’ value to the template context, containing the namespaced name of the current view.
That just leaves one problem: how do we find the current view from within the request context.
inspect to the rescue!

Update: see Alexander Dutton’s comment for the even cleaner way by using middleware instead of stack inspection.

With inspect.stack() and inspect.getmodule() we can both grab the a stack of frames (e.g. the execution frames up to and including the current execution frame, which should be our context processor instance), and the module the frame is in.

from inspect import stack, getmodule

def ContextWithView(request):
    """Template context with current_view value,
    a string with the full namespaced django view in use.
    """
    # Frame 0 is the current frame
    # So assuming normal usage the frame of the view
    # calling this processor should be Frame 1
    name = getmodule(stack()[1][0]).__name__
    return {
        'current_view': "%s.%s" % (name, stack()[1][3]),
    }

To use this in a view, just import it and then pass it as the context in your render_to_response() call:

"""main/my_site/views/misc.py - misc views"""

from main.my_site.context_processors import ContextWithView

def my_view(request):
    # Do some stuff
    return render_to_response(
        'my_template.html',
        {},
        context_instance=ContextWithView(request)
    )

From within a template you can then use current_view, which for the above should output something like “main.my_site.views.misc.my_view”

Misc.

If you’re using RequestContext with your templates to make sure request, user etc. are always available in your templates, you can chain context processor calls like so:

from inspect import stack, getmodule
from django.template import RequestContext

def ContextWithView(request):
    """Template context with current_view value,
    a string with the full namespaced django view in use.
    """
    d = RequestContext(request)
    # Frame 0 is the current frame
    # So assuming normal usage the frame of the view
    # calling this processor should be Frame 1
    name = getmodule(stack()[1][0]).__name__
    d['current_view'] = "%s.%s" % (name, stack()[1][3])
    return d

If, like me, you then need to render a reverse URL using the ‘current_view’ variable, you’ll find the the {% url %} tag in Django only works on static strings, and not variables, so we need to make a custom tag to do this for us:

"""main/my_site/templatetags/extras.py
"""

from django.core.urlresolvers import reverse
from django import template

register = template.Library()

@register.simple_tag
def var_url(view, *args, **kwargs):
    return reverse(view, args=args, kwargs=kwargs)

{% comment %}
To use {% var_url %} in your template
just import and use as you would {% url %}
{% endcomment %}

{% load extras %}

{% block content %}
The URL for this view is <em>{% var_url current_view %}</em>.
{% endblock content %}

Of Python, memcached and decorators: easy-peasy function caching

Following on from channam’s Memcached in 2 minutes, I’ve been working on a decorator to make life even simpler than this, memorised.
The most popular use-case for using memcached in Python apps is to cache the return value of a function or method. Over and over again you’ll find yourself doing something like:

mc = memcached.Client(['localhost:11211'])
def get_something(mc=mc):
    value = mc.get('something')
    if value is not None:
        return value
    else:
        # Do something else ...
        return 'hello world'

So in the interests of DRY, why not reduce that down to a reusable pattern?
We can do this using a decorator, to replace our function with a function that instead checks for the existance of some sort of key, returns the value, otherwise delegates to the actual function for output (and then caches this output back into memcache).

The simplest way to add reusable implementations of patterns such as these onto existing functions in Python is to use a decorator, effectively a function wrapper that replaces (or returns a replacement) function in place of the original call.
How does this help? Well using this we can intercept a call to the decorated function, generate a signature for use as a key in memcache, check if the item is available in the cache, if so return that, otherwise grab the output from the function, pop it into memcache using our generated key, and finally return this value.

The biggest challenge we face is generating the unique (at least to that particularly function call) key to reference in memcache. The way I’ve found best to do this is to use a combination of module, class name and key attributes (if it’s a method of an instance or decorated with @classmethod), function name, and call arguments, in this form: <module>.<class>[<key attributes>]::(<arguments>)

In order to do this we need information about both the arguments, and if it’s a method we also need information about the class the method is bound to. Ordinarily these are both easy tasks using the inspect module, and the im_self attribute (funnily enough, referencing self) that bound methods contain.
However, a quick explanation of how the @<decorator> shortcut tag works, and indeed how decorators work, reveals a slight kink in this assumption. For example take the following:

class Test:
    @testfunc
    def test(self, arg1):
        pass

This simple bit of syntactic sugar is actually equal in functionality to the following (which is how you had to do it before Python 2.4, see PEP-318):

class Test:
    def test(self, arg1):
        pass
    test = testfunc(test)

To explain further, as I said before, decorators are just function which take other function instances and wrap around or replace them, and they are applied at the time of definition, and not as you might think at first calltime.
This is the reason the decorator function returns a function, as it is first called and instanced at this point, but any arguments to the function it is wrapping are passed at call time as a call to the function returned by the decorator.

If this confuses you, don’t worry, it’s not actually that important right now, except for the fact that because the function instance is not passed in at calltime, it means it is not bound, and loses it’s frame (e.g. how it is called, from which instance etc.).
This means we can no longer use im_self and several of the class functions in the inspect module. What we can do however is cheat and use the fact that bound methods always pass in their bound object instance or class instance as the first argument of a call, the ‘self’ argument.

memorised.decorators.memorise() uses the following trick, to first check if ‘self’ or ‘cls’ (the standard first parameter of an @classmethod) is there, and then using the *args list of passed in arguments to access the first parameter and grab either .__class__._name for an object instance’s class name or .__name__ for class instances:

# Get the list of arg names from func_code
argnames = fn.func_code.co_varnames[:fn.func_code.co_argcount]


. . .

if classmethod:
    # Get the class name from the cls argument
    class_name = args[0].__name__
else:
    # Get the class name from the self argument
    class_name = args[0].__class__.__name__

By then merging *args and **kwargs, we can build a hash key of this particular function call. Next just create a handy MD5 hash of this string using hashlib.md5, and then do our memcache, checks pretty much as above in the first example.

Using memorise() to replace the first example, we get:

mc = memcached.Client(['localhost:11211'])
@memorise(mc=mc)
def get_something():
    return 'hello world'

Notice I’m still defining the mc variable to be a memcached.Client instance, memorise() does handle do this itself, either by using the default localhost:11211 server setting or by accepting a list of servers (via an argument named ‘mc_servers’). However, this isn’t ideal as the memcached.Client instance would be created every time a function definition is decorated with memorise() (which could be lots), so best to pass an instance in each time.
Not to mention the fact using dependency injection like this over any other way of keeping the instance (e.g. singleton) is much cleaner.

Another point of interest is that we need to always include the call parenthesis even when not passing in any arguments to memorise(), e.g. @memorise() and not @memorise, as you would expect from decorators such as @classmethod. This is best of the way arguments are passed to both a decorator, and then to the function being decorated. There are workarounds for this problem, but up until now I haven’t seen one that can be used with class-based decorators (which memorise() is). I hope to solve this in a future release, so expect a follow up post on using optional arguments with class-based decorators sometime in the near future.

Finally I’ll finish with a more realistic example of using this to decorate methods on a Django Model:

class BlogUser(User):
    objects = UserManager()
    
    def __unicode__(self):
        return u'%s' self.get_full_name();

    @property
    @memorise(parent_keys=['id'], mc=mc)
    def posts(self):
        Post.objects.filter(creator=self)

And there you have it, any posts by that user will be cached for as long as memcached’s cachetime is set, or until memorised.utils.uncache() is used to clear down the cache for that method.