Tag Archives: Python

Writing interactive command line DB query tools with pyDBCLI

While some people can’t navigate a relational database without reaching for a GUI, the vast majority of us spend a good proportion of our lives inside interactive command line interfaces, such as psql or mysql.

What do you do, however, if there isn’t a CLI query tool available for the DB you’re working with?
I had this exact same problem recently with a project, the main reason for the lack of any such tooling is because I wasn’t actually dealing with a real DB at all, but an ODBC interface to a web service that exposed reporting data as if it were tables in a DB. Rather than spend the rest of my life messing around with queries in a hooked up spreadsheet, or repeatedly writing one off Python snippets, I decided to write my own psql-like tool.

Thus the first version of pyDBCLI was born; well not really, my first tool really only handled querying Webtrends (I’ll save that for another post), while pyDBCLI is a base class for making such tools as long as you have a DB API compatible cursor to query.
pyDBCLI is based on the fantastic cmd.Cmd, so you can extend pretty much exactly as you would would Cmd, except with some extra properties such as cursor and multi_prompt are provided.

In order to make a Cmd based tool behave more like psql I ended up overriding the parseline method with regular expressions to handle escaped commands such as “\d” and “\c”, fuzzing them if not escaped, or un-escaping them if escaped so that Cmd’s unmodified parseline method can handle parsing and dispatching to defined do_* methods.

The other main change was modifying the default method to dispatch command lines to the a query method for querying against the DB API cursor; as well as this default and several other commands will detect an unfinished SQL query and wait for a finishing character (“;” by default) before sending it (or doing anything) else. This is what the multi_prompt property is for, the prompt property is replaced with multi_prompt when a query spans more than one line, and is set back again when the query is finished and executed.

2 example tools are bundled with pyDBCLI, in the extras package:

  • odbc – a tool to query an ODBC exposed data source, using PyODBC; takes PyODBC compatible DSN strings.
  • litecli – a tool to query a SQLite database; SQLite has it’s own CLI tool to do this, which is very well rounded and much better than litecli, but this is provided as a fairly functional example tool.

Tomorrow I’ll discuss the original reason I whipped up pyDBCLI: creating a tool for querying Webtrends, via ODBC, quickly and with more ease.

On Ubuntu Python, Exceptions and unnecessary imports

A few days ago Alexander Limi (one of Plone’s founders) tweeted the following:

Ubuntu Python: Raise an exception, import 190 modules: http://bit.ly/bCxlhC – this is why you don’t want to use the system Python.

Now this gets my goat on a few points. First up, why the hell would I not want to use the system python? If I’m using any sane distribution I’ll have package management and security updates, and any flaw in Python will be patched, packaged and tested by people that are far smarter than me. Upgrading the Python that ships with the Plone Unified Installer just isn’t going to be as easy, however you play it. And that’s without the risk of the Plone community moving on to more exciting things, leaving their version of Python unsupported.

Secondly, there’s a fatal flaw in the original blog post to which limi refers. Yes, on the desktop, Ubuntu imports 190 packages when an exception is raised. As the author explains this is to enable Apport to provide as much information to the Ubuntu devs about application failures. What the author does not mention is that this doesn’t happen on the Server edition of Ubuntu. Why would it? Apport is designed to handle desktop application failures and to improve the end user experience. It isn’t installed by default on the server edition, because it isn’t needed.

On my Karmic desktop:

Python 2.6.4 (r264:75706, Dec  7 2009, 18:43:55) 
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> len(sys.modules)
35
>>> raise KeyError
Traceback (most recent call last):
  File "", line 1, in 
KeyError
>>> len(sys.modules)
225

On my Karmic server:

Python 2.6.4 (r264:75706, Dec  7 2009, 18:43:55) 
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> len(sys.modules)
32
>>> raise KeyError
Traceback (most recent call last):
  File "", line 1, in 
KeyError
>>> len(sys.modules)
32

Being quick to condemn Ubuntu, and their packaging of Python, doesn’t do anyone any good. Think before you tweet. And don’t go live on a desktop distro.

Getting the name of the current view in a Django template

Recently I have been working with generalised/reusable Django base templates, and I had the need to know the name of the current view using the template.
As it turns out, there currently aren’t any simple methods of doing this out there, and some of the nicer ones at their best make use of monkey patching.

After some playing around I found the cleanest, and quite simple, way is to use a custom template context processor.
A context processor provides extra values to bind to a template instance (it’s context; similar to including items in the data dict when calling render_to_response()).

By using a custom context processor we can assign a ‘current_view’ value to the template context, containing the namespaced name of the current view.
That just leaves one problem: how do we find the current view from within the request context.
inspect to the rescue!

Update: see Alexander Dutton’s comment for the even cleaner way by using middleware instead of stack inspection.

With inspect.stack() and inspect.getmodule() we can both grab the a stack of frames (e.g. the execution frames up to and including the current execution frame, which should be our context processor instance), and the module the frame is in.

from inspect import stack, getmodule

def ContextWithView(request):
    """Template context with current_view value,
    a string with the full namespaced django view in use.
    """
    # Frame 0 is the current frame
    # So assuming normal usage the frame of the view
    # calling this processor should be Frame 1
    name = getmodule(stack()[1][0]).__name__
    return {
        'current_view': "%s.%s" % (name, stack()[1][3]),
    }

To use this in a view, just import it and then pass it as the context in your render_to_response() call:

"""main/my_site/views/misc.py - misc views"""

from main.my_site.context_processors import ContextWithView

def my_view(request):
    # Do some stuff
    return render_to_response(
        'my_template.html',
        {},
        context_instance=ContextWithView(request)
    )

From within a template you can then use current_view, which for the above should output something like “main.my_site.views.misc.my_view”

Misc.

If you’re using RequestContext with your templates to make sure request, user etc. are always available in your templates, you can chain context processor calls like so:

from inspect import stack, getmodule
from django.template import RequestContext

def ContextWithView(request):
    """Template context with current_view value,
    a string with the full namespaced django view in use.
    """
    d = RequestContext(request)
    # Frame 0 is the current frame
    # So assuming normal usage the frame of the view
    # calling this processor should be Frame 1
    name = getmodule(stack()[1][0]).__name__
    d['current_view'] = "%s.%s" % (name, stack()[1][3])
    return d

If, like me, you then need to render a reverse URL using the ‘current_view’ variable, you’ll find the the {% url %} tag in Django only works on static strings, and not variables, so we need to make a custom tag to do this for us:

"""main/my_site/templatetags/extras.py
"""

from django.core.urlresolvers import reverse
from django import template

register = template.Library()

@register.simple_tag
def var_url(view, *args, **kwargs):
    return reverse(view, args=args, kwargs=kwargs)

{% comment %}
To use {% var_url %} in your template
just import and use as you would {% url %}
{% endcomment %}

{% load extras %}

{% block content %}
The URL for this view is <em>{% var_url current_view %}</em>.
{% endblock content %}

Monitoring Your Power Usage

I was lucky enough to attend LUG Radio and Ogg Camp recently and was inspired by a talk given by Andy Standford-Clark and his house that twitters. I had been sent a power monitor after switching power supplier, but never done much with it. Watching Andy’s energetic performance inspired my to dig it out and have a go at getting the data off it. Also the thought of adding twitter to anything was too strong to resist.

My unit is a branded version of Classic Current Cost unit with a standard transmitter. The first hurdle was to get the transmitter and the monitor talking again. The monitor has three buttons at the base, pressing and holding the middle button syncs the unit to the transmitter. It is possible to associate more than one transmitter to the monitor, the Classic model I believe can talk to 3 transmitters.

Next issue was connecting it to a PC, the unit didn’t come with a data cable. The output from the monitor is slow trickle of XML over a RJ45 serial connector. I found a suitable cable on eBay from Current Cost RJ45 to USB. They so sell the same cables on Amazon but on eBay the postage is free.

Connecting the cable is simple. To get access the feed is slightly more complex but luckily there are some great articles on the web. Connecting under Linux is fairly straight forward, there is fantastic article at http://www.linuxuk.org/2008/12/currentcost-and-ubuntu/. This explains how make the USB device available to read from as a device. Next you can use a Perl script from http://www.jibble.org/currentcost/ or read on for the Python I wrote to interpret the XML. I used the python-serial package from Karmic Koala Ubuntu.

#!/usr/bin/env python

import serial
from lxml import etree

# open the device, 2400 is baud rate
ser = serial.Serial('/dev/ttyUSB0', 2400, timeout=1)

while (True):
    reading = ser.readline()
    if reading:
        try:
            # use lxml to extract what we care about
            xml = etree.fromstring(reading)
            print xml.xpath('/msg/ch1/watts')[0].text
            print xml.xpath('/msg/tmpr')[0].text
        except:
            pass

This will print the watts and the temperature as the data is sent from the monitor via the serial cable.

Without any processing the input from the monitor looks like (replace the hrr with hr – I was losing the fight with html tags…):



    
        00002
        205102
    
    CC02037771
    01429
    00000
    00000
    18.2


For some more detailed information check out: http://mungbean.org/blog/?p=477 and also http://knolleary.net/2008/05/05/power-graphing/ for some tips on graphing.

To finish off below is a script to log the data into a sqlite database:

#!/usr/bin/env python

import serial
import twitter
import sqlite3

from sqlalchemy import *
from sqlalchemy.orm import sessionmaker, mapper

from datetime import datetime
from lxml import etree
from lxml.etree import XMLSyntaxError
ser = serial.Serial('/dev/ttyUSB0', 2400, timeout=1)

while (True):
    reading = ser.readline()
    if reading:
        try:
            xml = etree.fromstring(reading)
            watts = xml.xpath('/msg/ch1/watts')[0].text
            temp = xml.xpath('/msg/tmpr')[0].text
            now = datetime.now()
            time = now.strftime("%Y%m%d%H%M%S")

            metadata = MetaData()
            engine = create_engine('sqlite:////home/yourusername/Desktop/power.db')
            metadata.bind = engine
            Session = sessionmaker(bind=engine, autoflush=True)
            session = Session()
            session.autocommit = False

            # create a class to map to
            class Power(object): pass
            table = Table('power', metadata, autoload=True)
            mapper(Power, table)

            entry = Power()
            entry.datetime = time
            entry.watts = watts
            entry.temperature = temp
            session.add(entry)
            session.commit()
            
            break
        except XMLSyntaxError:
            pass

Of Python, memcached and decorators: easy-peasy function caching

Following on from channam’s Memcached in 2 minutes, I’ve been working on a decorator to make life even simpler than this, memorised.
The most popular use-case for using memcached in Python apps is to cache the return value of a function or method. Over and over again you’ll find yourself doing something like:

mc = memcached.Client(['localhost:11211'])
def get_something(mc=mc):
    value = mc.get('something')
    if value is not None:
        return value
    else:
        # Do something else ...
        return 'hello world'

So in the interests of DRY, why not reduce that down to a reusable pattern?
We can do this using a decorator, to replace our function with a function that instead checks for the existance of some sort of key, returns the value, otherwise delegates to the actual function for output (and then caches this output back into memcache).

The simplest way to add reusable implementations of patterns such as these onto existing functions in Python is to use a decorator, effectively a function wrapper that replaces (or returns a replacement) function in place of the original call.
How does this help? Well using this we can intercept a call to the decorated function, generate a signature for use as a key in memcache, check if the item is available in the cache, if so return that, otherwise grab the output from the function, pop it into memcache using our generated key, and finally return this value.

The biggest challenge we face is generating the unique (at least to that particularly function call) key to reference in memcache. The way I’ve found best to do this is to use a combination of module, class name and key attributes (if it’s a method of an instance or decorated with @classmethod), function name, and call arguments, in this form: <module>.<class>[<key attributes>]::(<arguments>)

In order to do this we need information about both the arguments, and if it’s a method we also need information about the class the method is bound to. Ordinarily these are both easy tasks using the inspect module, and the im_self attribute (funnily enough, referencing self) that bound methods contain.
However, a quick explanation of how the @<decorator> shortcut tag works, and indeed how decorators work, reveals a slight kink in this assumption. For example take the following:

class Test:
    @testfunc
    def test(self, arg1):
        pass

This simple bit of syntactic sugar is actually equal in functionality to the following (which is how you had to do it before Python 2.4, see PEP-318):

class Test:
    def test(self, arg1):
        pass
    test = testfunc(test)

To explain further, as I said before, decorators are just function which take other function instances and wrap around or replace them, and they are applied at the time of definition, and not as you might think at first calltime.
This is the reason the decorator function returns a function, as it is first called and instanced at this point, but any arguments to the function it is wrapping are passed at call time as a call to the function returned by the decorator.

If this confuses you, don’t worry, it’s not actually that important right now, except for the fact that because the function instance is not passed in at calltime, it means it is not bound, and loses it’s frame (e.g. how it is called, from which instance etc.).
This means we can no longer use im_self and several of the class functions in the inspect module. What we can do however is cheat and use the fact that bound methods always pass in their bound object instance or class instance as the first argument of a call, the ‘self’ argument.

memorised.decorators.memorise() uses the following trick, to first check if ‘self’ or ‘cls’ (the standard first parameter of an @classmethod) is there, and then using the *args list of passed in arguments to access the first parameter and grab either .__class__._name for an object instance’s class name or .__name__ for class instances:

# Get the list of arg names from func_code
argnames = fn.func_code.co_varnames[:fn.func_code.co_argcount]


. . .

if classmethod:
    # Get the class name from the cls argument
    class_name = args[0].__name__
else:
    # Get the class name from the self argument
    class_name = args[0].__class__.__name__

By then merging *args and **kwargs, we can build a hash key of this particular function call. Next just create a handy MD5 hash of this string using hashlib.md5, and then do our memcache, checks pretty much as above in the first example.

Using memorise() to replace the first example, we get:

mc = memcached.Client(['localhost:11211'])
@memorise(mc=mc)
def get_something():
    return 'hello world'

Notice I’m still defining the mc variable to be a memcached.Client instance, memorise() does handle do this itself, either by using the default localhost:11211 server setting or by accepting a list of servers (via an argument named ‘mc_servers’). However, this isn’t ideal as the memcached.Client instance would be created every time a function definition is decorated with memorise() (which could be lots), so best to pass an instance in each time.
Not to mention the fact using dependency injection like this over any other way of keeping the instance (e.g. singleton) is much cleaner.

Another point of interest is that we need to always include the call parenthesis even when not passing in any arguments to memorise(), e.g. @memorise() and not @memorise, as you would expect from decorators such as @classmethod. This is best of the way arguments are passed to both a decorator, and then to the function being decorated. There are workarounds for this problem, but up until now I haven’t seen one that can be used with class-based decorators (which memorise() is). I hope to solve this in a future release, so expect a follow up post on using optional arguments with class-based decorators sometime in the near future.

Finally I’ll finish with a more realistic example of using this to decorate methods on a Django Model:

class BlogUser(User):
    objects = UserManager()
    
    def __unicode__(self):
        return u'%s' self.get_full_name();

    @property
    @memorise(parent_keys=['id'], mc=mc)
    def posts(self):
        Post.objects.filter(creator=self)

And there you have it, any posts by that user will be cached for as long as memcached’s cachetime is set, or until memorised.utils.uncache() is used to clear down the cache for that method.

Textual Log Analysis using Python

Here at isotoma, we have a company irc channel that is used for general communication, chattering and link sharing.
Everyone joins it at the start of the day, and keeps up to date with what’s going on, and who’s talking about what.
lolcats are occasionally mentioned.

Now, having logs of the channel reaching many megabytes, I was curious as to the text statistics produced by this channel, who has what reading age, and how much they’ve talked in comparison to other people.
While I won’t release the actual statistics I’ve gathered for the channel, I did think it’d be cool to release the script I wrote to do the analysis itself.

It uses the Natural Language Toolkit (NLTK), and the readability contrib module for it. It’s not particularly nice code (inline html generation and other nastiness), but it does work. I’ll attempt to release a cleaned up version when I get some more time to work on it.

Currently, it expects a log in the format from znc, the irc bouncer software that I use, although it can be modified easily by altering the timestamp_count to the correct number to skip the timestamp. It also expects nicks to be surrounded in ‘<’ and ‘>’. I _did_ say it wasn’t particularly nice code.

However, code style issues aside, it is a demonstration and example of using NTLK and the readability module on real world data, and the output is kind of cool. Especially when you find out that the ircbot has a higher reading age than you.

Find the source attached.

log-analyser