Archive for the 'Python' Category

On Ubuntu Python, Exceptions and unnecessary imports

A few days ago Alexander Limi (one of Plone’s founders) tweeted the following:

Ubuntu Python: Raise an exception, import 190 modules: http://bit.ly/bCxlhC – this is why you don’t want to use the system Python.

Now this gets my goat on a few points. First up, why the hell would I not want to use the system python? If I’m using any sane distribution I’ll have package management and security updates, and any flaw in Python will be patched, packaged and tested by people that are far smarter than me. Upgrading the Python that ships with the Plone Unified Installer just isn’t going to be as easy, however you play it. And that’s without the risk of the Plone community moving on to more exciting things, leaving their version of Python unsupported.

Secondly, there’s a fatal flaw in the original blog post to which limi refers. Yes, on the desktop, Ubuntu imports 190 packages when an exception is raised. As the author explains this is to enable Apport to provide as much information to the Ubuntu devs about application failures. What the author does not mention is that this doesn’t happen on the Server edition of Ubuntu. Why would it? Apport is designed to handle desktop application failures and to improve the end user experience. It isn’t installed by default on the server edition, because it isn’t needed.

On my Karmic desktop:

Python 2.6.4 (r264:75706, Dec  7 2009, 18:43:55)
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> len(sys.modules)
35
>>> raise KeyError
Traceback (most recent call last):
  File "", line 1, in 
KeyError
>>> len(sys.modules)
225

On my Karmic server:

Python 2.6.4 (r264:75706, Dec  7 2009, 18:43:55)
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> len(sys.modules)
32
>>> raise KeyError
Traceback (most recent call last):
  File "", line 1, in 
KeyError
>>> len(sys.modules)
32

Being quick to condemn Ubuntu, and their packaging of Python, doesn’t do anyone any good. Think before you tweet. And don’t go live on a desktop distro.

Compiling Python2.3 on Karmic

We still have some (now very old) Plone 2.1 sites running.  Up until Ubuntu 8.10 python2.3 was still available from package management (albeit with limited support), but as of Ubuntu 9.04 python2.3 was removed entirely.

To recreate a development environment for these older sites means downloading Python 2.3 and compiling it by hand.  Unfortunately it’s not a default build any more – as it says on the download page:

Since the release candidate, we received various reports that the this release may fail to build on current operating systems, in particular on OS X. We have made no attempt to fix these problems, as the release is targeted for systems that were current at the time Python 2.3 was originally released. For more recent systems, you might have to come up with work-arounds.

For Ubuntu 9.10 (Karmic Koala) to compile python2.3 you’ll need to disable buffer overflow checking by doing the following:

wget http://www.python.org/ftp/python/2.3.7/Python-2.3.7.tgz
tar zxvf Python-2.3.7.tgz
cd Python-2.3.7
./configure --prefix=/usr/local BASECFLAGS=-U_FORTIFY_SOURCE
make
sudo make install

Getting the name of the current view in a Django template

Recently I have been working with generalised/reusable Django base templates, and I had the need to know the name of the current view using the template.
As it turns out, there currently aren’t any simple methods of doing this out there, and some of the nicer ones at their best make use of monkey patching.

After some playing around I found the cleanest, and quite simple, way is to use a custom template context processor.
A context processor provides extra values to bind to a template instance (it’s context; similar to including items in the data dict when calling render_to_response()).

By using a custom context processor we can assign a ‘current_view’ value to the template context, containing the namespaced name of the current view.
That just leaves one problem: how do we find the current view from within the request context.
inspect to the rescue!

Update: see Alexander Dutton’s comment for the even cleaner way by using middleware instead of stack inspection.

With inspect.stack() and inspect.getmodule() we can both grab the a stack of frames (e.g. the execution frames up to and including the current execution frame, which should be our context processor instance), and the module the frame is in.

from inspect import stack, getmodule

def ContextWithView(request):
    """Template context with current_view value,
    a string with the full namespaced django view in use.
    """
    # Frame 0 is the current frame
    # So assuming normal usage the frame of the view
    # calling this processor should be Frame 1
    name = getmodule(stack()[1][0]).__name__
    return {
        'current_view': "%s.%s" % (name, stack()[1][3]),
    }

To use this in a view, just import it and then pass it as the context in your render_to_response() call:

"""main/my_site/views/misc.py - misc views"""

from main.my_site.context_processors import ContextWithView

def my_view(request):
    # Do some stuff
    return render_to_response(
        'my_template.html',
        {},
        context_instance=ContextWithView(request)
    )

From within a template you can then use current_view, which for the above should output something like “main.my_site.views.misc.my_view”

Misc.

If you’re using RequestContext with your templates to make sure request, user etc. are always available in your templates, you can chain context processor calls like so:

from inspect import stack, getmodule
from django.template import RequestContext

def ContextWithView(request):
    """Template context with current_view value,
    a string with the full namespaced django view in use.
    """
    d = RequestContext(request)
    # Frame 0 is the current frame
    # So assuming normal usage the frame of the view
    # calling this processor should be Frame 1
    name = getmodule(stack()[1][0]).__name__
    d['current_view'] = "%s.%s" % (name, stack()[1][3])
    return d

If, like me, you then need to render a reverse URL using the ‘current_view’ variable, you’ll find the the {% url %} tag in Django only works on static strings, and not variables, so we need to make a custom tag to do this for us:

"""main/my_site/templatetags/extras.py
"""

from django.core.urlresolvers import reverse
from django import template

register = template.Library()

@register.simple_tag
def var_url(view, *args, **kwargs):
    return reverse(view, args=args, kwargs=kwargs)
{% comment %}
To use {% var_url %} in your template
just import and use as you would {% url %}
{% endcomment %}

{% load extras %}

{% block content %}
The URL for this view is <em>{% var_url current_view %}</em>.
{% endblock content %}

Monitoring Your Power Usage

I was lucky enough to attend LUG Radio and Ogg Camp recently and was inspired by a talk given by Andy Standford-Clark and his house that twitters. I had been sent a power monitor after switching power supplier, but never done much with it. Watching Andy’s energetic performance inspired my to dig it out and have a go at getting the data off it. Also the thought of adding twitter to anything was too strong to resist.

My unit is a branded version of Classic Current Cost unit with a standard transmitter. The first hurdle was to get the transmitter and the monitor talking again. The monitor has three buttons at the base, pressing and holding the middle button syncs the unit to the transmitter. It is possible to associate more than one transmitter to the monitor, the Classic model I believe can talk to 3 transmitters.

Next issue was connecting it to a PC, the unit didn’t come with a data cable. The output from the monitor is slow trickle of XML over a RJ45 serial connector. I found a suitable cable on eBay from Current Cost RJ45 to USB. They so sell the same cables on Amazon but on eBay the postage is free.

Connecting the cable is simple. To get access the feed is slightly more complex but luckily there are some great articles on the web. Connecting under Linux is fairly straight forward, there is fantastic article at http://www.linuxuk.org/2008/12/currentcost-and-ubuntu/. This explains how make the USB device available to read from as a device. Next you can use a Perl script from http://www.jibble.org/currentcost/ or read on for the Python I wrote to interpret the XML. I used the python-serial package from Karmic Koala Ubuntu.

#!/usr/bin/env python

import serial
from lxml import etree

# open the device, 2400 is baud rate
ser = serial.Serial('/dev/ttyUSB0', 2400, timeout=1)

while (True):
    reading = ser.readline()
    if reading:
        try:
            # use lxml to extract what we care about
            xml = etree.fromstring(reading)
            print xml.xpath('/msg/ch1/watts')[0].text
            print xml.xpath('/msg/tmpr')[0].text
        except:
            pass

This will print the watts and the temperature as the data is sent from the monitor via the serial cable.

Without any processing the input from the monitor looks like (replace the hrr with hr – I was losing the fight with html tags…):



    
        00002
205102
    
    CC02037771
    01429
    00000
    00000
    18.2


For some more detailed information check out: http://mungbean.org/blog/?p=477 and also http://knolleary.net/2008/05/05/power-graphing/ for some tips on graphing.

To finish off below is a script to log the data into a sqlite database:

#!/usr/bin/env python

import serial
import twitter
import sqlite3

from sqlalchemy import *
from sqlalchemy.orm import sessionmaker, mapper

from datetime import datetime
from lxml import etree
from lxml.etree import XMLSyntaxError
ser = serial.Serial('/dev/ttyUSB0', 2400, timeout=1)

while (True):
    reading = ser.readline()
    if reading:
        try:
            xml = etree.fromstring(reading)
            watts = xml.xpath('/msg/ch1/watts')[0].text
            temp = xml.xpath('/msg/tmpr')[0].text
            now = datetime.now()
            time = now.strftime("%Y%m%d%H%M%S")

            metadata = MetaData()
            engine = create_engine('sqlite:////home/yourusername/Desktop/power.db')
            metadata.bind = engine
            Session = sessionmaker(bind=engine, autoflush=True)
            session = Session()
            session.autocommit = False

            # create a class to map to
            class Power(object): pass
            table = Table('power', metadata, autoload=True)
            mapper(Power, table)

            entry = Power()
            entry.datetime = time
            entry.watts = watts
            entry.temperature = temp
            session.add(entry)
            session.commit()

            break
        except XMLSyntaxError:
            pass

Of Python, memcached and decorators: easy-peasy function caching

Following on from channam’s Memcached in 2 minutes, I’ve been working on a decorator to make life even simpler than this, memorised.
The most popular use-case for using memcached in Python apps is to cache the return value of a function or method. Over and over again you’ll find yourself doing something like:

mc = memcached.Client(['localhost:11211'])
def get_something(mc=mc):
    value = mc.get('something')
    if value is not None:
        return value
    else:
        # Do something else ...
        return 'hello world'

So in the interests of DRY, why not reduce that down to a reusable pattern?
We can do this using a decorator, to replace our function with a function that instead checks for the existance of some sort of key, returns the value, otherwise delegates to the actual function for output (and then caches this output back into memcache).

The simplest way to add reusable implementations of patterns such as these onto existing functions in Python is to use a decorator, effectively a function wrapper that replaces (or returns a replacement) function in place of the original call.
How does this help? Well using this we can intercept a call to the decorated function, generate a signature for use as a key in memcache, check if the item is available in the cache, if so return that, otherwise grab the output from the function, pop it into memcache using our generated key, and finally return this value.

The biggest challenge we face is generating the unique (at least to that particularly function call) key to reference in memcache. The way I’ve found best to do this is to use a combination of module, class name and key attributes (if it’s a method of an instance or decorated with @classmethod), function name, and call arguments, in this form: <module>.<class>[<key attributes>]::(<arguments>)

In order to do this we need information about both the arguments, and if it’s a method we also need information about the class the method is bound to. Ordinarily these are both easy tasks using the inspect module, and the im_self attribute (funnily enough, referencing self) that bound methods contain.
However, a quick explanation of how the @<decorator> shortcut tag works, and indeed how decorators work, reveals a slight kink in this assumption. For example take the following:

class Test:
    @testfunc
    def test(self, arg1):
        pass

This simple bit of syntactic sugar is actually equal in functionality to the following (which is how you had to do it before Python 2.4, see PEP-318):

class Test:
    def test(self, arg1):
        pass
    test = testfunc(test)

To explain further, as I said before, decorators are just function which take other function instances and wrap around or replace them, and they are applied at the time of definition, and not as you might think at first calltime.
This is the reason the decorator function returns a function, as it is first called and instanced at this point, but any arguments to the function it is wrapping are passed at call time as a call to the function returned by the decorator.

If this confuses you, don’t worry, it’s not actually that important right now, except for the fact that because the function instance is not passed in at calltime, it means it is not bound, and loses it’s frame (e.g. how it is called, from which instance etc.).
This means we can no longer use im_self and several of the class functions in the inspect module. What we can do however is cheat and use the fact that bound methods always pass in their bound object instance or class instance as the first argument of a call, the ’self’ argument.

memorised.decorators.memorise() uses the following trick, to first check if ’self’ or ‘cls’ (the standard first parameter of an @classmethod) is there, and then using the *args list of passed in arguments to access the first parameter and grab either .__class__._name for an object instance’s class name or .__name__ for class instances:

# Get the list of arg names from func_code
argnames = fn.func_code.co_varnames[:fn.func_code.co_argcount]

. . .

if classmethod:
    # Get the class name from the cls argument
    class_name = args[0].__name__
else:
    # Get the class name from the self argument
    class_name = args[0].__class__.__name__

By then merging *args and **kwargs, we can build a hash key of this particular function call. Next just create a handy MD5 hash of this string using hashlib.md5, and then do our memcache, checks pretty much as above in the first example.

Using memorise() to replace the first example, we get:

mc = memcached.Client(['localhost:11211'])
@memorise(mc=mc)
def get_something():
    return 'hello world'

Notice I’m still defining the mc variable to be a memcached.Client instance, memorise() does handle do this itself, either by using the default localhost:11211 server setting or by accepting a list of servers (via an argument named ‘mc_servers’). However, this isn’t ideal as the memcached.Client instance would be created every time a function definition is decorated with memorise() (which could be lots), so best to pass an instance in each time.
Not to mention the fact using dependency injection like this over any other way of keeping the instance (e.g. singleton) is much cleaner.

Another point of interest is that we need to always include the call parenthesis even when not passing in any arguments to memorise(), e.g. @memorise() and not @memorise, as you would expect from decorators such as @classmethod. This is best of the way arguments are passed to both a decorator, and then to the function being decorated. There are workarounds for this problem, but up until now I haven’t seen one that can be used with class-based decorators (which memorise() is). I hope to solve this in a future release, so expect a follow up post on using optional arguments with class-based decorators sometime in the near future.

Finally I’ll finish with a more realistic example of using this to decorate methods on a Django Model:

class BlogUser(User):
    objects = UserManager()

    def __unicode__(self):
        return u'%s' self.get_full_name();

    @property
    @memorise(parent_keys=['id'], mc=mc)
    def posts(self):
        Post.objects.filter(creator=self)

And there you have it, any posts by that user will be cached for as long as memcached’s cachetime is set, or until memorised.utils.uncache() is used to clear down the cache for that method.

Memcached in 2 Minutes

So you need to use memcached with Python? Below is a brief intro.

First install everything you need. I run Ubuntu so python-memcached memcached packages were required. python-memcached is available from www.tummy.com and the memcached site is available here

Start the memcached server if its not already running:

/usr/bin/memcached -m 64 -p 11211 -u nobody -l 127.0.0.1

The above gets you a 64Mb server, more than enough to play on.

Next some python below is from the interpreter:

>>> import memcache
>>> memc = memcache.Client(['127.0.0.1:11211'])
>>> class test():
>>>    def __init__(self):
>>>        self.message = "Hello, world"
>>>
>>> t = test()
>>> memc.set('testname', t, 120)
True
>>> got = memc.get('testname')
>>> got.message
'Hello, world'

The above instantiates an object and then saves it memcached with set() and then we get it back using get(). In the set call we specify the number 120, this is the number of seconds the object should be held in the cache. The usage is pretty simple: try and fetch from memcached if it fails fetch from your datasource and then save that document ready for next time.

Using Select Boxes with WebDriver

This is a little update to my previous WebDriver post. The code below allows you to set the content of select boxes by accessing the element directly, in this case using XPath to find the element.

#!/usr/bin/env python

from webdriver_firefox.webdriver import FirefoxLauncher
from webdriver_firefox.webdriver import WebDriver

driver = WebDriver()
driver.get("http://cassandra.appspot.com/")

elements = driver.find_elements_by_xpath(
"/html/body/div[@id='container']/div[@id='search']/form[@id='searchForm']/div/select")

select_box = elements[0]
options = select_box.find_elements_by_tag_name("option")

for option in options:
    print option.get_text()

# sets the select box to last.fm Username
options[2].set_selected()

Textual Log Analysis using Python

Here at isotoma, we have a company irc channel that is used for general communication, chattering and link sharing.
Everyone joins it at the start of the day, and keeps up to date with what’s going on, and who’s talking about what.
lolcats are occasionally mentioned.

Now, having logs of the channel reaching many megabytes, I was curious as to the text statistics produced by this channel, who has what reading age, and how much they’ve talked in comparison to other people.
While I won’t release the actual statistics I’ve gathered for the channel, I did think it’d be cool to release the script I wrote to do the analysis itself.

It uses the Natural Language Toolkit (NLTK), and the readability contrib module for it. It’s not particularly nice code (inline html generation and other nastiness), but it does work. I’ll attempt to release a cleaned up version when I get some more time to work on it.

Currently, it expects a log in the format from znc, the irc bouncer software that I use, although it can be modified easily by altering the timestamp_count to the correct number to skip the timestamp. It also expects nicks to be surrounded in ‘<’ and ‘>’. I _did_ say it wasn’t particularly nice code.

However, code style issues aside, it is a demonstration and example of using NTLK and the readability module on real world data, and the output is kind of cool. Especially when you find out that the ircbot has a higher reading age than you.

Find the source attached.

log-analyser

D-Bus Fun

Use the following to make the music player Banshee queue a track ready for playback.

import dbus
bus = dbus.SessionBus()
player_queue = bus.get_object("org.bansheeproject.Banshee",
"/org/bansheeproject/Banshee/SourceManager/PlayQueue")
player_queue.EnqueueUri("/home/channam/Music/Jonathon Coulton/Code Monkey.mp3",True)

For more D-Bus info I recommend the D-Bus tutorial.

Uniquifying Delicious Links

We use delicious a lot here at isotoma, but given that we’re all likely to mostly bookmark the same links, we thought it’d be cool to filter the stream for a single copy of each link.

Click here

This site should pull all the streams for the entered users over the last number of days and give you all the links that were posted in the last number of days (or 100 items, whichever is smaller).

I hope it’s of use.