Category Archives: Uncategorized

PloneConf2010 – and there’s more

I’m here at PloneConf2010 too, and I’ve been to mostly different talks to Mitch, so here’s my write up too.  It’s taken a bit of time to find the time to write this up properly!

Calvin Hendryx-Parker: Enterprise Search in Plone using Solr

We’ve been using Solr for a few years here at Isotoma, but so far we’ve not integrated it with Plone.  Plone’s built-in Catalog is actually pretty good, however one thing it doesn’t do fantastically well is full-text search.  It is passable in English, but has very limited stemming support – which makes it terrible in other languages.

Calvin presented their experience of using Solr with Plone. They developed their own software to integrate the Plone catalog with Solr, instead of using collective.solr, which up till then was the canonical way of connecting them. Their new product alm.solrindex sounds significantly better than collective.solr.  Based on what I’ve heard here, you should definitely use alm.solrindex.

To summarise how this all hangs together, you need an instance of Solr installed somewhere that you can use.  You can deploy a solr specifically for each site, in which case you can deploy it through buildout.  Solr is Java, and runs inside various Java application servers.

You can also run a single Solr server for multiple Plone sites – in which case you partition the Solr database.

You then configure Solr, telling it how to index and parse the fields in your content. No configuration of this is required within Plone.  In particular you configure the indexes in Solr not in Plone.

Then install alm.solrindex in your plone site and delete all the indexes that you wish to use with Solr. alm.solrindex will create new indexes by inspecting Solr.

Then reindex your site, and you’re done!  It supports a lot of more complex use cases, but in this basic case you get top-end full text indexing at quite low cost.

Dylan Jay, PretaWeb: FunnelWeb

Funnelweb sounds invaluable if you want to convert an existing non-Plone site into a Plone site, with the minimum effort.

Funnelweb is a tool based on transmogrifier. Transmogrifier provides a “pipeline” concept for transforming content. Pipeline stages can be inserted into a pipeline, and these stages then have the ability to change the content in various ways.

Dylan wrote funnelweb to use transmogrifier and provide a harness for running it in a managed way over existing websites.  The goal is to create a new Plone site, using the content from existing websites.

Funnelweb uploads remotely to Plone over XML-RPC, which means none of transmogrifier needs to be installed in a Plone site, which is a significant advantage.  It is designed to be deployed using buildout, so a script will be provided in your build that executes the import.

A bunch of pipeline steps are provided to simplify the process of importing entire sites.  In particular funnelweb has a clustering algorithm that attempts to identify which parts of pages are content and which are templates.  This can be configured by providing xpath expressions to identify page sections, and then extract content from them for specific content fields.

It supports the concept of ordering and sorts, so that Ordered Folder types are created correctly.  It supports transmogrify.siteanalyser.attach to put attachments closer to pages and transmogrify.siteanalyser.defaultpage to detect index pages in collections and to make them folder indexes in the created sites.

Finally it supports relinking, so that pages get sane urls and all links to those pages are correctly referenced.

Richard Newbury: The State of Plone Caching

The existing caching solution for Plone 3 is CacheFu, which is now pretty long in the tooth.  I can remember being introduced to CacheFu by Geoff Davis at the Archipelago Sprint in 2006, where it was a huge improvement on the (virtually non-existent) support for HTTP caching in Plone.

It’s now looking pretty long in the tooth, and contains a bunch of design decisions that have proved problematic over time, particularly the heavy use of monkeypatching.

This talk was about the new canonical caching package for Plone, plone.app.caching. It was built by Richard Newbury, based on an architecture from the inimitable Martin Aspeli.

This package is already being used on high-volume sites with good results, and from what I saw here the architecture looks excellent.  It should be easy to configure for the general cases and allows sane extension of software to provide special-purpose caching configuration (which is quite a common requirement).

It provides a basic knob to control caching, where you can select strong, moderate or weak caching.

It can provide support for the two biggest issues in cache engineering: composite views (where a page contains content from multiple sources with different potential caching strategies) and split views (where one page can be seen by varying user groups who cannot be identified entirely from a tuple of URL and headers listed in Vary).

It provides support for nginx, apache, squid and varnish.  Richard recommends you do not use buildout recipes for Varnish, but I think our recipe isotoma.recipe.varnish would be OK, because it is sufficiently configuration.  We have yet to review the default config with plone.app.caching though.

Richard recommended some tools as well:

  • funkload for load testing
  • browsermob for real browsers
  • HttpFox instead of LiveHttpHeaders
  • Firebug, natch
  • remember that hitting refresh and shift-refresh force caches to refresh.  Do not use them while testing!

Jens Klein: Plone is so semantic, isn’t it?

Jens introduced a project he’s been working on called Interactive Knowledge Stack (IKS), funded by the EU.  This project is to provide an open source Java component for Content Management Systems in Europe to help the adoption of Semantic concepts online.  The tool they have produced is called FISE. The name is pronounced like an aussie would say “phase” ;)

FISE provides a RESTful interface to allow a CMS to associate semantic statements with content.  This allows us to say, for example that item X is in Paris, and in addition we can state that Paris is in France.  We can now query for “content in France” and it will know that this content is in France.

The provide a generic Python interface to FISE which is usable from within Plone.  In addition it provides a special index type that integrates with the Plone Catalog to allow for updating the FISE triple store with the information found in the content.  It can provide triples based on hierarchical relationships found in the plone database (page X is-a-child-of folder Y).

Jens would like someone to integrate the Aloha editor into Plone, which would allow much easier control by editors of semantic statements made about the content they are editing.

Querying Webtrends ODBC from the command line with WebtrendsQT

As I alluded to yesterday, and in my post about SQLAWebtrends, I’ve recently been doing a lot of work with the Webtrends analytics service, concerned mostly with getting data out of it via the old Windows ODBC drivers.

While turn around on new data available from reports could cause Methuselah to yawn, it could still be exceedingly time consuming loading up a spreadsheet app, defining queries in an ODBC query builder, and waiting for data to populate sheets; or at best writing several Python functions to query the last data; I would still have to spend tedious amounts of time tweaking and re-tweaking queries for different reports and/or datasets.

This lead me to make WebtrendsQT, a psql/mysql-like command line query tool for Webtrends using pyODBC.

WebtrendsQT is mostly just the ODBC extra tool provided by pyODBC, with some WT-specific changes. Namely the introduction of a “\p” command, which issues the {Call wtGetProfileList()} stored procedure against the WTSystem schema (via the system_cursor property), returning a list of profiles.
Similarly do_l (the handler for “\l”) instead of listing real schemas, lists the Webtrends ODBC equivalent templates.

do_c (“\c”) will work as you’d expect, taking a “schema” (e.g. template), and changing cursor to point to it, but also takes profile GUID as an optional first option to switch both profile and template (profiles define the data source and which report templates are available).

It took me some time to figure out that PyODBC‘s lovely columns() method wouldn’t work with the Webtrends driver, as some metadata isn’t provided by the driver and causes a segfault. Instead my hack is to use the DB API Cursor.description to get name and type details for columns on a table, unfortunately in order to get this information I need a cursor that specifically targets the table in question; and to get around this I make a simple query against the table that won’t return any information, but will still return a cursor:

@memoized()
def get_columns(self, name):
    columns = [['Column name', 'Type', 'Size',]]
    row = self.cursor.execute(
        'SELECT * FROM %s LIMIT 0' % (name,)
    ).fetchone()
    for r in row.cursor_description:
        columns.append(
            [r[0],
            self.db_types[r[1]],
            r[3],]
        )
    return columns

cursor_description is PyODBC’s special “always available even after query-set has been closed” reference to the cursor.description instance.

Unlike pyDBCLI.extras.odbc, WebtrendsQT takes a set of arguments rather than a single DSN string, due to the ODBC driver requiring a specific set of details to connect.

You most likely just want to install and run the tool under Windows, which if you have any experience with Python on Windows should be easy enough using easy_install or the included setup.py; if however you don’t have any Python-Windows experience and just want to get up and running with WebtrendsQT, the FAQ has a 5 step simple guide, including a pre-rolled pair of Windows scripts, that will install everything and create a batch script with all the Python paths set up to use.
When installed just type wtqt in the cmd.exe Window, provided by the batch script, and away you go.

C:\Users\test\Desktop> wtqt

ERROR: Must have a profile GUID, -p

Usage: wtqt.py [-u <user>] [-p <pass>] -d <system DSN> -h <host> [-P <port>] -t <template> -p <profile>

Options:
  -d, --systemdsn: Predefined system DSN
  -p, --profile : Webtrends profile GUID
  -t, --template : Template/schema
  -h, --host : Webtrends web instance
  -P, --port : Optional server port (default: 80)
  -u, --username: Optional username
  -k, --password: Optional password

Installing Postgis on Ubuntu Karmic Koala (9.10)

Karmic has Postgres 8.4 as it’s default version, and 8.3 can prove tricky to install.
Unfortunately, not all of the contrib packages were upgraded to run on the new version in time.

One of these is postgis, the geodatabase extensions. You have two options, build it from source, or install the Lucid (10.4) package.
Obviously, Lucid isn’t released yet, but I have had success with just installing the package. Your Mileage May Vary.

i386 Package
AMD Package

Hope this helps someone!
–t

Django templates derived from the view docstring for rapid prototyping

The rather verbose title says it all really.

We had a need the other week to churn out the skeleton of a site to see how the different areas fitted together.
As it was being written in Django anyway I put together this quick ‘n dirty utility that renders the reStructuredText docstring of a view to the returned response, so you can quickly put in page furniture and links to other views without having to go to the effort of creating templates.


def docview(fn):
    from docutils.core import publish_string
    from django.http import HttpResponse

    r = HttpResponse(publish_string(
            source=fn.func_doc,
            writer_name='html'))
    
    return lambda rtn: r

Then decorate your method:


@docview
def fast_view(request):
    """
===========
A Fast View
===========

With links to:

 * `An even faster view </faster_view>`_
 * `Somewhere else <http://isotoma.com>`_
 
    """

Which will render:

A Fast View

With links to:

Code Coverage in Testing

Getting an idea of much code your test covers varies in how complex your code is. I have found coverage.py from Ned Batchelder to be a great tool.

The usage confused me slightly for a while, so I thought I`d share with you how I used it.

First step was to created my Python test. I created mine using the standard unittest framework in Python. I then ran ‘coverage -e -x test_mytest_file.py’ which executes the test.

To extract the coverage of the test run ‘coverage -b -i -d htmlcov ../module_being_tested.py’. This creates a directory of HTML output containing an index.html. Open this in your browser and you will be able to see the overall percentage of covered code. On this page is a list of the modules you tested. Clicking on a module highlights in red the statements not covered by your test.

Google Contact API

I`m quite a Google fan and enjoying playing with their APIs. For me one of the big attractions of Google is the quality and availability of the APIs.

Below is a bit of code that fetches the postal address of a contact from your Google contacts. It took me ages to figure out so I hope posting it here will cut down the learning curve and yes I`m aware its only a simple thing :) .

You will need the Google Gdata libraries available from Google. I also use lxml to process the feed. I cant recommend lxml enough. If your using Python and XML or HTML use lxml.

The following has been lifted straight from a interpreted session so fire up Python first and paste the below in remembering to change the username and password.

import atom
import gdata.contacts
import gdata.contacts.service
from lxml import etree

gd_client = gdata.contacts.service.ContactsService()
gd_client.email = 'jo@gmail.com'
gd_client.password = 'password'
gd_client.source = 'exampleCo-exampleApp-1'
gd_client.ProgrammaticLogin()

feed = gd_client.GetContactsFeed()
entry = feed.entry[0]
entry
entry.postal_address[0].text
'123 Fake Street'

Memcached in 2 Minutes

So you need to use memcached with Python? Below is a brief intro.

First install everything you need. I run Ubuntu so python-memcached memcached packages were required. python-memcached is available from www.tummy.com and the memcached site is available here

Start the memcached server if its not already running:

/usr/bin/memcached -m 64 -p 11211 -u nobody -l 127.0.0.1

The above gets you a 64Mb server, more than enough to play on.

Next some python below is from the interpreter:

>>> import memcache
>>> memc = memcache.Client(['127.0.0.1:11211'])
>>> class test():
>>>    def __init__(self):
>>>        self.message = "Hello, world"
>>>
>>> t = test()
>>> memc.set('testname', t, 120)
True
>>> got = memc.get('testname')
>>> got.message
'Hello, world'

The above instantiates an object and then saves it memcached with set() and then we get it back using get(). In the set call we specify the number 120, this is the number of seconds the object should be held in the cache. The usage is pretty simple: try and fetch from memcached if it fails fetch from your datasource and then save that document ready for next time.

Return of the FOUC

So, generally Firefox 3 did not have much impact on our sites. Besides the Kupu problem Andy mentioned, there were just a few small CSS-related glitches, but overall we did not have to scramble to roll out patches. However, on a site we’re currently working on, Firefox 3 showed a serious glitch I hadn’t seen for a long time: the Flash of Unstyled Content or FOUC, first described back in 2001. The page content displays un-styled for a split second before the stylesheet kicks in.

The bluerobot article is now outdated, but the problem occasionally still rears its head on more modern browsers. Since I’ve been using Firefox 3, I’ve been seeing it quite often, where Firefox 2 doesn’t. (I’m not saying there’s anything wrong with FF3, it’s probably just being less lax than FF2.) This article has a good explanation. In a nutshell, you see a FOUC whenever a script tries to access properties like scrollHeight or offsetWidth before the stylesheet has loaded. The solution is simply to have the CSS links above the JavaScript links in the HTML.
Here’s an example (until they fix it, anyway): getcloser.com, an ambitious HMV-sponsored social connector based on entertainment interests, designed by the clever bods at LBi. You can see in the <head> there’s 78KB of JavaScript followed by 151KB of CSS. Switch it round, and it’ll go away.

Facebook advertising

We’ve been running a Facebook “social ad” for Forkd for the last few days. I’ve been interested in what Facebook advertising could do for our customers for a while, and thought that the best way to find out was to try it for ourselves. Some interesting things have arisen from it, too:

  • It’s a very simple ad format – hard to really make compelling. Despite that, and having made little effort in the ad design, we’re getting 1 in 2,000 click throughs, which feels to me like a good number.
  • We’re getting roughly 1 new account sign up for every 8 clicks. Again, this feels like a good number. Both of these make me think that a good advert well targetted on Facebook is definitely worth the money.
  • Side effects were the discovery that our click throughs dropped by 50% over the weekend, but that the sign up rate improved to 1 in 4. From this, I guess, we can infer that a) Facebook is primarily a work diversionary tactic, and b) people in that diversionary mode are clicking with less purpose

All in all then? I’ll be recommending that customers do it, but I’d ask them to think about only running campaigns at the weekends, particularly if the landing page from a click requires a little reading for the reader to understand the benefits.

Sleevenotez and the Beeb

Rather a long time ago Tristan of BBC Audio R&D announced public last.fm accounts for the 4 main BBC music stations. At the time I thought we should probably set up accounts on Sleevenotez for these stations so that people (us included, to be honest) could track what was being played on the radio. Finally, finally, I’ve got my finger out and done it:

  • Radio1: username: bbcradio1, password: bbcradio1
  • Radio2: username: bbcradio2, password: bbcradio2
  • 1 Extra: username: bbc1extra, password bbc1extra
  • 6 Music: username: bbc6music, password bbc6music

Enjoy!