Author Archives: Doug Winter

Links

There is a new version of gunicorn, 19.0 which has a couple of significant changes, including some interesting workers (gthread and gaiohttp) and actually responding to signals properly, which will make it work with Heroku.

The HTTP RFC, 2616, is now officially obsolete. It has been replaced by a bunch of RFCs from 7230 to 7235, covering different parts of the specification. The new RFCs look loads better, and it’s worth having a look through them to get familiar with them.

Some kind person has produced a recommended set of SSL directives for common webservers, which provide an A+ on the SSL Labs test, while still supporting older IEs. We’ve struggled to find a decent config for SSL that provides broad browser support, whilst also having the best levels of encryption, so this is very useful.

A few people are still struggling with Git.  There are lots of git tutorials around the Internet, but this one from Git Tower looks like it might be the best for the complete beginner. You know it’s for noobs, of course, because they make a client for the Mac :)

I haven’t seen a lot of noise about this, but the EU has outlawed pre-ticked checkboxes.  We have always recommended that these are not used, since they are evil UX, but now there’s an argument that might persuade everyone.

Here is a really nice post about splitting user stories. I think we are pretty good at this anyhow, but this is a nice way of describing the approach.

@monkchips gave a talk at IBM Impact about the effect of Mobile First. I think we’re on the right page with most of these things, but it’s interesting to see mobile called-out as one of the key drivers for these changes.

I’d not come across the REST Cookbook before, but here is a decent summary of how to treat PUT vs POST when designing RESTful APIs.

Fastly have produced a spectacularly detailed article about how to get tracking cookies working with Varnish.  This is very relevant to consumer facing projects.

This post from Thought Works is absolutely spot on, and I think accurately describes an important aspect of testing The Software Testing Cupcake.

As an example for how to make unit tests less fragile, this is a decent description of how to isolate tests, which is a key technique.

The examples are Ruby, but the principle is valid everywhere. Still on unit testing, Facebook have open sourced a Javascript unit testing framework called Jest. It looks really very good.

A nice implementation of “sudo mode” for Django. This ensures the user has recently entered their password, and is suitable for protecting particularly valuable assets in a web application like profile views or stored card payments.

If you are using Redis directly from Python, rather than through Django’s cache wrappers, then HOT Redis looks useful. This provides atomic operations for compound Python types stored within Redis.

The problem with Backing Stores, or what is NoSQL and why would you use it anyway

Durability is something that you normally want somewhere in a system: where the data will survive reboots, crashes, and other sorts of things that routinely happen to real world systems.

Over the many years that I have worked in system design, there has been a recurring thorny problem of how to handle this durable data.  What this means in practice when building a new system is the question “what should we use as our backing store?”.  Backing stores are often called “databases”, but everyone has a different view of what database means, so I’ll try and avoid it for now.

In a perfect world a backing store would be:

  • Correct
  • Quick
  • Always available
  • Geographically distributed
  • Highly scalable

While we can do these things quite easily these days with the stateless parts of an application, doing them with durable data is non-trivial. In fact, in the general case, it’s impossible to do all of these things at once (The CAP theorem describes this quite well).

This has always been a challenge, but as applications move onto the Internet, and as businesses become more geographically distributed, the problem has become more acute.

Relational databases (RDBMSes) have been around a very long time, but they’re not the only kind of database you can use. There have always been other kinds of store around, but the so-called NoSQL Movement has had particular prominence recently. This champions the use of new backing stores not based on the relational design, and not using SQL as a language. Many of these have radically different designs from the sort of RDBMS system that has been widely used for the last 30 years.

When and how to use NoSQL systems is a fascinating question, and I put forward our thinking on this. As always, it’s kind of complicated.  It certainly isn’t the case that throwing out an RDBMS and sticking in Mongo will make your application awesome.

Although they are lumped together as “NoSQL”, this is not actually a useful definition, because there is very little that all of these have in common. Instead I suggest that there are these types of NoSQL backing store available to us right now:

  • Document stores – MongoDB, XML databases, ZODB
  • Graph databases – Neo4j
  • Key/value stores – Dynamo, BigTable, Cassandra, Redis, Riak, Couch

These are so different from each other that lumping them in to the same category together is really quite unhelpful.

Graph databases

Graph databases have some very specific use cases, for which they are excellent, and probably a lot of utility elsewhere. However, for our purposes they’re not something we’d consider generally, and I’ll not say any more about them here.

Document stores

I am pretty firmly in the camp that Document stores, such as MongoDB, should never be used generally either (for which I will undoubtedly catch some flak). I have a lot of experience with document databases, particularly ZODB and dbxml, and I know whereof I speak.

These databases store “documents” as schema-less objects. What we mean by a “document” here is something that is:

  • self-contained
  • always required in it’s entirety
  • more valuable than the links between documents or it’s metadata.

My experience is that although often you may think you have documents in your system, in practice this is rarely the case, and it certainly won’t continue to be the case. Often you start with documents, but over time you gain more and more references between documents, and then you gain records and and all sorts of other things.

Document stores are poor at handling references, and because of the requirement to retrieve things in their entirety you denormalise a lot. The end result of this is loss of consistency, and eventually doom with no way of recovering consistency.

We do not recommend document stores in the general case.

Key/value stores

These are the really interesting kind of NoSQL database, and I think these have a real general potential when held up against the RDBMS options.  However, there is no magic bullet and you need to choose when to use them carefully.

You have to be careful when deciding to build something without an RDBMS. An RDBMS delivers a huge amount of value in a number of areas, and for all sorts of reasons. Many of the reasons are not because the RDBMS architecture is necessarily better but because they are old, well-supported and well-understood.

For example, PostgreSQL (our RDBMS of choice):

  • has mature software libraries for all platforms
  • has well-understood semantics for backup and restore, which work reliably
  • has mature online backup options
  • has had decades of performance engineering
  • has well understood load and performance characteristics
  • has good operational tooling
  • is well understood by many developers

These are significant advantages over newer stores, even if they might technically be better in specific use cases.

All that said, there are some definite reasons you might consider using a key/value store instead of an RDBMS.

Reason 1: Performance

Key/value stores often naively appear more performant than RDBMS products, and you can see some spectacular performance figures in direct comparisons. However, none of them really provide magic performance increases over RDBMS systems, what they do is provide different tradeoffs. You need to decide where your performance tradeoffs lie for your particular system.

In practice what key/value stores mostly do is provide some form of precomputed cache of your data, by making it easy (or even mandatory) to denormalize your data, and by providing the performance characteristics to make pre-computation reasonable.

If you have a key/value store that has high write throughput characteristics, and you write denormalized data into it in a read-friendly manner then what you are actually doing is precomputing values. This is basically Just A Cache. Although it’s a pattern that is often facilitated by various NoSQL solutions, it doesn’t depend on them.

RDBMS products are optimised for correctness and query performance and  write performance takes second place to these.  This means they are often not a good place to implement a pre-computed cache (where you often write values you never read).

It’s not insane to combine an RDBMS as your master source of data with something like Redis as an intermediate cache.  This can give you most of the advantages of a completely NoSQL solution, without throwing out all of the advantages of the RDBMS backing store, and it’s something we do a lot.

Reason 2: Distributed datastores

If you need your data to be highly available and distributed (particularly geographically) then an RDBMS is probably a poor choice. It’s just very difficult to do this reliably and you often have to make some very painful and hard-to-predict tradeoffs in application design, user interface and operational procedures.

Some of these key/value stores (particularly Riak) can really deliver in this environment, but there are a few things you need to consider before throwing out the RDBMS completely.

Availability is often a tradeoff one can sensibly make.  When you understand quite what this means in terms of cost, both in design and operational support (all of these vary depending on the choices you make), it is often the right tradeoff to tolerate some downtime occasionally.  In practice a system that works brilliantly almost all of the time, but goes down in exceptional circumstances, is generally better than one that is in some ways worse all of the time.

If you really do need high availability though, it is still worth considering a single RDBMS in one physical location with distributed caches (just as with the performance option above).  Distribute your caches geographically, offload work to them and use queue-based fanout on write. This gives you eventual consistency, whilst still having an RDBMS at the core.

This can make sense if your application has relatively low write throughput, because all writes can be sent to the single location RDBMS, but be prepared for read-after-write race conditions. Solutions to this tend to be pretty crufty.

Reason 3: Application semantics vs SQL

NoSQL databases tend not to have an abstraction like SQL. SQL is decent in its core areas, but it is often really hard to encapsulate some important application semantics in SQL.

A good example of this is asynchronous access to data as parts of calculations. It’s not uncommon to need to query external services, but SQL really isn’t set up for this. Although there are some hacky workarounds if you have a microservice architecture you may find SQL really doesn’t do what you need.

Another example is staleness policies.  These are particularly problematic when you have distributed systems with parts implemented in other languages such as Javascript, for example if your client is a browser or a mobile application and it encapsulates some business logic.

Endpoint caches in browsers and mobile apps need to represent the same staleness policies you might have in your backing store and you end up implementing the same staleness policies in Javascript and then again in SQL, and maintaining them. These are hard to maintain and test at the best of times. If you can implement them in fewer places, or fewer languages, that is a significant advantage.

In addition, it is a practical case that we’re not all SQL gurus. Having something that is suboptimal in some cases but where we are practically able to exploit it more cheaply is a rational economic tradeoff.  It may make sense to use a key/value store just because of the different semantics it provides – but be aware of how much you are losing without including an RDBMS, and don’t be surprised if you end up reintroducing one later as a platform for analysis of your key/value data.

Reason 4: Load patterns

NoSQL systems can exhibit very different performance characteristics from SQL systems under real loads. Having some choice in where load falls in a system is sometimes useful.

For example, if you have something that scales front-end webservers horizontally easily, but you only have one datastore, it can be really useful to have the load occur on the application servers rather than the datastore – because then you can distribute load much more easily.

Although this is potentially less efficient, it’s very easy and often cheap to spin up more application servers at times of high load than it is to scale a database server on the fly.

Also, SQL databases tend to have far better read performance than write performance, so fan-out on write (where you might have 90% writes to 10% reads as a typical load pattern) is probably better implemented using a different backing store that has different read/write performance characteristics.

Which backing store to use, and how to use it, is the kind of decision that can have huge ramifications for every part of a system.  This post has only had an opportunity to scratch the surface of this subject and I know I’ve not given some parts of it the justice they deserve – but hopefully it’s clear that every decision has tradeoffs and there is no right answer for every system.

About us: Isotoma is a bespoke software development company based in York and London specialising in web apps, mobile apps and product design. If you’d like to know more you can review our work or get in touch.

Using mock.patch in automated unit testing

Mocking is a critical technique for automated testing. It allows you to isolate the code you are testing, which means you test what you think are testing. It also makes tests less fragile because it removes unexpected dependencies.

However, creating your own mocks by hand is fiddly, and some things are quite difficult to mock unless you are a metaprogramming wizard. Thankfully Michael Foord has written a mock module, which automates a lot of this work for you, and it’s awesome. It’s included in Python 3, and is easily installable in Python 2.

Since I’ve just written a test case using mock.patch, I thought I could walk through the process of how I approached writing the test case and it might be useful for anyone who hasn’t come across this.

It is important to decide when you approach writing an automated test what level of the system you intend to test. If you think it would be more useful to test an orchestration of several components then that is an integration test of some form and not a unit test. I’d suggest you should still write unit tests where it makes sense for this too, but then add in a sensible sprinkling of integration tests that ensure your moving parts are correctly connected.

Mocks can be useful for integration tests too, however the bigger the subsystem you are mocking the more likely it is that you want to build your own “fake” for the entire subsystem.

You should design fake implementations like this as part of your architecture, and consider them when factoring and refactoring. Often the faking requirements can drive out some real underlying architectural requirements that are not clear otherwise.

Whereas unit tests should test very limited functionality, I think integration tests should be much more like smoke tests and exercise a lot of functionality at once. You aren’t interested in isolating specific behaviour, you want to make it break. If an integration test fails, and no unit tests fail, you have a potential hotspot for adding additional unit tests.

Anway, my example here is a Unit Test. What that means is we only want to test the code inside the single function being tested. We don’t want to actually call any other functions outside the unit under test. Hence mocking: we want to replace all function calls and external objects inside the unit under test with mocks, and then ensure they were called with the expected arguments.

Here is the code I need to test, specifically the ‘fetch’ method of this class:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
class CloudImage(object):
 
    __metaclass__ = abc.ABCMeta
 
    blocksize = 81920
    def __init__(self, pathname, release, arch):
        self.pathname = pathname
        self.release = release
        self.arch = arch
        self.remote_hash = None
        self.local_hash = None
 
    @abc.abstractmethod
    def remote_image_url(self):
        """ Return a complete url of the remote virtual machine image """
 
    def fetch(self):
        remote_url = self.remote_image_url()
        logger.info("Retrieving {0} to {1}".format(remote_url, self.pathname))
        try:
            response = urllib2.urlopen(remote_url)
        except urllib2.HTTPError:
            raise error.FetchFailedException("Unable to fetch {0}".format(remote_url))
        local = open(self.pathname, "w")
        while True:
            data = response.read(self.blocksize)
            if not data:
                break
            local.write(data)

I want to write a test case for the ‘fetch’ method. I have elided everything in the class that is not relevant to this example.

Looking at this function, I want to test that:

  1. The correct URL is opened
  2. If an HTTPError is raised, the correct exception is raised
  3. Open is called with the correct pathname, and is opened for writing
  4. Read is called successive times, and that everything returned is passed to local.write, until a False value is returned

I need to mock the following:

  1. self.remote_image_url()
  2. urllib2.urlopen()
  3. open()
  4. response.read()
  5. local.write()

This is an abstract base class, so we’re going to need a concrete implementation to test. In my test module therefore I have a concrete implementation to use. I’ve implemented the other abstract methods, but they’re not shown.

1
2
3
class MockCloudImage(base.CloudImage):
    def remote_image_url(self):
        return "remote_image_url"

Because there are other methods on this class I will also be testing, I create an instance of it in setUp as a property of my test case:

class TestCloudImage(unittest2.TestCase):

    def setUp(self):
        self.cloud_image = MockCloudImage("pathname", "release", "arch")

Now I can write my test methods.

I’ve mocked self.remote_image_url, now i need to mock urllib2.urlopen() and open(). The other things to mock are things returned from these mocks, so they’ll be automatically mocked.

Here’s the first test:

1
2
3
4
5
6
7
8
    @mock.patch('urllib2.urlopen')
    @mock.patch('__builtin__.open')
    def test_fetch(self, m_open, m_urlopen):
        m_urlopen().read.side_effect = ["foo", "bar", ""]
        self.cloud_image.fetch()
        self.assertEqual(m_urlopen.call_args, mock.call('remote_image_url'))
        self.assertEqual(m_open.call_args, mock.call('pathname', 'w'))
        self.assertEqual(m_open().write.call_args_list, [mock.call('foo'), mock.call('bar')])

The mock.patch decorators replace the specified functions with mock objects within the context of this function, and then unmock them afterwards. The mock objects are passed into your test, in the order in which the decorators are applied (bottom to top).

Now we need to make sure our read calls return something useful. Retrieving any property or method from a mock returns a new mock, and the new returned mock is consistently returned for that method. That means we can write:

1
m_urlopen().read

To get the read call that will be made inside the function. We can then set its “side_effect” – what it does when called. In this case, we pass it an iterator and it will return each of those values on each call.

Now we call call our fetch method, which will terminate because read eventually returns an empty string.

Now we just need to check each of our methods was called with the appropriate arguments, and hopefully that’s pretty clear how that works from the code above. It’s important to understand the difference between:

1
m_open.call_args

and

1
m_open().write.call_args_list

The first is the arguments passed to “open(…)”. The second are the arguments passed to:

1
local = open(); local.write(...)

Another test method, testing the exception is now very similar:

1
2
3
4
5
    @mock.patch('urllib2.urlopen')
    @mock.patch('__builtin__.open')
    def test_fetch_httperror(self, m_open, m_urlopen):
        m_urlopen.side_effect = urllib2.HTTPError(*[None] * 5)
        self.assertRaises(error.FetchFailedException, self.cloud_image.fetch)

You can see I’ve created an instance of the HTTPError exception class (with dummy arguments), and this is the side_effect of calling urlopen().

Now we can assert our method raises the correct exception.

Hopefully you can see how the mock.patch decorator saved me a spectacular amount of grief.

If you need to, it can be used as a context manager as well, with “with”, giving similar behaviour. This is useful in setUp functions particularly, where you can use the with context manager to create a mocked closure used by the system under test only, and not applied globally.

About us: Isotoma is a bespoke software development company based in York and London specialising in web apps, mobile apps and product design. If you’d like to know more you can review our work or get in touch.

What Heartbleed means for you

On the 8th April a group of security researchers published information about a newly discovered exploit in a popular encryption library. With some marketing panache, they called this exploit “Heartbleed”.

A huge number of Internet services were vulnerable to this exploit, and although many of them have now been patched many remain. In particular, this was an Open Source library and so many of the very largest and most popular sites were directly affected.

Attention on the exploit has so far focused on the possible use of the exploit to obtain “key material” from affected sites, but there are some more immediate ramifications and you need to act to protect yourself.

Unfortunately the attack will also reveal other random bits of webserver’s memory, which can include usernames, passwords and cookies. Obtaining this information will allow attackers to log into these services as you, and then conduct more usual fraud and identity theft.

Once the dust has settled (so later today on the 9th, or tomorrow on the 10th) you should go and change every single one of your passwords. Start with the passwords you’ve used recently and high value services.

It’s probably a good idea to clear all your cookies too once you’ve done this, to force you to re-login to every service with your new password.

You should also log out of every single service on your phone, and then re-login in, to get new session cookies. If you are particularly paranoid, wipe your phone and reinstall. Mobile app session cookies are likely to be a very popular vector for this attack.

This is an enormous amount of work, but you can use it as an opportunity to set some decent random passwords for every service and adopt a tool like LastPass, 1Password or KeePass while you are at it.

Most people are hugely vulnerable to password disclosure because they share passwords between accounts, and the entire world of black-hats are out there right now slurping passwords off every webserver they can get them from. There is going to be a huge spike in fraud and identity theft soon, and you want to make sure you are not a victim to it.

The Man-In-The-Middle Attack

In simple terms this would allow an attacker to impersonate the site’s SSL certificate, so they can show the padlock icon that demonstrates a secure connection, even though they control your connection.

They can only do this if they also manage to somehow make your browser connect to their computers for the request. This can normally only be done by either controlling part of your connection directly (hacking your router maybe), or by “poisoning” your access to the Domain Name Service with which you find out how to reach a site (there are many ways to do this, but none of them are trivial).

You can expect Internet security types to be fretting about this one for a long time to come, and there are likely to be some horrific exploits against some high-profile sites executed by some of the world’s most skilled hackers. If they do it well enough, we may never hear of it.

The impact of this exploit is going to have huge ramifications for server operators and system designers, but there is very little in practical terms that most people can mitigate this risk for their own browsing.

About us: Isotoma is a bespoke software development company based in York and London specialising in web apps, mobile apps and product design. If you’d like to know more you can review our work or get in touch.

PloneConf2010 – and there’s more

I’m here at PloneConf2010 too, and I’ve been to mostly different talks to Mitch, so here’s my write up too.  It’s taken a bit of time to find the time to write this up properly!

Calvin Hendryx-Parker: Enterprise Search in Plone using Solr

We’ve been using Solr for a few years here at Isotoma, but so far we’ve not integrated it with Plone.  Plone’s built-in Catalog is actually pretty good, however one thing it doesn’t do fantastically well is full-text search.  It is passable in English, but has very limited stemming support – which makes it terrible in other languages.

Calvin presented their experience of using Solr with Plone. They developed their own software to integrate the Plone catalog with Solr, instead of using collective.solr, which up till then was the canonical way of connecting them. Their new product alm.solrindex sounds significantly better than collective.solr.  Based on what I’ve heard here, you should definitely use alm.solrindex.

To summarise how this all hangs together, you need an instance of Solr installed somewhere that you can use.  You can deploy a solr specifically for each site, in which case you can deploy it through buildout.  Solr is Java, and runs inside various Java application servers.

You can also run a single Solr server for multiple Plone sites – in which case you partition the Solr database.

You then configure Solr, telling it how to index and parse the fields in your content. No configuration of this is required within Plone.  In particular you configure the indexes in Solr not in Plone.

Then install alm.solrindex in your plone site and delete all the indexes that you wish to use with Solr. alm.solrindex will create new indexes by inspecting Solr.

Then reindex your site, and you’re done!  It supports a lot of more complex use cases, but in this basic case you get top-end full text indexing at quite low cost.

Dylan Jay, PretaWeb: FunnelWeb

Funnelweb sounds invaluable if you want to convert an existing non-Plone site into a Plone site, with the minimum effort.

Funnelweb is a tool based on transmogrifier. Transmogrifier provides a “pipeline” concept for transforming content. Pipeline stages can be inserted into a pipeline, and these stages then have the ability to change the content in various ways.

Dylan wrote funnelweb to use transmogrifier and provide a harness for running it in a managed way over existing websites.  The goal is to create a new Plone site, using the content from existing websites.

Funnelweb uploads remotely to Plone over XML-RPC, which means none of transmogrifier needs to be installed in a Plone site, which is a significant advantage.  It is designed to be deployed using buildout, so a script will be provided in your build that executes the import.

A bunch of pipeline steps are provided to simplify the process of importing entire sites.  In particular funnelweb has a clustering algorithm that attempts to identify which parts of pages are content and which are templates.  This can be configured by providing xpath expressions to identify page sections, and then extract content from them for specific content fields.

It supports the concept of ordering and sorts, so that Ordered Folder types are created correctly.  It supports transmogrify.siteanalyser.attach to put attachments closer to pages and transmogrify.siteanalyser.defaultpage to detect index pages in collections and to make them folder indexes in the created sites.

Finally it supports relinking, so that pages get sane urls and all links to those pages are correctly referenced.

Richard Newbury: The State of Plone Caching

The existing caching solution for Plone 3 is CacheFu, which is now pretty long in the tooth.  I can remember being introduced to CacheFu by Geoff Davis at the Archipelago Sprint in 2006, where it was a huge improvement on the (virtually non-existent) support for HTTP caching in Plone.

It’s now looking pretty long in the tooth, and contains a bunch of design decisions that have proved problematic over time, particularly the heavy use of monkeypatching.

This talk was about the new canonical caching package for Plone, plone.app.caching. It was built by Richard Newbury, based on an architecture from the inimitable Martin Aspeli.

This package is already being used on high-volume sites with good results, and from what I saw here the architecture looks excellent.  It should be easy to configure for the general cases and allows sane extension of software to provide special-purpose caching configuration (which is quite a common requirement).

It provides a basic knob to control caching, where you can select strong, moderate or weak caching.

It can provide support for the two biggest issues in cache engineering: composite views (where a page contains content from multiple sources with different potential caching strategies) and split views (where one page can be seen by varying user groups who cannot be identified entirely from a tuple of URL and headers listed in Vary).

It provides support for nginx, apache, squid and varnish.  Richard recommends you do not use buildout recipes for Varnish, but I think our recipe isotoma.recipe.varnish would be OK, because it is sufficiently configuration.  We have yet to review the default config with plone.app.caching though.

Richard recommended some tools as well:

  • funkload for load testing
  • browsermob for real browsers
  • HttpFox instead of LiveHttpHeaders
  • Firebug, natch
  • remember that hitting refresh and shift-refresh force caches to refresh.  Do not use them while testing!

Jens Klein: Plone is so semantic, isn’t it?

Jens introduced a project he’s been working on called Interactive Knowledge Stack (IKS), funded by the EU.  This project is to provide an open source Java component for Content Management Systems in Europe to help the adoption of Semantic concepts online.  The tool they have produced is called FISE. The name is pronounced like an aussie would say “phase” ;)

FISE provides a RESTful interface to allow a CMS to associate semantic statements with content.  This allows us to say, for example that item X is in Paris, and in addition we can state that Paris is in France.  We can now query for “content in France” and it will know that this content is in France.

The provide a generic Python interface to FISE which is usable from within Plone.  In addition it provides a special index type that integrates with the Plone Catalog to allow for updating the FISE triple store with the information found in the content.  It can provide triples based on hierarchical relationships found in the plone database (page X is-a-child-of folder Y).

Jens would like someone to integrate the Aloha editor into Plone, which would allow much easier control by editors of semantic statements made about the content they are editing.

QCon London 2010

A couple of us went to QCon London last week, which as usual had some excellent speakers and some cutting edge stuff.  QCon bills itself as “enterprise software development conference designed for team leads, architects and project management”, but it has a reputation for being an awful lot more interesting than that.  In particular it covers a lot of cutting-edge work in architecture.

Scale, scale, scale

What that means in 2010 is scale, scale, scale – how do you service a bazillion people.  In summary, nobody really has a clue.  There were presentations from Facebook, Skype, BBC, Sky and others on how they’ve scaled out, as well as presentations on various architectural patterns that lend themselves to scale.

Everyone has done it differently using solutions tailored to their specific problem-space, pretty much all using Open Source technology but generally building something in-house to help them manage scale.  This is unfortunate – it would be lovely to have a silver bullet for the scale problem.

Functional languages

From the academics there is a strong consensus that functional languages are the way forward, with loads of people championing Erlang.  I’m a big fan of Erlang myself, and we’ve got a few Erlang coders here at Isotoma.

There was also some interesting stuff on other functional approaches to concurrency, in Haskell specifically and in general.  One of the great benefits of functional languages is their ability to defer execution through lazy evaluation, which showed some remarkable performance benefits compared with more traditional data synchronisation approaches.  I’d have to wave my hands to explain it better, sorry.

Real-world solutions

Erlang is now being used in production in some big scale outs now too: the BBC are using CouchDB, which they gave a glowing report to.

Skype are using Postgres (our preferred RDBMS here) and achieving remarkable scale using pretty simple technologies like pgbouncer.  The architect speaking for Skype said one of their databases had 60 billion rows, spread over 64 servers, and that it was performing fine.  That’s a level of scale that’s outside what you’d normally consider sane.

They did need a dedicated team of seriously clever people though – and that’s one of the themes from all the really big shops who talked, that they needed large, dedicated teams of very highly-paid engineers.  Serious scale right now is not an off-the-shelf option.

NoSQL

Erlang starred in one of the other big themes being discussed, NoSQL databases.  We’ve had our own experience with these here, specifically using Oracle’s dbXML, with not fantastic results.  XML is really not suited to large scale performance unfortunately.  Some of the other databases being talked about now though: Cassandra from Facebook, CouchDB and Voldemort from Amazon.

None of these are silver bullets either though – many of them do very little heavy lifting for you – often your application needs custom consistency or transaction handling, or you get unpredictable caching (i.e. “eventual consistency”).  You need to architect around your user’s actual requirements, you can’t use an off-the-shelf architecture and deploy it for everyone.

The need to design around your user’s was put very eloquently by Udi Dahan in his Command-Query Responsibility Segregation talk.  This was excellent, and it was pleasant to discover that an architecture we’d already derived ourselves from first principles (which I can’t talk about yet) had an actual name and everything!  In particular he concentrated on divining User Intent rather than throwing in your normal GUI toolkit for building UIs – he took data grids to pieces, and championed the use of asynchronous notification.  The idea of a notification stream as part of a call-centre automation system, rather than hitting F5 to reload repeatedly, was particularly well told.

DevOps, Agile and Kanban

Some of the other tracks were particularly relevant to us.  The DevOps movement attempts to make it easier for development and operations teams to work closely together.  For anyone who has worked in this industry this will be familiar issue – development and ops have different definitions of success, and different expectations from their customers.  When these come into conflict, everyone gets hurt.

There was a great presentation from Simon Stewart of webdriver fame about his role as a System Engineer in Test at Google, where they have around one SET to 7 or 8 developers to help productionise the software, provide a proper test plan and generally improve the productivity and quality of code by applying ops and automated testing principles to development.

One of the things we’ve experienced a lot here over the last year, as we’ve grown, is that there are a lot of bottlenecks, pinch points and pain in areas outside development too.  Agile addresses a lot of the issues in a development team, but doesn’t address any of the rest of the process of going from nothing to running software in production.  We’ve experienced this with pain in QA, productionisation, documentation, project management, specification – in fact every area outside actual coding!

Lean Kanban attempts to address this, with methods adopted from heavy industry. I’m not going to talk about it here, but there’s definitely a role for this kind of process management, if you can get your customer on-side.

Training and Software Craftsmanship

Finally what I think was the most interesting talk of the conference and one directly relevant to my current work, Jason Gorman gave a fantastic talk about a training scheme he is running with the BBC to improve software craftsmanship using peer-review.  I’ll be trying this out at Isotoma, and I’ll blog about it too!

Some thoughts on concurrency

In an earlier post Over on the Twisted blog, Duncan McGreggor has asked us to expand a bit on where we think Twisted may be lacking in it’s support for concurrency. I’m afraid this has turned into a meandering essay, since I needed to reference so much background. It does come to the point eventually…

An unsolved problem

To many people it must seem as though “computers” are a solved problem. They seem to improve constantly, they do many remarkable things and the Internet, for example, is a wonder of the modern world. Of course there are screw ups, especially in large IT projects, and these are generally blamed on incompetent officials and greedy consulting firms and so on.

Although undoubtedly officials are incompetent and consultants are greedy, these projects are often crippled by the failure of industry to recognise that some of the core problems of systems design are an unsolved problem. Concurrency is one of the major areas where they fall down. Building an IT system to service a single person is straightforward. Rolling that same system out to service hundreds of thousands is not.

It may seem odd to people outside the world of software, but concurrency (“doing several things at once”) is *still* one of the hot topics in software architecture and language design. Not only is it not a solved problem, there’s still a lot of disagreement on what the problem even *is*.

Here’s a typical scenario in IT systems rollout. Every experienced engineer will have been involved in this. A project where it seemed to be going ok, the software was substantially complete and people were talking about live dates. So the developers chuck it over the wall to the systems guys, so they can run some tests to work out how much hardware they’ll need.

And the answer comes back something like “we’re going to need one server per user” or “it falls over with four simultaneous users”. And I can tell you, if you get *that far* and discover this, the best option is to flee. Run for the hills and don’t look back.

Two worlds

There has always been a distinction between the worlds of academia and industry. Academics frame problems in levels of theoretical purity, and then address them in the abstract. Industry is there to solve immediate problems on the ground, using the tools that are available.

Academics have come up with a thousand ways to address concurrency, and a lot of these were dreamt up in the early days of computing. All the things I’m going to talk about here were substantially understood in the eighties. But these days it takes twenty years for something to make it from academia to something industry can use, and that time lag is increasing.

Industry only really cares about it’s tooling. The fact that academics have dreamt up some magic language that does really cool stuff is of no interest if there isn’t an ecosystem big enough to use. That ecosystem needs trained developers, books, training courses, compilers, interpreters, debuggers, profilers and of course huge systems libraries to support all the random crap every project needs (oh, it’s just like the last project except we need to write iCalendar files *and* access a remote MIDI music device). It also needs actual physical “tin” on which to run the code, and the characteristics of the tin make a lot of difference.

Toy academic languages are *no use*, as far as most of industry is concerned, for solving their problems. If you can’t go and get five hundred contractors with it on their CV, then you’re stuck.

The multicore bombshell

So, all industry has these days, really, is C++ and Java. C++ is still very widely used, but Java is gaining ground rapidly, and one of the reasons for this is it’s support for concurrency. I’ll quote Steve Yegge:

But it’s interesting because C++ is obviously faster for, you know, the short-running [programs], but Java cheated very recently. With multicore! This is actually becoming a huge thorn in the side of all the C++ programmers, including my colleagues at Google, who’ve written vast amounts of C++ code that doesn’t take advantage of multicore. And so the extent to which the cores, you know, the processors become parallel, C++ is gonna fall behind.

But for now, Java programs are getting amazing throughput because they can parallelize and they can take advantage of it. They cheated! Right? But threads aside, the JVM has gotten really really fast, and at Google it’s now widely admitted on the Java side that Java’s just as fast as C++.

His point here is vitally important. The reason Java is gaining is not an abstract language reason, it’s because of a change in the architecture of computers. Most new computers these days are multicore. They have more than one CPU on the processor die. Java has fundamental support for threading, which is one approach to concurrency, and so some programs can take advantage of the extra cores. On a quad-core machine, with the right program, Java will run four times faster than C++. A win, right?

Well here’s a comment from the master himself, Don Knuth:

I might as well flame a bit about my personal unhappiness with the current trend toward multicore architecture. To me, it looks more or less like the hardware designers have run out of ideas, and that they’re trying to pass the blame for the future demise of Moore’s Law to the software writers by giving us machines that work faster only on a few key benchmarks! I won’t be surprised at all if the whole multithreading idea turns out to be a flop, worse than the “Itanium” approach that was supposed to be so terrific—until it turned out that the wished-for compilers were basically impossible to write.

Let me put it this way: During the past 50 years, I’ve written well over a thousand programs, many of which have substantial size. I can’t think of even five of those programs that would have been enhanced noticeably by parallelism or multithreading. Surely, for example, multiple processors are no help to TeX….

I know that important applications for parallelism exist—rendering graphics, breaking codes, scanning images, simulating physical and biological processes, etc. But all these applications require dedicated code and special-purpose techniques, which will need to be changed substantially every few years.

(via Ted Tso)

Hardware designers are threatening to increase the numbers of cores massively. Right now you get two, four maybe eight core systems. But soon maybe hundreds of cores. This is important.

The problems with threading

Until recently, if you’d said to pretty much any developer that concurrency was an unsolved problem, they’d look at you like you were insane. Threading was the answer – everyone knew that. It’s supported in all kernels in all major Operating Systems. Any serious software used threads widely to handle all sorts of concurrency, and hey it was easy – Java, for example, provides primitives in the language itself to manage synchronisation and all the other stuff you need.

But then some people started realising that it wasn’t quite so good as it seemed. Steve Yegge again:

I do know that I did write a half a million lines of Java code for this game, this multi-threaded game I wrote. And a lot of weird stuff would happen. You’d get NullPointerExceptions in situations where, you know, you thought you had gone through and done a more or less rigorous proof that it shouldn’t have happened, right?

And so you throw in an “if null”, right? And I’ve got “if null”s all over. I’ve got error recovery threaded through this half-million line code base. It’s contributing to the half million lines, I tell ya. But it’s a very robust system.

You can actually engineer these things, as long as you engineer them with the certain knowledge that you’re using threads wrong, and they’re going to bite you. And even if you’re using them right, the implementation probably got it wrong somewhere.

It’s really scary, man. I don’t… I can’t talk about it anymore. I’ll start crying.

This is a pretty typical experience of anyone who has coded something serious with threads. Weird stuff happens. You get deadlocks and breakage and just utterly confusing random stuff.

And you know, all those times your Windows system just goes weird, and stuff hangs and crashes and all sorts. I’m willing to bet a good proportion of those are due to errors in threading.

In reality threads are hard. It’s sort of accepted wisdom these days (at least amongst some of the community) that threads are actually too hard. Too hard for most programmers anyhow.

Python

We’re Python coders, so Python is obviously of particular interest to us. We also write concurrent systems. Python’s creator (Guido van Rossem) took an approach to threading, which has become pretty standard in most modern “dynamic” languages. Rather than ensure the whole Python core is “thread-safe” he introduced a Global Interpreter Lock. This means that in practice when one thread is doing something it’s often impossible for the interpreter to context switch to other threads, because the whole interpreter is locked.

It certainly means threads in Python are massively less useful than they are in, say, Java. For a lot of people this has doomed Python – “what no threads!?” they cry, and then move on. Which is a shame, because threads are not the only answer, and as I’ve said I don’t even think they are a good answer.
Enter Twisted. Twisted is single-threaded, so it avoids all of the problems of threads. Concurrency is handled cooperatively, with separate subsytems within your program yielding control, either voluntarily or when they would block (i.e. when they are waiting for input).

This model fits a large proportion of programming problems very effectively, and it’s much more efficient than threads. So how does this handle multicore? Pretty effectively right now. We design our software in such a way that core parts can be run separately and scaled by adding more of them (“horizontal” scaling in the parlance). Our soon-to-be-released CMS, Exotypes, works this way, using multiple processes to exploit multiple cores.

This is a really effective approach. We can run say six processes, load balance between them and it takes great advantage of the hardware. Because we’ve designed it to work this way, we can even scale across multiple physical computers, giving us a lot of potential scale.

But what of machines of the future? Over a hundred cores, run a hundred processes? Over a thousand? At large numbers of cores the multi-process model breaks down too. In fact I don’t think any commonly deployed OS will handle this sort of hardware well at all, except for specialised applications. This is where I think Twisted falls down, through no fault of it’s own. I just suspect, like Don Knuth, that the hardware environment of the future is one that’s going to be extremely challenging for us to work in.

Two worlds, reprise

Of course, these issues have been addressed in academia, and I think, to finally answer Duncan’s question, that the long term solution to concurrency has to be addressed as part of the language. The only architecture that I think will handle it is the sort of thing represented in Erlang – lightweight processes that share no state.

Erlang addresses the challenge of multicore computer fantastically well, but as a language for writing real programs it suffers some huge lacks. I don’t think it’s Erlang that’s going to win, but it’s going to be a language with many of it’s features.

First, Erlang is purely functional, with no object-oriented structures. Pretty much every coder in the world has been trained, and is familiar with, the OO paradigm. For a language to gain traction it’s going to need to support this. This is quite compatible with Erlang’s concurrency model, and shouldn’t be too hard to support.

It also needs a decent library. Right now, the Erlang library ecosystem is, well, sparse.

Finally it needs wide adoption.

So, gods of the machines, I want something that’s got OCaml’s functional OO, Erlang’s concurrency and distribution, Python’s syntax and Python’s standard library. And I want you to bribe people to use it.

If you can do all this, not only will we be able to support multicore, but we might also, finally, be able to actually build a large IT system that actually works.

Big thanks to Rapidswitch

Our server ISP is RapidSwitch. It’s unfortunate, but most ISPs are pretty poor – there aren’t enough good people to go around, margins are very tight and it’s just the kind of work where it’s hard to keep good people. Last night RapidSwitch showed that at least not all ISPs are poor.

Because of the OpenSSL issues yesterday, we chose to reboot all of our servers last night, to ensure every service was using the new SSL code. A new kernel image came down too yesterday, and a number of our machines had the updated kernel.

We rebooted a number of machines on one cluster simultaneously… and they didn’t come back. We requested a KVM session, but in the meantime one of RapidSwitch’s engineers had noticed 4 of our machines were down simultaneously, so he went and took a look. Proactivity!

He worked out what had happened, and raised a ticket for us, telling us that the new Debian kernel was incompatible with our Network Cards. We asked him to manually boot the machines into the previous kernel, and they came back up without a hitch. Clue!

He then said RapidSwitch were aware of this issue and they were offering a free PCI network card to anyone who needed them. Planning!

Frankly this is unheard of in my experience. Massively well done guys – that’s what I call service.

Debian’s OpenSSL Disaster

Many of you will know by now of the serious security problems revealed yesterday by Debian, the Linux distribution. We use Debian exclusively for our server platform, so we had to react very quickly to this issue, and make sure our systems were secure again. I think we’ve made all of the necessary changes now to ensure we’re safe from this particular problem.

I have also made some attempt to get to the bottom of what actually went on, and I’ll record it here for posterity. If any of the below is wrong, please let me know!

What Happened

The story, basically, is this. In April 2006 bug #363516 was raised, suggesting that openssl wasn’t clean for valgrind. Valgrind is a package that detects problems in C code, and is widely used to help ensure software is correct. Valgrind reported some errors with openssl, and the reporter wanted to be able to use valgrind with openssl.

At that bug url a change is discussed to the openssl codebase. The general feeling from the bug discussion is that making this change isn’t a good idea, but then a patch was applied on 4th May 2006. There are two instances of the specific issue in the bug, one in ssleay_rand_add and one in ssleay_rand_bytes.

In the meantime, a discussion took place on the openssl-dev list. This mentions the same two lines, and on the 1st May ulf@openssl.org says he is in favour of removing them.

The patch amends the two lines suggested.

The problem, as I understand it, was a misunderstanding by the Debian Developer who made the change. The change to ssleay_rand_bytes was fine – this added some uninitialised memory into the entropy pool, which is fine. But the software doesn’t rely on it for security, and removing it is fine.

But the other change, in ssleay_rand_add, is a complete disaster. It alters the seeding for the random number generator in key generation, a serious flaw.
This reduces the keyspace to a few hundred thousand possible keys. It’s possible to generate all these keys in a few hours, and brute force a machine that’s using public key authentication with a compromised key in a few minutes, potentially. This is a security disaster of the first water, considering the number of organisations (such as ours) that rely on public key authentication for a lot of our inter-machine security. This also affected key generation for self signed email, web certificates, private networks, anonymous proxy networks and all sorts of other things. The cleaning up is going to take some time, and cost an awful lot. Some people are going to be compromised by this, and a lot of machines may be broken into.

Some background on how distributions work

Debian has a vast number of packages under management. They produce these packages by taking source packages from “upstream” (the people who maintain the software) and modifying it to fit the rules and goals of the distribution.

Some of these changes are for compatibility – for example, using standard file locations or configuration systems. Some of them are mechanical changes to do with integration with the build process. Quite a few changes are bug fixes.
It’s recommended that bug fixes be coordinated with upstream – send patches back to them, so everyone in the community can benefit from the changes.

Whose Fault Was It

After going through the above, it’s pretty clearly the DD (Debian Developer) in question’s fault. Although he suggested making changes on the openssl-dev list, and got an affirmative from someone on the project, it was pretty clear in the response that this was “if this helps with debugging it’s a good idea” not “i’ve closely read the code in question, and I agree”.

The DD should have also submitted a patch back to the openssl guys. They’d have spotted the error and screamed blue murder. He was a bit lazy and thoughtless here, and I imagine right now he wishes he could crawl into a hole and die.

What to do about it

Debian are getting badly slammed for this but it is worth keeping some perspective. We, and many others, use Debian because of it’s long history of excellent package quality. This is a result both of their culture (which is aggressively perfectionist) and their selection criteria for developers, which weeds out many dodgy ones. We are proud to use Debian, and will continue to do so.

DD’s are generally conscientious, knowledgeable and dedicated to their work. I have no reason to believe this DD was any different. Even conscientious, knowledgeable and dedicated people make mistakes. This is what process is for, to help mitigate human error. I think there was clearly a lack of process.
Two things would have really helped. Code review internally to debian and code review by upstream. I don’t think it’s unreasonable that for security critical packages Debian should require both for non-critical changes to these packages. Even critical changes should be reviewed as soon as possible.
Internal code review is impractical for every package, since it requires a good understanding of the code in question, and would impose a huge workload – but for critical packages I think it’s a necessity.

Upstream review is potentially tricky too. Some upstreams don’t have the time or inclination to participate. There is also often a lot of friction between distributions and upstream, since they have very different goals. This isn’t a problem that can be easily resolved – these groups really do have different goals and values, and sometimes unreconcilable differences arise. But for the good of their eventual users they need to work together to help stop this sort of problem occurring.

About us: Isotoma is a bespoke software development company based in York and London specialising in web apps, mobile apps and product design. If you’d like to know more you can review our work or get in touch.

Inflection Points

Some of you may know Tim Bray. He’s been a major player in some important technologies of the present (XML) and the future (Atom). He also has a really good blog.

He’s posted a good summary of some of the big issues in software and systems architecture. These are some of the points that occupy anyone involved in longer-term technology strategy, and it’s sobering to see them listed together like that. These are very exciting times to be in technology – but it’s probably easier now than it has ever been to back the wrong horse.

A lot of these issues are ones that we struggle with here at Isotoma, and as Chief Software Architect it’s my job to try and anticipate some of these trends, for the benefit of our clients. This seems like a good opportunity to respond to Tim, and to show how we’re thinking about technology strategy.

(Apologies if I lapse at times into gobbledegook. Some of the things I’ll talk about are just plain technical. I’ll try and link them appropriately, so at least there’s some context.)

Programming Languages

Up till not too damn long ago, for a big serious software project you could pick Java or .NET or, if you really liked pain, C++. Today you’d be nuts not to look seriously at PHP, Python, and Ruby. What’s the future mind-share of all these things? I have no idea, but that decision is being made collectively by the community right now.

He’s absolutely right, but obviously we’ve done rather more than look seriously. We’ve been a pretty much pure Python shop right from the outset. We use some PHP, when it makes sense, but Python is our clear choice, and it’s one we’re more than happy with. It’s a significant competitive advantage for us, in all sorts of ways.

Python has delivered for us in developer productivity, and on a larger scale it’s delivered in elegance – it scales very well as a developer language. Also, perhaps unlike Ruby, it scales very well in terms of performance, so I’m comfortable building very large systems in Python.

As Tim says, the community is deciding right now what to choose. There’s never quite an outright winner in language terms, but I’m comfortable having bet on Python, and it wouldn’t surprise me if, along with Javascript, it became one of the top languages of the next ten or fifteen years. The only caveat to this is below under “Processors”.

Databases

No, I don’t think relational databases are going away anytime soon. But I think that SQL’s brain-lock on the development community for the past couple of decades has been actively harmful, and I’m glad that it’s now OK to look at alternatives.

Will the non-relational alternatives carve out a piece of the market? I suspect so, but that decision is being made by the community, right now.

Brain-lock is right. It’s been the case for twenty years that for every single IT project your architect would open his toolbox and would pull out an RDBMS.

Relational Databases are still suited to a whole suite of applications and classes of problem. When you have highly structured data and very tight performance criteria they’re going to be a good choice for a long time to come. But for many of the problems they’ve been used to solve they are terminally ill-suited.

We’ve been using ZODB as part of Zope since 2004 (and I used it myself for several years before that). ZODB has some excellent characteristics for whole classes of problem that an RDBMS has problems with. It’s a lot more flexible, and it’s hierarchical nature provides a natural fit for web projects.
More recently we’ve been making heavy use of DB XML, Oracle’s Open Source XML database. This is a fantastic product, and it’s a much better model for most of the applications we build. A good example, oddly, would be Forkd which we built using a traditional RDBMS. If we had the experience of DB XML then that we have now then there’s no question that we’d use it for Forkd. Fitting recipes into a relational database is an exercise in ultimately pointless contortion.

I’m very confident XML databases are going to be huge.

Network Programming

CORBA is dead. DCOM is dead. WS-* is coughing its way down the slope to dusty death. REST, they say, is the way to go. Which I believe, actually. Still, there’s not much yet in tooling or best practices or received wisdom or blue-suit consultants or the other apparatus of a mainstream technology.

So what are they going to be teaching the kids, a few years hence, the right way is to build an application across a network full of heterogeneous technology? That’s being worked out by the community, right now.

The lack of tooling and blue-suit consultants (ignore my blue suit for the second) is, I think, a good thing. REST is not a technology stack, it’s an architectural style. It pares down the network programming model to fit the harsh realities of a stateless, highly concurrent, open system. We’re big fans of REST, and it’s a natural fit for how we work.

It’s not the whole story though, and there are a whole bunch of recurring problems in RESTful interfaces that are awaiting smart people to solve them. There’s some good work going on with URI Templates and PATCH, and of course Atom that I think are part of the solution yet.

Some relatively common orchestrations are horribly contorted in REST too, and it wouldn’t surprise me if here, to handle specific cases of lock acquisition and release and so forth we see some tooling.

Processors

Moore’s law is still holding, but the processors get wider not faster. Now that the best and the brightest have spent a decade building and debugging threading frameworks in Java and .NET, it’s increasingly starting to look like threading is a bad idea; don’t go there. I’ve personally changed my formerly-pro-threading position on this 180º since joining Sun four years ago.

We still haven’t figured out the right way for ordinary people to program many-core processors; check out the inconclusive results of my Wide Finder project last year. (By the way, I’ve now got an Internet-facing T2000 all of my own and will be re-launching Wide Finder as soon as I get some data staged on it; come one, come all).

And I can’t even repeat my crack about the right answer being worked out right now, because I’m not actually sure that anyone has a grip on it just yet. But we’re sure enough at an inflection point.

We’re a lot further down this particular inflection curve than most, I think. We make heavy use of Twisted, a single-threaded cooperatively multitasking network programming system that specifically addresses the threading problem.

I don’t think it’s the whole answer though, but nor is Erlang, which Tim championed in his Wide Finder project, with fascinating results.
Erlang has some marvellous attributes when it comes to large scale concurrent systems, and I’m very impressed with it. But adopting Erlang throws too much away, I think, losing the large-scale structural advantages of the Object Oriented approach that is pretty much the default for software architecture today.

Perhaps something like Stackless is the longer term solution here. An OO, message-passing, naturally distributed language using Python syntax and standard library but with some core functional changes (variables not being variable, for example) is the answer.

Or maybe even Mozart, which solves a lot of these problems too. It’s the current first-year MIT language [update: this is probably a lie, see comments], so expect to hear more of it in time.

Tim is right though, nobody really knows the answer here. All we know is that it certainly isn’t traditional multi-threaded programming, a la Java or C++.

Web Development

Used to be, it was Java EE or Perl or ASP.NET. Now all of a sudden it’s PHP and then Rails and a bunch of other frameworks bubbling up over the horizon; not a month goes buy that I don’t see a bit of buzz over something that includes the term “Rails-like”.

It seems obvious to me that pretty soon there’s going to be a Rails++ that combines the good ideas from RoR with some others that will be obvious once we see them.

Also, that some of those “Rails-like” frameworks, even if they’re not a huge step forward, will get some real market share because they’ll have some combination of of minor advantages.

Once again, I can’t say it’s being worked out right now, because for right now I see a pretty uniform picture of Rails’ market share advancing steadily. It won’t last.

We use a couple of rails-like frameworks ourselves, Turbogears being the most obviously MVC. The big ideas in Rails, and similar frameworks, is the combination of MVC with an Object Relational layer. Since, as I’ve said, I don’t think the Relational stuff is needed at all, there’s an obvious first place where Rails and friends should look. Ditch the RDBMS.

Second, MVC maps pretty well to a lot of applications, and it’s a natural architectural style for a lot of people. MVC isn’t the only architectural style though, and it’s not necessarily the best fit for some though. The well-documented problems at Twitter, for example, I think just show a poor fit between MVC and Twitter’s fascinating (and well chosen) Jabber back-end. I know for certain, I’d not have used anything Rails-like for that.

I think it’s likely that the notional “Rails++” will probably not be MVC, and nor will it have an Object Relational layer. I think Rails, and it’s imitators, are just not suited long-term to the challenges of scale and distribution. That said, they clearly work well for a whole host of small and medium-sized projects right now.

Business Models

Servers, they’re easy to understand. Blue-suited salesmen sell them to CIOs a few hundred thousand dollars’ worth at a time, they get loaded into data centers where they suck up too much power and HVAC.

Well, unless you’re gonna do your storage and compute and load-balancing and so on out in the cloud. Are you? The CIOs and data-center guys are wrestling this problem to the ground right now.

And as for software, used to be you shipped binaries on magnetic media and charged ’em a right-to-use license. Nope, nowadays it’s open-source and they download it for free and you charge them a support contract. Nope, that was last century; maybe the software’s all going to be out there in the cloud and you never download anything, just pay to use what’s there.

Personally, I don’t think any of those models are actually going to go away. But which works best where? The market’s working that out, right now.

Obviously we’re Open Source throughout, which takes us a long long way down this road already. We’ve got one secret squirrel project that’s successfully deployed using a massive Amazon EC2 back-end too, but I can’t say more about it.

Lets just say the massive economic advantages of the cloud are so conclusive that this is an obvious bet. Something like Google’s AppEngine is only a first step down this road, but it’s visionary and appropriate. And it’s in Python ;)

Desktops

As I wrote a couple of months ago: how long can the public and private sector IT management continue to go on ignoring the fact that in OS X and Ubuntu, there are not one but two alternatives to the Windows desktop that are more reliable, more secure, more efficient, and cheaper? More or less everybody now has a friend or relative that’s on Mac or Linux and is going to be wondering why their desktop can’t be that slick.

What’s going to happen? I don’t know, but it’s going to be dramatic once we get to the tipping point, and I think we’re approaching it right now.

We use Ubuntu throughout, on all our desktops and laptops.  Well, nearly.  The machine that runs Sage for accounting is Windows.  And our front-end guys need Windows or OSX to run PhotoShop and Flash.  But everything else?  Ubuntu works really well, and saves us an absolute fortune.

Will It Always Be Like This?

You know, just maybe. Our mastery of the IT technologies is still in its brawny youth, with lots of low-hanging fruit to be snatched and big advances to be made. And these days, with the advent of blogs and unconferences and all those new communication channels, our thought leaders are busy chattering at each other about all these problems all the time, 24/7/365. The gap between the leading edge and technology that’s actually deployed in the enterprise is as wide as it’s ever been and to me, that feels like a recipe for permanent disruption. Cowabunga!

Our industry has the greatest community of practice that has existed, perhaps, in the history of mankind.  Every profession has it’s conferences and papers and journals, but only in our part of IT is it normal to share and discuss all of our work, all of the time, even to the extent of giving away the very code we write.

I can’t see an end to this cycle of innovation yet – it’s just too damed valuable to everyone concerned.  Cowabunga indeed :)