Main

May 17, 2008

Some thoughts on concurrency

In an earlier post Over on the Twisted blog, Duncan McGreggor has asked us to expand a bit on where we think Twisted may be lacking in it’s support for concurrency. I’m afraid this has turned into a meandering essay, since I needed to reference so much background. It does come to the point eventually…

An unsolved problem

To many people it must seem as though “computers” are a solved problem. They seem to improve constantly, they do many remarkable things and the Internet, for example, is a wonder of the modern world. Of course there are screw ups, especially in large IT projects, and these are generally blamed on incompetent officials and greedy consulting firms and so on.

Although undoubtedly officials are incompetent and consultants are greedy, these projects are often crippled by the failure of industry to recognise that some of the core problems of systems design are an unsolved problem. Concurrency is one of the major areas where they fall down. Building an IT system to service a single person is straightforward. Rolling that same system out to service hundreds of thousands is not.

It may seem odd to people outside the world of software, but concurrency (“doing several things at once”) is still one of the hot topics in software architecture and language design. Not only is it not a solved problem, there’s still a lot of disagreement on what the problem even is.

Here’s a typical scenario in IT systems rollout. Every experienced engineer will have been involved in this. A project where it seemed to be going ok, the software was substantially complete and people were talking about live dates. So the developers chuck it over the wall to the systems guys, so they can run some tests to work out how much hardware they’ll need.

And the answer comes back something like “we’re going to need one server per user” or “it falls over with four simultaneous users”. And I can tell you, if you get that far and discover this, the best option is to flee. Run for the hills and don’t look back.

Two worlds

There has always been a distinction between the worlds of academia and industry. Academics frame problems in levels of theoretical purity, and then address them in the abstract. Industry is there to solve immediate problems on the ground, using the tools that are available.

Academics have come up with a thousand ways to address concurrency, and a lot of these were dreamt up in the early days of computing. All the things I’m going to talk about here were substantially understood in the eighties. But these days it takes twenty years for something to make it from academia to something industry can use, and that time lag is increasing.

Industry only really cares about it’s tooling. The fact that academics have dreamt up some magic language that does really cool stuff is of no interest if there isn’t an ecosystem big enough to use. That ecosystem needs trained developers, books, training courses, compilers, interpreters, debuggers, profilers and of course huge systems libraries to support all the random crap every project needs (oh, it’s just like the last project except we need to write iCalendar files and access a remote MIDI music device). It also needs actual physical “tin” on which to run the code, and the characteristics of the tin make a lot of difference.

Toy academic languages are no use, as far as most of industry is concerned, for solving their problems. If you can’t go and get five hundred contractors with it on their CV, then you’re stuck.

The multicore bombshell

So, all industry has these days, really, is C++ and Java. C++ is still very widely used, but Java is gaining ground rapidly, and one of the reasons for this is it’s support for concurrency. I’ll quote Steve Yegge:

But it’s interesting because C++ is obviously faster for, you know, the short-running [programs], but Java cheated very recently. With multicore! This is actually becoming a huge thorn in the side of all the C++ programmers, including my colleagues at Google, who’ve written vast amounts of C++ code that doesn’t take advantage of multicore. And so the extent to which the cores, you know, the processors become parallel, C++ is gonna fall behind.

But for now, Java programs are getting amazing throughput because they can parallelize and they can take advantage of it. They cheated! Right? But threads aside, the JVM has gotten really really fast, and at Google it’s now widely admitted on the Java side that Java’s just as fast as C++.

His point here is vitally important. The reason Java is gaining is not an abstract language reason, it’s because of a change in the architecture of computers. Most new computers these days are multicore. They have more than one CPU on the processor die. Java has fundamental support for threading, which is one approach to concurrency, and so some programs can take advantage of the extra cores. On a quad-core machine, with the right program, Java will run four times faster than C++. A win, right?

Well here’s a comment from the master himself, Don Knuth:

I might as well flame a bit about my personal unhappiness with the current trend toward multicore architecture. To me, it looks more or less like the hardware designers have run out of ideas, and that they’re trying to pass the blame for the future demise of Moore’s Law to the software writers by giving us machines that work faster only on a few key benchmarks! I won’t be surprised at all if the whole multithreading idea turns out to be a flop, worse than the “Itanium” approach that was supposed to be so terrific—until it turned out that the wished-for compilers were basically impossible to write.

Let me put it this way: During the past 50 years, I’ve written well over a thousand programs, many of which have substantial size. I can’t think of even five of those programs that would have been enhanced noticeably by parallelism or multithreading. Surely, for example, multiple processors are no help to TeX….

I know that important applications for parallelism exist—rendering graphics, breaking codes, scanning images, simulating physical and biological processes, etc. But all these applications require dedicated code and special-purpose techniques, which will need to be changed substantially every few years. (via Ted Tso)

Hardware designers are threatening to increase the numbers of cores massively. Right now you get two, four maybe eight core systems. But soon maybe hundreds of cores. This is important.

The problems with threading

Until recently, if you’d said to pretty much any developer that concurrency was an unsolved problem, they’d look at you like you were insane. Threading was the answer - everyone knew that. It’s supported in all kernels in all major Operating Systems. Any serious software used threads widely to handle all sorts of concurrency, and hey it was easy - Java, for example, provides primitives in the language itself to manage synchronisation and all the other stuff you need.

But then some people started realising that it wasn’t quite so good as it seemed. Steve Yegge again:

I do know that I did write a half a million lines of Java code for this game, this multi-threaded game I wrote. And a lot of weird stuff would happen. You’d get NullPointerExceptions in situations where, you know, you thought you had gone through and done a more or less rigorous proof that it shouldn’t have happened, right?

And so you throw in an “if null”, right? And I’ve got “if null”s all over. I’ve got error recovery threaded through this half-million line code base. It’s contributing to the half million lines, I tell ya. But it’s a very robust system.

You can actually engineer these things, as long as you engineer them with the certain knowledge that you’re using threads wrong, and they’re going to bite you. And even if you’re using them right, the implementation probably got it wrong somewhere.

It’s really scary, man. I don’t… I can’t talk about it anymore. I’ll start crying.

This is a pretty typical experience of anyone who has coded something serious with threads. Weird stuff happens. You get deadlocks and breakage and just utterly confusing random stuff.

And you know, all those times your Windows system just goes weird, and stuff hangs and crashes and all sorts. I’m willing to bet a good proportion of those are due to errors in threading.

In reality threads are hard. It’s sort of accepted wisdom these days (at least amongst some of the community) that threads are actually too hard. Too hard for most programmers anyhow.

Python

We’re Python coders, so Python is obviously of particular interest to us. We also write concurrent systems. Python’s creator (Guido van Rossem) took an approach to threading, which has become pretty standard in most modern “dynamic” languages. Rather than ensure the whole Python core is “thread-safe” he introduced a Global Interpreter Lock. This means that in practice when one thread is doing something it’s often impossible for the interpreter to context switch to other threads, because the whole interpreter is locked.

It certainly means threads in Python are massively less useful than they are in, say, Java. For a lot of people this has doomed Python - “what no threads!?” they cry, and then move on. Which is a shame, because threads are not the only answer, and as I’ve said I don’t even think they are a good answer.

Enter Twisted. Twisted is single-threaded, so it avoids all of the problems of threads. Concurrency is handled cooperatively, with separate subsytems within your program yielding control, either voluntarily or when they would block (i.e. when they are waiting for input).

This model fits a large proportion of programming problems very effectively, and it’s much more efficient than threads. So how does this handle multicore? Pretty effectively right now. We design our software in such a way that core parts can be run separately and scaled by adding more of them (“horizontal” scaling in the parlance). Our soon-to-be-released CMS, Exotypes, works this way, using multiple processes to exploit multiple cores.

This is a really effective approach. We can run say six processes, load balance between them and it takes great advantage of the hardware. Because we’ve designed it to work this way, we can even scale across multiple physical computers, giving us a lot of potential scale.

But what of machines of the future? Over a hundred cores, run a hundred processes? Over a thousand? At large numbers of cores the multi-process model breaks down too. In fact I don’t think any commonly deployed OS will handle this sort of hardware well at all, except for specialised applications. This is where I think Twisted falls down, through no fault of it’s own. I just suspect, like Don Knuth, that the hardware environment of the future is one that’s going to be extremely challenging for us to work in.

Two worlds, reprise

Of course, these issues have been addressed in academia, and I think, to finally answer Duncan’s question, that the long term solution to concurrency has to be addressed as part of the language. The only architecture that I think will handle it is the sort of thing represented in Erlang - lightweight processes that share no state.

Erlang addresses the challenge of multicore computer fantastically well, but as a language for writing real programs it suffers some huge lacks. I don’t think it’s Erlang that’s going to win, but it’s going to be a language with many of it’s features.

First, Erlang is purely functional, with no object-oriented structures. Pretty much every coder in the world has been trained, and is familiar with, the OO paradigm. For a language to gain traction it’s going to need to support this. This is quite compatible with Erlang’s concurrency model, and shouldn’t be too hard to support.

It also needs a decent library. Right now, the Erlang library ecosystem is, well, sparse.

Finally it needs wide adoption.

So, gods of the machines, I want something that’s got OCaml’s functional OO, Erlang’s concurrency and distribution, Python’s syntax and Python’s standard library. And I want you to bribe people to use it.

If you can do all this, not only will we be able to support multicore, but we might also, finally, be able to actually build a large IT system that actually works.

August 22, 2007

Interesting times in image processing

You’ve all seen the awesome Photosynth demo on TED. (If you haven’t, do.)

There’re several interesting things there. I especially liked the infinite-resolution Seadragon demo, with the startling claim that the only thing that should limit the speed of the application is the number of pixels being thrown around on-screen: not the size of the underlying image.

But the star of the show is arguably the ability to recognise features in photos, in order to composite them together intelligently. I presume the same technology would allow the computer to recognise known features or places, adding a semantic layer onto images currently absent. Imagine having your holiday snaps automatically tagged with the correct placenames, landmarks and from that, geodata.

Simultaneously, lots of companies (and certainly lots of government agencies) are working on facial recognition. You can already use Riya to search and tag your photo collection.

I expect this to be offered by Picasa, Flickr, iPhoto etc. within 2 years or so, whether they buy the technology, or develop it from scratch. And I certainly expect it to help power Google searches, within the same timeframe. (In the meantime, they’re building up a semantic layer around photos by other means, e.g. the delightful Image Labeler.)

I’ll leave it for a different post (or commenters) to explore the implications for privacy.

Actually recognising faces (but not identity) in photos is already becoming common in cameras, e.g. my Canon Ixus 70. In tests, it actually recognised the faces of sandstone angels in a cemetary, and in the office we were able to draw rudimentary faces on paper that the camera recognised.

Riya also extended their technology into shopping, their proof-of-concept like.com allowing you to search for shoes, handbags, clothes etc. on the basis of “likeness”. I don’t think it’s a silver bullet for the online shopping experience, but certainly valuable (to the user, and to them as a business).

Here are another couple of interesting things. Given sufficient processing power, and numbers of photos (and that’s not hard these days), you can perform what seems like magic.

  • Scene completion using millions of photographs
    “The algorithm patches up holes in images by finding similar image regions in the database that are not only seamless but also semantically valid.”
  • Content-Aware Image Sizing
    “It demonstrates a software application that resizes images in such a way that the content of the image is preserved intelligently.” Has to be seen to be believed.
  • Reconstructing 3D models from 2D photographs, e.g. Fotowoosh

These are all things that our brains are capable of doing without thinking, but we are gradually developing the processing power, the visual memory (repository of images), and clever algorithms to make it possible.

January 09, 2007

Chandler, and the hardness of software

CIO Insight magazine have a good interview with Scott Rosenberg, who was involved in the Chandler project. In many ways it seems to me like a poster-boy for the Agile movement. If Chandler had been approached in a more agile manner they may not only have shipping code, but I think the shipping code would be, now, for a very different piece of software.

Chandler made quite a splash when the project was launched because the project lead was Mitch Kapoor; creator of Lotus 1-2-3, Notes and the ill-fated but ambitious Groovy, and one of the leading lights in software development. One of the key things in an open source project is the project lead and Mitch is a proper heavyweight. He attracted a lot of very good developers and the project got moving with quite a fanfare.

I remember when the Chandler project was announced. Back then it was a fantastic idea, although ambitious. I still think the idea has a lot of legs. The idea was to produce a Microsoft Outlook killer, but using good UI and engineering design techniques. If you analyse it, Outlook is a pretty terrible application. Most of its users have probably never even thought about it, but the application only barely satisfies common use cases, and those with a lot of laborious work on the part of the user. It is all most people have used though, and people tend not to be terribly introspective about their software.

Mozilla Thunderbird is just as bad, and even less ambitious than Outlook. As an email client, neither of them compares at all well with the power and features of mutt, for example. Mutt however takes ages to learn and is ultimately limited by the capabilities of a terminal.

So, Chandler was supposed to change all this. Bring the sensibilities of a spreadsheet (i.e. a thinly disguised programming environment) to Groupware. To be a Notes for the 21st Century. Satisfy the Real Needs of Groupware users.

This is a Hard Problem. It has all the features of a project that disappears up its own arse, just as Mozilla did. There is a huge scope for architecture with such a large problem, and so that’s what they did: Architecture. Just as Mozilla did. Even though they sensibly chose Python to develop Chandler (unlike C and the homegrown XUL for Mozilla), they indulged in huge amounts of Big Design Up Front, which means that the environment and probably the users are now well ahead of where Chandler was aiming.

Maybe, like the Mozilla project, it will suddenly emerge from obscurity to take over the world. They will have a much more difficult environment for launch than Mozilla did however. The browser incumbent, Internet Explorer, was so appallingly bad it made a very easy target. The incumbents for Chandler will not only be better but will be hosted, browser accessed applications — which provides a set of behaviours that Chandler will find it impossible to replicate.

All of this may not sound like a software problem, but instead a marketing problem. To call it that is to miss much of the point in the real difficulty in software development. The most intractable problem is not avoiding building buggy software. The problem is in building the software that people actually want.

October 29, 2006

Queues are Databases?

Arnon Rotem-Gal-Oz mentions a thesis by Jim Grey in his post Queues Are Databases?. I had the opportunity to architect a large system that relied entirely on a large network of queues earlier in the year. It seemed natural to build the queue on top of a database.

Arnon is making the assumption here that the “database” is an RDBMS, which is where I think the real difference lies. Further to this he really seems to mean that the queue is in fact a single table - that seems to be the implication in the followup Queues Are Databases: Round Two. You might need to hack around with that page to see all the content because one of the adverts has gone mad and eaten it).

That is definitely wrong in my view, because of the issues in factoring objects. The underlying queue structure shouldn’t impose factoring requirements on the objects in the queue.

I used a class that multiply inherited from an axiom Item and a standard Twisted DeferredQueue. Although very simple that provides all of the primitives of a persistent queue, and very cheaply to boot.

October 28, 2006

Verbs in REST

More good stuff from lesscode.org with Useful and Useless REST. The verb issue is one I’ve come across quite a few times, and without really deciding what I think. Ryan’s comments on the semantics for proxies is interesting, and might be the real decider.

The guys at lesscode write some real interesting stuff, but I do wish they’d do their bickering in private.

September 29, 2006

Alan Holub Reading List

A good (reading list)[http://www.netvibes.com/] from Alan Holub. I’ve got a bunch of these books already, and just bought a few more. If I remember I’ll blog about them.

September 26, 2006

If you have ten million test cases, you probably missed one

So it was, so it always will be. Jim Horning's tales from the early years of computing provide a useful touchstone when you're bogged down in something that seems so terribly modern... I'm refactoring a wedge load of code at the moment, and my test coverage is not good enough. "If you have ten million test cases, you probably missed one."

September 21, 2006

On UML

I’ve been reading Arnon Rotem-Gal-Oz at Doctor Dobbs Journal for a few months now, and it makes very interesting reading. He writes on various aspects of architecture technique and principles, and even if I don’t always agree with him he is always well thought out.

He made a point in his latest post about UML, which I agree with most strongly though. He is a believer in UML As a Sketch, as am I. This position seems quite unusual amongst a lot of the people I deal with though, and for cultural reasons.

One of the things I find very interesting about the difference between proprietary software and open source is the very different approach to architecture. Because of the top-down nature of proprietary software projects, there is often someone appointed as Architect who can then, more or less, direct development. In this environment UML thrives, with huge pieces of analysis and design being done using UML, which is then given to developers to build (often with stubs for all the code generated automatically be modelling tools).

In Open Source software, there is no way any one person can be ‘architect’ in the same sense. Your developers can generally do their own thing. The only way to make them do something one way is to demonstrate that it’s best - and the only way to do that is to write some code, and show it working. It’s easy to argue about models, but hard to argue with running code.

Some of the most impressively architected Open Source projects I’ve seen are designed on IRC and mailing lists by groups of very highly skilled developers. They don’t use UML, in part because their medium is text only. In fact, UML seems to be generally regarded as a crutch for the feeble - real developers talk in code. During the tool-heavy eighties it was generally thought that large complex systems could only be developed using ever more large and complex support systems, but these guys most definitely refute that.

I am, unfortunately, one of the feeble, and I find UML can really help me think. Often I pop open MagicDraw, start knocking out a few diagrams and within 10 minutes the solution has crystallised in my head, and I can drop the diagrams and start coding. But the visual nature of the design, and the constraints it imposes on how you approach the problem are really useful. So, on this point at least, I agree with Mr Rotem-Gal-Oz, even if often the only person I am communicating with is myself!

July 08, 2006

Frameworks v Libraries

There has been some venting on this subject in the past, but here’s a good post by Arnen Rotem-Gal-Oz on the subject, with a summary of the differences.

July 01, 2006

Tools

One of the perennial points of discussion about coding is about which tools to use. I have been an inveterate user of one or another vi clone for almost twenty years, from elvis on RISC OS to vim on Ubuntu. Vim does fulfil many of the roles of an IDE, especially when running on *nix, where you get so many IDE features for free (ctags, make etc).

However, I have finally been seduced by an IDE, Wing. The Zope debugging was the killer feature for me, but the rest of it is excellent. And it even has vi keybindings!

For UML, I use the excellent MagicDraw. A few years ago I tried UML editors exhaustively and I’m certain this is the best of the lot. In particular it supports Robustness Diagrams, something missing from a lot of editors.

This week I was introduced to OxygenXML, and I’m very impressed. One project I am working on uses XSLT and XPath heavily, and Oxygen’s XSLT debugging is truly awesome. I’m using it now for editing KID templates too, and it works really well.

Of course, these are a tad more resource intensive than good old vim. Running MagicDraw, Wing and OxygenXML together would need a shade under 4GB to avoid swapping entirely. None of my machines can take more than 2GB, so there is a little bit of swapping sometimes ;)

May 18, 2006

ACM Queue Interview with Werner Vogels

An excellent interview with Werner Vogels, Amazon’s CTO. It’s very good to see someone doing things differently and so successfully, especially when it comes to structuring their business.

Of course there’s a lot of cheerleading in there, and I’m sure they suffer from a lot of organisational problems that he doesn’t talk about - but all that said, the way they have aligned their business architecture, personnel structure and technology architecture along with their strategy is a great example of proper strategic thinking.

His emphasis on testing being the really difficult thing also bears out my own experience with large distributed systems - just orchestrating a valid test can be a huge amount of work, and validating the test’s synthetic loads against real world load is a very skilled job. It’s worth putting the hours in here though.

April 26, 2006

Agile Alliance

Isotoma are now members of the Agile Alliance! We are committed to agile software development and the benefits it brings, and I am hoping our membership will make us more effective at delivering valuable and successful software.