Yesterday I presented my session "Just Keep Sending the Messages" at ACCU 2011. A number of the audience came up after the session and thanked me for a great session, so I can only surmise it went down well. Which is good. I thought I'd put down some of my reflections.
The aim of the session was to inspire people to appreciate that shared-memory multi-threading is like heap and stack, it is a resource we know underpins our applications, but it is not something that should be explicitly manipulated in our applications programs: our applications programs should be using higher level abstractions and thereby avoid all the incomprehensibility of using locks, semaphores, monitors, etc. The goal of the session was to explain Actor Model, Dataflow Model and Communicating Sequential Processes (CSP) as higher level models for concurrent and (in particular) parallel programs, and to show some small examples of actual use in small applications. Of course nothing ever goes quite to plan.
I had originally intended to be very much demonstration oriented and use a twin-Xeon workstation with an NVIDIA CUDA-enabled card for all the demonstrations, but I couldn't get the graphics card to do any CUDA stuff. So the idea of showing actors and dataflow working with a collection of CPUs and GPUs went out of the window. Sad because this is the future for all computing. Plan B had been to use two laptops to show clustered multicore parallelism and the issue of dealing with communications costs between actors and dataflow operators. Sadly though I couldn't get simple Scala RemoteActor code to compile, and I didn't have time to switch to creating some Akka examples. The MPI examples worked reliably though so at least I had some Fortran, C, C++ and Python examples to rely on. None of this involved Java, Groovy and GPars - all my ready-made examples for the JVM were single machine multicore focused and I didn't have opportunity to get Hadoop working nor Pervasive DataRush in distributed mode.
The first couple of days of the conference showed clearly that the arrival of a standard thread model in C++0x and the increased acceptance that multicore parallelism has to be harnessed rather than worried about, meant people were thinking about shared memory multithreading and missing the importance of asynchronous function call, futures and other such higher level abstract models. I therefore switched to a dynamically generated Plan C which was to do a little more explanatory, almost tutorial, material and de-emphazise the structured demonstration.
So, after a quick rewrite of the slides the night before, I did 60mins as a presentation (slides are here) and then went into total risk mode by asking the audience what example languages they wanted to have a look at in demonstration given the two examples were calculating "Pi by Quadrature" and "the sleeping barber problem". Scala was requested first so we looked at an actor implementation of "Pi by Quadrature". This and many many other variants of the same code can be found in a Bazaar branch: for branching with Bazaar the URL is http://www.russel.org.uk/Bazaar/Pi_Quadrature, for Web browsing there is a Loggerhead instance running at http://www.russel.org.uk:8080/Pi_Quadrature. Having looked at that and shown it scaling reasonably - as much as is possible to show on a dual core hyperthreaded processor pretending to be four processors (hyperthreads seem to be a real waste of time) - I asked the audience for the next language to look at and someone shouted Fortran, perhaps not realizing that I had the Fortran/MPI version ready and waiting. So the audience got to look at the traditional HPC view of "message passing", multicore and cluster parallelism. Sadly though it took far to long to get the right mpirun command to execute. But we did get to see all six (!) cores across two laptops working on the same problem. There was a huge insufficiency of demonstrations and I never did show the Pervasive DataRush or the Go examples, but time had run out.
I asked the audience if they were interested enough to go and investigate actors and dataflow more and to treat shared-memory multi-threading as hidden infrastructure (so as to avoid locks, semaphores, monitors, etc.), and everyone seemed to be enthusiastically saying yes. Result.