Russel Winder, Concertant LLP
As we are all aware, processor manufacturers stopped increasing the clock speed of processors, buses, and memory to get increased performance. The main problem is the inability to dissipate the heat generated by increasingly fast processors. We are now getting multicore processors and are being told that having more cores is the modern way of increasing the number of instructions executed per second.
Now every computer system is a parallel processor: welcome to the brave new world of parallelism – – more mere timesharing multitasking, now we have multicore and real parallelism. The big problem is that programming languages and platforms are not really up to the job of supporting programmers in creating applications that utilize all the cores. Put another way, software development doesn’t have the right tools. With many years of parallel programming experience in the HPC and mainframe server communities, why is there a problem?
In HPC, raw processing performance is all that matters. This is the realm of individual, bespoke software; of C, C++, Fortran, and the birthplace of OpenMP and MPI. The resulting code is generally very specific to a given platform and requires huge resources to port to a different platform. Of course each of those platforms costs many, many millions of pounds. For these people, multicore is going to be a boon since it means more parallelism and hence more performance.
Historically, mainframe servers have largely been about transaction processing (compute server, database, Web server, or some combination thereof) with the emphasis on throughput and responsiveness. This is very different from the computational needs of HPC. Yet mainframes have had to become parallel processing engines (think blades) in order to meet the demands of the services and applications available. For servers, Java, or at least the Java Virtual Machine (JVM), is increasingly the de facto dominant platform. Harnessing parallelism in a JVM context is about ensuring that transactions do not interact, and managing them independently. This allows the JVM to make use of all processors available via its standard thread-pool model: harnessing parallelism is about programmers carefully designing and coding their algorithms with a set of goals that are almost orthogonal to those of the HPC community. For these people multicore is going to be a boon because it means more parallelism and hence more performance. It is no coincidence that that same phrase finished the last paragraph!
So why is there a problem and what is the problem? We have two communities both of which are already familiar with parallelism and are already dealing with it. All that is needed surely is for their skill and expertise to be transferred to people writing end-user applications. Well, no. The experience of the HPC community is only really relevant to large, mathematically-oriented computations. End users are generally not interested in running weather forecasting models, or computational fluid dynamics (CFD) calculations. Experience on the mainframe is not really relevant either since it revolves centrally on avoiding synchronization of threads: the successful use of parallelism there means not allowing threads to interact or share data.
The imperative programming world, exemplified by C, C++, Java, C#, Python, Groovy, etc. has decided that threads are the way of managing concurrency in software. Java really took this view on board by making threads an integral part of the platform and programming language from the very outset. C++ is following this route as well: the next C++ standard will specify a threads API, based on, but not identical to, the pthreads library that has been used in the C and C++ communities for many years. So if programming languages support threads and operating systems support threads, why is there a problem?
The average programmer is not good at programming threads. This is not an indictment of programmers. It is a comment on the ability of all programmers to understand and work with complex multi-threaded systems. It is hard. Very hard. This mean that the tool is not suitable for the purpose being asked of it: current programming languages and threads are not good tools for writing programs that run on parallel hardware. The core of the difficulty is that shared-memory concurrency is not an easy model to work with. Many years ago the programming languages Occam and Erlang recognized this fact and chose to work with distributed memory models and message passing. This makes managing concurrency very much easier since there is no need for shared-memory synchronization. In both Occam and Erlang, working with large numbers of independent processes communicating with each other is the norm. This means programs using these languages really harness parallelism, including multicore systems, very easily indeed.
So should we all stop working with C, C++, Java, C#, Python, Groovy, etc. and immediately rewrite all applications in Occam or (more likely) Erlang? Clearly not. Occam is a language kept alive in academia because it is a good source of high-quality PhD theses. Erlang is used commercially but it is a niche language. Erlang could become a mainstream language but I think the commercial pressures of the moment are such that this is unlikely. What then is the way forward? Well let us keep C, C++, Java, C#, Python, Groovy, etc. but let us think of threads as being a hidden abstraction, not a tool for day to day use.
All computers execute machine code, but nobody programs at that level these days, we have better abstractions. Programmers used to program in assembly language, but very few people do these days, again we have better abstractions. People think that operating systems manage parallelism with threads and that therefore programming languages need to support them. The problem is that in a shared-memory parallel world, threads may be the implementation technology but they are the wrong abstraction for programming. Almost certainly what is needed is for programmers not to use threads but instead to use distributed memory and message passing as an abstraction layered on top of shared memory and threading. Given the experience of Occam and Erlang, and the use of MPI in HPC computing, this would seem like a good idea, well worth investigating.