_ In a comment to my posting "Python Adapts to the Multicore Era", Sam Aaron asked about the arguably contradictory position of mentioning Clojure when arguing for a processes and channels model for Python. He also raised the question of the future of Python's GIL in a world dominated by parallelism. I thought I would respond with a full posting rather than just a comment. _
Clojure emphasizes the use of software transactional memory (STM), threads and agents as tools for handling concurrency and parallelism rather than processes and channels - as is used in Scala, Go, Python-CSP, PyCSP, and GPars; which variously use CSP (Communicating Sequential Processes) or the Actor Model. For me STM is really a bit of "sticking plaster" to make sure that shared-memory multi-threading is more viable than it is using explicit locks, monitors and semaphores. However, others think STM has a promising future.
_ In surveying the field of STM you may come across implementations that talk of storing values in databases. These are not implementations of STM, they are implementations of persistent data storage, which is something very different. Databases generally have transactional state and the abstract concept of transaction is the same as STM, but the realization is something very different - or should be. _
The real questions that drive thinking about STM and threads vs. processes and channels are:
- Which computational model best allows programmers to express the parallelism in their application.
- Which computational model provides the smaller translation distance between expression of the application in code and the execution model of the machine executing the application. Of course, if the application is inherently and fundamentally sequential then none of this really matters, but then such applications will not get any faster of execution until there is a return to increasing the speed of individual processors. The point is that we should not try and parallelize fundamentally sequential applications just to try and make them faster.
So which is best: STM and threads or processes and channels.
As stated so baldly, the question is probably answered by "choose whichever suits you"; there really isn't any other answer to such a general question with no context. We need to restrict the context so as to be comparing things a little better.
Clojure operates on the JVM which promotes threads and a single global virtual machine viewpoint. The best comparison then is with Scala (and Akka) and GPars - which supports Groovy-based actors and CSP. Also there are STM implementations for Java and Scala which would help comparison. To be honest though, no amount of philosophizing is going to result in any truly useful indicators. Data is what is needed. So there need to be experiments implementing a number of different problems using this set of languages and libraries, and then the following questions need addressing: which programs are the easiest to write; which programs the easiest to comprehend for the author and for people other than the author; and which programs are the most efficient and speedy of execution. I have not yet done such experiments, and am unlikely to as I do not have the resources just now. Of course there is the question of whether Clojure is the version of Lisp that will finally allow Lisp to really make the big time; or will Clojure slide into relative obscurity as all other Lisps have. Lisp has the property of being the most fundamentally unique approach to programming language whilst at the same time never really catching on. This should not of course affect the core "which of STM and threads vs. processes and channels is better for parallelism" debate.
What about natively? Well there is Go which supports processes and channels, animated with its goroutines; there is C++0x with threads, futures and asynchronous function calls; there is Haskell, which supports STM. The problem is, of course, that Haskell has a totally different computational model to Go and C++, so would it be a valid comparison? In a sense yes since Haskell is claiming to compete against C, C++, etc. So probably worth doing. Of course there are STM implementations for C++ and even C - Intel have a C++ STM system but it remains an experimental not production feature of the Intel compiler - so that should be added to the mix. Of course the same argument about data and experimentation applies here: no experimentation, no data, no conclusions just unsubstantiated opinions.
What about Python? Well here there is the GIL (global interpreter lock), at least for the standard CPython implementation. This means that a single PVM (Python Virtual Machine) can be executing only a single thread of Python code at any one time. Thus, no immediate potential for parallelism. There are two solutions to this, write things as extensions in C++ or C so that the GIL can be released by a thread, or use multiple PVMs. Using C ++ and C extensions is not a generally viable approach to parallelism in Python. It has its rightful place and is incredibly useful in that place, but it is not where most Python code is. So Python effectively mandates a process and channel approach to parallelism. Hence the multiprocessing package in the standard (post 2.6, but there are backports to 2.5 and 2.6) Python distribution or Parallel Python. This means that Python naturally gravitates towards CSP and the Actor Model for concurrency to the effective exclusion of STM. So if the GIL is to remain in CPython, multiple PVMs, processes, message passing, etc. are the way of structuring Python applications. This means CSP and actors will be core to the future of Python in the increasingly multicore, and hence parallel, world.
The real upshot of this posting is that there needs to be some experimentation organized to move things away from pure argumentation and creating results by arguing loudest and longest. The STM and threads vs. processes and channels debate needs some work done not just arguments made. Except with Python where, whilst the GIL is present, STM doesn't really have much of a place.