A postscript version of this whole document is available here
The latex source is available here
Russel's original mail:
To: A.McEwan@lpac.ac.uk, J.Poole@cs.ucl.ac.uk, G.Roberts@cs.ucl.ac.uk Subject: A thought on loading. Phone: +44 (0)171 380 7293 Fax: +44 (0)171 387 1397 Date: Sun, 16 Apr 95 10:44:32 +0100 From: Russel Winder <R.Winder@cs.ucl.ac.uk> I think the contents of this file probably need adding into the "bugs" or "todo" database. Consider the following scenario. I want to write an UC++ program (perhaps the Sieve) that executes locally but puts all the prime number objects on the DEC mpp. This raises a number of issues: 1. It would be sensible to be able to force loading of the prime number objects onto the DEC mpp in the UC++ source. Mian's idea was to put the Internet name string into the file. This is a bit non-portable but does work. The alternative, relying on the UCConfig file and on clauses, is open to breaking due to failure on the user's part, either with on clauses or in UCConfig file (the need for synchronisation between them is dangerous), but is more portable. 2. It should be possible to mark machines specified in the UCConfig file as not usable as part of the round robin allocation strategy. We only want certain object (using on clause) to be loaded on certain machines. In the above we don't want general objects loaded on the DEC mpp. 3. We need to be able to amend allocation strategies. In fact we need a mechanism for user defined allocation strategies. There are two routes here: UC++ defines a set of allocation strategy options and provides a (perhaps file based) mechanism for selecting between them; or The user has to program the allocation in their UC++ code -- this required the user to be able to find the number of real machines available to them and also their type (remember Terry's discussion of graphics boxes and the requirement for group loading -- a set of boxes offering the same services for a set of objects). There was a fourth but I can't remember. 1 is an issue of how the system is to be used but is not critical immediately. 3 is probably important but (apart from ensuring that the library can tell the user program how many real processors there are) is probably not critical immediately. 2 is something I had overlooked until now (even if someone had already mentioned it) and is I think crucial now. Russel.
To: Russel Winder <R.Winder@cs.ucl.ac.uk> cc: A.McEwan@lpac.ac.uk, J.Poole@cs.ucl.ac.uk, G.Roberts@cs.ucl.ac.uk Subject: Re: A thought on loading. In-reply-to: Your message of "Sun, 16 Apr 95 10:44:32 BST." Date: Tue, 18 Apr 95 12:13:55 +0100 From: Jonathan Poole <J.Poole@cs.ucl.ac.uk> Russel, Re: your message about loading, and mapping the machines specified in the on clause to real machines. I have given thought to this, but have not put forward any specific points as it has not arisen. My view is that we need at one level of indirection between the compilation and the running: we want a single set of executables that will run on different configurations. At present each object is sent to a particular machine, though the particular mapping of virtual machine number to machine is set only at runtime, based on the UCconfig file info. This latter flexibility is very important, as we want the programmer to be able to tweak the particular configuration at runtime. At present the mapping is many-to-one, many objects can be put on one machines---but we can't do many to many. I think the argument to the on-clause should be not a machine number but a group number: thus we might have C* c = activenew C on FARM; W* w = activenew W on MASPAR; where FARM might be a group of machines with different addresses. In the UCConfig file we would have FARM x.cs /cs/research/..../ file1.exe FARM y.cs /cs/..... / file2.exe FARM .... FARM MASPAR jupiter.lpac.ac.uk /usr/maspar/uc++/ maspar.exe GRAPHICS ... GRAPHICS ... GRAPHICS ... and so on. Of course these group ids might be numbers rather than symbolic names, or strings. We might have a group calles DEFAULT that is used for machines that are not given another group name, and is used for objects not given an explicit on clause---though in practice I don't believe ever really be useful to be able to not specify the on clause. We might also have other predefined machine names such as CONSOLE, EXCEPTIONHANDLER and so on. A particular machine could also be part of more that one group presumably. This idea is not completely thought out yet, but I believe it is completely general, and answers all the points in RW's mail. If it could be combined with the "parallel slackness" that can come from lightweight objects, I believe it would allow a complete separation between compilation of code and parallel configuration. A possible extension would be to allow wildcarding, so we could have a hierarchy of machines such as FARM:GROUPA:MACHINE1 blah.blah.blah /... FARM:GROUPA:MACHINE2 blah.blah.blah /... etc, so we could say things like activenew on FARM:*:* or on FARM:GROUPA:*, so we get finer control of which objects go on the same machine---which allow us to express ideas like "these objects can be spread out as much as possible, but if they are clustered together, it is better to put this subset on one machine as there is more communication between them.." and so on. I'm not sure that this extension is necessary at this stage, however, and it would seem to be easy enough to extend to this later if the need is found. Jonathan