I recently had the good fortune to gain access to the most powerful (publicly known) computer in the UK – ARCHER. This Cray XC30 has 118,000 processing cores and is a workhorse for many large scientific projects requiring massive parallel data processing. My requirements are somewhat more modest, but it was still an interesting experience to wrangle the supercomputer (well, a mere 20 of its 5000 24-core processing nodes) and try to apply its awesome power to some Artificial Life.
As you might expect everything is geared for super parallel MPI jobs so my hand-rolled TCP-based task farm was a bit out of place. Parallel (compute) nodes are unable to break out of their network to get new work units, background processes for the task master are forbidden on login nodes and the functioning of inter-node networking is opaque. Given this I opted for a within-node task farm, running 23 (later 47, with hyperthreading) workers and one master process on a single node, communicating across the loopback interface. By requesting multiple nodes via an array job I was able to launch 20 such runs with a single job script.
The sheer scale of this parallelism dwarfs anything I’ve had access to up to now: normally I get 3-4 workers per job, not 47. The increase did not disappoint; runs that normally take 3-4 weeks are done in 2-3 days. The only slight downsides are the contention – this is a busy system, my first 24h job waited for 58h before kicking off – and the limited job run time (24h maximum for a standard job). In practice this meant only a slight upgrade of my task farm to handle restarts more gracefully – a change long overdue anyway.