As published in the Top500 Supercomputer Blog
Maybe I’m getting old, but the petascale era of supercomputing still feels new to me. On the other hand, the recent decommissioning of IBM’s Roadrunner,
the world’s first petaflopper, suggests otherwise. Roadrunner booted up at the Department of Energy’s Los Alamos National Laboratory five years
ago in 2008. Its retirement last week marks the approximate mid-point between the first petaflop system and the first exaflop one — assuming,
of course, you’re an exascale optimist.
A five-year working life for a $100 million-plus supercomputer might seem like a waste of silicon and copper, especially considering Roadrunner was
still the 22nd fastest machine on the planet when Los Alamos pulled the plug in late March. But its lifespan is probably about average for the
industry. There certainly have been longer-lived supercomputers in existence. For instance, ASCI Red, the first teraflop system, enjoyed a nine-year
lifespan (1997-2006) before Sandia National Laboratories shut it down. But the pace of technology and the exceptional size of these systems generally
translates into short, if dazzling, lives.
Roadrunner was a major departure from previous supercomputers in another way though. It was one of the first mixed-processor systems, pairing twelve
thousand IBM PowerXCell 8i coprocessors with six thousand standard dual-core x86 CPUs. The PowerXCell 8i was a souped-up version of PlayStation’s
Cell processor; its eight special vector units delivered prodigious amounts of double precision floating point performance for its day – over 100
gigaflops per chip.
The result was the first supercomputer that managed over one Linpack petaflop. Roadrunner captured the number one spot on the TOP500 list in June 2008
and remained there for two years.
Although Roadrunner wasn’t particularly energy hoggish, even by 2013 standards, its total power draw of more than 2.3 MW made it an expensive beast
to feed. Cielo, the new 1.4 petaflop Cray supercomputer that will take over some of Roadrunner’s computational duties at Los Alamos, chews up nearly
4 MW, but being a straight-up x86 system, is considerably easier to program.
The difficulty of developing code for the specialty PowerXCell 8i silicon and that processor’s lack of roadmap (the rumored PowerXCell 32i sequel was
canceled in 2009), relegated IBM’s super to a one-off machine. But the heterogeneous design of Roadrunner foreshadowed the coming age of HPC accelerators,
which rose to prominence during Roadrunner’s reign. In 2006, NVIDIA brought general-purpose GPUs and its CUDA programming environment into the
HPC fold, attracting FLOPS-hungry users willing to endure a certain amount of programming pain. More recently, Intel chipped in with its manycore
Xeon Phi, offering an x86-flavored coprocessor with a more CPU-like software development model.
In retrospect, Roadrunner could be viewed as a something of a design cul-de-sac, created by the artificial goal of the petaflop milestone. But it’s
notable that even in the contrived race to a quadrillion flops, something of worth endured. Although the PowerXCell 8i was a commercial dead end,
x86/accelerator combo servers took off and are now sold by every HPC system vendor, IBM included.
For the time being, accelerators offer the only commodity-based technology that delivers multi-petaflops of supercomputing in reasonable power envelopes,
not to mention tiny systems with multi-teraflops capability. The energy efficiency of these accelerators, compared to standard processors, is driving
the technology into mainstream HPC and is stretching the number of FLOPS that can be squeezed into a datacenter or into a deskside cluster.
As a result, the petascale era might end up being known as the decade of accelerators. It’s worth remembering that Roadrunner was there first. Rest
in peace.