WHY THIS MATTERS IN BRIEF
As supercomputers become increasingly powerful they will increasingly help us realise new breakthroughs in everything from drug discovery and energy generation, through to materials and space travel, and everything in between.
It’s the most powerful computing platform on Earth, for now at least – until tomorrow’s DNA and Quantum computers make it look like something from the Stoneage. I am, of course, talking about Summit, the US’s latest and largest supercomputer, which is a beast no matter which way you cut it. It’s also the first supercomputer that’s been specifically optimised for Artificial Intelligence (AI), making it a game changer on two huge fronts. Last year, for example, the supercomputer it’s replacing, named Titan, became the first to take just a single day to create complex AI Deep Learning models that comfortably beat out the ones put together by human experts for accuracy and performance. And Summit is at least five times more powerful. It also potentially paves the way to start running some crazy new interesting algorithms, such as ones that simulate the entire human brain, and create the first generation of super-intelligent machines. All of which is just the tip of the iceberg.
According to Oak Ridge National Laboratory (ORNL) in the US where Summit resides the supercomputer, which fills a server room the size of two tennis courts, can spit out answers to a staggering 200 Quadrillion, that’s 200 with 15 zeros, after it, calculations per second. Or in computer speak 200 Petaflops. This raw power is all thanks to its massive arrays of GPUs and CPUs, and it’s a marvel of engineering, powered by a total of 27,648 Nvidia Tesla V100 GPUs, which one day we could see replaced by new revolutionary “Intelligent Processing Units,” or IPU’s for short, and 9,216 IBM Power CPUs.
Just to put things into perspective, the Tesla V100’s have 5,120 CUDA cores each giving Summit a total of 141,557,760 CUDA cores which are arranged in 4,608 nodes where each node is configured with dual IBM Power 9 CPUs and 6 Tesla V100 GPUs. Add to this over 250 Petabytes of storage and the machine is a monster.
“If every person on Earth completed one calculation per second, it would take the world population 305 days to do what Summit can do in 1 second,” said ORNL in a statement, “or, put another way, if one person were to run the calculations, hypothetically, it would take them 2.3 trillion days, or 6.35 billion years.”
By comparison the former world’s fastest supercomputer, the Chinese Sunway TaihuLight, can perform 93 Quadrillion calculations a second, or “just” 93 Petaflops, while humming away inside China’s National Supercomputing Center in Wuxi.
Summit is what’s known as an IBM AC922 system and is made up of 4,608 computer servers, each of which have bunches of processors, but what’s actually going on inside these processors is what makes the difference.
“Summit’s computer architecture is quite different from what we have had before [in Titan],” says Daniel Jacobson, a computational biologist at ORNL, who is working on Summit, “for one thing, the computer uses the new Tensor Core feature in its GPU’s, which are designed specifically for applications focusing on AI, Deep Learning, and machine learning, and [are] fast.”
Basically, unlike older computer chips, these chips are optimised for a special type of mathematical operation on matrices – or rectangles filled with numbers with rules for adding, subtracting and multiplying the different rows and columns. Computers equipped with AI programs often learn using so called neural networks, which have tens or hundreds of layers where calculations run in the lower levels of the “stack” feed into the higher ones higher up. And this process requires the heavy use of matrices.
“This is a brand new feature that has allowed us to break the Exascale barrier,” Jacobson said, referring to a processing speed that’s over a billion billion calculations per second.
In addition, Summit has loads of superfast memory (RAM) available on each of its nodes, where localised calculations can take place.
“Each node on Summit has 512 Gb of RAM and the network that communicates between nodes uses adaptive routing, and is thus incredibly fast, which helps us scale the calculation across all the nodes very efficiently,” Jacobson said. Meanwhile, so called adaptive routing means Summit has some flexibility in how it runs calculations, sort of like networks of brain cells connected to synapses.
And though expensive, very expensive, with a New York Times report putting the cost of the new super at over $200 million, Summit could help scientists make scientific breakthroughs in everything from defence and energy to healthcare, materials and space.
“There are many, many scientific uses of this sort of supercomputing capacity,” he said. “Whether this is for new discoveries for bioenergy or new discoveries for precision medicine, many things are now possible that simply weren’t before.”
“For instance, just as AI programs are being co-opted to learn to pick out cats from images,” said Jack Wells, the director of science at ORNL, “these AI programs running on Summit could learn to pick out and categorise all kinds of data, ranging from those in biological sciences to physics, such as detections of neutrinos and other particles.”
“Something new that’s happening, is it’s going to be at the intersection of machine learning and simulation science, because this machine is going to be able to do both of those things in a very significant way,” added Wells.
Summit’s placement as the “world’s fastest” isn’t exactly official yet, because the Top500 list for supercomputer rankings hasn’t been updated yet, but that will be a formality when the list is updated later this month, but I for one am looking forward to reading about the exciting breakthroughs, especially in science, that Summit will now make possible.