Aurora supercomputer is now fully operational, available to researchers

by Pelican Press
3 minutes read

Aurora supercomputer is now fully operational, available to researchers

Argonne National Laboratory this week said that its Aurora supercomputer is now fully operational and is available to the scientific community. The machine, which was announced in 2015 and faced massive delays, now offers over 1 FP64 ExaFLOPS performance for simulation as well as 11.6 mixed precision ExaFLOPS for artificial intelligence and machine learning. 

“We are ecstatic to officially deploy Aurora for open scientific research,” said Michael Papka, director of the Argonne Leadership Computing Facility (ALCF), a DOE Office of Science user facility. “Early users have given us a glimpse of Aurora’s vast potential. We are eager to see how the broader scientific community will use the system to transform their research.” 

The availability of the Aurora supercomputer for open scientific research may be considered a formal acceptance of the system by ARNL, which marks an important milestone for the troubled machine. Initially planned for 2018, Aurora missed this target due to Intel’s decision to discontinue its Xeon Phi processors. After the machine was re-architected, the project faced further setbacks due to Intel’s 7nm process technology delay, pushing the completion date to 2021 and then again to 2023. 

Even after the hardware was installed in June 2023, it took several months for the system to be fully operational and achieve exascale performance, which it finally reached in May 2024. Yet, for well over half a year, the system was only available to select researchers.

(Image credit: Intel)

While Aurora is not the most powerful supercomputer for simulations, as its FP64 performance barely exceeds one ExaFLOPS, it is the most powerful system for AI as it can achieve 11.6 mixed precision ExaFLOPS according to the HPL-MxP benchmark. 

“A big target for Aurora is training large language models for science,” said Rick Stevens, Argonne associate laboratory director for Computing, Environment and Life Sciences. “With the AuroraGPT project, for example, we are building a science-oriented foundation model that can distill knowledge across many domains, from biology to chemistry. One of the goals with Aurora is to enable researchers to create new AI tools that help them make progress as fast as they can think — not just as fast as their computations.” 

Some of the first research projects using Aurora are detailed simulations of intricate systems, such as the human circulatory system, nuclear reactors, and supernova explosions. The machine’s overwhelming performance is also instrumental in processing data from major research centers, such as Argonne’s Advanced Photon Source (APS) and CERN’s Large Hadron Collider. 

“The projects running on Aurora represent some of the most ambitious and innovative science happening today,” said Katherine Riley, ALCF director of science.
“From modeling extremely complex physical systems to processing huge amounts of data, Aurora will accelerate discoveries that deepen our understanding of the world around us.” 

On the hardware side, Aurora clearly impresses. The supercomputer comprises 166 racks, each holding 64 blades, for a total of 10,624 blades. Each blade contains two Xeon Max processors with 64 GB of HBM2E memory onboard and six Intel Data Center Max ‘Ponte Vecchio’ GPUs, all cooled by a specialized liquid-cooling system. 

In total, Aurora has 21,248 CPUs with over 1.1 million high-performance x86 cores, 19.9 PB of DDR5 memory, and 1.36 PB of HBM2E memory attached to the CPUs. It also features 63,744 GPUs optimized for AI and HPC equipped with 8.16 PB of HBM2E memory. Aurora uses 1,024 nodes with solid-state drives for storage, offering 220 PB of total capacity and 31 TB/s of bandwidth. The machine relies on HPE’s Shasta supercomputer architecture with Slingshot interconnects.



Source link

#Aurora #supercomputer #fully #operational #researchers

You may also like