HPC Innovations Blaze the Zettascale Energy Trail

Rob Johnson
6 min readJul 19, 2022

As hardware advancements continue their rapid development pace, many HPC experts and the industry have their eyes set on zettascale. Getting there, though, is a journey with challenging obstacles. As in the past, next-generation HPC will require faster processors, accelerators, and more to eliminate data-transport bottlenecks. However, one of zettascale’s biggest challenges is determining how to accommodate electrical power consumption requirements. Some exascale systems using today’s technologies can devour as much electricity as a small town. Estimates suggest zettascale systems could require power in the multi-gigawatt range! That means the power problem requires multiple vectors of attack:

· Power requirements: During the May 2022 International Supercomputing Conference, presenter Jeff McVeigh from Intel noted that data centers could consume between 3 and 7% of global energy production by 2030*[1], with computing infrastructure being a top driver of new electricity use. Other estimates are much more aggressive, suggesting that by 2025, data centers will devour a staggering 20% of the global power produced. Accelerating workloads in an environmentally friendly and energy-efficient way will take innovation.*[2]

· Minimizing data movement: We’ve witnessed dramatic improvements in memory density and other storage mediums over the years. However, HPC + AI scenarios need other ways to minimize data movement for greater electrical efficiency. During his ISSCC presentation, Dr. Frank Hady from Intel noted that a huge opportunity exists to achieve system-energy saving by keeping data commutes between memory and processing as short as possible.

· Cooling techniques: As HPC + AI systems increase in size and density, their cooling systems must work harder to keep system components within acceptable operating temperature ranges. The industry needs to adopt new approaches to thermal management.

Reducing power requirements and data movement

As we strive toward zettascale, improving HPC systems’ performance-per-watt ratio is necessary. Those designing HPC components and architectures must figure out ways to reduce or eliminate data movement by keeping the data and processors closer together. Today, many different approaches help to address that problem.

Semiconductor manufacturers AMD, Intel, and Nvidia turn to AI to assist their teams with processor design. With the right experimental data, HPC and AI can model chips, determine those designs worth exploring further, and minimize the number of physical prototypes needed. AI can also help explore making processors and other components more energy efficient. Intel has explored ways to improve packaging technologies to increase throughput between the on-chip die and the connections off-chip. One result of that effort is Hybrid Bonding (Intel’s Foveros Direct) which involves copper-to-copper fusion that offers significant performance improvements over traditional soldering. By fusing “bumps” that measure less than ten microns each, bonded copper enables an order-of-magnitude greater interconnect density for 3D stacking. Tighter spaces between bumps, without a need for solder, help move information more rapidly with less energy.

Another physical approach to addressing the power challenge is increasing I/O efficiency with High Bandwidth Memory (HBM). HBM offers a stacked memory technology for CPUs and GPUs that quickly moves data while using less power. HBM offers higher bandwidth in a smaller form factor that can achieve speeds nearing that of on-chip RAM. The standard for HBM3 was published in January by JEDEC and promises to provide up to 819 GB/s per device and enables devices with up to 32 Gb of density and up to 16-high stack for a total of 64 GB storage.

Other solutions can also help bring computational tasks closer to data, too. For example, some vendors offer AI acceleration on-chip. Others make the most of persistent memory using Spatial and Temporal Caching. If a workload uses a particular data variable or instruction set often, it can be assigned to the memory blocks nearest to the processor.

Near-Memory Computing (NMC) also offers beneficial ways to make the most of available resources. As the name suggests, NMC minimizes the distance data needs to travel between processors and system memory. Since every nanosecond and nanowatt counts in HPC, even little improvements add up over time to create significant savings. The idea of NMC is a decades-old concept, but it has taken years for the technology and practicality of implementing it to merit serious consideration by engineers. Beyond the challenge of hardware evolution necessary for NMC, there’s a software-based consideration, too. For example, if a processor and an accelerator need to work together to divvy the workload, the resulting heterogeneous computing scenario can present challenges for a developer.

Interconnects are getting better as well. Compute Express Link (CXL) is an open, standards-based specification, initially developed by Intel over four years and built upon standard PCI Express Gen 5 technology. CXL helps overcome the challenge of memory coherency between processors, GPUs, and other accelerators by facilitating resource sharing for improved system performance. Since 2019, many major technology companies like Microsoft, Google, Facebook, and Cisco have signed on as consortium supporters.

Another new technology offers a parallel-file system that speeds the I/O process. The open-source Distributed Asynchronous Object Storage (DAOS) technology facilitates a distributed storage solution that taps next-generation NVM technology like Intel Optane Persistent memory and Non-Volatile Memory Express (NVMe) SSDs to increase performance in distributed environments.

While these approaches make important strides, they do not entirely solve the underlying challenge of zettascale power requirements. In the future, though, the rise of photonics for information transfer shows exceptional promise. Using light to transmit data rather than physical connections can provide a major performance boost while reducing power usage by a large margin. The idea of photonics is not new, but the data transfer and energy savings benefits encourage excitement for its practical applications. For example, Ayar Labs’ Optical I/O chiplet offers up to 1000x bandwidth density at 1/10th the power of electrical I/O.*[3] And Intel recently announced a significant advancement in their research that increases communication between compute silicon in data centers and across networks, including a demonstration of a laser array that is fully integrated on a silicon wafer.*[4]

Better cooling systems

Another HPC energy usage factor relates to system cooling. One way to reduce energy consumption for cooling is by choosing an ideal place to host the servers. For example, HPCaaS instances hosted by Advania Data Centers maximize system cooling by locating their HPC infrastructure in Iceland. First, the average temperature in Iceland is around five degrees Celsius, so the local temperatures reduce the amount of air conditioning needed. Secondly, the country has plentiful hydroelectric and geothermal energy sources making plentiful, green power available at a lower cost.

Microsoft also tried a unique approach to the data center cooling challenges. They recently shared the story of their underwater data center experiments that proved quite successful. Because their environment-optimized data center remained sealed in a container under the ocean, free of vibration, server components lasted longer. Plus, the surrounding water addressed the heat dissipation challenge at very little cost. Intel is investigating fully submersible data center components, where liquid dissipates the heat directly and introduced the first open intellectual property (open IP) immersion liquid cooling solution and reference design. Intel also announced plans to invest more than $700 million in a 200,000-square-foot lab in Oregon where the focus will be on capabilities related to heating, cooling, and water usage.*[5]

What’s next?

We’re at an exciting time in the history of HPC. As innovators find ways to coax greater HPC system speed and power efficiency, researchers gain increasingly powerful tools for scientific breakthroughs. Considering the incredible discoveries exascale-level HPC systems make possible now, it’s thrilling to imagine scientific leaps when new technologies make zettascale HPC a reality.

###

This article was produced as part of Intel’s HPC editorial program, with the goal of highlighting cutting-edge science, research and innovation driven by the HPC community through advanced technology. The publisher of the content has final editing rights and determines what articles are published.

Rob Johnson spent much of his professional career consulting for a Fortune 25 technology company. Currently, Rob owns Fine Tuning, LLC, a strategic marketing and communications consulting company based in Portland, Oregon. As a technology, audio, and gadget enthusiast his entire life, Rob also writes for TONEAudio Magazine, reviewing high-end home audio equipment.

1 https://www.businesswire.com/news/home/20220531005771/en/Intel-Editorial-Accelerated-Innovations-for-Sustainable-Open-HPC

2 https://www.theguardian.com/environment/2017/dec/11/tsunami-of-data-could-consume-fifth-global-electricity-by-2025

3 https://ayarlabs.com/

4 https://www.intel.com/content/www/us/en/newsroom/news/intel-labs-announces-integrated-photonics-research-advancement.html

5 https://www.intel.com/content/www/us/en/newsroom/news/key-investments-advance-data-center-sustainability.html

--

--

Rob Johnson

Marketing & public relations consultant, freelance writer, technology & gadget geek, TONEAudio gear reviewer, hopeful artist, foodie, and fan of the outdoors.