HPC-AI Technology Survey 2023: Systems, CPUs, and Accelerators

EXECUTIVE SUMMARY

Intersect360 Research surveyed the user community for High Performance Computing (HPC) and artificial intelligence (AI) on a wide range of technology issues. The complete study analyzes users’ current computing systems, processing elements, storage systems, networks, operating environments, cloud computing usage, and selected forward-looking trends. Our goal in this analysis is to provide an overview of how HPC-AI systems are configured, including the breadth of technologies most commonly used. The survey audience included members of the worldwide HPC-AI user community spanning industry, government, and academia.

Intersect360 Research reports available in this HPC-AI Technology Survey report series include the following segmentations:

  • Systems, CPUs, and Accelerators: including system vendors installed, current and planned installations and user preferences for CPUs and accelerators, system utilization rates, and usage of liquid cooling.
  • Storage and Interconnect Technologies: including total active HPC data; storage configurations spanning on-node, attached storage arrays, and cloud storage; parallel file system usage; system interconnects and speeds; and composable infrastructure.
  • Operating Environments: including installations of operating systems, middleware packages, and developer tools.
  • Cloud Computing: including current and planned proportion of computing and storage in public cloud for HPC and top named cloud vendors.

This report provides a detailed examination of the systems and processing elements that comprise respondents’ HPC-AI infrastructure. We look at the top system vendors for HPC-AI, as well as the distribution of CPUs, GPUs, and other accelerators, and users’ forward-looking preferences for CPUs, GPUs, and the combinations thereof. We also look at trends in cooling for HPC-AI systems, including various approaches to liquid and immersion cooling.

Dell and HPE are still the most commonly named system providers for HPC-AI, but Supermicro and Nvidia are both growing quickly. Nvidia presents an interesting case. Nvidia does design systems and markets them as DGX, HGX, or DGX SuperPOD; however, in this case, we believe Nvidia’s survey share is bolstered by respondents who think of Nvidia as a system vendor, even when the system is integrated and delivered by a traditional server vendor partner. This dynamic highlights Nvidia’s current dominance in HPC-AI consumer mindspace.

The current state of the processing market for HPC-AI—CPUs dominated by x86; GPUs dominated by Nvidia—sets up a major pending shift that may soon be unavoidable. The results of this survey highlight instability in the current market dynamic. HPC-AI users are most comfortable with the current state of the market, with x86 CPUs and Nvidia GPUs. This may not be a competitive or even stable offering going forward, as the three primary processor vendors—AMD, Intel, and Nvidia—are each developing their own CPU-GPU combined architectures. Furthermore, each has a different track to success: Intel has a historical commanding lead in CPUs, Nvidia dominates in GPUs, and AMD has been first-to-market with all-AMD CPU-plus-GPU solutions. This survey examines these dynamics in detail.

The rapid rise of AI and corresponding adoption of accelerators leads back to increased density and related system-level issues of utilization and cooling. Among survey respondents who have accelerators as part of their HPC-AI environments, 19% their accelerators are not highly utilized; 30% say the accelerators are not as highly utilized as expected. 25% said their accelerated nodes are “compute islands” not easily accessed by other systems.

Increased computational density also brings challenges related to system power and cooling. Intersect360 Research surveys have found an increasing trend toward the incorporation of liquid cooling, in various form factors. 34% of respondents say they expect their usage of fully plumbed racks to increase.

Even measured against the backdrop of perpetual advancement, this is a time of unusual upheaval, with the rapid adoption of new technologies to accommodate machine learning and AI. 2024 will be a campaign year for AMD, Intel, and Nvidia for the future of HPC-AI computing, as well as for down-ticket candidates advertising themselves as worthy leaders. The dynamics of the processor wars will have upstream effects in how users configure and purchase systems. The development of integrated CPU-GPU options and their surrounding software environments in complete systems and solutions is one of the most important trends to watch in 2024 for the HPC-AI market.