Issues Facing the HPC-AI Industry: Insights from the Advisory Committees of the HPC-AI Leadership Organization (HALO)

October 21, 2024|Addison Snell, Kevin Jackson, Paul Muzio, Steve Conway

EXECUTIVE SUMMARY

The HPC-AI Leadership Organization (HALO) aims to advance scalable computing in both High Performance Computing (HPC) and Artificial Intelligence (AI). Through interviews with its Advisory Committee, Intersect360 Research has identified key challenges and opportunities in the HPC-AI landscape.

The design of optimal HPC-AI infrastructure presents complex challenges – including the choice between homogenous and heterogeneous systems, integration of diverse processor types, and balancing AI and traditional HPC needs. These decisions impact user productivity, result reproducibility, and adaptability to technological advancements.

A critical shortage of skilled personnel in computational sciences and HPC-AI system management is clearly evident. Regional disparities also exist, with Asia/Pacific having an advantage due to an emphasis on educational programs. The commercial sector’s higher compensation further complicates talent retention in public and academic sectors.

Portability and ease of use remain significant concerns, as specialization in computational elements often leads to reduced portability. Porting applications across different systems or upgrading software versions requires substantial time and resources, often viewed as sideways progress rather than advancement.

Accuracy and reproducibility of results are crucial, especially in fields like research, medicine, and engineering. The increasing diversity of chip technology complicates result consistency and verification across different systems.

The market also faces processor suitability, chip supply, and design issues. Different applications require various processor types, leading to difficulties in system design and procurement. The high demand for AI-optimized GPUs is influencing market dynamics and potentially skewing HPC system designs.

AI and Large Language Model (LLM) training and use face hurdles including data availability, ownership issues, legal restrictions, and cultural implications. Developing efficient training methods, managing data transfers, and validating results are all ongoing concerns.

System software stacks present three major issues: the impact of “HPC Nationalism” on knowledge exchange, difficulties in integrating AI and traditional HPC support, and the need for improved schedulers and file systems to meet evolving HPC-AI needs.

Sustainability and power consumption are both growing concerns. The increasing energy demands may necessitate infrastructure upgrades and potentially reshape HPC-AI management strategies.

HALO aims to address these challenges through cross-industry collaboration, guiding technology development, and fostering innovation in the HPC-AI field. The organization’s structure – divided into three geographical areas – allows for targeted approaches to regional needs and challenges.

TABLE OF CONTENTS

EXECUTIVE SUMMARY 2

TABLE OF CONTENTS 4

INTRODUCTION 5

The HPC-AI Leadership Organization (HALO) 5

Interviews with HALO Advisory Committee Members 6

ISSUES FACING THE HPC-AI INDUSTRY 7

Designing Optimal HPC-AI Infrastructure 7

Supporting Insights from Intersect360 Research Studies 8

Figure 1: Accelerators Configured per Node in HPC-AI Systems 8

Figure 2: HPC-AI Performance Relative to Expectations 9

Figure 3: Average HPC-AI System Utilization, by Sector and Budget 10

Human Resources 10

Portability and Ease of Use 11

Accuracy and Reproducibility of Results 11

The Processor Market: Suitability to HPC, Chip Supply, Design Issues 12

Training and Use of AI/LLM 13

Supporting Insights from Intersect360 Research Studies 13

Figure 4: HPC User Engagement with Generative AI 14

Figure 5: LLM Adoption Among HPC Users 14

System Software Stacks 15

Supporting Insights from Intersect360 Research Studies 15

Figure 6: Programming Languages in Use for HPC-AI 16

Sustainability 16

CONCLUSIONS 17

Download Report - Advisory Clients

Purchase Report

Apply for Academic Citation

Back to All Reports

Access License

Purchase Reports

Single User Access

Single-User Access ($490): Report is for the sole use of one person, such as for forecasting or product planning. Redistribution or sharing of the reports is prohibited, even internally within the purchasing organization or department. Citation of quotes or data, such as on a web site, in a press release, or for enterprise use, requires approval from Intersect360 Research.

Enterprise Usage

Enterprise-Wide Usage ($2,400): Report can be shared within the purchasing organization—for example, across product, strategy, sales, or forecasting teams. This does not extend to partner organizations, such as suppliers or resellers, but does include contractors working for the purchasing organization. Public citation of quotes or data, such as on a web site or in a press release, requires approval from Intersect360 Research. Advisory service clients are granted enterprise-wide usage rights for all reports in the subscribed service (HPC Advisory Service or Hyperscale Advisory Service), but not reprint rights unless purchased separately.

Reprints

Reprints ($4,800 per report): Full redistribution rights for some or all of a report, such as for PR purposes, including email, print, web, or other distribution. The reports may not be edited in any way except as approved by Intersect360 Research (such as to excerpt a section or to highlight particular text). An organization must have enterprise usage of a report—either through individual report purchase or advisory service access—to purchase reprints. Contact Intersect360 Research to purchase.