as published in HPCwire.

HPC is all about scalability. The most powerful systems. The biggest data sets. The most cores, the most bytes, the most flops, the most bandwidth. HPC scales!

Notwithstanding a few recurring arguments over the last twenty years about scaling up versus scaling out, the definition of scalability hasn’t changed much. Give us more. But in an era of new technological choices driven by a medley of megatrends, including AI, analytics, and cloud, new shades of scalability have emerged, prompting the question: More of what?

Mixed workloads are now at the forefront of computing. Traditional workloads haven’t gone anywhere; they’ve been joined by new ones. Within HPC, that means technical computing in a wide array of domains. HPC is a long-term stable market because these applications keep scaling. They have to, because we never reach a point where science is “solved.” With more powerful systems, scientists and engineers will refine their meshes or add degrees of freedom or otherwise build increased complexity and fidelity into their models, enabling new generations of innovation and discovery. We won’t see the end of scientific inquiry in five years, or in five hundred. Technical computing will march on.

Simultaneously, enterprise computing remains stable as well, though these applications also evolve. ERP, CRM, BI, database: it’s hard to imagine any of these going away entirely, though organizations may change how they are run. In fact, it is the revolution underway in enterprise computing that points to HPC types of scalability, as analytics and machine learning are becoming the underpinnings of digital transformation for high-end enterprise.

Neither analytics nor machine learning is an entirely new science, but each has gone through tremendous booms over the past decade. Over 60 percent of HPC users are already running machine learning applications as part of their same environment, according to Intersect360 Research studies, and other analysts have pointed to the trend toward AI in the broader, non-HPC enterprise, in applications spanning logistics, customer service, and process optimization. The resulting state of the datacenter is that organizations are supporting many multiple workloads—the traditional and the new—as part of mixed-purpose infrastructures, out of budgets that haven’t commensurately increased to match. Incorporating analytics and machine learning capabilities into mixed-workload environments is the first pillar of the new scalability.

Technology has evolved to meet these new requirements. Where “industry-standard” clusters were once viewed as fungible, computing-by-the-pound investments, the pendulum has now swung back hard to an era of specialization, with diverse processing elements, networking fabrics, and storage architectures forming heterogeneous environments.

Chief among these technological evolutions has been the broad adoption of GPUs as computational accelerators. With their strong floating-point capabilities, GPUs—primarily those provided by Nvidia—made significant headway into HPC. Five years ago, an Intersect360 Research custom study found that the top 10 applications in HPC, and 35 of the top 50, all offered some form of GPU acceleration. Today, 30 percent of HPC users have GPUs in “broad adoption,” according to a 2020 survey, with an additional 49% at “some adoption.” GPUs have furthermore gotten a dramatic boost from the AI wave, as deep neural networks (DNNs) and inference engines have been well-suited to the architecture.

Processing diversity goes well beyond Nvidia GPUs. Even limiting the discussion to x86-architecture CPUs, AMD Epyc processors are now in a pitched competition with Intel Xeon Scalable CPUs. Meanwhile, ARM-based CPUs are also getting a significant look. Not only are ARM processors at the heart of Fugaku, the world’s most powerful computer according to the semi-annual TOP500 list, but they also recently exceeded IBM Power processors in surveyed HPC usage. And the accelerator game is even more crowded. AMD has its own Radeon GPUs, soon to be integrated with Epyc over AMD Infinity Fabric, while Intel has its own forthcoming Intel Xe-HPC GPU. These AMD and Intel offerings will power the first U.S. exascale systems later this year. FPGAs are also growing in deployment (over 20% of HPC users, in the most recent Intersect360 Research survey), as are various, specialty custom processors, such as the wafer-scale Cerebras chip.

Survey Data: Adoption of Accelerators in HPC Environments
Intersect360 Research, 2021

If the processing arena is the most diverse technology space right now, it is far from the only one driving difficult choices for organizations shopping for HPC, analytics, and AI solutions. The incorporation of flash storage in all its forms—burst buffers, all-flash arrays, on-node NVME and persistent memory—has created new storage tiers from local to archive, with custom storage software solutions to match any architecture. Networking fabrics, too, offer a range of options, from high-bandwidth interconnects with embedded processing elements to composable, software-defined clouds. The ability to incorporate specialty components at full performance is another modern dimension to scalability.

And speaking of clouds, the ability to solve any workload “in the cloud” is an inevitable component to any scalability discussion. The promise of cloud computing to offer “unlimited scalability” on an elastic, utility basis has been one of the primary value propositions of public cloud vendors for over a decade. Nevertheless, most HPC users keep the majority of their work on-premises. While two-thirds of HPC users make at least some use of public cloud, it’s usually for a minority of overall workloads. (See chart.) Hybrid clouds are the norm, and managing workloads and data across both the datacenter and the cloud—possibly multiple datacenters and multiple clouds—is another challenge for today’s IT environments. Add to this complexity the notion that data is often generated at the edge, with computing pushing to meet the data, and edge-to-core-to-cloud considerations form another major touchpoint for the new scalability.

Survey Data: Percent of HPC Workload in Public Cloud
Intersect360 Research, 2021

 

Multiple workloads, multiple technologies, edge-to-core-to-cloud. These are merely the starting points for the compounded factors that they entail. Which vendors? Which standards? Which middleware environments? At a practical level, these discussions involve decisions such as managing storage domain name spaces and availability zones, choosing programming models and migration tools, load balancing composable fabrics for optimal throughput, negotiating data egress fees in long-term service level agreements, and walking the tightrope between new capabilities and proven solutions.

The amazing thing is, these considerations are now affecting everyone, from HPC to enterprise, from entry-level clusters to the world’s largest supercomputer, all attempting to match technology components to workloads, both on-premises and off. In this era of specialization, the solutions will be as diverse as the questions that can be asked. And when it comes to the perennial question, Does it scale?, the answer won’t be as obvious as just adding more.

Scaling across workloads, across technology elements, across protocols and standards and domains, up to the cloud, out to the edge, and back to the core, now and into the future. This is the new scalability. As always, there are tremendous benefits to getting it right, with incredible innovations and insights right around the corner. But in this new era of specialization, scalability isn’t just for the biggest anymore.

Posted in