What Is High Performance Computing (HPC)?

February 3, 2022

HPC Defined

High Performance Computing (HPC) refers to the practice of aggregating computing power in a way that delivers much higher horsepower than traditional computers and servers. HPC, or supercomputing, is like everyday computing, only more powerful. It is a way of processing huge volumes of data at very high speeds using multiple computers and storage devices as a cohesive fabric. HPC makes it possible to explore and find answers to some of the world’s biggest problems in science, engineering, and business.

Today, HPC is used to solve complex, performance-intensive problems—and organizations are increasingly moving HPC workloads to the cloud. HPC in the cloud is changing the economics of product development and research because it requires fewer prototypes, accelerates testing, and decreases time to market.

How Does HPC Work?

Some workloads, such as DNA sequencing, are simply too immense for any single computer to process. HPC or supercomputing environments address these large and complex challenges with individual nodes (computers) working together in a cluster (connected group) to perform massive amounts of computing in a short period of time. Creating and removing these clusters is often automated in the cloud to reduce costs.

HPC can be run on many kinds of workloads, but the two most common are embarrassingly parallel workloads and tightly coupled workloads.

Embarrassingly Parallel Workloads

Are computational problems divided into small, simple, and independent tasks that can be run at the same time, often with little or no communication between them. For example, a company might submit 100 million credit card records to individual processor cores in a cluster of nodes. Processing one credit card record is a small task, and when 100 million records are spread across the cluster, those small tasks can be performed at the same time (in parallel) at astonishing speeds. Common use cases include risk simulations, molecular modeling, contextual search, and logistics simulations.

Tightly Coupled Workloads

Typically take a large, shared workload and break it into smaller tasks that communicate continuously. In other words, the different nodes in the cluster communicate with one another as they perform their processing. Common use cases include computational fluid dynamics, weather forecast modeling, material simulations, automobile collision emulations, geospatial simulations, and traffic management.

Why Is HPC Important?

HPC has been a critical part of academic research and industry innovation for decades. HPC helps engineers, data scientists, designers, and other researchers solve large, complex problems in far less time and at less cost than traditional computing.

The primary benefits of HPC are:

  • Reduced physical testing: HPC can be used to create simulations, eliminating the need for physical tests. For example, when testing automotive accidents, it is much easier and less expensive to generate a simulation than it is to perform a crash test.
  • Speed: With the latest CPUs, graphics processing units (GPUs), and low-latency networking fabrics such as remote direct memory access (RDMA), coupled with all-flash local and block storage devices, HPC can perform massive calculations in minutes instead of weeks or months.
  • Cost: Faster answers mean less wasted time and money. Additionally, with cloud-based HPC, even small businesses and startups can afford to run HPC workloads, paying only for what they use and scaling up and down as needed.
  • Innovation: HPC drives innovation across nearly every industry—it's the force behind groundbreaking scientific discoveries that improve the quality of life for people around the world.

HPC Use Cases—Which Industries Use High Performance Computing?

Fortune 1000 companies in nearly every industry employ HPC, and its popularity is growing. According to Hyperion Research, the global HPC market is expected to reach US$44 billion by 2022.

The following are some of the industries using HPC and the types of workloads HPC is helping them perform:

  • Aerospace: Creating complex simulations, such as airflow over the wings of planes
  • Manufacturing: Executing simulations, such as those for autonomous driving, to support the design, manufacture, and testing of new products, resulting in safer cars, lighter parts, more-efficient processes, and innovations
  • Financial technology (fintech): Performing complex risk analyses, high-frequency trading, financial modeling, and fraud detection
  • Genomics: Sequencing DNA, analyzing drug interactions, and running protein analyses to support ancestry studies
  • Healthcare: Researching drugs, creating vaccines, and developing innovative treatments for rare and common diseases
  • Media and entertainment: Creating animations, rendering special effects for movies, transcoding huge media files, and creating immersive entertainment
  • Oil and gas: Performing spatial analyses and testing reservoir models to predict where oil and gas resources are located, and conducting simulations such as fluid flow and seismic processing
  • Retail: Analyzing massive amounts of customer data to provide more-targeted product recommendations and better customer service

Where Is HPC Performed?

HPC can be performed on-premises, in the cloud, or in a hybrid model that involves some of each.

In an on-premises HPC deployment, a business or research institution builds an HPC cluster full of servers, storage solutions, and other infrastructure that they manage and upgrade over time. In a cloud HPC deployment, a cloud service provider administers and manages the infrastructure, and organizations use it on a pay-as-you-go model.

Some organizations use hybrid deployments, especially those that have invested in an on-premises infrastructure but also want to take advantage of the speed, flexibility, and cost savings of the cloud. They can use the cloud to run some HPC workloads on an ongoing basis, and turn to cloud services on an ad hoc basis, whenever queue time becomes an issue on-premises.

What Are the Challenges of On-Premises HPC Deployments?

Organizations with on-premises HPC environments gain a great deal of control over their operations, but they must contend with several challenges, including

  • Investing significant capital for computing equipment, which must be continually upgraded
  • Paying for ongoing management and other operational costs
  • Suffering a delay, or queue time, of days to months before users can run their HPC workload, especially when demand surges
  • Postponing upgrades to more powerful and efficient computing equipment due to long purchasing cycles, which slows the pace of research and business

In part because of the costs and other challenges of on-premises environments, cloud-based HPC deployments are becoming more popular, with Market Research Future anticipating 21% worldwide market growth from 2017 to 2023. When businesses run their HPC workloads in the cloud, they pay only for what they use and can quickly ramp up or down as their needs change.

To win and retain customers, top cloud providers maintain leading-edge technologies that are specifically architected for HPC workloads, so there is no danger of reduced performance as on-premises equipment ages. Cloud providers offer the newest and fastest CPUs and GPUs, as well as low-latency flash storage, lightning-fast RDMA networks, and enterprise-class security. The services are available all day, every day, with little or no queue time.

HPC Cloud—What Are the Key Considerations When Choosing a Cloud Environment?

All cloud providers are not created equal. Some clouds are not designed for HPC and can’t provide optimal performance during peak periods of demanding workloads. The four traits to consider when selecting a cloud provider are

  • Leading-edge performance: Your cloud provider should have and maintain the latest generation of processors, storage, and network technologies. Make sure they offer extensive capacity and top-end performance that meets or exceeds that of typical on-premises deployments.
  • Experience with HPC: The cloud provider you select should have deep experience running HPC workloads for a variety of clients. In addition, their cloud service should be architected to deliver optimal performance even during peak periods, such as when running multiple simulations or models. In many cases, bare metal computer instances deliver more consistent and powerful performance compared to virtual machines.
  • Flexibility to lift and shift: Your HPC workloads need to run the same in the cloud as they do on-premises. After you move workloads to the cloud “as is” in a lift-and-shift operation, the simulation you run next week must produce a consistent result with the one you ran a decade ago. This is extremely important in industries where year-to-year comparisons must be made using the same data and computations. For example, the computations for aerodynamics, automobiles, and chemistry haven’t changed, and the results cannot change either.
  • No hidden costs: Cloud services are typically offered on a pay-as-you-go model, so make sure you understand exactly what you’ll be paying for each time you use the service. Many users are often surprised by the cost of outbound data movement, or egress—you may know you have to pay per transaction and for data access requests, but egress costs are easily overlooked.

Getting the Results You Expect and Want

Generally, it’s best to look for bare metal cloud services that offer more control and performance. Combined with RDMA cluster networking, bare metal HPC provides identical results to what you get with similar hardware on-premises.

What Is the Future of HPC?

Businesses and institutions across multiple industries are turning to HPC, driving growth that is expected to continue for many years to come. The global HPC market is expected to expand from US$31 billion in 2017 to US$50 billion in 2023. As cloud performance continues to improve and become even more reliable and powerful, much of that growth is expected to be in cloud-based HPC deployments that relieve businesses of the need to invest millions in data center infrastructure and related costs.

In the near future, expect to see big data and HPC converging, with the same large cluster of computers used to analyze big data and run simulations and other HPC workloads. As those two trends converge, the result will be more computing power and capacity for each, leading to even more groundbreaking research and innovation.