High-performance computing (HPC) clusters anywhere [part 2]

Jon Thor Kristinsson

on 19 April 2022

Tags: HPC

In this blog, we will be introducing where and how clusters are currently being deployed, what these deployment methods enable, and the major players in that space.

This blog is a part of a series of blogs on HPC where we will introduce you to the world of HPC.

What is High-performance computing? – Introduction to the concept of HPC
What is supercomputing? A short history of HPC and how it all started with supercomputers
High-performance computing cluster architectures – An overview of HPC cluster architecture
Open source in HPC – An overview of how open source has influenced and driven HPC
High-performance computing (HPC) technologies – what does the future hold?

HPC clusters

HPC clusters can now be deployed almost anywhere. It started with research-focused supercomputing clusters and moved on to dedicated HPC clusters. However, thanks to ever-growing improvements in technology the option of running HPC clusters in the cloud has had a huge growth in popularity in recent years. Some even combine these options and have a dedicated localised cost optimised cluster, then burst into the cloud as needed, taking advantage of hybrid cloud methodologies for clustered computing. And with computing power, use cases, and needs ever growing some are even turning to run their HPC clusters on the edge.

HPC in the cloud

Advancements in cloud computing, networking, and storage now make it possible to run HPC workloads at scales rarely seen before in the cloud. Many of the Public cloud providers have specialised resources with deep foundations in the HPC solution space that are available for consumption to organisations of all sizes. Cloud computing has been key in delivering HPC to organisations that might require bursting or scaling beyond what is reasonable with dedicated clusters, or even just small experimental clusters for organisations that are just getting started with HPC and might not have the capacity to maintain the infrastructure investment required for a private cluster. Or resources for experimentation or testing that might not be available in the organisation’s dedicated cluster, for example, GPU, FPGA, or other architectures that might be in the beginning phase of adoption.

Amazon Web Services

AWS has been one of the key players when it comes to driving innovation in providing public cloud services for HPC. Their implementation of the AWS Nitro System was key for them to eliminate virtualization overhead and enable direct access to underlying host hardware. This drove down latency and increased performance which was vital to running HPC clusters and workloads in a public cloud. In order to be able to deliver on the demands of HPC workloads when it comes to inter-node communication, they brought in the Elastic Fabric Adaptor which was key to reducing latency and increasing the performance for workloads that communicate across nodes and require a high-performance interconnect. To cover the storage needs of HPC users Amazon added a specialised storage offering based on Lustre, called Amazon FSx for Lustre. Alongside that, they have offerings in terms of scheduling with solutions such as AWS ParallelCluster and AWS Batch.

Microsoft Azure

Azure is a key player when it comes to driving HPC in the public cloud and has provided strong instance types that use traditional HPC technologies such as Infiniband which provides RDMA functionality for optimal latency and performance. They also have instance types that cater to those looking to reduce the number of cores exposed to the workload catering to workloads that are primarily limited by memory bandwidth rather than available cores. They even have an offering that delivers supercomputers as a service, which is their Cray solution. Along with that, they offer HPC-focused storage which is their Cray ClusterStor.

Google Cloud

Google Cloud Platform offers pre-configured HPC VMs that are well documented towards the user. They have a very document-driven approach towards the enablement of HPC workloads, with clear guides for anything on MPI workloads to HPC images giving users clear and practical information on how to get the most out of their usage of GCP for HPC workloads.

Oracle Cloud Infrastructure

Oracle was an early player when it came to the enablement of HPC in public clouds. They take a bare-metal approach to HPC in the public cloud, offering bare metal instance types with an ultra low latency RDMA networking delivering a solution close to what one might expect from a dedicated private HPC cluster.

Dedicated private HPC clusters

Dedicated private clusters remain a solid option in HPC for those looking to optimise on cost, control, and even very specific data ownership or security requirements. Plenty of solutions exists that give users cloud-like management capabilities of local on-premise resources. The main challenge with private HPC clusters is the high upfront investment and required expertise. This can be mitigated by working with partners such as Canonical and its partners because it gives you access to expert knowledge and solutions that make adoption more feasible. And could deliver on potential cost savings in terms of total costs while still requiring an upfront investment.

Hybrid HPC

Hybrid usage of local and public cloud-based resources has been very popular in the HPC space, giving users both the benefits of cost optimisation and control with on-premise along with the extreme scalability of public cloud-based clusters. In its nature hybrid cloud usage has the challenges associated with both public and private cloud cluster usage while also bringing out many of the potential benefits of both solutions. In a way, it delivers a complementary solution where the negatives of one get mitigated by the positives of the other. The main additional challenge coming from such a setup might be the increased complexity but overall it has the possibility to bring a greater overall resiliency. With solutions both in public and private cloud Canonical can help you simplify the increased complexity of the setup.

HPC on the edge

Many of the various HPC workloads, especially those that require real-time processing or are extremely latency-sensitive, are now being deployed on the edge. That means they are often in small clusters or even as a single very focused computer often referred to as a high-performance computer (HPC), Despite sharing an abbreviation with High-Performance Computing, it refers to the usage of a single high-performance computer instead of a cluster. HPC on the edge can be seen in anything from Telco located in edge deployments for 5G or in automotive where a high-performance computer might reside in a car to process very product-focused workloads, such as image recognition or interpreting LiDAR data for autonomous vehicles. The main challenge with HPC on the edge is that deployment and maintenance might come with some complications due to limited access which makes managing it over the lifetime of the solution a bit of a hurdle. Thankfully, we offer solutions that could solve that problem.

HPC with Canonical

Our solutions such as Ubuntu are available for high-performance computing needs across clouds where you have access to an almost unlimited number of computational resources. We offer solutions, such as the Charmed OpenStack that allow you to have your own cost-optimised cloud delivering the most value for performance with full sovereignty and we have MAAS for those looking for the ultimate cloud experience in bare metal cluster management delivering the ultimate performance and flexibility. Or you can have any combination of these and go with a hybrid cloud strategy. Whatever your requirements are and no matter the size of the computation Canonical has the solutions for you. Contact us and we’ll help you map out your needs.

Summary

This blog has introduced the key public cloud players in the HPC cluster space and how they are driving innovation and evolution in the HPC solution space. Along with an introduction to hybrid cloud and clusters and HPC on the edge.

If you are interested in more information take a look at the previous blog in the series “What is High-performance computing (HPC)?”, how Scania is Mastering multi-cloud for HPC systems with Juju, or dive into some of our other HPC content.

In the next blog, we will go into the history of HPC and Supercomputing where we will cover how it all started and how it developed into the HPC we see today.

Talk to us today

Interested in running Ubuntu in your organisation?