• wx


AWS and NVIDIA Collaborate to Build Generative AI Applications

Amazon Web Services (AWS) announced a multi-part collaboration focused on building out the world's most scalable, on-demand artificial intelligence (AI) infrastructure optimized for training increasingly complex large language models (LLMs) and developing generative AI applications.

The joint work features next-generation Amazon Elastic Compute Cloud (Amazon EC2) P5 instances powered by NVIDIA H100 Tensor Core GPUs and AWS’s state-of-the-art networking and scalability that will deliver up to 20 exaFLOPS of compute performance for building and training the largest deep learning models. P5 instances will be the first GPU-based instance to take advantage of AWS’s second-generation Elastic Fabric Adapter (EFA) networking, which provides 3,200 Gbps of low-latency, high bandwidth networking throughput, enabling customers to scale up to 20,000 H100 GPUs in EC2

UltraClusters for on-demand access to supercomputer-class performance for AI.

New supercomputing clusters

New P5 instances are built on more than a decade of collaboration between AWS and NVIDIA delivering the AI and HPC infrastructure and build on four previous collaborations across P2, P3, P3dn, and P4d(e) instances. P5 instances are the fifth generation of AWS offerings powered by NVIDIA GPUs and come almost 13 years after its initial deployment of NVIDIA GPUs, beginning with CG1 instances. P5 instances are ideal for training and running inference for increasingly complex LLMs and computer vision models behind the most-demanding and compute-intensive generative AI applications, including question answering, code generation, video and image generation, speech recognition, and more.

Specifically built for both enterprises and startups racing to bring AI-fueled innovation to market in a scalable and secure way, P5 instances feature eight NVIDIA H100 GPUs capable of 16 petaFLOPs of mixed-precision performance, 640 GB of high-bandwidth memory, and 3,200 Gbps networking connectivity (8x more than the previous generation) in a single EC2 instance. The increased performance of P5 instances accelerates the time-to-train machine learning (ML) models by up to six times (reducing training time from days to hours), and the additional GPU memory helps customers train larger, more complex models. P5 instances are expected to lower the cost to train ML models by up to 40% over the previous generation, providing customers greater efficiency over less flexible cloud offerings or expensive on-premises systems.

Amazon EC2 P5 instances are deployed in hyperscale clusters called EC2 UltraClusters that are comprised of the highest performance compute, networking, and storage in the cloud. Each EC2 UltraCluster is one of the most powerful supercomputers in the world, enabling customers to run their most complex multi-node ML training and distributed HPC workloads. With the new EC2 P5 instances, customers like Anthropic, Cohere, Hugging Face, Pinterest, and Stability AI will be able to build and train the largest ML models at scale. The collaboration through additional generations of EC2 instances will help startups, enterprises, and researchers seamlessly scale to meet their ML needs.

New server designs for scalable, efficient AI

Leading up to the release of H100, NVIDIA and AWS engineering teams with expertise in thermal, electrical, and mechanical fields have collaborated to design servers to harness GPUs to deliver AI at scale, with a focus on energy efficiency in AWS infrastructure. GPUs are typically twenty times more energy efficient than CPUs for certain AI workloads, with the H100 up to three hundred times more efficient for LLMs than CPUs.

Building on AWS and NVIDIA’s work focused on server optimization, the companies have begun collaborating on future server designs to increase the scaling efficiency with subsequent-generation system designs, cooling technologies, and network scalability.

Read Also
Tech giant PDD Holdings, parent of Pinduoduo and Temu, moves headquarters from China to Ireland
SAP signs IBM Watson deal, ChatGPT showstopper waits in the wings
2022-2023 Shanghai and its Surrounding Areas IDC Industry Market Development Research Report