Amazon Web Services develops cooling technology for next generation Nvidia GPUs

Decided third-party solutions were not a "good fit".


Amazon Web Services (AWS) has developed its own cooling solution for the latest generation of Nvidia GPUs.


With the cloud giant having deployed Nvidia's Blackwell GPUs - and earlier this month, made UltraServer instances based on the Nvidia GB200 NVL72 system available, a move to liquid cooling has been necessitated.


In a video posted to YouTube, AWS' VP of compute and machine learning, David Brown, said: "In order to support the incredible compute density with GB200 NVL72 racks, we've moved towards a new first at AWS: liquid cooling.


"Blackwell represents the first liquid cooling hardware platform we've deployed at scale on AWS."


Brown added that while they looked at liquid cooling solutions currently available from third-party vendors, they would have required AWS to build liquid-cooled data centers from scratch, which would have long lead times "in the order of years."


"The second option would have been adopting some off-the-shelf solutions, but these didn't scale. They would take up too much data center floorspace, would still require major modifications to data centers, or increase water usage substantially."


AWS instead developed its own "In Row Heat Exchanger" (IRHX) that can be installed without adjusting the air-cooled mechanical design and with "minimal changes" to existing infrastructure.


According to Brown, this has meant that it can be fitted into their existing data centers.


The IRHX has a water distribution cabinet, a pumping unit, and fan coils. Cooling liquid is pushed from the pumping unit to the servers, and distributed through the chips via a cold plate designed by AWS and Nvidia. The warm liquid then returns to the IRHX through the coils and is cooled by fans, with heat expelled out of the back.


The IRHX is scalable and can have coil units added or subtracted as needed.


A blog post shared by AWS noted that it took the company's data center cooling team four months to go from "a whiteboard design to a prototype" and then 11 months to deliver the first unit. "That included time to develop designs, build a supply chain, write control software, test everything, and manufacture systems."


Following the revelation that AWS was using its own cooling solution, Vertiv - a provider of power and cooling solutions to the data center industry - saw its stock take an 11 percent hit on Thursday, July 10.


Bloomberg Intelligence analyst Mustafa Okur noted the potential impact on Vertiv: "Amazon Web Services rolling out its own server liquid-cooling system could weigh on Vertiv’s future growth prospects. Around 10 percent of overall sales come from liquid cooling, we calculate, and AWS may be one of the largest customers."


AWS is no stranger to developing its own hardware and equipment. The company has its own chips - Graviton, Trainium, and Inferentia, and last year launched a series of data center components to help its data centers handle the next generation of AI workloads.

Read Also
New CPC Solution Tackles Growing Liquid Cooling Needs for AI
Waste heat from Météo France supercomputers to be used in Toulouse district heating system
Stack secures AU$1.3bn green financing in Australia to fund Melbourne campus

Research