Nvidia Unveils New Blackwell-Powered Systems: A Massive Leap Forward

The DGX SuperPOD, boasting eight or more DGX GB200 systems, has the potential to scale up to tens of thousands of Nvidia Superchips.

Nvidia is introducing its new Blackwell architecture, as well as new DGX systems. These new introductions promise significant strides in performance compared to the older generation.

Nvidia’s current DGX servers range from ones with 8 Hopper processors to ones boasting 256 processors and starting at a cost of $500,000, reaching to several millions. Nvidia plans to follow a similar structure for the Blackwell generation, though specific prices are yet to be disclosed.

The Nvidia GB200 NVL72 system stands at the top-tier of this new lineup. Developed as a 72-node, liquid-cooled, rack-scale system, it caters to the most compute-intensive workloads. Each DGX GB200 system is equipped with 36 Grace Blackwell Superchips, encompassing 72 Blackwell GPUs and 36 Grace CPUs, all connected by the latest NVLink interconnect. The platform functions as a unique GPU boasting 1.4 exaflops of AI performance and comes with 30TB of fast memory.

The innovative DGX systems present more than just speed and efficiency, they represent a revolutionary shift in interchip communication, as stated by Charlie Boyle, the vice president of DGX systems at Nvidia. For large AI training jobs, traditional GPUs might spend a disproportionate amount of time communicating amongst themselves. By implementing a memory-based network called NVlink, this interaction speed is significantly enhanced, enabling more efficient operations.

The structure of a DGX rack comprises of a 44U cabinet containing 18 compute trays, nine switch trays, two power distribution units, a management switch power, a liquid cooling manifold, and a NVlink backplane. This is the first instance of DGX systems incorporating liquid cooling, an indirect acknowledgment that such high-performing systems generate considerable heat. Boyle did not comment on speculations that the Blackwell processor could operate over 1000 watts of power.

According to Boyle, the decision for liquid cooling was driven by considerations of efficiency and density. Accommodating 72 GPUs in a rack with an uninterrupted NVlink requires a high density configuration. This technology is also made available to OEM and ODM partners who may choose different configurations or density depending on their requirements. The decision for a liquid cooled DGX primarily stems from the high-density needs of the system.

The recent DGX SuperPOD iteration equipped with DGX GB200 systems does not render the earlier versions redundant. However, it does offer specific features that are exclusive to this system. Built-in chip features such as RAS (reliability, availability, scalability) extend into the server providing predictive maintenance, system health monitoring and constant monitoring of thousands of data points.

Nvidia has created a program they refer to as the DGX Ready data center program; their data center partners have been prepared to host these systems with little to no set up efforts required, this includes the aspect of liquid cooling.

“When these systems are shipped to our customers – and we expect most of these will be sent to colocation data centers – while some customers have their own native liquid systems and some are putting together next generation data centers, we have ensured the process is streamlined for customers that want to adopt this,” he stated.

The new DGX systems are projected to be shipped out later in the current year.

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Article

Demo of "Sand Land" Now Available: Unlock Rewards for the Full Game Experience

Next Article

How DeepMind is Revolutionizing Soccer by Perfecting Corner Kicks

Related Posts