Nvidia has announced a significant enhancement in AI storage speed, boasting nearly a 50% increase in storage read bandwidth. This improvement is attributed to their Spectrum-X Ethernet networking technology, which optimizes the processing of large language models (LLMs).
The Spectrum-X solution combines the Spectrum-4 Ethernet switch with the BlueField-3 SuperNIC smart networking card, utilizing RoCE v2 for remote direct memory access over Converged Ethernet. Specifically, the Spectrum-4 SN5000 switch supports 64 800 Gbps Ethernet ports, totaling 51.2 Tbps of bandwidth. Nvidia’s technical improvements include RoCE extensions for adaptive routing and congestion control, which minimize network congestion by sending data packets through the least congested routes.
Adaptive routing allows packets to arrive out of order, but the BlueField-3 DPU compensates by reassembling them in the correct sequence, a process that would traditionally require retransmission of packets under standard Ethernet protocols. This capability enhances bandwidth efficiency, making storage systems significantly more effective compared to standard RoCE v2 setups.
Nvidia conducted tests using its Israel-1 AI supercomputer, comparing the performance of standard networking against the modified Spectrum-X technology. The results showed improvements in read bandwidth ranging from 20% to 48%, while write bandwidth saw enhancements of 9% to 41%.
The company also highlighted the importance of storage in AI processing, emphasizing that as LLMs have massive data requirements, faster data movement is crucial to prevent GPUs from idling while waiting for data.
Nvidia collaborates with storage vendors DDN, VAST Data, and WEKA to further integrate and optimize their solutions for the Spectrum-X networking platform.