One of the most fundamental aspects of networking, interconnect networking technologies, is crucial for enabling devices to connect in a standardized manner. In contemporary data centers, these technologies face unprecedented challenges due to the demands of artificial intelligence (AI).
As companies race to develop expansive AI systems capable of managing trillion-parameter models, they encounter significant bottlenecks, particularly in the rapid transmission of data among thousands of accelerators. These bottlenecks can hinder the efficient parallel processing essential for training large language models (LLMs), which are used in applications like generative AI.
The Unique Challenges of AI Networking
Traditional networking systems, which were designed for general computing purposes, struggle to address the specific requirements of AI workloads. The demands of AI training and inference introduce networking challenges that differ greatly from conventional data center tasks. The scale and nature of communication within modern AI systems often exceed the limits of existing interconnect technologies.
Key differences include:
- Communication Patterns: Traditional applications tend to use client-server traffic, whereas AI requires an all-to-all communication model, where every GPU communicates with all other GPUs at once.
- Bandwidth Needs: Increasing AI model parameters can lead to a fourfold increase in network traffic due to the need for gradient synchronization among multiple accelerators.
- Latency Sensitivity: AI applications require response times in the sub-microsecond range for training synchronization, compared to the millisecond tolerances acceptable in typical applications.
- Traffic Volume: Large training clusters generate continual data streams that far exceed the traffic demands seen in traditional high-performance computing (HPC) scenarios.
The Three Key Players: Ethernet, InfiniBand, and Omni-Path
In 2025, three core technologies dominate the data center interconnect landscape: Ethernet, InfiniBand, and Omni-Path.
Ethernet’s Evolution
Ethernet has long been the dominant standard in enterprise data centers due to its compatibility and cost-efficiency. However, standard Ethernet has limitations, particularly in high-latency and high-traffic scenarios. Recently, various advancements have positioned Ethernet to better accommodate the demands of AI.
- IEEE 802.3df-2024: This landmark standard introduces the 800 Gigabit Ethernet specification, which boasts enhanced flexibility and backward compatibility with existing technologies, ensuring investment protection and enabling smooth transitions.
- Ultra Ethernet Consortium (UEC) 1.0: The UEC aims to optimize Ethernet for AI workloads by introducing modern RDMA implementations and advanced congestion control, eliminating the historical reliance on traditional lossless networks.
InfiniBand’s Advantage
InfiniBand, originally developed in the late 1990s, was created specifically for high-performance data center interconnects. Its architecture supports ultra-low latency communication and lossless transmission, which is critical for large AI workloads.
With its recent evolution to XDR (eXtended Data Rate), InfiniBand maintains its advantages while matching Ethernet’s bandwidth capabilities. The community anticipates its deployment to support up to 500,000 endpoints with nearly linear scaling.
Omni-Path’s Resurgence
Originally developed by Intel as a competitor to InfiniBand, Omni-Path faced setbacks but has been revitalized through the efforts of Cornelis Networks. The revived technology aims for cost-competitive AI network deployments, focusing on optimizing cost-performance where absolute performance may not be as critical.
The Future of AI Interconnects
As AI is expected to reshape various industries, networking technologies like Ethernet, InfiniBand, and Omni-Path are evolving from mere data conduits into intelligent networks that facilitate advanced computing capabilities. While InfiniBand excels in performance, Ethernet’s openness and the revival of Omni-Path suggest a future of democratized access to high-performance interconnects. Hyperscalers are expected to adopt a hybrid approach that balances innovation with operational efficiency, underscoring the crucial role of interconnections in the evolving landscape of AI.