Most artificial intelligence experts seem to agree that taking the next big leap in the field will depend at least partly on building supercomputers on a once unimaginable scale. At an event hosted by the venture capital firm Sequoia last month, the CEO of a startup called Lightmatter pitched a technology that might well enable this hyperscale computing rethink by letting chips talk directly to one another using light.
Data today generally moves around inside computers—and in the case of training AI algorithms, between chips inside a data center—via electrical signals. Sometimes parts of those interconnections are converted to fiber-optic links for great bandwidth, but converting signals back and forth between optical and electrical creates a communications bottleneck.
Instead, Lightmatter wants to directly connect hundreds of thousands or even millions of GPUs—those silicon chips that are crucial to AI training—using optical links. Reducing the conversion bottleneck should allow data to move between chips at much higher speeds than is possible today, potentially enabling distributed AI supercomputers of extraordinary scale.
Lightmatter’s innovation, known as Passage, is an optical or photonic form of interconnects built into silicon. This allows its hardware to interact directly with the transistors on a silicon chip like a GPU. The company believes this capability enables it to move data between chips with a hundredfold increase in bandwidth.
To provide some context, GPT-4, OpenAI’s most sophisticated AI algorithm and the backbone of ChatGPT, is speculated to have operated on over 20,000 GPUs. Harris suggests that Passage, slated for completion in 2026, should enable more than a million GPUs to function concurrently on the same AI training run.
Lightmatter aims to expedite AI supercomputers by transmitting data between chips utilizing light rather than electrical signals.
One of the attendees at the Sequoia event was Sam Altman, CEO of OpenAI. He has occasionally been obsessed with the task of constructing larger, quicker data centers to promote AI further. In February, the Wall Street Journal noted that Altman requested up to $7 trillion in funding to produce massive amounts of chips for AI. In contrast, a more recent statement by The Information hints that OpenAI and Microsoft are sketching out plans for a $100 billion data center, called Stargate, with millions of chips. Because electrical interconnects consume so much power, connecting chips of such a scale would necessitate an extraordinary amount of energy, requiring new ways of connecting chips, such as the one Lightmatter is introducing.
Dell Cameron
Andy Greenberg
Julian Chokkattu
Reece Rogers
GlobalFoundries, a semiconductor company which produces chips for clients such as AMD and General Motors, has recently partnered with tech company, Lightmatter. According to Harris, Lightmatter’s CEO, they are working not only with top semiconductor corporations worldwide, but also with ‘hyperscalers’, which include global cloud companies like Microsoft, Amazon, and Google.
One of the pivotal challenges in developing smarter algorithms is reinventing the wiring for large AI projects. This has been identified as a potential area for Lightmatter or other innovative companies. Scaling-up of hardware—a fundamental factor in the creation of advanced AI products like ChatGPT—is seen by many AI researchers as key for future AI progression. This is also crucial for achieving the loosely defined target of artificial general intelligence (AGI) which essentially embodies programs that can equally match or surpass biological intelligence.
Harris, the CEO of Lightmatter, boldly proposes that using light to link a million chips together could provide a basis for algorithms that are several generations ahead of today’s leading technology. “Passage, our product, is going to enable AGI algorithms,” he confidently states.
To train large-scale AI algorithms, substantial data centers are necessary. These usually comprise of racks filled with thousands of computers running specialized silicon chips and a web of predominantly electrical connections between them. Coordinating training runs for AI across so many systems, each interconnected by wires and switches, is a massive engineering task. The act of converting between electronic and optical signals also imposes fundamental constraints on the ability of chips to perform computations as a whole.
Lightmatter’s method aims to streamline the complex traffic in AI data centers. According to Harris, traditionally communication between GPUs entails traversing numerous layers of switches. However, with a data center utilizing Passage, each GPU would possess a high-speed connection to all other chips.
Lightmatter’s development of Passage highlights how the recent surge in AI has prompted both big and small businesses to innovate key hardware powering advancements such as OpenAI’s ChatGPT. Nvidia, the foremost provider of GPUs for AI initiatives, showcased its newest chip for AI training, called Blackwell, at its annual conference last month. According to CEO Jensen Huang, Nvidia will sell this GPU as part of a “superchip”, which includes two Blackwell GPUs and a standard CPU processor, interconnected using the company’s new high-speed communication technology, NVLink-C2C.
While the chip industry is renowned for boosting computing power from chips without enlarging them, Nvidia decided to deviate from this convention. The Blackwell GPUs in the superchip are double the power of their predecessors, but also consume significantly more power as they are created by connecting two chips. This trade-off, coupled with Nvidia’s strategy of stringing its chips together with high-speed links, indicates that enhancements to crucial components for AI supercomputers, similar to Lightmatter’s proposal, may become increasingly significant.