Exploring GPU Alternatives: The New Trend Among AI Startups

The GPU has been at the core of AI processing, but many firms believe there may be a superior alternative.

In a strategic move made two decades ago, Nvidia chose to move beyond their gaming focus to concentrate on high-performance computing (HPC) processing. A considerable aspect of HPC involves math calculations, and GPUs, by their nature, are essentially robust math coprocessors – their thousands of cores working simultaneously.

This decision has yielded substantial benefits for Nvidia. In their latest quarter, they reported an all-time high data center revenue of $14.5 billion, marking a 41% increase from the previous quarter and a 279% leap from the same quarter last year. Their GPUs have become a standard for AI processing, even more prominent than in gaming.

However, a slew of firms are eyeing Nvidia’s dominant position, including both apparent competitors like AMD and Intel as well as numerous startups. These startups claim to have designed superior methods of processing large language models (LLMs) and other AI components. The list includes companies such as SambaNova, Cerebras, GraphCore, Groq, xAI, among others. Parallelly, Intel is also venturing into GPU alternatives with its Gaudi3 processor, besides having the Max GPU series for data centers.

There’s a vast potential present for vendors in the field of Artificial Intelligence hardware. According to Precedence Research, the AI hardware market is expected to be around $43 billion in 2022, reaching a staggering $240 billion by 2030.

Senior Vice President and analyst with Forrester Research, Glenn O’Donnell, shares that the CPU isn’t particularly ideal for exclusive processing such as in AI. Being a general-purpose processor, it’s performing a multitude of tasks, some of which may not necessarily be needed, like running the system.

O’Donnell adds, “There’s energy wasted and circuitry usage that isn’t really required. So what if a specialised, optimised chip was available?” One excellent example of this is Google’s TensorFlow processor, created specifically for the tensor flow algorithm and processing needed for tensor flow analytics. The chip isn’t a compromise, it’s created for this sole function.”

According to Daniel Newman, principal analyst with Futurum Research, GPUs have a similar issue. The GPU was designed in the 1990s for 3-D gaming acceleration and, like the CPU, there’s room for improvement in its efficiency.

In the larger context, the architecture remains akin to a kernel model, implying a single task execution at a time. This necessitates a host chip to manage multiple models or their elements. This cycle involves significant intercommunication between the processors, disassembling the model into components suitable for each GPU, and reassembling to construct the base models.

Founder, CEO, and chief engineer at Ainstein.com, Elmer Morales, articulates how when AI and HPC began to take shape, industries adopted GPUs, as they were readily accessible and easy-to-use.

The argument surrounding alternative GPU vendors centers on the claim that they have devised an improved solution.

Rodrigo Liang, co-founder and CEO of SambaNova Systems, explains that while GPUs perform decently for general training of numerous aspects, users can learn to deploy them rapidly. However, issues begin to arise when working with extremely large models. For instance, in the case of GPT-sized models, one needs to operate thousands of these chips. The caveat is that these chips may not function at an optimal efficiency.

Senior product marketing manager at Cerebras Systems, James Wang, concurs with the traditional design view, stating that the GPU chip is simply not sufficiently large. The company’s WSE-2 chip is the dimensions of an album cover. While the Hopper GPU is comprised of a couple thousand cores, the WSE-2 boasts 850,000 cores. With this, Cerebras Systems says it has 9,800 times the memory bandwidth of a GPU.

Wang explains that memory capacity determines the size of the model you can train. “If a GPU is your starting point, you are restricted by its size and its memory. If larger models are required, the problem intensifies. Essentially, you have to code around the GPU’s shortcomings,” he asserts.

The GPU is simply too small for enormous models, according to another statement from Morales, meaning the model must be divided across thousands of GPU chips for processing. “Ignoring latency, it’s just too small if the model doesn’t fit.” He believes that eighty gigabytes – the memory capacity of an Nvidia H100 GPU – is inadequate for large models.

The creation of a physically larger chip with more cores and greater memory means a larger language model can be processed on a chip-by-chip basis. Consequently, fewer chips are needed overall, resulting in reduced power consumption – a key factor in processing-intensive AI workloads.

Startups Cerebras and SambaNova are shifting the focus from being mere chipmakers to complete system developers. They offer both the necessary server hardware and software stack for running applications. Similar services are provided by Intel, AMD, and Nvidia. These companies are not only recognized for their silicon but also manage considerable software initiatives centered around AI.

This dual-purpose software ecosystem supports the corresponding hardware while securing customers on their platforms. O’Donell points out that a solitary GPU or CPU is quite useless, and highlights Nvidia’s success in this field due to the protective “moat” they’ve constructed around their CUDA platform. This makes the process of replacing Nvidia GPU hardware with Intel’s hardware complex because of its intertwining software ecosystem.

The whole AI industry, from Nvidia to Cerebras, according to Wang, is now welcoming the idea of open-source software. This approach fosters cross-platform compatibility, averting vendor or platform lock-in that Nvidia experienced with CUDA. This grants clients the freedom to opt for their hardware without being coerced into choosing a platform based on the available software.

“The trend towards open-source is relatively new,” shared Wang. “However, it’s been extremely beneficial for the industry, since it permits everyone to reap the benefits of the investment made by a single individual.”

“We are committed to providing startups and our customers with multiple options, enabling them to use various vendors, redesign and repurpose according to their needs, therefore avoiding network lock-in,” stated Morales of Ainstein. Ainstein utilises Grok systems by the Elon Musk-supported xAI, but their AI agents can operate on all platforms.

O’Donnell asserts that the upcoming stage in AI processing evolution will be the creation of tailor-made, programmable chips, essentially “FPGAs on steroids”. He elaborated, “FPGAs can be redesigned to accomplish different tasks effectively. I believe we’ll begin to see significant advancements in this area, likely in the later part of this decade.”

Morales agrees, indicating that it is not viable for hardware vendors to be restricted to a single model type. “Hardware manufacturers will need to provide comparable programmable chips that can be repurposed to run different models,” he explained. “This will afford consumers the flexibility to utilise a device for any purpose, against any model of their preference. I am confident that this is a trend the industry is moving towards.”

O’Donnell is sceptical that the majority of these startups have a fighting chance to take the lead, particularly against giants like Nvidia and Intel. “However, I believe some will locate their speciality and thrive within that. I don’t foresee any immediate breakthroughs, but who can predict? Some could potentially be acquired simply for their intellectual property,” he concluded.

Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Article

Unveiling Blizzard's 33-Year Journey in Jason Schreier's New Book

Next Article

Introducing the Masterminds Behind Goody-2: The World's 'Most Responsible' AI Chatbot

Related Posts