A Deep Dive into the Creation of DBRX: The World’s Most Robust Open-Source AI Model

Will Knight

This past Monday, about a dozen engineers and executives at data science and AI company Databricks gathered in conference rooms connected via Zoom to learn if they had succeeded in building a top artificial intelligence language model. The team had spent months, and about $10 million, training DBRX, a large language model similar in design to the one behind OpenAI’s ChatGPT. But they wouldn’t know how powerful their creation was until results came back from the final tests of its abilities.

“We’ve surpassed everything,” Jonathan Frankle, chief neural network architect at Databricks and leader of the team that built DBRX, eventually told the team, which responded with whoops, cheers, and applause emojis. Frankle usually steers clear of caffeine but was taking sips of iced latte after pulling an all-nighter to write up the results.

Databricks will release DBRX under an open source license, allowing others to build on top of its work. Frankle shared data showing that across about a dozen or so benchmarks measuring the AI model’s ability to answer general knowledge questions, perform reading comprehension, solve vexing logical puzzles, and generate high-quality code, DBRX was better than every other open source model available.

It excelled beyond both Meta’s Llama 2 and Mistral’s Mixtral, two of the foremost open source AI models of the present. When the scores were displayed, Ali Ghodsi, CEO of Databricks, excitedly cheered, “Yes!” He then inquired, “Did we outperform Elon’s venture?” Frankle confirmed that they had indeed surpassed Grok AI model, recently open-sourced by Musk’s xAI. He humorously added, “If we receive a critical tweet from him, I’ll mark it as a triumph.”

Surprisingly, in several scoring areas, DBRX also came remarkably close to GPT-4, the hitherto unchallenged closed model by OpenAI, which acts as the brains behind ChatGPT and is seen as the epitome of machine intelligence. Frankle, with a radiating grin, declared, “We’ve raised the bar for open-source LLMs.”

With its move to open-source, Databricks is adding to a momentum that dares to defy the clandestine tactics employed by key players in the current generative AI wave. While OpenAI and Google retain their GPT-4 and Gemini large language models, some competitors, like Meta, have made their models accessible. The intention is to encourage innovation by providing the technology to an extended user base, which includes more researchers, entrepreneurs, startups, and well-established companies.

Databricks also intends to be transparent about the effort invested in creating its open-source model; something that Meta hasn’t done regarding certain crucial aspects of its Llama 2 model. They plan to launch a blog post that lays out the work put into crafting the model. Databricks also gave WIRED the chance to spend time with its engineers as they made critical decisions during the final stages of the multimillion-dollar DBRX training operation. This gave an insight into the intricacy and challenges of constructing a leading AI model. It also highlighted how recent advancements promise to cut down expenses. The combination of these cost-cutting measures with the openness of models like DBRX suggests that AI development is poised to maintain its speedy pace.

Ali Farhadi, CEO of the Allen Institute for AI, says greater transparency around the building and training of AI models is badly needed. The field has become increasingly secretive in recent years as companies have sought an edge over competitors. Opacity is especially important when there is concern about the risks that advanced AI models could pose, he says. “I’m very happy to see any effort in openness,” Farhadi says. “I do believe a significant portion of the market will move towards open models. We need more of this.”

Aarian Marshall

Stephen Ornes

Chris Baraniuk

Peter Guest

Databricks has a reason to be especially open. Although tech giants like Google have rapidly rolled out new AI deployments over the past year, Ghodsi says that many large companies in other industries are yet to widely use the technology on their own data. Databricks hopes to help companies in finance, medicine, and other industries, which he says are hungry for ChatGPT-like tools but also leery of sending sensitive data into the cloud.

“We call it data intelligence—the intelligence to understand your own data,” Ghodsi says. Databricks will customize DBRX for a customer or build a bespoke one tailored to their business from scratch. For major companies, the cost of building something on the scale of DBRX makes perfect sense, he says. “That’s the big business opportunity for us.” In July last year, Databricks acquired a startup called MosaicML, that specializes in building AI models more efficiently, bringing on several people involved with building DBRX, including Frankle. No one at either company had previously built something on that scale before.

DBRX, like other large language models, is essentially a giant artificial neural network—a mathematical framework loosely inspired by biological neurons—that has been fed huge quantities of text data. DBRX and its ilk are generally based on the transformer, a type of neural network invented by a team at Google in 2017 that revolutionized machine learning for language.

Not long after the transformer was invented, researchers at OpenAI began training versions of that style of model on ever-larger collections of text scraped from the web and other sources—a process that can take months. Crucially, they found that as the model and data set it was trained on were scaled up, the models became more capable, coherent, and seemingly intelligent in their output.

Seeking still-greater scale remains an obsession of OpenAI and other leading AI companies. The CEO of OpenAI, Sam Altman, has sought $7 trillion in funding for developing AI-specialized chips, according to The Wall Street Journal. But not only size matters when creating a language model. Frankle says that dozens of decisions go into building an advanced neural network, with some lore about how to train more efficiently that can be gleaned from research papers, and other details are shared within the community. It is especially challenging to keep thousands of computers connected by finicky switches and fiber-optic cables working together.

Aarian Marshall

Stephen Ornes

Chris Baraniuk

Peter Guest

“You’ve got these insane [network] switches that do terabits per second of bandwidth coming in from multiple different directions,” Frankle said before the final training run was finished. “It’s mind-boggling even for someone who’s spent their life in computer science.” That Frankle and others at MosaicML are experts in this obscure science helps explain why Databricks’ purchase of the startup last year valued it at $1.3 billion.

The data fed to a model also makes a big difference to the end result—perhaps explaining why it’s the one detail that Databricks isn’t openly disclosing. “Data quality, data cleaning, data filtering, data prep is all very important,” says Naveen Rao, a vice president at Databricks and previously founder and CEO of MosaicML. “These models are really just a function of that. You can almost think of that as the most important thing for model quality.”

Recent advancements in AI research have resulted in new architecture tweaks and modifications to make modern AI models more efficient. One of the most notable advancements is an architecture known as “mixture of experts”, where specific parts of a model activate to answer a query, based on its content. This leads to a more economical model to train and function. For instance, DBRX has approximately 136 billion parameters that are updated during training, Llama 2 has 70 billion parameters, Mixtral has 45 billion, and Grok has 314 billion. But for DBRX, only an average of 36 billion are activated for a typical query. Databricks states that certain enhancements made to the model in order to better its utilization of the underlying hardware have improved training efficiency by 30 to 50 percent, while also making the model respond more quickly to queries and reducing energy consumption.

Training a large AI model often comes down to a decision that is both sentimental and technical. Recently, the Databricks team had to ponder over a decision worth millions while trying to get the maximum benefit from the model.

After a two-month period of training the model on 3,072 powerful Nvidia H100s GPUs rented from a cloud service provider, DBRX was scoring high on several benchmarks, but it still had roughly another week’s worth of supercomputer time to exploit.

The team used Slack to brainstorm on how to utilize the remaining week of supercomputer power. Some suggested creating a version of the model to generate computer code or a smaller version for hobbyists. Some proposed to stop further enlargement of the model and instead boost its proficiency on specific capabilities through curriculum learning. The last option was simply to continue as is, hoping to further enhance the model’s capabilities. This option was often referred to as the “dive in headfirst” choice and proved to be quite popular with one team member.

Aarian Marshall

Stephen Ornes

Chris Baraniuk

Peter Guest

While the discussion remained friendly, strong opinions bubbled up as different engineers pushed for their favored approach. In the end, Frankle deftly ushered the team toward the data-centric approach. And two weeks later it would appear to have paid off massively. “The curriculum learning was better, it made a meaningful difference,” Frankle says.

Frankle was less successful in predicting other outcomes from the project. He had doubted DBRX would prove particularly good at generating computer code because the team didn’t explicitly focus on that. He even felt sure enough to say he’d dye his hair blue if he was wrong. Monday’s results revealed that DBRX was better than any other open AI model on standard coding benchmarks. “We have a really good code model on our hands,” he said during Monday’s big reveal. “I’ve made an appointment to get my hair dyed today.”

The final version of DBRX is the most powerful AI model yet to be released openly, for anyone to use or modify. (At least if they aren’t a company with more than 700 million users, a restriction Meta also places on its own open source AI model Llama 2.) Recent debate about the potential dangers of more powerful AI has sometimes centered on whether making AI models open to anyone could be too risky. Some experts have suggested that open models could too easily be misused by criminals or terrorists intent on committing cybercrime or developing biological or chemical weapons. Databricks says it has already conducted safety tests of its model and will continue to probe it.

Stella Biderman, executive director of EleutherAI, a collaborative research project dedicated to open AI research, says there is little evidence suggesting that openness increases risks. She and others have argued that we still lack a good understanding of how dangerous AI models really are or what might make them dangerous—something that greater transparency might help with. “Oftentimes, there’s no particular reason to believe that open models pose substantially increased risk compared to existing closed models,” Biderman says.

EleutherAI, along with Mozilla and approximately 50 other organizations and scholars, contributed to an open letter sent to the US Secretary of Commerce, Gina Raimondo. The letter requested her consideration in ensuring future AI regulations accommodate open source AI projects. The collaborative message expressed the view that open models foster economic growth by assisting startups, small businesses, and facilitate the acceleration of scientific research.

As for Databricks, they harbor hopes that DBRX will serve two purposes. Not only will it provide a new model for other AI researchers to explore and offer them useful insights for their own constructions, but DBRX could potentially assist with gaining a more profound understanding of AI. Frankle suggests his team intends on researching how the model evolved during the last week of training, which could potentially expose how an impactful model acquires additional capabilities. Frankle stated, “What excites me the most is the science we get to do at this scale.”

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Article

Review: Base Hit with MLB The Show 24

Next Article

New Guardrails on Government's Use of AI Introduced by the White House

Related Posts