OpenAI Unveils “Strawberry”: A Revolutionary AI Model Designed to Tackle Complex Problems Incrementally

OpenAI made the last big breakthrough in artificial intelligence by increasing the size of its models to dizzying proportions, when it introduced GPT-4 last year. The company today announced a new advance that signals a shift in approach—a model that can “reason” logically through many difficult problems and is significantly smarter than existing AI without a major scale-up.

The new model, dubbed OpenAI o1, can solve problems that stump existing AI models, including OpenAI’s most powerful existing model, GPT-4o. Rather than summon up an answer in one step, as a large language model normally does, it reasons through the problem, effectively thinking out loud as a person might, before arriving at the right result.

“This is what we consider the new paradigm in these models,” Mira Murati, OpenAI’s chief technology officer, tells WIRED. “It is much better at tackling very complex reasoning tasks.”

The new model was code-named Strawberry within OpenAI, and it is not a successor to GPT-4o but rather a complement to it, the company says.

Murati from OpenAI has disclosed that the organization is in the process of developing GPT-5, their latest generation model, expected to surpass the size of its predecessor. The new model will not only pursue the traditional scaling approach but will also incorporate reasoning technology. “There are two paradigms,” explained Murati. “The scaling paradigm and this new reasoning paradigm. We are looking to integrate these approaches.”

Large Language Models (LLMs), like their predecessors, generate responses from extensive neural networks that have been trained with substantial data. These models demonstrate impressive language and logic capabilities, yet they sometimes fail at simple tasks such as basic arithmetic that requires logical thinking.

Murati highlighted that OpenAI employs reinforcement learning in their models, which rewards the system for correct responses and penalizes it for errors to enhance its reasoning capabilities. “The model sharpens its thinking and improves its strategies to find answers,” she noted. Reinforcement learning has not only helped in achieving extraordinary proficiency in games but also in practical applications likesuperhuman game playing andchip designing. This technique is crucial for refining an LLM into an effective and reliable chatbot.

Mark Chen, the vice president of research at OpenAI, demonstrated the capabilities of the new model at WIRED. He showcased its ability to solve complex problems that the previous model, GPT-4o, could not address, including an intricate chemistry problem and a challenging mathematical puzzle about the ages of a prince and princess, solving it correctly with the prince being 30 and the princess being 40.

“The [new] model is learning to think for itself, rather than kind of trying to imitate the way humans would think,” as a conventional LLM does, Chen says.

OpenAI says its new model performs markedly better on a number of problem sets, including ones focused on coding, math, physics, biology, and chemistry. On the American Invitational Mathematics Examination (AIME), a test for math students, GPT-4o solved on average 12 percent of the problems while o1 got 83 percent right, according to the company.

The new model is slower than GPT-4o, and OpenAI says it does not always perform better—in part because, unlike GPT-4o, it cannot search the web and it is not multimodal, meaning it cannot parse images or audio.

Improving the reasoning capabilities of LLMs has been a hot topic in research circles for some time. Indeed, rivals are pursuing similar research lines. In July, Google announced AlphaProof, a project that combines language models with reinforcement learning for solving difficult math problems.

AlphaProof demonstrated the ability to solve mathematical problems by analyzing correct solutions. Expanding this learning approach is challenging because not every scenario a model may face comes with a premade answer. According to Chen from OpenAI, the company has achieved significant strides in creating a more universal reasoning system. “I do think we have made some breakthroughs there; I think it is part of our edge,” Chen states, “It’s actually fairly good at reasoning across all domains.”

Noah Goodman, a Stanford professor who has explored enhancing reasoning in large language models (LLMs), believes that the route to more comprehensive training may involve utilizing a “carefully prompted language model and handcrafted data” for training purposes. He remarks that the ability to exchange speed for improved accuracy consistently would represent a significant improvement.

Yoon Kim, an assistant professor at MIT, comments that the process of problem-solving by LLMs is still largely enigmatic, and their reasoning may differ fundamentally from human intelligence. This distinction becomes increasingly important as these technologies are more extensively adopted. “These are systems that would be potentially making decisions that affect many, many people,” he suggests. “The larger question is, do we need to be confident about how a computational model is arriving at the decisions?”

The newly introduced technique by OpenAI could also promote better conduct in AI models. Murati notes that the latest model proves more capable of avoiding negative or harmful outputs by contemplating the consequences of its actions. “If you think about teaching children, they learn much better to align to certain norms, behaviors, and values once they can reason about why they’re doing a certain thing,” she explains.

Oren Etzioni, a distinguished former professor at the University of Washington and an expert in artificial intelligence, emphasized the importance of enhancing large language models (LLMs) to perform complex multi-step problem-solving, utilize tools, and tackle intricate challenges. He stated, “Simply increasing size will not achieve these goals.” He further mentioned that even if we perfect reasoning abilities, issues such as hallucinations and accuracy in factual representation remain a hurdle.

Chen from OpenAI suggests that their newly developed reasoning method demonstrates that significant advancements in AI do not necessarily require massive computational resources. “We believe this new approach will allow us to deliver smarter technology more cost-effectively,” he noted, “which aligns closely with our foundational goal.”

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Article

Album Deep Dive: "Shattered Memories" by Hollowbody - A Comprehensive Review

Next Article

UK Government Declares Data Centers as Critical National Infrastructure

Related Posts