When ChatGPT was released in November 2023, it could only be accessed through the cloud because the model behind it was downright enormous.
Today I am running a similarly capable AI program on a Macbook Air, and it isn’t even warm. The shrinkage shows how rapidly researchers are refining AI models to make them leaner and more efficient. It also shows how going to ever larger scales isn’t the only way to make machines significantly smarter.
The model now infusing my laptop with ChatGPT-like wit and wisdom is called Phi-3-mini. It’s part of a family of smaller AI models recently released by researchers at Microsoft. Although it’s compact enough to run on a smartphone, I tested it by running it on a laptop and accessing it from an iPhone through an app called Enchanted that provides a chat interface similar to the official ChatGPT app.
In a paper describing the Phi-3 family of models, Microsoft’s researchers say the model I used measures up favorably to GPT-3.5, the OpenAI model behind the first release of ChatGPT. That claim is based on measuring its performance on several standard AI benchmarks designed to measure common sense and reasoning. In my own testing, it certainly seems just as capable.
Microsoft announced a new “multimodal” Phi-3 model capable of handling audio, video, and text at its annual developer conference, Build, this week. That came just days after OpenAI and Google both touted radical new AI assistants built on top of multimodal models accessed via the cloud.
Microsoft’s Lilliputian family of AI models suggest it’s becoming possible to build all kinds of handy AI apps that don’t depend on the cloud. That could open up new use cases, by allowing them to be more responsive or private. (Offline algorithms are a key piece of the Recall feature Microsoft announced that uses AI to make everything you ever did on your PC searchable.)
The Phi family also reveals something about the nature of modern AI, and perhaps how it can be improved. Sébastien Bubeck, a researcher at Microsoft involved with the project, tells me the models were built to test whether being more selective about what an AI system is trained on could provide a way to fine-tune its abilities.
The large language models like OpenAI’s GPT-4 or Google’s Gemini that power chatbots and other services are typically spoon-fed huge gobs of text siphoned from books, websites, and just about any other accessible source. Although it’s raised legal questions, OpenAI and others have found that increasing the amount of text fed to these models, and the amount of computer power used to train them, can unlock new capabilities.
By Carlton Reid
By Emily Mullin
By Steven Levy
By Andy Greenberg
Bubeck, with a keen interest in the character of “intelligence” demonstrated by language models, pondered if maintaining an attention to the input to a model could hone its capabilities without augmenting its training data.
Previous year in September, his team picked a model approximately one-17th the mass of OpenAI’s GPT-3.5, and nurtured it on “textbook grade” artificial data produced by a larger AI model, inclusive of snippets from certain domains like programming. The resultant model exhibited astonishing abilities for its size. He states, “Astoundingly, what we noted is that we managed to outshine GPT-3.5 in coding via this method. That was extremely astounding to us.”
Bubeck’s crew at Microsoft has unearthed more findings using this method. A trial demonstrated that supplying an ultra-small model with children’s stories made it generate consistently coherent output, despite AI systems of this size typically generating nonsense when trained in traditional methods. Again, the outcome implies the potential of making seemingly unpowered AI software functional by teaching it with adequate material.
Bubeck asserts these outcomes seem to suggest that enhancing future AI systems’ intelligence will necessitate more than simply escalating them to even larger sizes. It also seems plausible that minimised models like Phi-3 will be a crucial feature of the computing future. Executing AI models “locally” on a handheld device, laptop, or PC minimises the delay or interruptions which can manifest when queries need to be inserted into the cloud. It ensures your data remains on your device and could facilitate entirely novel applications for AI that are unfeasible today under the cloud-centric model, such as AI applications deeply interwoven into a device’s operating system.
Apple is widely expected to unveil its long-awaited AI strategy at its WWDC conference next month, and it has previously boasted that its custom hardware and software allows machine learning to happen locally on its devices. Rather than go toe-to-toe with OpenAI and Google in building ever more enormous cloud AI models, it might think different by focusing on shrinking AI down to fit into its customers’ pockets.