The concept of a robot capable of handling various household tasks—ranging from unloading the dryer to folding laundry and tidying up a cluttered table—has often been relegated to the realm of science fiction. A notable representation of this fantasy is Rosey in The Jetsons.
Physical Intelligence, a startup based in San Francisco, is turning this dream into a tangible reality by showcasing a single artificial intelligence model that has been trained on an extraordinary amount of data to perform a variety of essential household chores—effectively mastering all the tasks mentioned.
The achievement opens up exciting possibilities for integrating something as extraordinary and versatile as many AI applications, such as ChatGPT, into the tangible world around us.
The emergence of large language models (LLMs)—versatile learning algorithms that digest extensive data from literature and online sources—has significantly enhanced the functionality of chatbots. Meanwhile, Physical Intelligence aims to replicate this success in the physical realm by training a similar algorithm using vast amounts of robotic data.
“We have a versatile method that leverages information from various forms and types of robots, akin to how language models are trained,” states the company’s CEO, Karol Hausman.
In the past eight months, the company has focused on building its foundational model, known as π0 or pi-zero. This model was developed using extensive data collected from different robots performing a range of household tasks, often with human operators remotely guiding the robots to facilitate the learning process.
Physical Intelligence, commonly referred to as PI or π, was established earlier this year by a group of distinguished robotics researchers aiming to explore a novel robotics method influenced by advances in AI’s language capabilities.
“The volume of data we are using for training is greater than any robotics model ever created, by a substantial margin, as far as we know,” remarks Sergey Levine, a cofounder of Physical Intelligence and an associate professor at UC Berkeley. “It’s not on the level of ChatGPT, but perhaps it is somewhere near GPT-1,” he continues, alluding to the first large language model created by OpenAI in 2018.
Recently released videos by Physical Intelligence showcase a variety of robotic models performing various household tasks with remarkable proficiency. One wheeled robot skillfully retrieves clothes from a dryer. A robotic arm efficiently clears a table loaded with cups and plates. Two robotic arms collaborate to fold laundry. Another remarkable ability demonstrated by the company’s algorithm is constructing a cardboard box, where a robot carefully bends its sides and precisely assembles the components.
Courtesy of Physical Intelligence
According to Hausman, folding clothes poses a significant obstacle for robots, as it requires a deep understanding of the physical world. This task involves handling a diverse array of flexible items that can deform and crumple in unpredictable ways.
The algorithm incorporates some unexpectedly human-like behaviors; for instance, it shakes T-shirts and shorts to ensure they lie flat.
Hausman points out that the algorithm isn’t flawless. Similar to modern chatbots, these robots occasionally make surprising and humorous mistakes. For example, when tasked with loading eggs into a carton, one robot opted to overfill the box, causing it to shut. In another instance, a robot abruptly hurled a box off a table instead of filling it.
The development of robots with broader capabilities is not merely a concept found in science fiction; it also represents a significant commercial opportunity.
Courtesy of Physical Intelligence
Even with remarkable advancements in AI over recent years, robots still seem remarkably limited in their capabilities. Typically, the robots used in factories and warehouses execute strictly designed routines, lacking the ability to understand their surroundings or adapt dynamically. The rare industrial robots that possess vision and the ability to manipulate objects can only manage a few tasks with very little finesse, primarily due to their insufficient general physical intelligence.
If robots were more versatile, they could handle a broader array of industrial jobs after just a few simple demonstrations. Furthermore, to function effectively in the greatly variable and chaotic environments of human households, these robots would require enhanced general capabilities.
The general enthusiasm surrounding advancements in AI has sparked optimism regarding significant breakthroughs in the realm of robotics. Tesla, the electric car company founded by Elon Musk, is developing a humanoid robot named Optimus. Musk has recently proposed that this robot could be made available at a price range of $20,000 to $25,000 and might be capable of performing most tasks by the year 2040.
Courtesy of Physical Intelligence
In the past, attempts to teach robots to perform complex tasks have primarily centered around training one robot for one specific task, as it appeared that learning could not be easily applied to other tasks. However, recent academic research has indicated that with adequate scaling and proper fine-tuning, skills can indeed be transferred across various tasks and different robotic systems. A notable 2023 initiative from Google named Open X-Embodiment showcased this approach by facilitating the exchange of robot learning among 22 robots across 21 diverse research laboratories.
A significant challenge faced by Physical Intelligence is the lack of extensive robot data available for training compared to the vast amounts of text data used in large language models. This limitation compels the company to create its own data and devise innovative techniques to enhance learning from a smaller dataset. To advance its model π0, the company integrated vision language models—trained on both images and text—with diffusion modeling, a method adapted from AI-driven image generation, to foster a broader type of learning.
For robots to effectively accept any chore requested by a human, this learning process must be greatly expanded. “There’s still a long way to go, but we have something that you can think of as scaffolding that illustrates things to come,” Levine says.