AI Pioneer Advocates for Universal Participation in Shaping Our Digital Future

If you buy something using links in our stories, we may earn a commission. This helps support our journalism. Learn more. Please also consider subscribing to WIRED

According to market-fixated tech pundits and professional skeptics, the artificial intelligence bubble has popped, and winter’s back. Fei-Fei Li isn’t buying that. In fact, Li—who earned the sobriquet the “godmother of AI”—is betting on the contrary. She’s on a part-time leave from Stanford University to cofound a company called World Labs. While current generative AI is language-based, she sees a frontier where systems construct complete worlds with the physics, logic, and rich detail of our physical reality. It’s an ambitious goal, and despite the dreary nabobs who say progress in AI has hit a grim plateau, World Labs is on the funding fast track. The startup is perhaps a year away from having a product—and it’s not clear at all how well it will work when and if it does arrive—but investors have pitched in $230 million and are reportedly valuing the nascent startup at a billion dollars.

Roughly a decade ago, Li helped AI turn a corner by creating ImageNet, a bespoke database of digital images that allowed neural nets to get significantly smarter. She feels that today’s deep-learning models need a similar boost if AI is to create actual worlds, whether they’re realistic simulations or totally imagined universes. Future George R.R. Martins might compose their dreamed-up worlds as prompts instead of prose, which you might then render and wander around in. “The physical world for computers is seen through cameras, and the computer brain behind the cameras,” Li says. “Turning that vision into reasoning, generation, and eventual interaction involves understanding the physical structure, the physical dynamics of the physical world. And that technology is called spatial intelligence.” World Labs calls itself a spatial intelligence company, and its fate will help determine whether that term becomes a revolution or a punch line.

Li has been obsessing over spatial intelligence for years. While everyone was going gaga over ChatGPT, she and a former student, Justin Johnson, were excitedly gabbling in phone calls about AI’s next iteration. “The next decade will be about generating new content that takes computer vision, deep learning, and AI out of the internet world, and gets them embedded in space and time,” says Johnson, who is now an assistant professor at the University of Michigan.

Li decided to start a company early in 2023, after a dinner with Martin Casado, a pioneer in virtual networking who is now a partner at Andreessen Horowitz. That’s the VC firm notorious for its near-messianic embrace of AI. Casado sees AI as being on a similar path as computer games, which started with text, moved to 2D graphics, and now have dazzling 3D imagery. Spatial intelligence will drive the change. Eventually, he says, “You could take your favorite book, throw it into a model, and then you literally step into it and watch it play out in real time, in an immersive way,” he says. The first step to making that happen, Casado and Li agreed, is moving from large language models to large world models.

Li began assembling a team, with Johnson as a cofounder. Casado suggested two more people—one was Christoph Lassner, who had worked at Amazon, Meta’s Reality Labs, and Epic Games. He is the inventor of Pulsar, a rendering scheme that led to a celebrated technique called 3D Gaussian Splatting. That sounds like an indie band at an MIT toga party, but it’s actually a way to synthesize scenes, as opposed to one-off objects. Casado’s other suggestion was Ben Mildenhall, who had created a powerful technique called NeRF—neural radiance fields—that transmogrifies 2D pixel images into 3D graphics. “We took real-world objects into VR and made them look perfectly real,” he says. He left his post as a senior research scientist at Google to join Li’s team.

One obvious goal of a large world model would be imbuing, well, world-sense into robots. That indeed is in World Labs’ plan, but not for a while. The first phase is building a model with a deep understanding of three dimensionality, physicality, and notions of space and time. Next will come a phase where the models support augmented reality. After that the company can take on robotics. If this vision is fulfilled, large world models will improve autonomous cars, automated factories, and maybe even humanoid robots.

That’s a long way away, and no slam dunk. World Labs promises a product in 2025. When I pressed the founders on exactly what the product would be and who the projected customers were—stuff like how World Labs will make money— they emphasized that they’re just ramping up. “There are a lot of boundaries to push, a lot of unknowns,” says Li. “Of course, we’re the best team in the world to figure out these unknowns.”

Casado is a little more specific. As with ChatGPT or Anthropic’s Claude, he notes, a model can be the product—a platform that others either use directly or that hosts other apps. Customers might include game companies or movie studios. I remember writing about how Pixar used to spend endless resources on things like monster fur or the movement of water. Imagine doing that with a one-sentence prompt.

World Labs is not the only company tackling what some are calling physical AI. “Building foundation models for general humanoid robots is one of the most exciting problems to solve in AI today,” Nvidia CEO Jensen Huang said earlier this year. I wrote recently about a company called Archetype that was also pursuing that line. But Casado insists that the ambition, talent, and vision of World Labs is unique. “I’ve been investing for almost 10 years, and this is the single best team I’ve ever, ever run across,” he says. It’s common for a VC to boost his bets, but he’s putting more than money into this one: For the first time since he became a VC, he’s a part-time team member, spending a day a week at the company.

Other VC firms are also chipping in, including Radical Ventures, NEA, and (surprise) Nvidia’s venture capital arm, as well as an all-star list of angels that features Marc Benioff, Reid Hoffman, Jeff Dean, Eric Schmidt, Ron Conway, and Geoff Hinton. (So you’ve got the godfather of AI backing the field’s godmother.) The late Susan Wojcicki also invested before her untimely passing last month.

Can all those smart people be wrong? Of course. You don’t have to squint too hard to see how the promises of World Lab overlap with a recent buzzword that debuzzed rather dramatically: the metaverse. The World Lab founders argue that the short-lived craze was premature, a blip based on some promising hardware that didn’t have the right interactive content. Large world models, they imply, could solve that problem. Presumably, none of those worlds would visualize AI as stuck on a plateau.

Last year, Fei-Fei Li published a mix of memoir and artificial intelligence insights titled The Worlds I See. Previously, I discussed this book and exchanged thoughts with her in an article titled “Fei-Fei Li Started an AI Revolution by Seeing Like an Algorithm.” Currently, she aspires to construct unseen worlds.

Li keeps her personal life reserved. Despite her discomfort in public speaking about herself, she skillfully narrated her journey as an immigrant who moved to the U.S. at age 16 without knowing English, and how she surmounted challenges to become prominent in AI development. She has served as the director of the Stanford AI Lab and was a chief scientist of AI and machine learning at Google Cloud. Her book combines her personal story with the evolution of AI, likened to a double helix structure, reflecting on how we see ourselves and how technology frames that perspective. “The hardest world to see is ourselves,” Li observes.

The most significant part of her story is about the development and deployment of ImageNet. Li describes her resolve to overcome skepticism from colleagues about the feasibility of annotating millions of images across a vast range of categories. This endeavor, which seemed impossible, was achieved with the aid of many, including contributors from Amazon’s Mechanical Turk. Her narrative connects this achievement with her personal life, particularly the influence of her supportive parents who encouraged her to chase her scientific dreams rather than settling for a profitable corporate job. This project stood as a testament to their sacrifices.

Tom inquires, “With the initial public stance on smartphone etiquette evolving to ubiquitous use today, how do you foresee the etiquette of AR headgear shaping up in public spaces?”

Hi, Tom, thanks for the question. Etiquette for AR won’t be as straightforward as it is with phones, which clearly capture our attention. The peak of augmented reality technology will be realized when it can be integrated within products like Meta’s popular Ray-Ban glasses, which currently lack AR capabilities, but are expected to include them in the future. Eventually, much of what we currently do on our phones will be accessible via heads-up displays.

At that stage, it might not be immediately obvious that people are engaging more with digital content like TikTok, texts, and Candy Crush behind their sunglasses than with the people around them. While public spaces may not seem like everyone is mentally elsewhere, they will be. I foresee haptics becoming crucial to notify individuals about their train departures, if they’re blocking an entryway, or if they have been pickpocketed. Dinner conversations could frequently include exchanges like: “Did you catch what I just said?” [Silence.] “ARE YOU LISTENING TO ME?” [Brief pause, touches side panel of glasses.] “Yes, of course, I’m listening.” This scenario could become common across many tables in eateries.

My etiquette forecast? People might start communicating by texting even when they are physically close, because the messages relayed directly to one’s eyeball and earpiece will feel more immersive. Therefore, the current complaints about mobile phone use will seem minor compared to the disruptions that lie ahead.

You can submit questions to mail@wired.com. Please include ASK LEVY in the subject line.

How can it get any hotter? Just wait.

Here’s everything announced at Apple’s September event.

While the iPhone 16 got attention, AirPods that act like hearing aids might have been Apple’s most significant move

Residents of a Texas oil town aren’t so neighborly when a bitcoin mine moves in.

According to Mark Cuban, Mark Cuban is not having a midlife crisis.

Don’t miss future subscriber-only editions of this column. Subscribe to WIRED (50% off for Plaintext readers) today.

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Article

Samsung's 49-Inch Gaming Monitor Hits Record Low Price of $800

Next Article

This Week in Security: Exploiting ChatGPT to Uncover Dangerous Information

Related Posts