Written by Will Knight
A week after its algorithms advised people to eat rocks and put glue on pizza, Google admitted Thursday that it needed to make adjustments to its bold new generative AI search feature. The episode highlights the risks of Google’s aggressive drive to commercialize generative AI—and also the treacherous and fundamental limitations of that technology.
Google’s AI Overviews feature draws on Gemini, a large language model like the one behind OpenAI’s ChatGPT, to generate written answers to some search queries by summarizing information found online. The current AI boom is built around LLMs’ impressive fluency with text, but the software can also use that facility to put a convincing gloss on untruths or errors. Using the technology to summarize online information promises can make search results easier to digest, but it is hazardous when online sources are contractionary or when people may use the information to make important decisions.
“You can get a quick snappy prototype now fairly quickly with an LLM, but to actually make it so that it doesn’t tell you to eat rocks takes a lot of work,” says Richard Socher, who made key contributions to AI for language as a researcher and, in late 2021, launched an AI-centric search engine called You.com.
Socher discusses the challenges of managing large language models (LLMs) due to their lack of real-world understanding and the abundance of unreliable information on the internet. He emphasizes the value of providing users with multiple perspectives rather than a single answer in some instances.
According to Liz Reid, Google’s head of search, written in a recent blog post, extensive testing was conducted prior to the launch of AI Overviews. However, incidents such as the rock eating and glue pizza mishaps, derived from a satirical article and a humorous Reddit comment respectively, have led to further modifications. These adjustments aim at improving the detection of nonsensical queries and decreasing reliance on user-generated content, Google indicates.
Socher claims that You.com typically avoids the errors that Google’s AI Overviews encounter because his team employs about a dozen strategies to prevent misbehavior of LLMs during searches.
“We achieve greater accuracy because we invest heavily in it,” mentions Socher. He outlines that You.com not only operates with a specially crafted web index that aids LLMs in avoiding false information but also selects from an array of LLMs for specific inquiries, and incorporates a citation feature that addresses contradictory sources. Nonetheless, mastering AI-driven search remains a complex task. WIRED reports that You.com erroneously responded to a query on Friday, which erroneously stated that no African nations start with ‘K’, despite successfully handling this question in past evaluations.
Google’s recent introduction of generative AI into its flagship services is a strategic response to the tech sector’s paradigm shift sparked by OpenAI’s unveiling of ChatGPT in November 2022. Shortly after ChatGPT’s introduction, Microsoft, leveraging its association with OpenAI, enhanced its search engine Bing with similar technologies. Despite encountering issues with AI inaccuracies and peculiar responses, Microsoft’s CEO, Satya Nadella, expressed that this initiative was meant to pose a competitive challenge to Google, remarking “I want people to know we made them dance.”
It is highly challenging for the AI to consistently achieve accuracy due to its inherent complexities.
Critics argue that Google may have prematurely deployed its AI enhancements, especially for sensitive topics like medical and financial information. “I’m surprised they launched it as it is for as many queries—I thought they’d be more cautious,” noted Barry Schwartz, a news editor at Search Engine Lang, a platform monitoring the search industry. He mentioned that it’s crucial for Google to foresee and counteract attempts to mislead AI Overviews. Schwartz emphasized that displaying these results by default on their most important platform requires careful consideration.Barry Schwartz
Lily Ray, an SEO consultant who was deeply involved in beta testing of Google’s preliminary model known as Search Generative Experience, wasn’t taken aback by the recent errors. “I think it’s virtually impossible for it to always get everything right,” she stated, emphasizing the inevitable pitfalls of AI technology.More on Google’s Search Generative Experience Lily Ray.
By Kim Zetter
By Nena Farrell
By Will Knight
By Julian Chokkattu
Even if blatant errors like suggesting people eat rocks become less common, AI search can fail in other ways. Ray has documented more subtle problems with AI Overviews, including summaries that sometimes draw on poor sources such as sites that are from another region or even defunct websites—something she says could provide less useful information to users who are hunting for product recommendations, for instance. Those who work on optimizing content for Google’s Search algorithm are still trying to understand what’s going on. “Within our industry right now, the level of confusion is on the charts,” she says.
Even if industry experts and consumers get more familiar with how the new Google search behaves, don’t expect it to stop making mistakes. Daniel Griffin, a search consultant and researcher who is developing tools to make it easy to compare different AI-powered search services, says that Google faced similar problems when it launched Featured Snippets, which answered queries with text quoted from websites, in 2014.
Griffin says he expects Google to iron out some of the most glaring problems with AI Overviews, but that it’s important to remember no one has solved the problem of LLMs failing to grasp what is true, or their tendency to fabricate information. “It’s not just a problem with AI,” he says. “It’s the web, it’s the world. There’s not really a truth, necessarily.”