As generative AI systems like OpenAI’s ChatGPT and Google’s Gemini become more advanced, they are increasingly being put to work. Startups and tech companies are building AI agents and ecosystems on top of the systems that can complete boring chores for you: think automatically making calendar bookings and potentially buying products. But as the tools are given more freedom, it also increases the potential ways they can be attacked.
Now, in a demonstration of the risks of connected, autonomous AI ecosystems, a group of researchers have created one of what they claim are the first generative AI worms—which can spread from one system to another, potentially stealing data or deploying malware in the process. “It basically means that now you have the ability to conduct or to perform a new kind of cyberattack that hasn’t been seen before,” says Ben Nassi, a Cornell Tech researcher behind the research.
Nassi, along with fellow researchers Stav Cohen and Ron Bitton, created the worm, dubbed Morris II, as a nod to the original Morris computer worm that caused chaos across the internet in 1988. In a research paper and website shared exclusively with WIRED, the researchers show how the AI worm can attack a generative AI email assistant to steal data from emails and send spam messages—breaking some security protections in ChatGPT and Gemini in the process.
The research, conducted in test environments and not against a publicly available email assistant, indicates that Large Language Models (LLMs) are becoming increasingly multimodal, able to generate not just text but images and video as well. Although generative AI worms haven’t been spotted in the wild yet, multiple researchers argue that they present a security risk that startups, developers, and tech companies should remain aware of.
Typically, generative AI systems function through prompts – text instructions that guide the tools to answer a question or create an image. These prompts, however, can also be manipulated to work against the system. Jailbreaks can cause the system to disengage its safety rules and produce toxic or hateful content, while prompt injection attacks can feed a chatbot covert instructions. For instance, an aggressor may conceal text on a webpage instructing an LLM to function as a scammer and gather your bank information.
The researchers constructed the generative AI worm using a so-called “adversarial self-replicating prompt.” This is a prompt that prompts the generative AI model to produce another prompt in its response. Essentially, the AI system is compelled to generate further instruction sets in its replies. According to the researchers, this is broadly akin to traditional SQL injection and buffer overflow attacks.
To demonstrate how the worm functions, the researchers devised an email system capable of transmitting and receiving messages utilizing generative AI, using ChatGPT, Gemini, and an open-source LLM, LLaVA. They then discovered two methods to exploit the system – either by using a text-based self-replicating prompt or by embedding a self-replicating prompt within an image.
Steven Levy
Aarian Marshall
Byron Tau
Aarian Marshall
In one example, researchers portraying themselves as attackers composed an email containing a dangerous text prompt. This prompts, in essence, “taints” the database of an email assistant utilizing retrieval-augmented generation (RAG), a method for LLMs to acquire supplementary data from external sources. Upon retrieval by the RAG in response to a user enquiry, the email is then sent to GPT-4 or Gemini Pro to formulate the answer, causing a “jailbreak” of the GenAI service and ultimately extracting data from the emails, according to Nassi. Nassi further explains, “The resulting response containing confidential user data subsequently corrupts new hosts when utilized as a reply to an email sent to a new customer and subsequent storage within the new client’s database.”
In the second method as described by the researchers, an image containing an embedded harmful prompt leads to the email assistant forwarding the message. In Nassi’s words, “When the self-replicating prompt is encoded within the image, any kind of image, whether spam, abusive material, or even propaganda, can be passed on to new clients following the original email dispatch.”
In a video showcasing the research, the email system can be observed forwarding a message numerous times. Further, the researchers claimed their ability to extract data from the emails. They stated, “This data can range from names, phone numbers, credit card details, social security numbers, and any other classified data,” according to Nassi.
While the research does bypass safety measures of ChatGPT and Gemini, the researchers insist that this serves as an alarm regarding “substandard architectural design” within the greater AI landscape. Regardless, they have communicated their findings to both Google and OpenAI. A representative from OpenAI mentions that the team appears to have discovered how to exploit prompt-injection vulnerabilities by relying on unchecked or unfiltered user input, and adds the company is striving to make its systems “more robust”. They recommend that developers “employ methods that ensure they are not dealing with harmful input”. Google, however, refrained from commenting on the research findings. Messages shared by Nassi with WIRED indicate that the company’s researchers have requested a discussion on the topic.
Steven Levy
Aarian Marshall
Byron Tau
Aarian Marshall
While the demonstration of the worm takes place in a largely controlled environment, multiple security experts who reviewed the research say that the future risk of generative AI worms is one that developers should take seriously. This particularly applies when AI applications are given permission to take actions on someone’s behalf—such as sending emails or booking appointments—and when they may be linked up to other AI agents to complete these tasks. In other recent research, security researchers from Singapore and China have shown how they could jailbreak 1 million LLM agents in under five minutes.
Sahar Abdelnabi, a researcher at the CISPA Helmholtz Center for Information Security in Germany, who worked on some of the first demonstrations of prompt injections against LLMs in May 2023 and highlighted that worms may be possible, says that when AI models take in data from external sources or the AI agents can work autonomously, there is the chance of worms spreading. “I think the idea of spreading injections is very plausible,” Abdelnabi says. “It all depends on what kind of applications these models are used in.” Abdelnabi says that while this kind of attack is simulated at the moment, it may not be theoretical for long.
In a paper covering their findings, Nassi and the other researchers say they anticipate seeing generative AI worms in the wild in the next two to three years. “GenAI ecosystems are under massive development by many companies in the industry that integrate GenAI capabilities into their cars, smartphones, and operating systems,” the research paper says.
Despite this, there are ways people creating generative AI systems can defend against potential worms, including using traditional security approaches. “With a lot of these issues, this is something that proper secure application design and monitoring could address parts of,” says Adam Swanda, a threat researcher at AI enterprise security firm Robust Intelligence. “You typically don’t want to be trusting LLM output anywhere in your application.”
Swanda also says that keeping humans in the loop—ensuring AI agents aren’t allowed to take actions without approval—is a crucial mitigation that can be put in place. “You don’t want an LLM that is reading your email to be able to turn around and send an email. There should be a boundary there.” For Google and OpenAI, Swanda says that if a prompt is being repeated within its systems thousands of times, that will create a lot of “noise” and may be easy to detect.
Nassi and the research reiterate many of the same approaches to mitigations. Ultimately, Nassi says, people creating AI assistants
need to be aware of the risks. “This is something that you need to understand and see whether the development of the ecosystem, of the applications, that you have in your company basically follows one of these approaches,” he says. “Because if they do, this needs to be taken into account.”