Understanding AI Code Hallucinations: The Rising Threat of ‘Package Confusion’ Attacks

AI-generated computer code is increasingly problematic due to the prevalence of references to non-existent third-party libraries, according to new research. This vulnerability can be exploited to create supply-chain attacks that inject malicious code into legitimate software packages, posing significant risks such as data theft or unauthorized access.

A study involving 16 widely used large language models (LLMs) generated 576,000 code samples, revealing that approximately 440,000 of their package dependencies were “hallucinated”—meaning they did not actually exist. Open-source models were especially problematic, with around 21% of their dependencies linking to these non-existent libraries.

These hallucinations heighten the danger of dependency confusion attacks, a tactic that tricks software into adopting a malicious package masquerading as a legitimate one. When attackers publish fake packages with similar names but different version numbers, software may mistakenly select the newer, malicious option, unknowingly exposing users to risks.

The phenomenon of package hallucination is linked to a broader issue in AI, where LLMs occasionally produce outputs that are factually incorrect or unrelated to their training. This has led researchers to coin the term “package hallucination” to address the specific issue of non-existent package references.

In their evaluation, the study conducted tests across Python and JavaScript, generating code samples at scale. It discovered that 19.7% of package references from these samples led to non-existent packages. Notably, many of the hallucinated packages appeared consistently across multiple queries, making them particularly susceptible to exploitation by malicious actors.

The frequency of package hallucinations varied between different models and languages, with open-source models generating nearly 22% compared to just over 5% from commercial versions. Furthermore, the discrepancy between the two language ecosystems was notable; JavaScript, which hosts a larger variety of packages, produced a higher rate of hallucinations than Python.

These findings reaffirm the untrustworthiness of outputs from LLMs, highlighting a crucial need for caution among developers. As AI-generated code continues to become a significant part of software development, it is imperative that developers remain vigilant and critically assess the quality of code to avoid potential security breaches.

Editor

As the Editor of IT Magazine, I curate cutting-edge content on technology trends, collaborating with experts to deliver insightful articles and reviews. With a focus on innovation and precision, I ensure each issue maintains the magazine's reputation as a trusted source in the IT community.