In late 2023, researchers uncovered a significant flaw in OpenAI’s GPT-3.5 model. When prompted to repeat specific words a thousand times, the model not only complied but then devolved into generating nonsensical text and revealing snippets of sensitive information from its training data, such as names, phone numbers, and email addresses. This flaw was addressed in collaboration with OpenAI prior to its public disclosure, highlighting the numerous issues present in major AI systems today.
In a proposal announced recently, over 30 esteemed AI researchers, some involved in the discovery of the GPT-3.5 flaw, advocated for improved methods to identify and report vulnerabilities in AI models. They propose a structured system that permits external researchers to test these models and publicly share any identified flaws.
Shayne Longpre, a PhD candidate at MIT and lead author of the proposal, indicated that the current environment resembles "the Wild West." Individuals who attempt to expose AI vulnerabilities often risk facing repercussions, including legal action for violating terms of service, leading to unreported flaws that could impact users broadly.
Given the wide application of AI technologies, the researchers emphasize the importance of thorough safety examinations. Without rigorous testing, powerful models could perpetuate harmful biases or produce dangerous outputs, potentially aiding malicious actors or leading to unwanted consequences.
The proposal suggests three key strategies to enhance the process for reporting AI flaws: establishing standardized reporting formats, providing necessary infrastructure for third-party testers, and creating systems for sharing identified flaws across platforms. This approach draws inspiration from established practices in cybersecurity, where there are legal protections for external researchers who disclose bugs.
Currently, large AI companies perform extensive testing on their models but may not have sufficient resources to cover all potential issues. The researchers’ initiative, which includes contributions from prestigious institutions such as MIT and Stanford, aims to facilitate a formal process for companies to be held accountable for the integrity of their AI systems. Ruth Appel, a postdoctoral fellow at Stanford, emphasized that without a formal reporting system, users risk experiencing poorer or even hazardous products due to undiscovered flaws.
The call for a structured reporting process comes at a critical time, especially with the uncertain future of the US government’s AI Safety Institutes. These institutes were established under the Biden administration to assess the most powerful AI models for significant errors, but they are now experiencing budget cuts.
Interest in better flaw reporting has prompted discussions with researchers from major AI firms, including OpenAI and Google, as these researchers seek to instigate change in the practices around AI model evaluation and disclosure. The effort for improved transparency in AI safety is backed by a commitment to enhancing protections for users and ensuring the reliability of AI technologies.