Cisco’s latest research reveals that conventional AI safety benchmarks may overlook significant threats, particularly from multi-turn attacks that exploit gaps in frontier AI models. Traditionally, enterprises have evaluated AI models using single-turn adversarial prompts, but Cisco’s AI Threat Intelligence and Security Research team has found that this method underrepresents potential vulnerabilities.
In testing 15 proprietary models from notable AI developers, including OpenAI and Google, the research demonstrated stark differences in the efficacy of safety measures. While single-turn attacks exhibited success rates between 2.19% and 64.91%, multi-turn attacks showcased a success rate ranging from 7.89% to a striking 88.30%. For instance, Anthropic’s Claude family, which ranked lowest in single-turn evaluations, nevertheless performed up to 16.20% under multi-turn conditions.
Multi-turn attacks involve a series of benign prompts that gradually unveil harmful intent through conversation. Strategies for these attacks can include escalating demands incrementally or adopting personas to manipulate the AI’s responses. Cisco identified five key attack strategies: crescendo escalation, refusal reframing, role-playing, contextual ambiguity, and information decomposition.
The fundamental design of generative AI models contributes to their susceptibility to multi-turn attacks. These models operate on probabilistic principles, predicting the next most likely output based on input tokens. The closed nature of many proprietary models exacerbates this issue, as companies cannot fully audit the training data and resulting vulnerabilities.
Cisco’s study calls for reevaluation of how enterprises select AI models. Key recommendations encourage security teams to utilize their new model evaluation tools, be skeptical of vendors’ safety claims, and impose additional defense layers beyond base model capabilities. As Amy Chang, Cisco’s head of AI threat research, stated, the current models lack adequate safeguard mechanisms for iterative attacks, emphasizing the need for a robust security framework to protect against sophisticated AI threats.
For further insights into Cisco’s findings, visit the Cisco report or their LLM Security Leaderboard.