Over the past year, Jay Prakash Thakur, a software engineer, has been experimenting with AI agents capable of performing tasks like ordering meals and developing mobile applications independently. These agentic AI systems, poised to revolutionize how companies automate customer interactions, also raise critical legal issues regarding accountability when errors occur.
AI agents can execute tasks autonomously, which allows businesses to streamline operations such as handling customer inquiries or processing invoices. Tech giants like Microsoft envision these agents not merely tackling simple queries but managing complex functions with minimal human oversight. The tech industry anticipates a future where multiple agents will collaborate to replace entire workforces, driven by the significant cost savings for companies. For instance, market research firm Gartner predicts that by 2029, agentic AI will autonomously resolve 80% of routine customer service issues.
Thakur, who works for Microsoft during the day, has been developing prototypes using Microsoft’s AutoGen software since his tenure at Amazon in 2024. His recent projects have underscored the pressing question: who is liable when an AI agent causes financial harm? Errors can occur when agents from different companies miscommunicate, making it difficult to ascertain where the fault lies. Thakur likens this situation to piecing together an incomplete conversation based on various notes—often, pinpointing responsibility is nearly impossible.
At a recent legal conference, attorney Benjamin Softness emphasized that litigation typically targets companies with deep pockets, suggesting that firms will need to accept responsibility for their agents’ mistakes, regardless of whether external factors caused the errors. This trend has prompted the insurance sector to introduce policies catering to AI-related mishaps.
In his experiments, Thakur has created systems combining multiple agents to minimize human involvement. For example, one prototype designed to replace software developers included agents that found necessary coding tools and summarized their usage policies. During testing, however, one agent mistakenly omitted crucial details about service limitations, which could have led to a significant breakdown if it had occurred in a real scenario.
In another project, Thakur simulated a bespoke ordering system for a restaurant, where customers could place complex orders via a chatbot. Although most transactions ran smoothly, errors arose particularly with more detailed requests. If actual orders had contained such mistakes, the repercussions could have been severe, such as incorrectly serving someone with a food allergy.
Thakur’s shopping comparison agent also encountered problems, illustrating how the path to automation is fraught with challenges. Familiar AI systems, including chatbots, already experience notable errors, from unintended legal commitments to sensitive information mishaps. While simpler AI applications allow for easier error resolution by adding human confirmation, complex systems risk undermining their intended purpose of reducing human involvement.
To manage errors, developers hope to create a "judge" agent capable of overseeing operations to minimize mistakes before they escalate. However, overengineering with excessive agents could replicate the inefficiencies of human bureaucracies, according to some experts.
As the development of intricate AI systems accelerates, determining legal liability will become increasingly important. At the conference, legal experts acknowledged that users who direct agents would likely share responsibility for the agents’ actions. This expectation underscores the necessity for contracts that delineate accountability between users and tech providers. However, negotiating such agreements with large companies may be challenging for everyday consumers.
Amid growing concerns, legal professional Dazza Greenwood cautions against releasing systems with high error rates, advocating for thorough testing before deployment. For now, Thakur notes that users cannot fully rely on agents to manage tasks unattended; more refinement is needed to ensure their functionality aligns with consumer safety and satisfaction.