Internet infrastructure company Cloudflare is releasing a new series of tools aimed at changing the relationship between AI enterprises and the websites they extract data from. As of today, Cloudflare is offering all its users, which includes approximately 33 million utilizing its complimentary services, the capability to observe and selectively restrict AI-driven data-gathering bots.
This preventative initiative is achieved through a set of no-cost AI auditing tools named Bot Management, the earliest of which permits the real-time surveillance of bots. Users will be able to use a dashboard that identifies which AI bots are accessing their sites and collecting information, even those that attempt to disguise their activities.
“We’ve identified all the AI bots, including those attempting to mask their identities,” stated Cloudflare cofounder and CEO Matthew Prince during a conversation with WIRED in Lisbon, Portugal, where he has been stationed for recent months at the company’s European office.
Additionally, Cloudflare has launched a more comprehensive bot-blocking service. This new feature allows users to either block all known AI bots or selectively filter which to block and which to allow. Starting earlier in the year, Cloudflare introduced a feature enabling its users to block all known AI bots collectively. The latest service provides more flexibility, allowing users to fine-tune their preferences regarding which bots can access their site. It provides a more targeted approach, proving more valuable as publishers and platforms negotiate with AI entities to let bots navigate their sites without restrictions.
“We want to make it easy for anyone, regardless of their budget or their level of technical sophistication, to have control over how AI bots use their content,” Prince says. Cloudflare labels bots according to their functions, so AI agents used to scrape training data are distinguished from AI agents pulling data for newer search products, like OpenAI’s SearchGPT.
Websites typically try to control how AI bots crawl their data by updating a text file called Robots Exclusion Protocol, or robots.txt. This file has governed how bots scrape the web for decades. It’s not illegal to ignore robots.txt, but before the age of AI it was generally considered part of the web’s social code to honor the instructions in the file. Since the influx of AI-scraping agents, many websites have attempted to curtail unwanted crawling by editing their robots.txt files. Services like the AI agent watchdog Dark Visitors offer tools to help website owners stay on top of the ever-increasing number of crawlers they might want to block, but they’ve been limited by a major loophole: unscrupulous companies tend to simply ignore or evade robots.txt commands.
According to Dark Visitors founder Gavin King, most of the major AI agents still abide by robots.txt. “That’s been pretty consistent,” he says. But not all website owners have the time or knowledge to constantly update their robots.txt files. And even when they do, some bots will skirt the file’s directives: “They try to disguise the traffic.”
Prince says Cloudflare’s bot-blocking won’t be a command that this kind of bad actor can ignore. “Robots.txt is like putting up a ‘no trespassing’ sign,” he says. “This is like having a physical wall patrolled by armed guards.” Just as it flags other types of suspicious web behavior, like price-scraping bots used for illegal price monitoring, the company has created processes to spot even the most carefully concealed AI crawlers.
Cloudflare is set to introduce a marketplace where customers can establish scraping agreements with AI firms, involving potential payments or swaps of credits for AI services in return for scraping permissions. “We’re indifferent to the form of transaction, but affirm the necessity for some mechanism to recompensate original content creators,” mentioned Prince. “Compensation could be in various forms, such as dollars, credits, or recognition.”
While the exact launch date of the market is ambiguous, its introduction could find itself in a space brimming with similar initiatives aimed at facilitating licensing and permissions agreements between AI firms, publishers, and other digital platforms. crowded field of projects designed to manage licensing agreements.
The feedback from AI companies on this approach varies, with reactions spanning from acceptance to outright rejection, notes Prince, though specific names were not disclosed.
Prince acknowledges the rapid development of this project, inspired by a dialogue with Atlantic CEO Nick Thompson who highlighted the challenges many publishers face with covert web scrapers. “I am enthusiastic about this initiative,” commented Thompson, recognizing the struggles even prominent media entities face, which suggests even greater challenges for independent bloggers and website proprietors.
Cloudflare has long been a cornerstone in web security and plays a crucial role in supporting the infrastructure of the internet. The company typically strives to maintain neutrality concerning the content on the websites it services. During infrequent instances where it has deviated from this stance, Prince, the company’s head, has clarified his reluctance for Cloudflare to determine online permissible content.
Prince views Cloudflare as being in a unique position to make a difference. He expresses concern over the current trajectory and believes, “The path we’re on isn’t sustainable,” advocating for a future where creators can be compensated for their contributions.