Think of any topic vaguely related to raising kids imaginable, and there’s probably a post about it on Mumsnet, the long-running, enormously popular, controversy-spurring UK-based parenting forum for mothers. Over its more than two decade-long history, Mumsnet has amassed an archive of more than six billion words written by its highly engaged user base, on topics such as dirty diapers and lazy husbands. (Not to mention a bonkers rant about dolphins.)
This spring, after Mumsnet discovered that AI companies were scraping its data, the company says it decided to try to strike licensing deals with some of the major players in the space, including OpenAI, which initially expressed willingness to explore an arrangement after Mumsnet first reached out. After talks with OpenAI fell apart, Mumsnet in July announced its intention to pursue legal action.
According to Mumsnet, during those early conversations, an OpenAI strategic partnership lead told the company that datasets over 1 billion words were of interest to the AI giant. Mumsnet’s leadership was excited. “We spent quite some time in a back-and-forth with them,” Mumsnet founder and CEO Justine Roberts tells WIRED. “We had to sign some NDAs, and they wanted a lot of information from us.”
However, over a month later, OpenAI told Mumsnet that the company was no longer interested in partnering at that time, according to an email exchange reviewed by WIRED. When asked why, the OpenAI staffer characterized Mumsnet’s 6 billion word dataset as too small to warrant a licensing arrangement, Roberts says. They also noted that OpenAI is primarily interested in large datasets that the public cannot already access online, and that it wanted datasets that captured broad human experience.
This sentiment was mirrored by the organization in a response given to WIRED. “We engage in partnerships for extensive datasets that mirror human society and do not focus solely on data that’s public,” stated OpenAI spokesperson Kayla Wood. “We allow publishers and creators to set preferences on how their content and sites interact with AI in search results and in the training of foundational generative AI models.”
Roberts expressed her dissatisfaction with this progression. She noted that OpenAI initially showed particular interest in Mumsnet due to its predominantly female-authored content. “It offers rich conversational data,” she remarked. “Around 90 percent of the discussions are by females, which is quite rare.”
Over the past year, OpenAI has established various data-licensing agreements with several media outlets including Vox Media, the Atlantic, Axel Springer, Time, and WIRED’s parent company Condé Nast, alongside platforms like Reddit. (Automattic, owner of WordPress.com and Tumblr, was reportedly discussing licensing earlier in the year.) However, the specifics of these contracts, including the size of the datasets involved, remain undisclosed.
When WIRED inquired about the scope of datasets OpenAI considers for commercial use, the company declined to provide specifics. However, spokesperson Kayla Wood emphasized that their collaborations with publishers aim to showcase their content in products and drive traffic back to them.
Alex Bestall, the CEO of Rightsify, a music copyright management company, believes that larger labs require significant data volumes which might explain why OpenAI opts to pursue major deals over smaller ones.
Currently, OpenAI faces potential copyright infringement litigation in the UK. Mumsnet has accused OpenAI of breaching its terms of use and violating database rights by extracting substantial parts of a database without authorization.
Mumsnet issued a preliminary letter in July signaling potential legal action. They later received a response from OpenAI, with OpenAI seeking clarifications but not denying data scraping. Mumsnet is still deciding whether to take legal action in the High Court or a specialized intellectual property court in the UK.
While these legal procedures are ongoing, Mumsnet is also exploring licensing agreements with other AI firms, including discussions with Google and various new startups that help with data licensing agreements.
“I’m quite worried about the ecosystem, where these big LLMs are allowed to march all over small publishers to build their models, and then people have less reasons to go and visit the websites,” Roberts says. “We need to come to some sort of satisfactory arrangement where people are compensated for their work.”
As Mumsnet’s content is largely user-generated, WIRED asked whether it was considering any sort of payment system for users when it does strike deals. Roberts says there is no plan at the moment, but that she would consider it if data licensing for AI became incredibly lucrative down the road.
She says that, based on comments she received after the announcement Mumsnet was looking into legal action, users by and large understand the company’s aims in licensing their data. “We’re quite concerned about AI being gender-biased,” she says. “There’s something to be said for it being trained on verified female voices.”
Roberts is optimistic about how Mumsnet’s potential legal action will unfold. “We think we have a good chance,” she says. In the US, there have already been dozens of copyright-infringement cases brought against AI companies. In many of the ongoing cases, AI companies are defending themselves by arguing that their actions are shielded by the “fair use” doctrine, which allows for copyright infringement in certain circumstances. The UK has a similar concept, which it calls “fair dealing,” but it’s significantly more limited in scope.
Regardless of the outcome, Roberts is glad her platform is taking a stance. “This is probably more about the principle of the thing than anything else,” she says.