FTC Launches Inquiry into Reddit’s Sale of User Data for AI Training

Paresh Dave

Reddit said ahead of its IPO next week that licensing user posts to Google and others for AI projects could bring in $203 million of revenue over the next few years. The community-driven platform was forced to disclose Friday that US regulators already have questions about that new line of business.

In a regulatory filing, Reddit said that it received a letter from the US Federal Trade Commission on Thursday asking about “our sale, licensing, or sharing of user-generated content with third parties to train AI models.” The FTC, the US government’s primary antitrust regulator, has the power to sanction companies found to engage in unfair or deceptive trade practices. The idea of licensing user-generated content for AI projects has drawn questions from lawmakers and rights groups about privacy risks, fairness, and copyright.

Reddit isn’t alone in trying to make a buck off licensing data, including that generated by users, for AI. Programming Q&A site Stack Overflow has signed a deal with Google, the Associated Press has signed one with OpenAI, and Tumblr owner Automattic has said it is working “with select AI companies” but will allow users to opt out of having their data passed along. None of the licensors immediately responded to requests for comment. Reddit also isn’t the only company receiving an FTC letter about data licensing, Axios reported on Friday, citing an unnamed former agency official.

It remains uncertain whether the letter addressed to Reddit is correlated to the review into other firms.

Reddit declared in its disclosure last Friday that it maintains its stance that it did not engage in deceptive or unfair practices, but highlighted that any government examination can be labour intensive and costly. “The communication indicated an interest from the FTC staff in meeting us to understand our plans more deeply, indicating that they would request documents and more information as the review wages on,” according to the filing. The FTC characterised their review as being related to “a confidential inquiry,” as stated in the correspondence sent to Reddit.

Reddit with its whopping 17 billion posts and comments is perceived by AI researchers as a treasure trove for training chatbots to converse. A partnership was announced last month to license this content to Google. No immediate comments were received from Reddit and Google. The FTC declined to provide any comments. (Condé Nast, WIRED’s publishing parent, which is owned by Advance Magazine Publishers, holds a stake in Reddit).

AI chatbots like OpenAI’s ChatGPT and Google’s Gemini are considered competitive threats to Reddit, publishers, and various other businesses driven by content and supported by ads. Over the past year, some firms have recognized that licensing data to AI developers could offer emerging benefits with the advance of generative AI.

But the use of data harvested online to train AI models has raised a number of questions winding through boardrooms, courtrooms, and Congress. For Reddit and others whose data is generated by users, those questions include who truly owns the content and whether it’s fair to license it out without giving the creator a cut. Security researchers have found that AI models can leak personal data included in the material used to create them. And some critics have suggested the deals could make powerful companies even more dominant.

Reece Rogers

Ali Winston

Jaina Grey

Amanda Hoover

The Google deal was one of a “small number” of data licensing wins that Reddit has been pitching to investors as it seeks to drum up interest for shares being sold in its IPO. Reddit CEO Steve Huffman in the investor pitch described the company’s data as invaluable. “We expect our data advantage and intellectual property to continue to be a key element in the training of future” AI systems, he wrote.

In a blog post last month about the Reddit AI deal, Google vice president Rajan Patel said tapping the service’s data would provide valuable new information, without being specific about its uses. “Google will now have efficient and structured access to fresher information, as well as enhanced signals that will help us better understand Reddit content and display, train on, and otherwise use it in the most accurate and relevant ways,” Patel wrote.

The FTC had previously shown concern about how data gets passed around in the AI market. In January, the agency announced it was requesting information from Microsoft and its partner and ChatGPT developer OpenAI about their multibillion-dollar relationship. Amazon, Google, and AI chatbot maker Anthropic were also questioned about their own partnerships, the FTC said. The agency’s chair, Lina Khan, described its concern as being whether the partnerships between big companies and upstarts would lead to unfair competition.

Reddit has been licensing data to other companies for a number of years, mostly to help them understand what people are saying about them online. Researchers and software developers have used Reddit data to study online behavior and build add-ons for the platform. More recently, Reddit has contemplated selling data to help algorithmic traders looking for an edge on Wall Street.

Licensing for AI-related purposes is a newer line of business, one Reddit launched after it became clear that the conversations it hosts helped train up the AI models behind chatbots including ChatGPT and Gemini. Reddit last July introduced fees for large-scale access to user posts and comments, saying its content should not be plundered for free.

That move had the consequence of shutting down an ecosystem of free apps and add ons for reading or enhancing Reddit. Some users staged a rebellion, shutting down parts of Reddit for days. The potential for further user protests had been one of the main risks the company disclosed to potential investors ahead of its trading debut expected next Thursday—until the FTC letter arrived.

Updated March 15, 2024, 8 pm EDT: This article was updated to disclose that Advance, owner of WIRED’s publisher Condé Nast, has a stake in Reddit.

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Article

Forecast: Edge Computing Investment to Reach $232B in 2024, Boosting AI Deployment

Next Article

Revealing the Truth: How Automakers Inform Your Insurer About Your Driving Habits

Related Posts