Reddit’s Sale of User Data for AI Training Draws FTC Inquiry

Reddit said ahead of its IPO next week that licensing user posts to Google and others for AI projects could bring in $203 million of revenue over the next few years. The community-driven platform was forced to disclose Friday that US regulators already have questions about that new line of business.

In a regulatory filing, Reddit said that it received a letter from the US Federal Trade Commision on Thursday asking about “our sale, licensing, or sharing of user-generated content with third parties to train AI models.”

The FTC, the US government’s primary antitrust regulator, has the power to sanction companies found to engage in unfair or deceptive trade practices. The idea of licensing user-generated content for AI projects has drawn questions from lawmakers and rights groups about privacy risks, fairness, and copyright.

Reddit isn’t alone in trying to make a buck off licensing data, including that generated by users, for AI. Programming Q&A site Stack Overflow has signed a deal with Google, the Associated Press has signed one with OpenAI, and Tumblr owner Automattic has said it is working “with select AI companies” but will allow users to opt out of having their data passed along. None of the licensors immediately responded to requests for comment. Reddit also isn’t the only company receiving an FTC letter about data licensing, Axios reported on Friday, citing an unnamed former agency official.

It’s unclear whether the letter to Reddit is directly related to review into any other companies.

Reddit said in Friday’s disclosure that it does not believe that it engaged in any unfair or deceptive practices but warned that dealing with any government inquiry can be costly and time-consuming. “The letter indicated that the FTC staff was interested in meeting with us to learn more about our plans and that the FTC intended to request information and documents from us as its inquiry continues,” the filing says. Reddit said the FTC letter described the scrutiny as related to “a non-public inquiry.”

Reddit, whose 17 billion posts and comments are seen by AI experts as valuable for training chatbots in the art of conversation, announced a deal last month to license the content to Google. Reddit and Google did not immediately respond to requests for comment. The FTC declined to comment. (Advance Magazine Publishers, parent of WIRED’s publisher Condé Nast, owns a stake in Reddit.)

AI chatbots like OpenAI’s ChatGPT and Google’s Gemini are seen as a competitive threat to Reddit, publishers, and other ad-supported, content-driven businesses. In the past year the prospect of licensing data to AI developers emerged as a potential upside of generative AI for some companies.

But the use of data harvested online to train AI models has raised a number of questions winding through boardrooms, courtrooms, and Congress. For Reddit and others whose data is generated by users, those questions include who truly owns the content and whether it’s fair to license it out without giving the creator a cut. Security researchers have found that AI models can leak personal data included in the material used to create them. And some critics have suggested the deals could make powerful companies even more dominant.

Read the full article here