Reddit CEO Steve Huffman stated massive language fashions “wouldn’t exist as we all know them” with out Reddit’s content material. He referred to as the platform’s user-generated information “trendy oil” for AI.
Huffman made the feedback throughout an interview at Quick Firm’s Most Revolutionary Firms Summit.
What Huffman Stated About Reddit’s Worth To AI
Huffman described the place Reddit’s information holds within the AI ecosystem.
Huffman stated:
“LLMs wouldn’t exist as we all know them with out Reddit. Reddit is likely one of the single largest sources of coaching information for the LLMs and Reddit continues to be one of many major sources of each coaching information and we’re additionally essentially the most cited, essentially the most cited platform throughout all fashions.”
He attributed the quotation declare to Profound, a agency that tracks AI quotation information.
Huffman defined why AI firms depend upon the content material.
“There’s no synthetic intelligence with out precise intelligence. On the finish of the day, these fashions are fairly easy. They’re regurgitating on a completely huge scale what they’ve consumed elsewhere and a big portion of that consumption is definitely simply the human dialog on Reddit as a result of it’s pure and it covers mainly each subject conceivable.”
Offers For Some, Lawsuits For Others
Reddit introduced information licensing agreements with Google and OpenAI in 2024. Huffman referenced these as Reddit’s authentic two AI information offers and didn’t announce any extra agreements.
“Since we did the unique two offers with Google and OpenAI, that was over two years in the past, so we’ve realized so much. They’ve realized so much. The entire world’s realized so much. Particularly how useful Reddit’s information is and the way helpful it’s. And so we’re being I feel very deliberate and selective there. However yeah, we’re open and open for enterprise.”
For firms that haven’t agreed to licensing phrases, Reddit has taken authorized motion. The corporate sued Anthropic in California Superior Courtroom, alleging unauthorized use of Reddit content material and violations of Reddit’s phrases. Reddit filed a federal lawsuit towards Perplexity within the Southern District of New York, together with three data-scraping corporations, alleging DMCA anti-circumvention violations and associated claims.
Huffman drew a line between the 2 teams.
“Firms like Google and OpenAI the place we had good relationships, we will really do a deal and put some guard rails on use and entry to our information on behalf of our customers however then collaborate on making merchandise for the following era of the web.”
He added that “not each firm is keen to be a collaborative associate and so sadly we have now to go the opposite manner which is lawsuits.”
Huffman informed the viewers Reddit’s place on business use is easy. “Business use of our information requires business phrases,” he stated. Reddit started charging for business API entry in 2023, a transfer that preceded the present licensing offers.
Huffman stated Reddit nonetheless supplies free information entry to researchers and universities and tries to stay versatile for non-commercial use.
What Modified Reddit’s Openness
Based on Huffman, Reddit’s willingness to share information freely modified when the AI trade moved away from open analysis. As SEJ beforehand reported, Reddit restricted entry for a lot of search engine crawlers whereas Google remained an exception.
“Traditionally, Reddit has been like we’re born of the open web and Reddit has been open and really permissive for entry to its information. And truthfully, I feel we might be in a special place immediately if the AI firms had been nonetheless mainly open and open supply and doing open analysis.”
Huffman stated the difficulty was that Reddit couldn’t longer monitor how its information was getting used. “Individuals are utilizing our information and we don’t know what it was getting used for,” he informed the viewers.
Past business phrases, Huffman stated Reddit needs to forestall its information from getting used to establish customers, goal them with advertisements, or to interchange or disintermediate the platform.
Reddit’s Personal AI Efforts
Huffman acknowledged what he referred to as a “paradox.” Reddit’s content material powers exterior AI techniques, however the firm additionally makes use of AI throughout its platform.
Essentially the most seen product is Reddit Solutions, an LLM-powered search function. It reads posts and feedback, then organizes them into responses constructed from verbatim consumer quotes. Huffman famous it’s designed for questions with out definitive solutions.
“What Reddit Solutions does is a few issues which might be distinctive to Reddit. One, it mainly solely solutions in verbatim quotes from precise folks. After which the second factor it does is it tries to current a number of views as a result of the entire level when you’re on Reddit, you need the human perspective.”
Behind the scenes, Reddit makes use of AI for content material moderation and classification. LLMs can consider whether or not a remark crosses into bullying, one thing Huffman described as beforehand troublesome due to the subjectivity concerned.
Huffman offered AI moderation as a option to scale back publicity to the worst content material, not as a substitute for Reddit’s neighborhood moderation mannequin.
“The worst job on the web was trying on the worst content material on the web and deciding whether or not it may very well be on-line or not,” Huffman stated. “That job simply goes away.”
The Grey Space Of AI-Written Posts
Huffman additionally addressed the problem of customers writing content material with AI instruments and pasting it into Reddit. That’s completely different from automated bot exercise, he careworn.
“Essentially the most annoying factor that I see not simply on Reddit, however all around the web is anyone who wrote their submit or remark with ChatGPT after which pasted it into Reddit. Like, is {that a} bot? Definitely seems like a bot, however there’s a human behind the thought.”
Huffman solid the difficulty as certainly one of intent. “It’s crucial to us that there’s a human behind the thought, behind the content material, behind the immediate,” Huffman stated. However he additionally famous that “the writing sucks” when customers depend on AI to compose their posts.
Reasonably than making a coverage to deal with it, Huffman indicated Reddit will let its neighborhood deal with the difficulty. Customers are already downvoting AI-written content material and calling it out in feedback. Huffman stated Reddit will “empower the customers extra and the subreddits extra to simply reject that kind of content material altogether.”
He in contrast the broader query to calculators in math class. “Children as of late are simply studying write with AI. What are we going to do about it?” he stated. “We sort of must be taught, I feel, together with everyone else.”
Why This Issues
Huffman’s feedback reinforce Reddit’s pitch that its consumer discussions are a core enter for AI techniques.
The AI-written content material drawback Huffman described is one SEJ coated as a part of a broader YouTube AI slop investigation. Reddit’s choice to let neighborhood voting deal with AI-generated posts, slightly than constructing detection instruments, is a special path than platforms which have deployed automated labeling.
Trying Forward
Huffman informed Quick Firm that Reddit is “out there speaking to people on a regular basis” about new information offers, although he didn’t trace at a 3rd settlement.
Reddit’s lawsuits towards Anthropic and Perplexity are each ongoing. The Anthropic case was the topic of a federal court docket remand listening to in March.
