Pay to Play, Pay to Scrape
On Wednesday, June 4, 2025, U.S. based social media behemoth Reddit filed a lawsuit against Anthropic due to an alleged breach of Reddit’s terms of use and unfair competition. In this article we contemplate the ethics behind training AI on users' public social media content, what the 'I Agree' button after Terms & Conditions really means and what potential precedents this lawsuit could set.
DATA USEBIG DATATECH LAWAI
Tsahai Banks
6/11/20255 min read


On Wednesday, June 4, 2025, U.S. based social media behemoth Reddit filed a lawsuit against Anthropic (creator of popular generative AI chatbot/LLM Claude) due to an alleged breach of Reddit’s terms of use and unfair competition. Reddit has previously entered into licensing agreements with a number of companies including OpenAI and Google for the privilege to train their AI systems on the public commentary of Reddit users. According to Reuters, Google’s licensing agreement with Reddit is valued at around $60 million.
Why AI Tools Want to Collect/Scrape from Social Media Platforms
Reddit, an internet chat forum that is typically known for rich conversations on niche topics where community is formed by information sharing, storytelling and debates surrounding niche topics or content. Reddit, like much of the social media landscape is a great place to mine golden data for AI training.
Removing the possibility of bot-generated content, Reddit is typically known to rely on user-generated content where users are in control of information sorting and ranking through up or down-voting content to increase or decrease visibility through a social scoring system. Essentially as I write this I just realized, Reddit is a brilliant way to have users sign up to share their data and work for free to sort it for data brokers in advance. Its a #theoreticalthursday so when speaking in theory, we are allowed to be as candid with the dark humor as we’d like.
User-generated content, means the ability for an AI tool to learn from a large and diverse set of:
Language and writing styles
Writing format
Trending topics and words
Human nuance (tone, slang, cultural undertone etc.)
Conversational debates & discussions
All with the aim of training the model to create better AI-generated content for humans that sounds human.
☑️ No, I have not read Read the Terms of Agreement
"I Agree" by Dima Yarovinsky
With the why AI tools scrape social media in mind, it is important to revisit the early days of social media. Where we posted bags of Doritos, obscure photos of grass…essentially the most bizarre but authentic form of user-generated content. Think 2010s. It was a goldmine for organic content. Now that many social media platforms have grown more established and their user bases have matured/adapted to the social expectations of each of the platforms, one thing hasn’t changed much in the last decade.
Most of us are still not reading the terms of agreement for social media platforms.
All social media survives on user-generated content. The age old adage that if a platform is not actually selling anything, then you are the product.
For many of us, the social contract we metaphorically sign in the pursuit of connection, visibility and growth opportunities, outweighs the need to be bogged down by legal jargon. The commodification of community means that yes, every comment, like or thought expressed is valuable data that can and will be brokered to the highest bidder. In the golden age of data, the old coding adage of Garbage-In, Garbage Out has quickly updated its meaning to Gold-In, Gold Out (GIGO). This concept has brought back to the forefront the question of profiteering off of big data and if we are all conscious of the long-term effects. Whether it be creative works posted online or even your “Pop Pop” (grandad) and Nana’s (grandma) Facebook posts, big data will continue to power tech platforms. As it has done for the last few decades.
Who Can Really Afford to Complain?
Let’s say I paint paintings. In order to circulate sales and attract a potential buyer, I post pictures of my work online. For the sake of this argument, I won’t specify where I posted but let’s just speculate that I posted on a public forum where I have agreed to a lengthy terms and conditions where I unfortunately misunderstood the longer term consequences.
Yes, I am generating business and gaining a following as I continue to post, but not nearly what I could make if I was paid as part of some kind of creators fund that was sourced from billion to trillion dollar tech giants licensing my content. In this example, I see two sides of the coin.
Unless the work is legally protected, we have become conditioned and apparently via informed consent of our terms of agreement, have legally consented to give away our data for free.
So who can afford to complain about AI training and misuse of data?:
Tech giants with the money to spend on investigating data theft or misuse.
Tech giants with the money to spend millions on lawyers.
Tech giants who enter into licensing agreements.
Pay to Play, Pay to Scrape…unfortunately, everyone else you won’t be getting paid.
I also encourage and challenge folks who will suddenly shift their opinion on the issue of copyright infringement or misuse of data because of this lawsuit and not because of the countless regular people/creators who found their work was stolen to train AI tools. This also speaks to who can afford to complain and who is just expected to stay quiet in the pursuit of the greater good.
What Precedents this Lawsuit Could Set
This lawsuit will be interesting to follow for a number of reasons. On a broader scale, we as the general public can get a front row seat to how to successfully argue for compensation of allegedly stolen data. Although it is still generally considered difficult to prove that a company trained their AI tool on your data without permission, it will be very interesting to see the legal precedents in this case set forth by a tech giant like Reddit. With greater resources than the average user, I look forward to seeing how topics such as the following are assessed in this suit:
Technical Expert Evidence (e.g. linguistic fingerprinting etc.)
Legal and Regulatory frameworks
Data privacy laws
Let’s consider these possibilities, critically think and come to our own conclusions in order to create a human-centered future with AI. Share your thoughts in the comments.
Follow Read Write X for more content on human-centered AI and happy #theoreticalthursday.
AI disclosure: This article is 100% human-generated.
All opinions stated in this article are speculative as Reddit’s lawsuit against Anthropic is ongoing and in turn all claims made against Anthropic are considered alleged.
Citations:
Reddit sues AI company Anthropic for allegedly ‘scraping’ user comments to train chatbot Claude by Matt O’Brien (Associated Press) https://apnews.com/article/reddit-sues-ai-company-anthropic-claude-chatbot-f5ea042beb253a3f05a091e70531692d
AI is learning from what you said on Reddit, Stack Overflow or Facebook. Are you OK with that? by Matt O’Brien (Associated Press) https://apnews.com/article/genai-training-data-stack-overflow-reddit-9d71b75ec16c78c0d2c51cc46121c1a4
Artist Visualizes the Lengthy “Terms of Service” Agreements of Popular Social Media Apps by Emma Taggart (My Modern Met) https://mymodernmet.com/social-media-policy-infographics-dima-yarovinsky/
Reddit sues AI company Anthropic for allegedly ‘scraping’ user comments to train chatbot Claude by Anna Tong, Echo Wang and Martin Coulter (Reuters) https://www.reuters.com/technology/reddit-ai-content-licensing-deal-with-google-sources-say-2024-02-22/#:~:text=The

