The Number That Should Terrify Google and OpenAI
A new legal strategy by ‘Bad Blood’ author John Carreyrou threatens to turn the AI industry’s cheap settlements into an existential crisis.
On December 22, 2025, the uneasy truce between the world’s most valuable artificial intelligence companies and the creators whose data feeds them was shattered. John Carreyrou, the investigative journalist famous for exposing the Theranos fraud, filed a copyright lawsuit that explicitly rejects the industry’s standard “cost of doing business” model. Instead of joining a class action—where payouts are often diluted to negligible amounts—Carreyrou and five other authors are suing Google, OpenAI, Meta, xAI, Anthropic, and Perplexity individually. Their target? The statutory maximum damages of $150,000 per infringed work.
This development marks a critical turning point in the “Google OpenAI chatbot training lawsuit” saga. For years, tech giants have relied on the assumption that copyright violations would eventually be resolved through massive, yet manageable, class-action settlements. The recent history of these lawsuits suggests a calculated risk: scrape the data now, pay a fine later. But the data from late 2025 suggests that the math is about to change drastically for Google and OpenAI.
The chart above illustrates the stark economic disparity driving this new legal strategy. In August 2025, Anthropic settled a major class-action lawsuit for $1.5 billion. While the headline number was large, the actual payout to individual authors averaged roughly $3,000 per book—a mere 2% of the potential damages allowed under the Copyright Act. Carreyrou’s lawsuit explicitly cites this “bargain-basement” rate as the reason for opting out of the class-action model. If this strategy succeeds, the potential liability for companies like Google and OpenAI could skyrocket from billions to trillions, depending on the number of works proven to be ingested.
“LLM companies should not be able to so easily extinguish thousands upon thousands of high-value claims at bargain-basement rates.”
The YouTube Transcript Controversy
While the Carreyrou lawsuit targets books, a parallel legal battle focuses on the unauthorized use of video content. Central to the allegations against both Google and OpenAI is the use of “Whisper,” OpenAI’s speech recognition tool, to transcribe vast amounts of YouTube data. Reports indicate that OpenAI transcribed over one million hours of YouTube videos to train its GPT-4 model—an act that YouTube CEO Neal Mohan has publicly stated is a “clear violation” of the platform’s terms of service.
Google’s position is complicated by its own dual role. As the owner of YouTube, it is a victim of OpenAI’s scraping; yet, as the developer of the Gemini model, it has faced accusations of using the same video transcripts for its own training purposes. This “double-dipping” has fueled class-action lawsuits filed by creators like David Millette, who argue that both companies have built their empires on stolen time.
To visualize the sheer scale of this data ingestion, consider that one million hours of video is equivalent to roughly 114 years of continuous content. This dwarfs a human lifetime of audiovisual consumption, highlighting the industrial scale of the alleged copyright infringement. This is not merely “reading” the internet; it is a systematic ingestion of human creativity at a velocity no human could replicate.
The Widening Net of Defendants
The December 2025 lawsuit is also notable for the breadth of its targets. While earlier suits often focused on a single entity, the new filing names a “who’s who” of the generative AI landscape. This signals that plaintiffs are no longer treating these companies as separate entities with unique defenses, but as a collective industry built on a shared, potentially illegal, foundation of data acquisition.
The trend line is unmistakable: the legal pressure is accelerating, not dissipating. The inclusion of Elon Musk’s xAI for the first time in the Carreyrou suit suggests that no player, regardless of how new or well-capitalized, is immune. The legal arguments are shifting from “fair use” defenses to granular debates over specific sourcing methods, such as the use of “shadow libraries” like LibGen and Z-Library.
“This case concerns a straightforward and deliberate act of theft that constitutes copyright infringement... [The defendants] pirated authors’ works and fed them into the large language models.”
The era of “move fast and break things” is colliding with the reality of statutory damages. If the courts side with Carreyrou and enforce the $150,000 penalty per work, the cost of training a frontier model could instantly exceed the market cap of the companies building them. For Google and OpenAI, the days of treating copyright lawsuits as a mere operating expense may be coming to an abrupt end.






