The future of AI innovation is increasingly being shaped by a growing global copyright debate. In the U.S., rightsholders are aggressively pursuing legal action against AI companies that use copyrighted material without authorization. Meanwhile, other countries are taking a more permissive stance, allowing AI models to train on massive datasets, including those sourced from pirate libraries. This emerging “copyright schism” could have significant and far-reaching consequences.
Copyright and AI: A Global Divide
Earlier this week, various rightsholder organizations submitted recommendations for the 2025 Special 301 Report. This annual review, compiled by the U.S. Trade Representative, identifies countries that fail to meet U.S. copyright protection standards.
Among the concerns raised, the protection of copyright in the AI sector stood out. Rightsholders emphasized that foreign governments should be mindful of potential copyright infringements, particularly as AI technologies advance.
China, for instance, has been called out for considering a text and data mining (TDM) exception for AI training, a policy that other nations, such as Japan, have already enacted. This has raised alarm bells—not only among copyright holders but also within American tech circles.
Tech Giants vs. Copyright Holders
In the United States, AI learning does not benefit from explicit copyright exceptions. Instead, major tech companies, including Meta, OpenAI, and Google, are facing high-profile lawsuits for allegedly training their large language models (LLMs) on unauthorized sources, including pirate libraries.
Rightsholders argue that these repositories serve as an AI goldmine, providing access to vast amounts of free, unlicensed text. The key legal question now being debated is whether this practice constitutes copyright infringement or if it falls under the principle of “fair use.”
These lawsuits are expected to take years to resolve. In the meantime, pirate libraries such as Z-Library, LibGen, and Anna’s Archive remain off-limits to U.S. AI companies. However, in countries with more relaxed copyright laws, the situation is very different—one that could create a global AI copyright divide with profound implications.
AI Companies and Shadow Libraries
Recently, DeepSeek, a Chinese AI company, has garnered attention for releasing a new, highly efficient AI model. DeepSeek’s innovation has been described as a serious challenge to U.S. dominance in AI development, significantly reducing costs while maintaining high accuracy.
While recent DeepSeek publications have become less transparent about their data sources, earlier papers explicitly reference reliance on Anna’s Archive. One study, published in March, states: “We cleaned 860K English and 180K Chinese e-books from Anna’s Archive.”
This is not an isolated case. Anna’s Archive itself has confirmed that many AI teams, including those affiliated with large U.S. and Chinese firms, have reached out for high-speed access to its dataset. The shadow library often collaborates with AI developers in exchange for financial contributions or data trades. While most U.S. companies are hesitant to engage due to legal risks, many international teams have fewer reservations.
The “Forbidden Fruit” of AI Training Data
For AI developers, shadow libraries represent an irresistible source of knowledge. The allure of these massive, freely available datasets is akin to the biblical “forbidden fruit.” Just as Adam and Eve were drawn to the tree of knowledge, AI developers are tempted by the vast, unlicensed data found in these repositories.
However, the risks are significant. In the U.S., AI companies face potential lawsuits and heavy penalties if caught using unauthorized data. This legal climate could place American AI development at a disadvantage, limiting access to the wealth of information available elsewhere.
In contrast, companies in countries with more lenient copyright laws are free to train their models on whatever data they can access. This could provide a competitive edge, accelerating AI advancements outside the U.S. and potentially shifting the global balance of technological power.
The AI Copyright Conundrum
This divide raises critical questions about the intersection of copyright law and technological innovation. Should all countries adopt strict copyright policies to level the playing field? Or should the West consider relaxing its copyright rules to keep pace with international AI development?
Rightsholders argue that global AI regulation should be strengthened, ensuring that companies pay for access to copyrighted works. Meanwhile, shadow libraries advocate for a more open approach, arguing that unrestricted access to knowledge could be a strategic advantage for Western AI leadership.
Anna’s Archive, for instance, suggests that “archiving and distributing books should be made fully legal” if the West hopes to stay competitive in AI development.
The Future of AI and Copyright
The coming years will be critical in determining how copyright laws evolve in relation to AI. As legal battles unfold in the U.S. and copyright policies continue to diverge globally, the “copyright schism” could become one of the most defining factors in the future of AI development.
Will stricter copyright enforcement stifle AI innovation in certain countries? Or will lenient policies in other regions create a global divide in technological advancement? One thing is certain—the decisions made today will shape the AI landscape for years to come.