Anthropic’s AI Training Ruled as Fair Use: A Californian Court Sets a Precedent
At the end of June, a federal court in California issued a ruling stating that Anthropic (best known for its Claude AI system) did not violate copyright when it used lawfully purchased books to train its AI model.
According to the judge, Anthropic’s actions should be considered legitimate because they are deeply transformative. This means the use was not intended for the reproduction or distribution of the works, but rather to build a system that learns from the texts to generate new, autonomous responses. Such a model, the Californian court argued, does not replicate the original work but radically changes it to create something different and new.
The Concept of “Fair Use”
To fully grasp the significance of the American decision, it is essential to understand a central legal concept: fair use. This is a provision specific to United States law that allows, under certain circumstances, the use of copyrighted works without needing to obtain a license or the author’s consent.
However, it is important to note that fair use is not a free pass. It is, for all intents and purposes, a legal defense. It can only be invoked after the fact as a defense argument in a copyright infringement proceeding. This means a judge will then evaluate, on a case-by-case basis, whether this principle can be applied. The factors that come into play are the purpose of the use of the original work (education, research, non-profit activities), the type of work involved (purely informational or highly creative), the amount of the original work used, and finally, the economic effect of the contested use (i.e., if and how it may harm the original work’s market). It should be noted that, in contrast, civil law systems (like Italy’s) do not have this principle. Exceptions to copyright are exhaustive and codified.
Returning to the Californian Decision
The decision acknowledged that if a work was lawfully purchased in a physical format, it is permissible to digitize it and use its content for internal training purposes, even if this involves destroying the physical medium. The core principles are the lawful purchase and a use that is non-competitive with the original work. In this context, AI training is equated to a use that neither subtracts commercial value from the work nor replicates its purpose, thus falling within the fair use doctrine provided by U.S. copyright law.
Let’s proceed according to the four evaluation points identified:
- Purpose and Character of the Use: The court found that Anthropic’s use of the books was not aimed at copying them, but rather at “transforming” them to train a system capable of generating new texts, not replicating existing ones. Digitizing purchased copies was deemed legitimate because it only served to make them available in a digital format for consultation. Conversely, using pirated works to build a library was judged to be clearly illicit.
- Nature of the Work: The judge recognized that the books used by Anthropic are creative works, rich in expression and originality. For this very reason, their use is less justifiable compared to more neutral or informational content. On this point, fair use is therefore penalized.
- Amount and Substantiality of the Material Used: In essence, the court considered it legitimate to use large quantities of text to train an AI model because it is technically necessary and not intended to make the original content public or replicate it in its entirety. The use of the material was therefore considered proportional.
- Effects on the Work and Its Market: The court found that the training of Claude did not negatively impact the market for the works used. There was no evidence of a decline in sales or of products that replaced them. Copyright, the judge specified, serves to protect creativity, not to block potential future competitors.
The distinction highlighted by the Californian court in this case is clear. If a work was lawfully purchased, it can be used in a transformative way for training an AI system. If, on the contrary, it comes from illicit sources, any use remains a violation, even when the final purpose is training.
Conclusions
From a technological standpoint, the Californian court’s decision offers a useful initial legal reference for companies operating in the United States. It is not an indiscriminate green light, but a confirmation that, in that regulatory context, the use of purchased works to train language models can fall within the scope of fair use, provided it adheres to specific criteria. This could incentivize the development of structured datasets from legitimate sources, with increasing attention to documenting the use and origin of materials.
The American ruling marks a starting point in the global debate, but not a final destination. It is an interesting precedent that is difficult to replicate automatically elsewhere.
The creation of digital archives that do not have transformative elements, or the simple digital reproduction of protected works, especially without a license, entails concrete legal risks. Indeed, transforming a physical text into a digital format can constitute an act of reproduction relevant under copyright law, falling within the exclusive economic exploitation rights granted to the author or rights holder.
The legitimacy of this type of use cannot, therefore, be generalized but must be evaluated on a case-by-case basis. However, in light of this decision, U.S. jurisprudence—at least in California—seems to be moving towards greater openness when digitization serves as a technical step to train an artificial intelligence, and not to replace the original work. This outlines an interpretive line that seeks to balance the protection of creative works with the demands of technological innovation
Article in collaboration with AW LEGAL
AW LEGAL is a law firm specializing in Intellectual Property, Privacy, and Legal Tech.