MEXICO CITY, Aug. 21 (EL UNIVERSAL).- For some time now, the discussion about the use of copyrighted literary works to train artificial intelligence has been on the table. Writers have spoken out against this action, however, some companies continue to use books to train them.
Such is the case of Anthropic, the company behind Claude, an AI assistant that has been trained on books. This has led to a lawsuit from three authors who claim that the firm has built “a multi-million dollar business by stealing hundreds of thousands of copyrighted books.”
According to the lawsuit, Antrophic allegedly used Book3, a dataset of more than 196,000 books, to train its Claude language model. This is because The Pile, an open-source dataset, trained Claude on smaller datasets that included Books3.
Antrophic confirmed earlier this month that The Pile had been used to train Claude, and although Books3 was removed from the dataset in August last year, the authors say the original version of the dataset is still available.
“Anthropic downloaded and reproduced copies of The Pile and Books3, knowing that these data sets consisted of a wealth of copyrighted content from pirate websites such as Bibiliotik,” the complaint said.
The plaintiffs are seeking damages from Antrophic and an order to stop the company from using copyrighted content to train Claude.
As of this writing, Anthropic has not issued any comment on the matter.
More lawsuits for Anthropic
This would not be the first lawsuit the company has faced. In October last year, Universal Music Group (UMG), Concord Publishing and ABKCO Music & Records filed a lawsuit against the AI firm for using “the lyrics of numerous musical compositions” to train Claude.
In the lawsuit, they said that with this training, the AI was able to generate identical or nearly identical lyrics for around 500 songs.