- OpenAI faces another scrutiny of its data collection practices. This time it comes from authors.
- Two authors are suing OpenAI, accusing the company of using their books to train ChatGPT.
- A law professor believes there will be more lawsuits related to copyright and generative AI in the future.
Two award-winning authors recently sued OpenAI, accusing the Generative AI bastion of infringing copyright law by using their published books to train ChatGPT without their consent.
The lawsuit, filed in late June, alleges that ChatGPT’s underlying large language model “ingested” plaintiffs’ copyrighted work, authors Mona Awad and Paul Tremblay. They argue that ChatGPT’s ability to produce detailed summaries of their work suggests their books were included in datasets used to train the technology.
The suit is the latest example of the tension between creatives and generative AI tools capable of producing text and images in a matter of seconds. Many workers in creative fields are concerned about how rapidly evolving technology could affect their careers and livelihoods. And these concerns can increasingly find expression in legal challenges.
Daniel Gervais, a law professor at Vanderbilt University, told Insider that the authors’ lawsuit is one of only a few copyright cases against generative AI tools across the country. It won’t be the last time, he added.
Gervais expects many more authors to sue companies that develop large language models and generative AI as these programs become better at reproducing the style of writers and artists. He believes there is a flood of legal challenges targeting the issuance of tools like ChatGPT across the country.
“This is really about the input,” Gervais said of the lawsuit’s allegations of AI data scraping and training. “The output wave is also coming.”
Proving that the authors of the case suffered financial harm as a result of OpenAI’s data collection practices, as alleged in the complaint, can be difficult. Gervais told Insiders that ChatGPT may have obtained Awad and Tremblay’s work from sources other than the authors’ source material, but that it’s possible the bot “swallowed” their books, as the lawsuit alleges.
Andres Guadamuz, an AI and copyright expert at the University of Sussex, echoed those concerns, telling Insiders that even if the books were included in OpenAI’s training datasets, the company could have obtained the work by legitimately creating a would have captured another data set.
And it’s unlikely ChatGPT would have behaved any differently if it had never tracked the authors’ work, as it scrapes vast amounts of data from the internet, Guadamuz told The Guardian.
The Authors Guild, a US-based advocacy group that advocates for authors’ labor rights, published an open letter last week urging CEOs at large tech and AI companies to “get permission” from authors, their to use copyrighted works for the education of generative content AI programs and “reward authors fairly”. The organization told Insiders that its letter garnered over 2,000 signatures.
Awad and Tremblay’s lawsuit was filed the same day that OpenAI received another legal complaint alleging that the company had stolen “vast amounts of personal data” that it later fed into ChatGPT. The 157-page lawsuit, which did not give the full names of the 16 plaintiffs, accused the company of “absorbing essentially all data exchanged on the Internet that it could absorb”.
As for Awad and Tremblay’s lawsuit, which was filed in a district court in Northern California, the authors are seeking damages and reimbursement of what they claim is lost profits.
Also included in the file were documents containing ChatGPT’s summaries of Awad’s novels 13 Ways of Looking at a Fat Girl and Bunny and Tremblay’s The Cabin at the End of the World. Tremblay’s novel was adapted from the M. Night Shyamalan film Knock at the Cabin.
OpenAI and Awad did not respond to insider requests for comment. A Tremblay representative declined to comment.