OpenAI is facing multiple lawsuits over the use of copyrighted material to train its AI models, including ChatGPT. Various publishers demand compensation for the use of their works. Recently, the Center for Investigative Reporting sued OpenAI, joining a list of other media outlets, such as The New York Times, that have taken similar actions.
Training data is crucial for AI development, and companies like OpenAI, Google, and Microsoft are aggressively sourcing data, often leading to legal disputes. Meta considered acquiring Simon & Schuster to enhance its data pool.
OpenAI and Microsoft are accused of using copyrighted materials without permission. Monika Bauerlein, CEO of the Center for Investigative Reporting, criticized this practice as a “violation of copyright.” The lawsuit highlights that 16,793 URLs from Mother Jones appeared in OpenAI’s WebText training set.
The Authors Guild also filed a class-action lawsuit, claiming OpenAI used data from their books. In response to such allegations, OpenAI has signed licensing agreements with major news organizations like The Associated Press and The Wall Street Journal.
As a potential solution, OpenAI considers synthetic data but faces challenges in ensuring its quality. CEO Sam Altman expressed optimism about overcoming these hurdles if AI can generate high-quality synthetic data.
OpenAI has not commented on the ongoing lawsuits.
Source: OpenAI faces more lawsuits over copyrighted data used to train ChatGPT.