OpenAI deleted NYT copyright case evidence, say lawyers

Probably not intentional, but '150 person-hours' of work were still lost

The New York Times has filed a letter in its copyright infringement case against OpenAI and Microsoft, alerting the court that the ChatGPT maker accidentally deleted a bunch of data that may have been evidence.

The letter [PDF], filed yesterday in the Southern District of New York by lawyers for the Times, asserts that OpenAI engineers deleted "all of News Plaintiffs' programs and search result data" from one of two virtual machines set up for the purpose of allowing the plaintiffs to scour OpenAI training data for copyrighted material.

The lawsuit in question was filed in late 2023, alleging that OpenAI and Microsoft used articles from the Times to train ChatGPT and other models and readily displayed the content of articles from the newspaper when asked - all without permission, the Times claimed.

"OpenAI has provided the News Plaintiffs with two dedicated virtual machines with improved computing resources for performing their searches, and News Plaintiffs have spent an additional 150 person-hours (and even more computing hours) since November 1 searching OpenAI's training data," lawyers Ian Crosby and Steven Lieberman said in the letter.

"While OpenAI was able to recover much of the data that it erased, the folder structure and file names of the News Plaintiffs' work product have been irretrievably lost," the document continued. "Without the folder structure and original final names, the recovered data is unreliable and cannot be used to determine where the News Plaintiffs' copied articles were used to build Defendants' models."

As a result, the plaintiffs have been forced to redo "an entire week's worth of its experts' and lawyers' work," the letter asserted. There's no assertion that OpenAI deleted the data on purpose, mind you, with the Times' lawyers saying that they "have no reason to believe [it] was intentional."

Crosby and Lieberman did note in the letter that the incident "underscore[s] that OpenAI is in the best position to search its own datasets for the News Plaintiffs' works using its own tools and equipment," but argue OpenAI hasn't been receptive to such requests.

"Since the last hearing, the News Plaintiffs have sent OpenAI information for OpenAI to perform two separate searches on the News Plaintiffs' behalf," claiming that the requests were sent on November 4 and 13. "To date, the News Plaintiffs have not received results from either those searches, or confirmation that OpenAI has started them."

Because OpenAI hasn't committed to conducting searches "in a timely manner," The Times' lawyers are requesting that the court order OpenAI "to identify and admit which of the News Plaintiffs' works it used" and save it the burden of digging through the digital stacks itself.

"We disagree with the characterizations made and will file our response soon," OpenAI told The Register, while declining to elaborate on which portion it disagreed with - the deletion claim or the non-response to query requests.

Info Pulse Now

OpenAI deleted NYT copyright case evidence, say lawyers

POPULAR CATEGORY

corporate

tech

entertainment

research

misc

wellness

athletics