Why The New York Times is suing Perplexity AI over unauthorized content use

07 Dec 2025 06:29

1042272
Explainers (FAQ)
Share

Why The New York Times is suing Perplexity AI over unauthorized content use

Source: Reuters

The New York Times has filed a significant lawsuit against Perplexity AI, accusing the company of using Times journalism without permission to train its artificial intelligence systems and to generate content that mirrors the newspaper’s reporting, News.Az reports.

The lawsuit is part of a wider global debate about how news publishers, technology companies, and AI developers should interact in an era where data is the core resource behind artificial intelligence. Below is a detailed FAQ explainer that breaks down what happened, why it matters, and how this case could shape the future of news and AI.

What is the lawsuit about?

The core claim from The New York Times is that Perplexity AI used Times articles, investigations, photos, and other proprietary journalism without authorization to train large language models and to answer user queries with material that closely resembles or directly reproduces Times content. According to the lawsuit, this practice violates copyright law and deprives the publisher of fair compensation for its intellectual property.

The Times argues that its journalism requires substantial financial investment in reporting, verification, editorial oversight, and distribution. By allegedly using this work for free, Perplexity is seen as obtaining an unfair commercial advantage.

For Perplexity, which develops an AI search platform that summarizes information from various online sources, the challenge is to defend how its system gathers and processes data across the internet. The case will likely evaluate whether scraping news content for AI training can be legally justified under existing copyright rules or the doctrine of fair use.

Who are the parties involved?

The plaintiffs are The New York Times Company, one of the world’s leading news organizations. They publish a wide range of content, including daily news, investigative reporting, multimedia journalism, and data driven stories.

The defendant is Perplexity AI, an AI powered search and answer engine founded in 2022. The company has raised significant funding and positions itself as a next generation information retrieval tool that can summarize the web for users.

The lawsuit places Perplexity alongside other AI companies facing legal scrutiny, including OpenAI and Meta, who are also dealing with copyright related claims from publishers and authors.

What exactly does The New York Times accuse Perplexity AI of doing?

The Times outlines several key allegations:

1. Unauthorized scraping of Times content

According to the lawsuit, Perplexity allegedly used automated tools to copy Times articles that sit behind a paywall or have restricted access. The Times states that this content is explicitly prohibited from being scraped by automated systems.

2. Regurgitating copyrighted material

The lawsuit claims that Perplexity tools sometimes output detailed summaries or near verbatim reproductions of Times articles, including investigative reports. In some examples mentioned by The Times, Perplexity generated responses that mirrored the newspaper’s writing style, structure, and factual details.

3. Misleading attribution

The Times alleges that Perplexity often gives the impression that it is summarizing the web neutrally, but in practice it may draw heavily from copyrighted sources without mentioning them. The newspaper argues that this erodes proper attribution and could mislead users about the origin of the information.

4. Competitive harm

The lawsuit suggests that users who rely on Perplexity’s summaries may have less incentive to visit The New York Times website or subscribe to its services. This could reduce advertising revenue, subscription income, and overall traffic.

How has Perplexity responded?

Perplexity has denied wrongdoing. The company argues that:

Its data collection practices comply with industry standards.
The system uses publicly accessible content for training.
It respects robots.txt and website access policies.
It generates transformative summaries rather than copying material.

Perplexity maintains that its AI is designed to help users discover information more efficiently, not to undermine publishers. The company also states that it is willing to collaborate with news organizations and explore licensing arrangements.

Still, the lawsuit sets the stage for a broader debate over what constitutes acceptable training data for AI models and whether publishers should be compensated by technology firms that use their work.

Why is this case important for the news industry?

The Times lawsuit is widely seen as a landmark moment for journalism in the AI era. Many publishers fear that AI systems could undermine their business models in several ways:

1. Loss of traffic and revenue

If AI platforms provide detailed summaries of news articles, users may not click through to the original source. This could jeopardize subscription models, which are crucial for funding modern newsrooms.

2. Erosion of attribution

Publishers worry that AI systems may obscure the origin of reporting, reducing recognition for journalists’ efforts.

3. Competitive disadvantage

AI companies can sometimes deliver faster and broader summaries of content, raising concerns that original news providers might lose relevance.

4. Lack of compensation

Publishers argue that their proprietary content should not be freely used as raw material for AI development. They seek licensing agreements similar to those negotiated with social platforms like Facebook and Google in recent years.

The Times lawsuit may inspire other publishers to take similar action or negotiate licensing deals to protect their intellectual property.

How does this case relate to other global AI copyright disputes?

This lawsuit comes at a time of rapid legal evolution surrounding AI:

OpenAI is facing lawsuits from authors, news outlets, and digital creators over alleged unauthorized data use.
Meta has been sued by publishers who claim that the company’s AI models were trained on copyrighted content.
European regulators are debating new rules for AI data transparency and training dataset disclosures.
United States lawmakers are considering updates to copyright law to account for AI performance.

The Times versus Perplexity case will likely influence how courts interpret key concepts such as fair use, transformative work, and the legality of web scraping for AI model training.

What does fair use mean in this context?

Fair use is a legal principle in U.S. copyright law that allows limited use of copyrighted material without permission in certain cases, such as commentary, criticism, or education. AI companies often argue that training models on large datasets constitutes a transformative use that does not infringe on original works.

However, publishers like The New York Times contend that:

AI models may reproduce their content too closely to be considered transformative.
Large scale scraping is not equivalent to traditional fair use scenarios.
Commercial AI products directly compete with their journalism.

Courts will have to determine how far fair use extends when dealing with generative AI trained on vast amounts of copyrighted material.

Could this lawsuit shape the future of AI regulation?

Yes. The case could set important precedents in several areas:

1. Data transparency

AI developers may be required to disclose more information about their training datasets.

2. Licensing obligations

Courts or lawmakers may push AI firms to pay for copyrighted materials used in training.

3. Limits on scraping

Regulations may strengthen restrictions on accessing paywalled or protected content.

4. Accountability for generated outputs

AI companies may face stricter rules about reproductions of proprietary material.

Whatever the outcome, the case is likely to influence how AI developers and news publishers negotiate partnerships in the future.

What are potential outcomes of the lawsuit?

Several possibilities exist:

Settlement and licensing agreement – Perplexity could choose to license Times content.
Court ruling in favor of The Times – establishing stricter limits on AI training data.
Court ruling in favor of Perplexity – reinforcing the notion that large scale data scraping is permissible.
Partial ruling – allowing some uses while limiting others.
Industry wide reform – regardless of the ruling, the industry may shift toward new standards.

Given the stakes involved, many legal experts believe the case might end in a negotiated settlement to avoid long term litigation.

How will this affect readers and everyday users?

Readers could experience several effects depending on the outcome:

AI generated summaries may become more restricted or better attributed.
Some AI services may require licensing deals, possibly leading to subscription models.
News organizations may adopt their own AI tools to compete.
If publishers feel squeezed financially, they may put more content behind paywalls.

Overall, users are likely to see a more structured relationship between content creators and AI platforms.

What happens next?

The legal process will unfold over months or even years. Both sides will present evidence about how Perplexity collected data, how its AI models function, and how closely its outputs resemble copyrighted content. Judges will examine the definitions of fair use, transformation, scraping, and economic harm.

Meanwhile, publishers, lawmakers, and AI companies will continue negotiating the boundaries of content use in the age of artificial intelligence.

News.Az