Cloudflare Accuses Perplexity of Masked Web Scraping on a Massive Scale

by Saad Farooq
Cloudflare Accuses Perplexity of Masked Web Scraping on a Massive Scale

A growing conflict between AI startup Perplexity and internet infrastructure giant Cloudflare is shedding light on a controversial practice in the world of generative AI: covert web scraping.

Scraping Behind the Scenes

Perplexity, a rising name in the AI search space, has come under fire for allegedly bypassing website restrictions to collect data at scale. According to Cloudflare, the company engaged in deceptive techniques to access content from tens of thousands of domains—content it was explicitly blocked from reaching.

Initially, Perplexity’s bots identified themselves transparently, using user-agent strings like “PerplexityBot.” But after being blocked, Cloudflare claims the startup switched to stealth mode. The bots began posing as standard users by mimicking browsers like Google Chrome on macOS and continuously rotating IP addresses to avoid detection. Even more concerning, the report alleges that Perplexity manipulated its ASN (Autonomous System Numbers), a move typically associated with evading security measures.

The scale of the activity, Cloudflare says, reached millions of requests per day.

Crackdown From Cloudflare

In response to what it described as “unauthorized access behavior,” Cloudflare removed Perplexity from its list of verified bots—meaning websites relying on Cloudflare’s services would no longer treat Perplexity’s traffic as legitimate. The company also introduced new defenses to detect and block similar disguised crawlers.

The action follows Cloudflare’s broader initiative to rein in free access to content for AI companies. In July, the company began automatically blocking AI bots and gave site owners the option to charge for data access—effectively challenging the assumption that all public internet content is free for AI training.

Perplexity Fires Back

Perplexity has rejected the accusations. Spokesperson Jesse Dwyer dismissed Cloudflare’s blog post as a “publicity stunt,” adding that it was “filled with misunderstandings.” The company did not directly address the technical claims but signaled its disagreement with how Cloudflare characterized its activity.

Larger Implications for AI Data Practices

This dispute is the latest flashpoint in the escalating debate over how AI companies acquire the massive volumes of data needed to train their systems. As more publishers and tech platforms push back against unlicensed data collection, the industry is being forced to confront the legal and ethical boundaries of scraping.

Cloudflare CEO Matthew Prince has previously described unrestricted AI scraping as an “existential threat” to publishers and content creators—a warning that underscores how quickly the landscape is shifting for both AI companies and the broader web ecosystem.

As pressure mounts, this standoff between Cloudflare and Perplexity may serve as a litmus test for how aggressively infrastructure providers are willing to defend content against unauthorized AI data mining.

You may also like

Leave a Comment