WIRED investigated Perplexity AI after Forbes accused the company of stealing its content and found a whole lot of mess. You should definitely read the full article but here are some key parts:
- Perplexity AI bots ignore robots.txt directives which specifically block them from crawling—and therefore scraping—site content
- They are also bypassing server-side blocks which should present a 403 status code when accessed
- The chatbots paraphrase and summarise content it finds from a variety of sites but can get the context wrong, or make things up
- They may also be training their models on the content they scrape in order to present their regurgitated results
This sentence summed it all up for me:
The magic trick that’s made Perplexity worth 10 figures, in other words, appears to be that it’s both doing what it says it isn’t and not doing what it says it is.
When will this AI grift end?
Filed under: language models machine learning robots search engines