A sure sign that they are a nefarious company.
Technology
Share interesting Technology news and links.
Rules:
- No paywalled sites at all.
- News articles has to be recent, not older than 2 weeks (14 days).
- No videos.
- Post only direct links.
To encourage more original sources and keep this space commercial free as much as I could, the following websites are Blacklisted:
- Al Jazeera.
- NBC.
- CNBC.
- Substack.
- Tom's Hardware.
- ZDNet.
- TechSpot.
- Ars Technica.
- Vox Media outlets, with exception for Axios(Due to being ad free.)
- Engadget.
- TechCrunch.
- Gizmodo.
- Futurism.
- PCWorld.
- ComputerWorld.
- Mashable.
- Hackaday.
- WCCFTECH.
More sites will be added to the blacklist as needed.
Encouraged:
- Archive links in the body of the post.
- Linking to the direct source, instead of linking to an article talking about the source.
Perplexity's firing back assumes website owners distinguish between automated scraping and on-demand scraping.
I don't think most people make that distinction.
And that falls in line perfectly with the typical "assumption of access" all of these "AI" companies make.
Perplexity fired back in their blog.
Pretty tasty.
In other words, they’re assholes.
The only surprising thing to me from this article is that OpenAI actually follows the rules for bot crawlers.
Or they haven't been caught yet.
The article explains PerplexityBot respects robots.txt, but then sends a different request with a different IP and different user-agent. They could very well be using a different method to walk around it.
The article explains how they tested for that, and as far as they could tell OpenAI is respecting the rules.