2533
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 03 Jul 2023
2533 points (98.0% liked)
Technology
59081 readers
3003 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
Doesn’t sound like retaliation to me, it sounds like their scheduled web crawlers are finding that content they used to index is now no longer viewable and this removed from search results. Pretty standard. My guess is that there were 400 million URLs listed and as the crawler uncovers that they are no longer available, that number will keep dropping to reflect only content publicly viewable. If only 500 URLs are now publicly viewable (without logins) then that’s what they will index. Google isn’t a search engine for private companies (unless you pay for the service) they are a public search engine so they make an effort to ensure that only public information is indexed. Some folk game the system (like the old expertsexchange.com) but sooner or later google drops the hammer.
I don't think Twitter would rate limit the Google indexer, though.
It's probably the increased bounce rate, as people click Twitter links in the search results, get Twitter's login wall and click back to continue searching instead of creating an account.
I tried to access twitter by impersonating a googlebot. I was denied. The bots aren't so much rate limited, but unable to access tweets as they don't have a Twitter account.
Impersonating a Googlebot user-agent isn’t a perfect test though, since you’re supposed to implement IP verification too - either checking against a known list of Google crawler IPs or doing a PTR lookup. See https://developers.google.com/search/docs/crawling-indexing/verifying-googlebot
Question is whether or not Twitter has competent software developers implementing stuff correctly anymore. Anyone’s guess at this point.