https://mwmbl.org/ has a custom index of user-submitted domains and user-curated results but it's not suitable for daily use yet.
Fuck AI
"We did it, Patrick! We made a technological breakthrough!"
A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.
That's how Yahoo originally worked. They gave up on the "directory" because they utterly failed to keep up with the expansion of the world wide web. Google with its automatic crawlers did a much better job of listing new websites.
Yep. Web indexing (or rather internet indexing) for us started with a notebook (paper and pen version) in the computer room. Later Yahoo and Altavista joined in (we still used the notebook for the good sites, browser bookmarks did not yet exist). But Google, when it still was good, wiped this all off the face of the earth.
Yahoo could go back to the directory approach and advertise as having less SEO garbage and AI slop.
Skipping a few generations there. Manual indexing wasn't feasible long before Google existed. Engines like the eponymous "webcrawler" would follow the links between sites and allowed searching the text of them.
Google came around later with enhancements to how the pages are ranked based on the relationship of links between pages.
There's space in the market for a better search engine. It could be built using an AI-assisted tool for semi-automatic curation by a team of a hundred librarians that could speedily vet the AI's suggestions for quality.
I'm not sure the web is even that big any more, due to all the big web 2.0 social apps like Facebook and Instagram sucking everything into their uncrawlable platforms.
Kagi allows you to save filters (called lenses) for later use, so if you curate your own, you can share with other people. Would be neat if there was a lemmy community for that
It also allows you to boost/deprioritize/ban domains from your searches. Not seeing pinterest show up ever again is great
You could try something like the huge AI blocklist for uBlock Origin. It cleans "AI" results out of Google, DDG and Bing searches.
I'm not sure how a curated search engine would work in practice, though it's a nice idea. It would just be an enormous task, and vulnerable to manipulation by bad actors if you crowdsource it. I'd love to be proven wrong, though.
As @ironhydroxide@sh.itjust.works said already, a self hosted searxng instance would give you some individual curation capabilities, but wouldn't be a "for the greater good" project as I think you might be looking for?
It looks like that list only focuses on AI images? Which can be useful but probably more narrow than what OP is looking for.
It links to a related project that might be more relevant though.
https://github.com/NotaInutilis/Super-SEO-Spam-Suppressor
The blocklist is over 6 MB though, please be aware that I have not vetted the list yet and do not speak for it.
Good call, thanks for the added link!
Yahoo in the mid nineties: p
It's certainly possible. Wouldn't even be hard, since early crawlers were using radically smaller computers and the technology involved is now just available free and open source.
If you limit it to only curated domains you'll find issues with limited content and difficulties discovering novel information.
If you need information on Peruvian sand art you can only find it if you've already added it to the index.
What you might consider is starting with a set of "seed" sites that you trust, and fan out from there. Use something like pagerank to rank encountered sites, and augment that ranking with distance from a known good domain. A site with a lot of link activity that's also referenced by a site you find credible is probably better than one that's four steps removed.
Human review of sites as they cross some threshold of ranking is plausible, since it's easier to look at a list of sites that seem consistently okay and check if they're slop than to enumerate the sites that aren't slop from scratch.
One of the better ways to gain insight about which results your users find most helpful is, ironically, a non-llm neural net. Understanding what types of queries lead to which domains should help you guide curation and put trustworthy sites that users pick higher.
Someone figured out if you add profanity to the search it avoids AI responses.
vs.
But this wont stop AI generated articles
Oh, that's true.
I've been getting by with SwissCows
The tech nerds on HN really like Kagi - it's a paid search engine, but apparently it provides filters and stuff to curate your search results to come from the types of sources you're looking for but I have no personal experience. That said, it's infected with AI just like Mojeek, Qwant Next, etc. and other alternatives. A SearXNG instance might be your best bet.
You can easily turn off the kagi AI throught settings, blocking all of the AI generated sites would be tedious though. They currently have a setting that tries to filter AI images from search results, hopefully they'll add one for AI sites in the future.
It only uses AI if you use a question mark. I’ve never once gotten AI search results otherwise.
I too was frustrated with the ever decreasing quality of search results. I spun up my own searxng instance and use that. In my experience is usually better at finding what I'm looking for.