Fuck AI

2756 readers

1120 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

founded 1 year ago

MODERATORS

VerbFlow@lemmy.world

MrMcGasion@lemmy.world

TootSweet@lemmy.world

BigMikeInAustin@lemmy.world

cynar@lemmy.world

themaninblack@lemmy.world

drmeanfeel@lemmy.world

pavnilschanda@lemmy.world

CriticalMedicine@lemmy.world

WonderfulWanderer@lemmy.world

Communist@lemmy.ml

eatCasserole@lemmy.world

SpaceNoodle@lemmy.world

NutWrench@lemmy.world

Soup@lemmy.cafe

iAvicenna@lemmy.world

Tinks@lemmy.world

wizblizz@lemmy.world

corus_kt@lemmy.world

Prandom_returns@lemm.ee

JimSamtanko@lemm.ee

TrickDacy@lemmy.world

TheFriar@lemm.ee

ArmokGoB@lemmy.dbzer0.com

HawlSera@lemm.ee

andrew_bidlaw@sh.itjust.works

MeDuViNoX@sh.itjust.works

33550336@lemmy.world

Nougat@fedia.io

Lost_My_Mind@lemmy.world

Sterile_Technique@lemmy.world

Quill7513@slrpnk.net

ogmios@sh.itjust.works

glowing_hans@sopuli.xyz

e8d79@discuss.tchncs.de

ThefuzzyFurryComrade@pawb.social

Is there such a thing as a manually curated search engine? (lemmy.world)

submitted 1 day ago by fargeol@lemmy.world to c/fuck_ai@lemmy.world

24 comments fedilink hide all child comments

When I search for anything on Google or DuckDuckGo, more than half of the results are useless AI generated articles.

Those articles are generated to get in the first results of requests, since the search engine use algorithms to index websites and pages.

If we manually curate "good" websites (newspapers, forums, encyclopedias, anything that can be considered a good source) and only index their contents, would it be possible to create a good ol'fashioned search engine? Does it already exist?

you are viewing a single comment's thread
view the rest of the comments

[–] ricecake@sh.itjust.works 7 points 1 day ago

It's certainly possible. Wouldn't even be hard, since early crawlers were using radically smaller computers and the technology involved is now just available free and open source.

If you limit it to only curated domains you'll find issues with limited content and difficulties discovering novel information.
If you need information on Peruvian sand art you can only find it if you've already added it to the index.

What you might consider is starting with a set of "seed" sites that you trust, and fan out from there. Use something like pagerank to rank encountered sites, and augment that ranking with distance from a known good domain. A site with a lot of link activity that's also referenced by a site you find credible is probably better than one that's four steps removed.
Human review of sites as they cross some threshold of ranking is plausible, since it's easier to look at a list of sites that seem consistently okay and check if they're slop than to enumerate the sites that aren't slop from scratch.

One of the better ways to gain insight about which results your users find most helpful is, ironically, a non-llm neural net. Understanding what types of queries lead to which domains should help you guide curation and put trustworthy sites that users pick higher.