Technology

78627 readers

5534 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

263

AI insiders seek to poison the data that feeds them (www.theregister.com)

submitted 2 days ago by tonytins@pawb.social to c/technology@lemmy.world

18 comments fedilink hide all child comments

Alarmed by what companies are building with artificial intelligence models, a handful of industry insiders are calling for those opposed to the current state of affairs to undertake a mass data poisoning effort to undermine the technology.

Their initiative, dubbed Poison Fountain, asks website operators to add links to their websites that feed AI crawlers poisoned training data. It's been up and running for about a week.

AI crawlers visit websites and scrape data that ends up being used to train AI models, a parasitic relationship that has prompted pushback from publishers. When scaped data is accurate, it helps AI models offer quality responses to questions; when it's inaccurate, it has the opposite effect.

you are viewing a single comment's thread
view the rest of the comments

[–] sobchak@programming.dev 2 points 1 day ago (1 children)

I once saw an old lecture where the guy working on Yahoo spam filters noticed that spammers would create accounts to mark their own spam messages as not spam (in an attempt to trick the spam filters; I guess a kind of a Sybil attack), and because the way the SPAM filtering models were created and used, it made the SPAM filtering more effective. It's possible that wider variety of "poisoned" data can actually help improve models.

[–] algernon@lemmy.ml 1 points 1 day ago

I... have my doubts. I do not doubt that a wider variety of poisoned data can improve training, by implementing new ways to filter out unusable training data. In itself, this would, indeed, improve the model.

But in many cases, the point of poisoning is not to poison the data, but to deny the crawlers access to the real work (and provide an opportunity to poison their URL queue, which is something I can demonstrate as working). If poison is served instead of the real content, that will hurt the model, because even if it filters out the junk, it will have access to less new data to train on.