this post was submitted on 08 Jun 2026
299 points (98.1% liked)
Technology
85242 readers
4027 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I suppose once the pillaging of everything they can train on is over, it will calm down a bit. But maybe I'm naïve on that front...
AI is already training on slop, because it can't tell its own excrement from actual human creativity.
Isn't AI training on itself a well known thing to avoid ? If I remember correctly the "" performance "" goes to shit very quickly when you train a model on it's own output.
I doubt serious AI actors will make that mistake.
But on its own, the way they just opened their torrent client and started downloading made me furious.
In France you can still get caught downloading illegally and it can have serious consequences. But for AI businesses, copyright holders seem to look the other way. Businesses have extra rights to citizens and it's completely unfair.
AI developers might be smart enough to filter their own output, but AI training is still taking in output from their competitors' models, usually without realizing it.
I read carefully this article but I think it is not about the issue I was mentioning.
I was talking about "model collapse" and this seems rather about multiple models training on similar datasets (shared learning ressources).
Yeah, I think we're talking about the same thing. I thought the article I linked was the one I had read about model inbreeding, but now that I look at it a bit closer, it's probably the product of model inbreeding itself. ;) I thought there was an article published this year about the problem, but now I can't find it to save myself. It's possible that I'm hallucinating. My memory is worse than ChatGPT's.