this post was submitted on 24 Feb 2026
47 points (89.8% liked)

Opensource

5640 readers
297 users here now

A community for discussion about open source software! Ask questions, share knowledge, share news, or post interesting stuff related to it!

CreditsIcon base by Lorc under CC BY 3.0 with modifications to add a gradient



founded 2 years ago
MODERATORS
 

Does a license like this exist?

you are viewing a single comment's thread
view the rest of the comments
[–] FaceDeer@fedia.io 1 points 1 day ago (1 children)

That generally only happens in cases of overfitting, where the model was trained on a poorly de-duplicated data set that contains many copies of that book (or excerpts, quotes, and so forth). This is considered a flaw by AI trainers and a lot of work goes into sanitizing the training data to prevent it.

[–] XLE@piefed.social 4 points 1 day ago* (last edited 1 day ago) (1 children)

But you're otherwise disgusted by the fact that material is plagiarized without consent to begin with...

...Right, FaceDeer?

[–] FaceDeer@fedia.io -1 points 1 day ago

You went digging through my Reddit comments to find a two-month-old thread, that must have taken a lot of effort. But I'm afraid I don't see what the relevance of it is, aside from a general "it's about AI". The bulk of the comments I wrote there were about water usage.

I'm genuinely puzzled. Are you saying that deduplicating data is "hiding unethical behaviour?" It's actually intended for improving the model's performance, having a model spit out exact copies of its training data means you've produced a hugely expensive and wasteful re-implementation of copy-and-paste rather than a generative AI. The whole point of generative AI is to produce novel outputs.