Opensource

5640 readers

297 users here now

A community for discussion about open source software! Ask questions, share knowledge, share news, or post interesting stuff related to it!

Credits

Icon base by Lorc under CC BY 3.0 with modifications to add a gradient

⠀

founded 2 years ago

MODERATORS

pylapp@programming.dev

Open source license that doesn’t allow your code to be used for AI data training? (feddit.online)

submitted 1 day ago by cat_fishing@feddit.online to c/opensource@programming.dev

40 comments fedilink hide all child comments

Does a license like this exist?

you are viewing a single comment's thread
view the rest of the comments

[–] FaceDeer@fedia.io 1 points 1 day ago (1 children)

That generally only happens in cases of overfitting, where the model was trained on a poorly de-duplicated data set that contains many copies of that book (or excerpts, quotes, and so forth). This is considered a flaw by AI trainers and a lot of work goes into sanitizing the training data to prevent it.

[–] XLE@piefed.social 4 points 1 day ago* (last edited 1 day ago) (1 children)

But you're otherwise disgusted by the fact that material is plagiarized without consent to begin with...

...Right, FaceDeer?

[–] FaceDeer@fedia.io -1 points 1 day ago

You went digging through my Reddit comments to find a two-month-old thread, that must have taken a lot of effort. But I'm afraid I don't see what the relevance of it is, aside from a general "it's about AI". The bulk of the comments I wrote there were about water usage.

I'm genuinely puzzled. Are you saying that deduplicating data is "hiding unethical behaviour?" It's actually intended for improving the model's performance, having a model spit out exact copies of its training data means you've produced a hugely expensive and wasteful re-implementation of copy-and-paste rather than a generative AI. The whole point of generative AI is to produce novel outputs.