this post was submitted on 13 Jan 2026
73 points (86.9% liked)
Open Source
43248 readers
317 users here now
All about open source! Feel free to ask questions, and share news, and interesting stuff!
Useful Links
- Open Source Initiative
- Free Software Foundation
- Electronic Frontier Foundation
- Software Freedom Conservancy
- It's FOSS
- Android FOSS Apps Megathread
Rules
- Posts must be relevant to the open source ideology
- No NSFW content
- No hate speech, bigotry, etc
Related Communities
- !libre_culture@lemmy.ml
- !libre_software@lemmy.ml
- !libre_hardware@lemmy.ml
- !linux@lemmy.ml
- !technology@lemmy.ml
Community icon from opensource.org, but we are not affiliated with them.
founded 6 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
This is entirely my opinion and I'm likely wrong about many things, but at minimum:
The model has to be open source and freely downloadable, runnable, and copyleft, satisfying the distribution license requirements of copyleft source material (I'm willing to give a free pass to making it copyleft in general, as different copyleft licenses can have different and contradictory distribution license requirements, but IMO the leap from permissive to copyleft is the more important part). I suspect this alone will kill the AI bubble, because as soon as they can't exclusively profit off it they won't see AI as "the future" anymore.
All training data needs to be freely downloadable and independently hosted by the AI creator. Goes without saying that only material you can legally copy and host on your own server can be used as training data. This solves the IP theft issue, as IMO if your work is licensed such that it can be redistributed in its entirety, it should logically also be okay to use it as training data. And if you can't even legally host it on your own server, using it to train AI is off the table. And the independently hosted dataset (complete with metadata about where it came from) also serves as attribution, as you can then search the training data for creators.
Pay server owners for use of their resources. If you're scraping for AI you at the very least need to have a way for server owners to send you bills. And no content can be scraped from the original source more than once, see point 2.
Either have a mechanism of tracking acknowledgement and accurately generating references along with the code, or if that's too challenging, I'm personally also okay with a blanket policy where anything AI generated is public domain. The idea that you can use AI generated code derived from open source in your proprietary app, and can then sue anyone who has the audacity to copy your AI generated code, is ridiculous and unacceptable.