Open Source

46794 readers

562 users here now

All about open source! Feel free to ask questions, and share news, and interesting stuff!

Useful Links

Rules

Posts must be relevant to the open source ideology
No NSFW content
No hate speech, bigotry, etc

Related Communities

Community icon from opensource.org, but we are not affiliated with them.

founded 6 years ago

MODERATORS

Cloak@lemmy.ml

kevincox@lemmy.ml

CrypticCoffee@lemmy.ml

Lettuceeatlettuce@lemmy.ml

Open Source in the age of license laundering (discuss.online)

submitted 1 month ago* (last edited 1 month ago) by francisco_1844@discuss.online to c/opensource@lemmy.ml

17 comments fedilink hide all child comments

Recently saw a youtube video about a service created to change an open source software license.

One agent reads code and gather specs
Another agent, without access to the original code, creates equivalent software

In theory this should allow someone to take any open source software and change it's license.

For a large portion of open source likely this is not an issue, because nobody may care for the particular software, but for larger projects I wonder what sort of impact this may have. In particular any open source software where it's authors are making a living from donations or public support.

Has anyone read, or thought, of a way to prevent getting one's code license changed this way?

you are viewing a single comment's thread
view the rest of the comments

[–] phailhaus@piefed.social 19 points 1 month ago (1 children)

The claim that they are doing a clean-room implementation is bullshit. The only way any of these models are able to make any working code is by being trained on every bit of code that could be scraped from the internet. Unless the project you are cloning was released after the model was trained, it was trained on the code. It may be a tiny fragment of the training data, but it still saw it.

[–] med@sh.itjust.works 2 points 1 month ago (1 children)

An interesting argument would be to require the training data to be shared to prove it was never exposed to the original source it's ripping off.

It might help set a precedent that would make this sort of thing less attractive

[–] francisco_1844@discuss.online 3 points 1 month ago

require the training data to be shared to prove it was never exposed to the original source

I believe there have been lawsuits which have already proven these models stole, and can reproduce verbatim, copyrighted material yet there has been little to no real consequences for the AI companies. So, if they can get away with that from companies that actually have the means to present a strong lawsuit, the chances of some open source author to defend their code are slim (very slim in my opinion)