this post was submitted on 09 Jan 2024

526 points (98.2% liked)

Technology

83728 readers

970 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

526

‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says (www.theguardian.com)

submitted 2 years ago by L4s@lemmy.world to c/technology@lemmy.world

312 comments fedilink hide all child comments

‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says::Pressure grows on artificial intelligence firms over the content used to train their products

you are viewing a single comment's thread
view the rest of the comments

[–] bjoern_tantau@swg-empire.de 125 points 2 years ago (3 children)

Or let's use this opportunity to make copyright much less draconian.

[–] dhork@lemmy.world 86 points 2 years ago* (last edited 2 years ago) (18 children)

¿Porque no los dos?

I don't understand why people are defending AI companies sucking up all human knowledge by saying "well, yeah, copyrights are too long anyway".

Even if we went back to the pre-1976 term of 28 years, renewable once for a total of 56 years, there's still a ton of recent works that AI are using without any compensation to their creators.

I think it's because people are taking this "intelligence" metaphor a bit too far and think if we restrict how the AI uses copyrighted works, that would restrict how humans use them too. But AI isn't human, it's just a glorified search engine. At least all standard search engines do is return a link to the actual content. These AI models chew up the content and spit out something based on it. It simply makes sense that this new process should be licensed separately, and I don't care if it makes some AI companies go bankrupt. Maybe they can work adequate payment for content into their business model going forward.

[–] deweydecibel@lemmy.world 24 points 2 years ago* (last edited 2 years ago) (1 children)

It shouldn't be cheap to absorb and regurgitate the works of humans the world over in an effort to replace those humans and subsequently enrich a handful of silicon valley people.

Like, I don't care what you think about copyright law and how corporations abuse it, AI itself is corporate abuse.

And unlike copyright, which does serve its intended purpose of helping small time creators as much as it helps Disney, the true benefits of AI are overwhelmingly for corporations and investors. If our draconian copyright system is the best tool we have to combat that, good. It's absolutely the lesser of the two evils.

[–] lolcatnip@reddthat.com 6 points 2 years ago (1 children)

Do you believe it's reasonable, in general, to develop technology that has the potential to replace some human labor?

Do you believe compensating copyright holders would benefit the individuals whose livelihood is at risk?

the true benefits of AI are overwhelmingly for corporations and investors

"True" is doing a lot of work here, I think. From my perspective the main beneficiaries of technology like LLMs and stable diffusion are people trying to do their work more efficiently, people paying around, and small-time creators who suddenly have custom graphics to illustrate their videos, articles, etc. Maybe you're talking about something different, like deep fakes? The downside of using a vague term like "AI" is that it's too easy to accidently conflate things that have little in common.

[–] EldritchFeminity@lemmy.blahaj.zone 11 points 2 years ago

There's 2 general groups when it comes to AI in my mind: Those whose work would benefit from the increased efficiency AI in various forms can bring, and those who want the rewards of work without putting in the effort of working.

The former include people like artists who could do stuff like creating iterations of concept sketches before choosing one to use for a piece to make that part of their job easier/faster.

Much of the opposition of AI comes from people worrying about/who have been harmed by the latter group. And it all comes down the way that the data sets are sourced.

These are people who want to use the hard work of others for their own benefit, without giving them compensation; and the corporations fall pretty squarely into this group. As does your comment about "small-time creators who suddenly have custom graphics to illustrate their videos, articles, etc." Before AI, they were free to hire an artist to do that for them. MidJourney, for example, falls into this same category - the developers were caught discussing various artists that they "launder through a fine tuned Codex" (their words, not mine, here for source) for prompts. If these sorts of generators were using opt-in data sets, paying licensing fees to the creators, or some other way to get permission to use their work, this tech could have tons of wonderful uses, like for those small-time creators. This is how music works. There are entire businesses that run on licensing copyright free music out to small-time creators for their videos and stuff, but they don't go out recording bands and then splicing their songs up to create synthesizers to sell. They pay musicians to create those songs.

Instead of doing what the guy behind IKEA did when he thought "people besides the rich deserve to be able to have furniture", they're cutting up Bob Ross paintings to sell as part of their collages to people who want to make art without having to actually learn how to make it or pay somebody to turn their idea into reality. Artists already struggle in a world that devalues creativity (I could make an entire rant on that, but the short is that the starving artist stereotype exists for a reason), and the way companies want to use AI like this is to turn the act of creating art into a commodity even more; to further divest the inherently human part of art from it. They don't want to give people more time to create and think and enjoy life; they merely want to wring even more value out of them more efficiently. They want to take the writings of their journalists and use them to train the AI that they're going to replace them with, like a video game journalism company did last fall with all of the writers they had on staff in their subsidiary companies. They think, "why keep 20 writers on staff when we can have a computer churn out articles for our 10 subsidiaries?" Last year, some guy took a screenshot of a piece of art that one of the artists for Genshin Impact was working on while livestreaming, ran it through some form of image generator, and then came back threatening to sue the artist for stealing his work.

Copyright laws don't favor the small guy, but they do help them protect their work as a byproduct of working for corporate interests. In the case of the Genshin artist, the fact that they were livestreaming their work and had undeniable, recorded proof that the work was theirs and not some rando in their stream meant that copyright law would've been on their side if it had actually gone anywhere rather than some asshole just being an asshole. Trademark isn't quite the same, but I always love telling the story of the time my dad got a cease and desist letter from a company in another state for the name of a product his small business made. So he did some research, found out that they didn't have the trademark for it in that state, got the trademark himself, and then sent them back their own letter with the names cut out and pasted in the opposite spots. He never heard from them again!

[–] AnneBonny@lemmy.dbzer0.com 4 points 2 years ago (4 children)

I don’t understand why people are defending AI companies sucking up all human knowledge by saying “well, yeah, copyrights are too long anyway”.

Would you characterize projects like wikipedia or the internet archive as "sucking up all human knowledge"?

[–] dhork@lemmy.world 15 points 2 years ago (2 children)

In Wikipedia's case, the text is (well, at least so far), written by actual humans. And no matter what you think about the ethics of Wikipedia editors, they are humans also. Human oversight is required for Wikipedia to function properly. If Wikipedia were to go to a model where some AI crawls the web for knowledge and writes articles based on that with limited human involvement, then it would be similar. But that's not what they are doing.

The Internet Archive is on a bit less steady legal ground (see the resent legal actions), but in its favor it is only storing information for archival and lending purposes, and not using that information to generate derivative works which it is then selling. (And it is the lending that is getting it into trouble right now, not the archiving).

[–] phillaholic@lemm.ee 4 points 2 years ago

The Internet Archive has no ground to stand on at all. It would be one thing if they only allowed downloading of orphaned or unavailable works, but that’s not the case.

[–] randon31415@lemmy.world 2 points 2 years ago

Wikipedia has had bots writing articles since the 2000 census information was first published. The 2000 census article writing bot was actually the impetus for Wikipedia to make the WP:bot policies.

[–] MBM@lemmings.world 15 points 2 years ago

Does Wikipedia ever have issues with copyright? If you don't cite your sources or use a copyrighted image, it will get removed

[–] assassin_aragorn@lemmy.world 9 points 2 years ago (1 children)

Wikipedia is free to the public. OpenAI is more than welcome to use whatever they want if they become free to the public too.

[–] afraid_of_zombies@lemmy.world -3 points 2 years ago (1 children)

It is free. They have a pair model with more stuff but the baseline model is more than enough for most things.

[–] assassin_aragorn@lemmy.world 3 points 2 years ago (1 children)

There should be no paid model if they aren't going to pay for training material.

[–] afraid_of_zombies@lemmy.world -4 points 2 years ago (1 children)

There also shouldn't be goal post moving in lemmy threads but yet here we are. Can you move the goalposts back into position for me?

[–] assassin_aragorn@lemmy.world 2 points 2 years ago (1 children)

My position has always been that OpenAI can either pay for training materials or make money solely on advertisements. Having a paid version is completely unacceptable if they aren't paying for training.

[–] afraid_of_zombies@lemmy.world -2 points 2 years ago

OpenAI is more than welcome to use whatever they want if they become free to the public too.

My position has always been

Left the goalposts and went on to gaslighting

[–] afraid_of_zombies@lemmy.world -1 points 2 years ago

The copyright shills in this thread would shutdown Wikipedia

[–] lolcatnip@reddthat.com 1 points 2 years ago (3 children)

I don't understand why people are defending AI companies

Because it's not just big companies that are affected; it's the technology itself. People saying you can't train a model on copyrighted works are essentially saying nobody can develop those kinds of models at all. A lot of people here are naturally opposed to the idea that the development of any useful technology should be effectively illegal.

[–] assassin_aragorn@lemmy.world 12 points 2 years ago

This is frankly very simple.

If the AI is trained on copyrighted material and doesn't pay for it, then the model should be freely available for everyone to use.
If the AI is trained on copyrighted material and pays a license for it, then the company can charge people for using the model.

If information should be free and copyright is stifling, then OpenAI shouldn't be able to charge for access. If information is valuable and should be paid for, then OpenAI should have paid for the training material.

OpenAI is trying to have it both ways. They don't want to pay for information, but they want to charge for information. They can't have one without the either.

[–] BURN@lemmy.world 10 points 2 years ago (1 children)

You can make these models just fine using licensed data. So can any hobbyist.

You just can’t steal other people’s creations to make your models.

[–] lolcatnip@reddthat.com -2 points 2 years ago (1 children)

Of course it sounds bad when you using the word "steal", but I'm far from convinced that training is theft, and using inflammatory language just makes me less inclined to listen to what you have to say.

[–] BURN@lemmy.world 9 points 2 years ago (1 children)

Training is theft imo. You have to scrape and store the training data, which amounts to copyright violation based on replication. It’s an incredibly simple concept. The model isn’t the problem here, the training data is.

[–] lolcatnip@reddthat.com -2 points 2 years ago

Training is theft imo.

Then it appears we have nothing to discuss.

[–] dhork@lemmy.world 7 points 2 years ago (1 children)

I am not saying you can't train on copyrighted works at all, I am saying you can't train on copyrighted works without permission. There are fair use exemptions for copyright, but training AI shouldn't apply. AI companies will have to acknowledge this and get permission (probably by paying money) before incorporating content into their models. They'll be able to afford it.

[–] lolcatnip@reddthat.com 0 points 2 years ago (1 children)

What if I do it myself? Do I still need to get permission? And if so, why should I?

I don't believe the legality of doing something should depend on who's doing it.

[–] BURN@lemmy.world 4 points 2 years ago

Yes you would need permission. Just because you’re a hobbyist doesn’t mean you’re exempt from needing to follow the rules.

As soon as it goes beyond a completely offline, personal, non-replicatible project, it should be subject to the same copyright laws.

If you purely create a data agnostic AI model and share the code, there’s no problem, as you’re not profiting off of the training data. If you create an AI model that’s available for others to use, then you’d need to have the licensing rights to all of the training data.

load more comments (15 replies)

[–] hellothere@sh.itjust.works 37 points 2 years ago* (last edited 2 years ago) (1 children)

I'm no fan of the current copyright law - the Statute of Anne was much better - but let's not kid ourselves that some of the richest companies in the world have any desire what so ever to change it.

[–] Gutless2615@ttrpg.network 5 points 2 years ago (2 children)

My brother in Christ I’m begging you to look just a little bit into the history of copyright expansion.

[–] hellothere@sh.itjust.works 7 points 2 years ago

I am well aware.

[–] LWD@lemm.ee -2 points 2 years ago* (last edited 2 years ago) (1 children)

deleted

[–] Gutless2615@ttrpg.network 3 points 2 years ago (1 children)

I only discuss copyright on posts about AI copyright issues. Yes, brilliant observation. I also talk about privacy y issues on privacy relevant posts, labor issues on worker rights related articles and environmental justice on global warming pieces. Truly a brilliant and skewering observation. Youre a true internet private eye.

Fair use and pushing back against (corporate serving) copyright maximalism is an issue I am passionate about and engage in. Is that a problem for you?

[–] LWD@lemm.ee -3 points 2 years ago* (last edited 2 years ago) (1 children)

deleted

[+] Gutless2615@ttrpg.network 8 points 2 years ago (1 children)

[deleted]

[–] LWD@lemm.ee 1 points 2 years ago* (last edited 2 years ago) (1 children)

Creators in your circles? Does that include your clients, because apparently small artists hire you?

In my current job i fight back against the tech giants and try to reign in specifically Google Amazon and Meta with consumer protection regulations.

Well now I'm intrigued.

Exactly how do you prevent your clients from getting their content stolen by a corporation created by Sam Altman, who is worth half a billion dollars on his own?

[+] Gutless2615@ttrpg.network 5 points 2 years ago* (last edited 2 years ago) (1 children)

[deleted]

[–] LWD@lemm.ee 0 points 2 years ago (1 children)

I trust you enough to believe that you aren't stupid, and you know what to say and not say in any given place. That's why I thought it was extremely curious that you never bring up copyright in any thread except when it's about AI. I can understand being a defeatist when working with small creators, of course.

Speaking of which, what do you call theft? After all, with your legal background, doesn't pedantry kick into high gear and remind you the issues are piracy etc?

So are you more focused on targeting the little pirates and not the big rich billionaires in your business, or...?

[–] Gutless2615@ttrpg.network 2 points 2 years ago* (last edited 2 years ago) (1 children)

Not legal advice not your lawyer etc etc. But I would likely never suggest someone pursue aggressively against individual piracy. You write contracts for your partners. You fight businesses when they breach. You make great work and price it appropriately. You make your wins there and you do everything you can to not find yourself in a courtroom or arbitration if you can avoid it. You’re not winning any friends and you’re not saving yourself any trouble by raging against torrents. Especially for small creators the calculus never (imo) works out in their favor. More often than not, small artists and creators need to be much more concerned about and need help with being able to defend themselves against spurious accusations of infringement by larger corporate Ip rent seekers and more-or-less automated systems (again: cyberpunk dystopia).

Speaking personally I find the equivocation of “copyright infringement” and “theft” ridiculous. One download = \ = one “stolen” sale, and it never has. Theft requires depriving the original of the property, being able to exercise exclusive control over it. Conceptually it has always broken down when talking about digital goods.

[–] LWD@lemm.ee 0 points 2 years ago (1 children)

So, with all your talk of copyright and AI (and you always make sure to only talk about copyright when AI is involved), have you ever talked about the decision to digitize actor's voices? Since you work for small creators, I imagine you have a history of being against this sort of copyright, as game studios might opt to use someone else's digitized voice instead of a new creator, harming your clients.

[–] Gutless2615@ttrpg.network 0 points 2 years ago* (last edited 2 years ago)

I’m Incredibly worried about AI deepfakes and voice cloning for a whole host of reasons. It’s one of the things I think we are collectively least prepared to deal with. The privacy concerns, national security, cyber security - to say nothing of disinformation and yeah, labor impacts — we are fucked and not at all ready for this.

Name and likeness rights, rights of publicity though and privacy rights don’t stem from copyright and don’t require an expansion of copyright to further protect. There’s case law already preventing a business from cloning someone without their permission, and everyone will be paying very close attention to those parts of contracts moving forward, I’d wager. As to wholesale replacing actors and talent with generated content — yeah, I’m very worried a lot of artists and creative people are as fucked as the lawyers and the accountants and writers and everyone else when it comes to job displacement.

Again, despite your really aggressive tone, I’m telling you: we almost certainly agree more than we disagree. It is ghoulish watching studios rush to replace extras and voice actors and resurrect dead actors. True cyberpunk dystopia necromancer shit. I’m hoping that we see more victories won in this genuinely encouraging resurgence of labor (todays SAG AFTRA deal notwithstanding) and legislation directly addressing the labor impacts of AI more broadly. Different kinds of guard rails and safety nets. I just don’t think copyright is the answer you think it is to the horrors that we both agree are coming.

[–] Fisk400@feddit.nu 13 points 2 years ago

As long as capitalism exist in society, just being able go yoink and taking everyone's art will never be a practical rule set.