this post was submitted on 25 Aug 2023

88 points (100.0% liked)

Technology

41298 readers

477 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 4 years ago

MODERATORS

alyaza@beehaw.org

TheRtRevKaiser@beehaw.org

gyrfalcon@beehaw.org

rs5th@beehaw.org

coldredlight@beehaw.org

SemioticStandard@beehaw.org

TheRtRevKaiser@kbin.social

remington@beehaw.org

Stephen King: My Books Were Used to Train AI (literature.cafe)

submitted 2 years ago* (last edited 2 years ago) by GammaGames@beehaw.org to c/technology@beehaw.org

45 comments fedilink hide all child comments

One prominent author responds to the revelation that his writing is being used to coach artificial intelligence.

By Stephen King

Non-paywalled link: https://archive.li/8QMmu

you are viewing a single comment's thread
view the rest of the comments

[–] Phanatik@kbin.social 12 points 2 years ago (2 children)

LLMs have been caught plagiarising works, by the simple nature of how they function. They predict the next word based on an assumed context of the previous words, they're very good at constructing sentences but often the issue is "where is it getting its information from?" Authors never consented to their works being fed into an optimisation algorithm and neither did artists when DALL E was created.

For authors, you buy the book and thus the author is paid but that's not what happened with ChatGPT.

[–] echodot@feddit.uk 6 points 2 years ago* (last edited 2 years ago) (1 children)

Authors never consented to their works being fed into an optimisation algorithm

Yeah I know they didn't but at worst the company owes them 30 bucks for the licence. I don't think copyright law gives authors the right to say who can buy their works, so at the absolute worst, the AI company's stole a book.

To be clear I'm not saying that this should be allowed, I'm just saying that under the current legal system I'm not sure they actually committed that much of a crime. Obviously it needs to be updated, but you do that through political reform (and good luck with that because AI is big bucks), not through the courts.

[–] Phanatik@kbin.social 5 points 2 years ago (1 children)

Copyright Law doesn't talk about who can consume the work. ChatGPT's theft is no different to piracy and companies have gotten very pissy about their shit being pirated but when ChatGPT does it (because the piracy is hidden behind its training), it's fine. The individual authors and artists get shafted in the end because their work has been weaponised against them.

[–] FaceDeer@kbin.social 3 points 2 years ago (1 children)

Copyright Law doesn't talk about who can consume the work.

What law does talk about it, then?

[–] Phanatik@kbin.social 3 points 2 years ago (1 children)

That would be a worthwhile question if that was the contention.

[–] FaceDeer@kbin.social 6 points 2 years ago

You seem to be suggesting that training these LLMs is illegal, with things like "ChatGPT's theft" and " the piracy is hidden behind its training".

In order for something to be illegal there has to be a law making it illegal. What law is that?

[–] Duxon@feddit.de 4 points 2 years ago* (last edited 2 years ago) (2 children)

LLMs have been caught plagiarising works

Any source for this? I have never seen that.

I'm highly skeptical about GPT4 having been directly trained on copyrighted material by Stephen King. Simply by all the sheer information about his works, including summaries, themes, characters, and critical analyses that are publicly available, a good LLM can appear to be able to plagiarize these works, while it doesn't. If I'm right, there is no leverage for creators to complain. Just accept that that's the world we're living in now. I don't see why this world will stop the sales of books or movie rights on books, etc.

[–] adespoton@lemmy.ca 4 points 2 years ago* (last edited 2 years ago)

Especially since copyright only protects human authored works. Meaning anything created by an LLM is in the public domain, and the publisher using it loses control of the work.

Of course, this has the potential to be a significant issue, as I can take a copyrighted work, train an LLM using it, and then get it to generate a similar but unique work that is in the public domain. This new work will likely impact the original author’s ability to profit off their original work, thus decreasing supply of human created works in the long run.

But it’s currently all legal and above board.

[–] xapr@lemmy.sdf.org 2 points 2 years ago* (last edited 2 years ago) (1 children)

I had heard some mentions of this before too, but didn't recall the exact references. I went searching and found this recent study.

[–] Duxon@feddit.de 1 points 2 years ago (1 children)

Sure, it can plagiarize works it has been trained on. They didn't show in the study, however, that this has occurred for copyright protected material like fiction books.

[–] xapr@lemmy.sdf.org 2 points 2 years ago

I saw a comment, probably on Mastodon, from an author saying that (I believe) ChatGPT had plagiarized some of his work verbatim. I don't recall if it was a work of fiction or not, although for the purpose of copyright it doesn't matter.

I wouldn't be surprised if it's trained on works of fiction just as much as non-fiction though. I think that from what I've heard, you can ask ChatGPT to write something in the style of particular writers? If it's possible to give a very specific prompt for it to write something with the same plot points as a Stephen King story in the style of Stephen King, I wonder just how close it would look like the original?