Technology

84597 readers

4000 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

353

Hot take: LLM technology is being purposefully framed as AI to avoid accountability (lemmy.world)

submitted 2 years ago by assassin_aragorn@lemmy.world to c/technology@lemmy.world

103 comments fedilink hide all child comments

Which of the following sounds more reasonable?

I shouldn't have to pay for the content that I use to tune my LLM model and algorithm.
We shouldn't have to pay for the content we use to train and teach an AI.

By calling it AI, the corporations are able to advocate for a position that's blatantly pro corporate and anti writer/artist, and trick people into supporting it under the guise of a technological development.

you are viewing a single comment's thread
view the rest of the comments

[–] Zeth0s@lemmy.world 2 points 2 years ago* (last edited 2 years ago) (1 children)

You can absolutely compare AI with students. The problem is that, legally, in many western countries students still have to pay copyright holders of the books they use to learn.

It is purely a copyright discussion. How far copyright applies? Shall the law distinguish between human learning and machine learning? Can we retroactively change copyright of material available online?

For instance, copilot is more at risk than a LLM that learned from 4chan, because licenses are clearer there. Problem is that we have no idea on which data big llms were trained, to know if some copyright law already applies.

At the end it is just a legal dispute on companies making money out of AI trained on data publicly available (but not necessarily copyright free).

[–] assassin_aragorn@lemmy.world -1 points 2 years ago (1 children)

My argument is that an LLM here is reading the content for different reasons than a student would. The LLM uses it to generate text and answer user queries, for cash. The student uses it to learn their field of study, and then apply it to make money. The difference is that the student internalizes the concepts, while the LLM internalizes the text. If you used a different book that covered the same content, the LLM would generate different output, but the student would learn the same thing.

I know it's splitting hairs, but I think it's an important point to consider.

My take is that an LLM algorithm can't freely consume any copyrighted work, even if it's been reproduced online with the consent of the author. The company would need the permission of the author for the express purpose of training the AI. If there's a copyright, it should apply.

You have me thinking though about the student comparison. College students pay to attend lectures on material that can be found online or in their textbooks. Wouldn't paying for any copyright material be analogous to this?

[–] Zeth0s@lemmy.world 1 points 2 years ago* (last edited 2 years ago)

Students and LLM do the same with data, simply in a different way. LLM can learn more data, student can understand more concepts, logic and context.

And students study to make money.

Both LLMs and students map the data in some internal representation, that is however pretty different, because a biological mind is different from an AI.

Regarding your last paragraph, this is exactly the point. What shall openai and Microsoft pay, as they are making a lot of money out of other people work? Currently it is unclear as openai hasn't released what data they used, and because copyright laws do not cover generative AI. We need to wait for interpretation of existing laws and for new ones. But it will change soon in the future for sure