this post was submitted on 26 Jul 2023
        
      
      815 points (96.4% liked)
      Technology
    76364 readers
  
      
      1313 users here now
      This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
        founded 2 years ago
      
      MODERATORS
      
    you are viewing a single comment's thread
view the rest of the comments
    view the rest of the comments
> Data is encoded into mathematical functions in neural network nodes but, it is still encoded data in the same way that an MP3 and WAV of a song are both still the song; the neural network is the medium.
Here: https://www.understandingai.org/p/large-language-models-explained-with
It's not plagiarism by any definition of the word that makes sense; while the analogy may not be literal, it is perfectly analogous to suggest that learning new words from a Harry Potter book means that any book you write going forward is plagiarizing JK Rowling; the training data helps map the words in the model-- it's never used as a blueprint when predicting what word comes next in any given scenario. It's even farther away from copyright infringement-- there is no limited right granted that allows a IP holder to say how that IP can be processed. That's just not a thing. You'd have just as much leg to stand on if you suggested that Stephen King had the right to prevent people from reading his books in a room with green walls. You can't just make up new rights. Trademark law is totally insane. I don't know why you even mention it. It doesn't even have the same goals as the others.
> as a software engineer
I am not so sure that this specific role is in any way secure, myself. You may come to the same conclusion after reading that link I provided-- pay attention to how rapidly the LLMs are growing in complexity. I do not wish for anyone to lose their financial security, even a stranger like you, but I can't help but look at the available information and come to that conclusion.
There very much is. Literally all intellectual property law concerns how intellectual property may or may not be used and licensed. For example, one may not record and sell a cover of a song that is in copyright without explicit permission in the form of a mechanical license. In our industry, one may not use code that is covered by a GNU GPL license without fulfilling the source code distribution requirements (see: IBM RedHat drama).
The training data is what gives the LLM value in the problematic situations so, it is very clear that the material is a key component in the business plan and commercial use. This is not an educational, parody, or other exempt fair-use activity. This means that if any data used for training is not licensed appropriately, such use is a clear violation of intellectual property laws, even if but explicitly covered due to the technology not existing when they were written.
I do agree that there are software engineering jobs at risk in the short-term due to management desire to cut labor while riding the hype train as well as US taxation on R&D but, given the widespread failures found when companies have replaced engineers and others, I have been expecting wave of desperate re-hiring to occur in 1-3 years after layoff. The particular segment that I'm involved in is generally considered high-ROI so, likely less vulnerable (but no guarantee).
I don't see how QA could be sanely replaced though as, from my experience, it's already frequently under-funded and, as I mentioned, for all the bad in the CRA drafts, one of the positives is that QA-related work is going to be mandatory for software and devices sold in the EU market.
Sorry about the late reply-- I try my best to stay mostly disconnected from the internet on the weekends.
True, but no IP law gives the IP holder the power you're trying to give them. That is what I'm saying. It would need the law to be changed. There is no aspect of IP law that says that you aren't allowed to use the text to train anyone, let alone a LLM.
This does not matter. If I read a book on Six Sigma business practices and then use that knowledge to better structure my business to increase my profits, I don't owe the author of the book anything from that. You're, again, trying very hard to give away your own rights in order to stick it to LLMs. I'm positive IP rights holder would love this new right you want to give them. Perhaps reconsider the implications, though. Simply making money off of the information found in a book does not give the author rights to that money.
Let me ask you this. If you have a epub of a book on your computer and you select it and press Ctrl-C, Ctrl-V-- have you violated copyright laws? You've made a copy, after all.
No worries! Definitely important to have healthy relationships with device usage.
Your statements on rights of IP owners seems to imply that the vast majority of open-source licenses are meaningless. It also seems to be a parallel to the legal cases brought by the family of Henrietta Lacks, who's cells were cultured without her consent and have been used extensively in research and pharmaceutical development, bringing in significant profits, while neither Ms. Lacks, nor her family saw a dime.
Not 1:1, as Lacks involves human research and biomedical research but, it certainly rhymes as there is a lack of consent and unshared profits.
Depending on the use, possibly. If I intended to sell copies of it, almost definitely. Likewise if I intended to create derivative art that did not fall within the bounds of fair use without attributing credit or license from the holder of the copyright.
Speaking from an ethical, rather than purely legal perspective, profiting off of training an LLM or similar neural network on someone else's work, in a manner that competes with the source work, without their permission or giving them a share of the proceeds, is hard to imagine as ethical in any manner that does not involve extraordinary mental gymnastics.
On the other hand, I would not see anything wrong with doing so for one's personal enjoyment, if there is no harm done to the IP owner.