I'm sympathetic to the NYT, even if it's not reproducing their IP verbatim.
AI companies need to acknowledge that their LLMs would be worthless without training data and compensate/credit the sources appropriately.
I'm sympathetic to the NYT, even if it's not reproducing their IP verbatim.
AI companies need to acknowledge that their LLMs would be worthless without training data and compensate/credit the sources appropriately.
It's not just that it circumvents the paywall, it makes up random nonsense and then claim the NYT said it.
I've never got why people don't see this about AI. When it "works" it's just spitting out what a human was paid (Avery low wage) to write, when it has to come up with something that hasn't been written, it just slaps nonsense together.
It's not real AI, it's just next generation search engines that gives unreliable results.
You just don't notice if you don't already know what you're asking.
Even tho these LLM work by just figuring out next word (token) that makes sense, it is still able to generate things that no human has ever written before. It isn't just copypasting stuff together.
I use GPT4 daily basis on coding and the way it spills out complex code templates/snippets, which are unique to the problem, is not just not possible without model having some level of intelligence. Of course it hallucinates now and then, but so does most of the coders now and then
Never gonna happen.
The NYT might win some money based on what Microsoft published, but only to the same extent as if a human wrote that and Microsoft published it. Copyright will never be an issue for training data because training is just scanning text and guessing the next letter. Consuming an entire library to make up anything you ask for is pretty goddamn transformative.
Oh, does the model know the names of characters in a popular book? So do Google and Wikipedia. Try framing a law that's cool with Google having a whole searchable plain-text copy of a book, so it can go 'this book?' when you search for a quote, but forbids OpenAI from having the essence of that book distilled somewhere in its terabyte of inscrutable numbers.
This fight is over.
Anything that is at least tangentially connected to the technology, social media platforms, informational technologies and tech policy.
1. English only
Title and associated content has to be in English.
2. Use original link
Post URL should be the original link to the article (even if paywalled) and archived copies left in the body. It allows avoiding duplicate posts when cross-posting.
3. Respectful communication
All communication has to be respectful of differing opinions, viewpoints, and experiences.
4. Inclusivity
Everyone is welcome here regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, or sexual identity and orientation.
5. Ad hominem attacks
Any kind of personal attacks are expressly forbidden. If you can't argue your position without attacking a person's character, you already lost the argument.
6. Off-topic tangents
Stay on topic. Keep it relevant.
7. Instance rules may apply
If something is not covered by community rules, but are against lemmy.zip instance rules, they will be enforced.
!globalnews@lemmy.zip
!interestingshare@lemmy.zip
Icon attribution | Banner attribution