this post was submitted on 03 Apr 2025
631 points (96.5% liked)

Technology

68400 readers
2480 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

Artificial Generalized Incompetence

you are viewing a single comment's thread
view the rest of the comments
[–] DarkCloud@lemmy.world 21 points 4 days ago* (last edited 4 days ago) (1 children)

All the search engines search the same internet, find similar text, output it using similar formulas.

[–] MartianSands@sh.itjust.works 29 points 4 days ago (2 children)

Except these AI systems aren't search engines, and people treating them like they are is really dangerous

[–] DarkCloud@lemmy.world -5 points 3 days ago* (last edited 3 days ago) (1 children)

They are. They record the data, stealing it. They search it (or characteristics of it), and reprint it (in whole or in part) upon request.

Viewing it as something creative, or other than a glorified remixing machine is the problem. It's a search engine for creative works they've stolen, and reproduce parts of.

They search the data-space of what they're "trained" on (our content, the content of human beings), and reproduce statistically defined elements of it.

They're search engines that have stolen what they're "trained on", and reproduce it as "results" (be that images or written text, it has to come from our collective data. Data we created). It's theft. It's copywrite fraud. Same as google stealing books (which they had to he sued over the digitizing of, and enter into rights agreements over).

Searching and reproducing content they've already recorded (aka stolen without permission), is absolutely part of what they are. Part of what they do.

Don't stan for them or pretend they're creative, intelligent, or doing anything original.

The real lie is that it's "training data". It's not. It's the internet, and it's not training - it's theft, it's stealing and copying (violating copyright). Digital stealing, and processing into a "data set", a representation or repackaging of our original works.

[–] futatorius@lemm.ee 2 points 1 day ago (1 children)

They are.

Their input sides are based on crawling, just as search is.

[–] DarkCloud@lemmy.world 1 points 23 hours ago* (last edited 23 hours ago)

Yeah, and then they convert that to weighted probabilities or a "data space" which they then search during content generation.