this post was submitted on 03 Apr 2025
630 points (96.3% liked)
Technology
68495 readers
3550 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
It's searched in training, tagged for use/topic then that info is processed and filtered through layers. So it's pre-searched if you will. Like meta tags in the early internet.
Then the data is processed into cells which queries flow through during generation.
Yes it does - the fact that you in particular can't recognize from where it comes: doesn't matter. It's still using copywrited works.
Anyways you're an AI stan, and defending theft. You can deny it all day, but it's what you're doing. "It's okay, I'm a software engineer I'm allowed to defend it"
...as if being a software engineer doesn't stop you from also being a dumbass. Of course it doesn't.
You're still putting words in my mouth.
I never said they weren't stealing the data
I didn't comment on that at all, because it's not relevant to the point I was actually making, which is that people treating the output of an LLM as if it were derived from any factual source at all is really problematic, because it isn't.
Our discussion was never about the term factuality. You've just now raised that term for the first time in this discussion. You said search engine. They are in fact searching and reconstructing data based on a probabilistic data space.
...and there are plenty of examples of search engines being sued for the types of data they've explored or digitized.
...also the inference that search engines are "accurate" or don't serve up misinformation, and manipulated data is foolish.