this post was submitted on 03 Apr 2025

631 points (96.5% liked)

Technology

68400 readers

2480 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

631

Did ChatGPT come up with Trump’s tariff rate formula? AI chatbots ChatGPT, Gemini, Claude and Grok all return the same formula for reciprocal tariff calculations, several X users claim. (cointelegraph.com)

submitted 4 days ago by UnderpantsWeevil@lemmy.world to c/technology@lemmy.world

71 comments fedilink hide all child comments

Artificial Generalized Incompetence

you are viewing a single comment's thread
view the rest of the comments

[–] DarkCloud@lemmy.world 21 points 4 days ago* (last edited 4 days ago) (1 children)

All the search engines search the same internet, find similar text, output it using similar formulas.

[–] MartianSands@sh.itjust.works 29 points 4 days ago (2 children)

Except these AI systems aren't search engines, and people treating them like they are is really dangerous

[–] DarkCloud@lemmy.world -5 points 3 days ago* (last edited 3 days ago) (1 children)

They are. They record the data, stealing it. They search it (or characteristics of it), and reprint it (in whole or in part) upon request.

Viewing it as something creative, or other than a glorified remixing machine is the problem. It's a search engine for creative works they've stolen, and reproduce parts of.

They search the data-space of what they're "trained" on (our content, the content of human beings), and reproduce statistically defined elements of it.

They're search engines that have stolen what they're "trained on", and reproduce it as "results" (be that images or written text, it has to come from our collective data. Data we created). It's theft. It's copywrite fraud. Same as google stealing books (which they had to he sued over the digitizing of, and enter into rights agreements over).

Searching and reproducing content they've already recorded (aka stolen without permission), is absolutely part of what they are. Part of what they do.

Don't stan for them or pretend they're creative, intelligent, or doing anything original.

The real lie is that it's "training data". It's not. It's the internet, and it's not training - it's theft, it's stealing and copying (violating copyright). Digital stealing, and processing into a "data set", a representation or repackaging of our original works.

[–] futatorius@lemm.ee 2 points 1 day ago (1 children)

They are.

Their input sides are based on crawling, just as search is.

[–] DarkCloud@lemmy.world 1 points 23 hours ago* (last edited 23 hours ago)

Yeah, and then they convert that to weighted probabilities or a "data space" which they then search during content generation.

[+] UnderpantsWeevil@lemmy.world -10 points 3 days ago* (last edited 3 days ago) (1 children)

The basic graphing technology used by AI is the same pioneered by Alta Vista and optimized by Google years later. We've added a layer of abstraction through user I/O, such that you get a formalized text response encapsulating results rather than a series of links containing related search terms. But the methodology used to harvest, hash, and sort results is still all rooted in graph theory.

The difference between then and now is that back then you'd search "Horse" in Alta Vista and getting a dozen links ranging from ranches and vet clinics to anime and porn. Now, you get a text blob that tries to synthesize all the information in those sources down to a few paragraphs of relevant text.

[–] MartianSands@sh.itjust.works 10 points 3 days ago (1 children)

That simply isn't true. There's nothing in common between an LLM and a search engine, except insofar as the people developing the LLM had access to search engines, and may have used them during their data gathering efforts for training data

[–] DarkCloud@lemmy.world -2 points 3 days ago* (last edited 3 days ago) (1 children)

"data gathering" and "training data" is just what they've tricked you into calling it (just like they tried to trick people into calling it an "intelligence").

It's not data gathering, it's stealing. It's not training data, it's our original work.

It's not creating anything, it's searching and selectively remixing the human creative work of the internet.

[–] MartianSands@sh.itjust.works 1 points 3 days ago (1 children)

You're putting words in my mouth, and inventing arguments I never made.

I didn't say anything about whether the training data is stolen or not. I also didn't say a single word about intelligence, or originality.

I haven't been tricked into using one piece of language over another, I'm a software engineer and know enough about how these systems actually work to reach my own conclusions.

There is not a database tucked away in the LLM anywhere which you could search through and find the phrases which it was trained on, it simply doesn't exist.

That isn't to say it's completely impossible for an LLM to spit out something which formed part of the training data, but it's pretty rare. 99% of what it generates doesn't come from anywhere in particular, and you wouldn't find it in any of the sources which were fed to the model in training.

[–] DarkCloud@lemmy.world 1 points 3 days ago* (last edited 3 days ago) (1 children)

It's searched in training, tagged for use/topic then that info is processed and filtered through layers. So it's pre-searched if you will. Like meta tags in the early internet.

Then the data is processed into cells which queries flow through during generation.

99% of what it generates doesn't come from anywhere in particular, and you wouldn't find it in any of the sources which were fed to the model in training.

Yes it does - the fact that you in particular can't recognize from where it comes: doesn't matter. It's still using copywrited works.

Anyways you're an AI stan, and defending theft. You can deny it all day, but it's what you're doing. "It's okay, I'm a software engineer I'm allowed to defend it"

...as if being a software engineer doesn't stop you from also being a dumbass. Of course it doesn't.

[–] MartianSands@sh.itjust.works 2 points 3 days ago (1 children)

You're still putting words in my mouth.

I never said they weren't stealing the data

I didn't comment on that at all, because it's not relevant to the point I was actually making, which is that people treating the output of an LLM as if it were derived from any factual source at all is really problematic, because it isn't.

[–] DarkCloud@lemmy.world -1 points 3 days ago* (last edited 3 days ago)

Our discussion was never about the term factuality. You've just now raised that term for the first time in this discussion. You said search engine. They are in fact searching and reconstructing data based on a probabilistic data space.

...and there are plenty of examples of search engines being sued for the types of data they've explored or digitized.

...also the inference that search engines are "accurate" or don't serve up misinformation, and manipulated data is foolish.