Technology

72278 readers

2949 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

182

An indepth explanation of how LLMs work with an minimum of jargon (open.substack.com)

submitted 2 years ago by wisdomchicken@lemm.ee to c/technology@lemmy.world

15 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] ComfortablyGlum@sh.itjust.works 2 points 2 years ago (4 children)

"As a result, no one on Earth fully understands the inner workings of LLMs. Researchers are working to gain a better understanding, but this is a slow process that will take years—perhaps decades—to complete."

Maybe I missed it in the article, but can someone please explain-like-i'm-5 how this is possible.

It's not like we are interacting with a biologic with mysterious chemistry. Everything about LLMs are completely man-made.

[–] Dark_Blade@lemmy.world 7 points 2 years ago (1 children)

From my extremely limited understanding, it’s because of the sheer scale of the data that’s been fed into LLMs, and because of the (admittedly small) possibility that the people working on LLMs never really took the time to understand what sort of connections the LLM was making between all the datapoints it was interacting with and drawing connections between, or at least a deeper understanding of how the math worked; just that it did.

As the scale of the project kept growing and LLM companies just kept throwing ‘more data, more neural networks, more hardware!’ into the mix, the black box became…well, blacker and it kept getting harder to figure out the internal ‘logic’ used by the LLM to predict the next word. Now, the people who’re trying to figure it all out are working with extremely large amounts of data with nothing to go off of.

In short, the people making GPT were somehow smart enough to make it, but not smart enough to understand what they were making.

[–] lily33@lemmy.world 5 points 2 years ago* (last edited 2 years ago)

It's not that nobody took the time to understand. Researchers have been trying to "un-blackbox" neural networks pretty much since those have been around. It's just an extremely complex problem.

Logistic regression (which is like a neural network but with just one node) is pretty well understood - but even then sometimes it can learn some pretty unintuitive coefficients and it can be tricky to understand why.

With LLMs - which are enormous by comparison - it's simply not a tractable problem to understand how it works in detail.

load more comments (2 replies)