Technology

84431 readers

4391 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

Nvidia delivers first Vera Rubin AI GPU samples to customers — 88-core Vera CPU paired with Rubin GPUs with 288 GB of HBM4 memory apiece (www.tomshardware.com)

submitted 2 months ago by RegularJoe@lemmy.world to c/technology@lemmy.world

50 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] TropicalDingdong@lemmy.world 1 points 2 months ago* (last edited 2 months ago) (1 children)

Yeah I've read that before. I don't necessarily agree with their framework. And even working within their framework, this article is about a challenge to their third bullet.

I'm just not quite ready to rule out the idea that if you can scale single models above a certain boundary, you'll get a fundamentally different/ novel behavior. This is consistent with other networked systems, and somewhat consistent with the original performance leaps we saw (the ones I think really matter are ones from 2019-2023, its really plateaued since and is mostly engineering tittering at the edges). It genuinely could be that 8 in a MoE configuration with single models maxing out each one could actually show a very different level of performance. We just don't know because we just can't test that with the current generation of hardware.

Its possible there really is something "just around the corner"; possible and unlikely.

What we need are more efficient models, and better harnessing. Or a different approach, reinforced learning applied to RNNs that use transformers has been showing promise.

Could be. I'm not sure tittering at the edges is going to get us anywhere, and I think I would agree with just.. the energy density argument coming out of the dettmers blog. Relative to intelligent systems, the power to compute performance (if you want to frame it like that) is trash. You just can't get there in computation systems like we all currently use.

[–] in_my_honest_opinion@piefed.social 2 points 2 months ago* (last edited 2 months ago)

I mean what you're proposing was the initial push of gpt3. All the experts said, these GPTs will only hallucinate more with more resources and they'll never do anything more than repeat their training data as a word salad posing as novelty. And on a very macro scale, they were correct.

The scaling problem
https://arxiv.org/abs/2001.08361

The scaling hype
https://gwern.net/scaling-hypothesis

Ultimately, hype won out.