Aneurysm Posting

3842 readers

196 users here now

For shitposting by people who can smell burnt toast.

Rules:

Nothing promoting crypto, blockchain or NFTs.
Nothing right wing.
Nothing anti science.
No tankie support.
No TERFS.
No porn.
Must tag AI posts as such.

founded 2 years ago

MODERATORS

PinkyCoyote@sopuli.xyz

SnokenKeekaGuard@lemmy.dbzer0.com

155

d wha (lemmy.today)

submitted 10 hours ago by Abrinoxus@lemmy.today to c/aneurysmposting@sopuli.xyz

12 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] mercano@lemmy.world 8 points 5 hours ago (3 children)

AI doesn’t see a word as a sequence of letters, they just see it as a pointer to an entry in Words table.

[–] Viceversa@lemmy.world 5 points 4 hours ago (1 children)

Semantic Vectors don't work that way.

[–] lambdabeta@lemmy.ca 5 points 4 hours ago (1 children)

Yeah, if words were actually encoded as 1-hot vectors this would be pretty trivial, but the rest of LLM training would be somewhere between infeasible and impossible. The actual embedding vectors obscure spelling even more.

Side note: last time I checked, current embedding vectors were approximately 40 dimensional... Has that gone up significantly in the last couple of years?

[–] Meron35@lemmy.world 1 points 2 hours ago

A fair bit. EmbeddingGemma is open weights and allows for 128-768 dimensions.

It's not as simple as more dimensions = better, due to size, efficiency, and context rot limitations though.

Introducing EmbeddingGemma: The Best-in-Class Open Model for On-Device Embeddings - Google Developers Blog - https://developers.googleblog.com/en/introducing-embeddinggemma/

[–] echodot@feddit.uk 3 points 4 hours ago

Oh thank God. I was worried that I was really stupid.

[–] chicken@lemmy.dbzer0.com 1 points 3 hours ago

Shouldn't it help that it separated them out with underlines? How does this text break down in terms of tokens?