this post was submitted on 15 May 2026
155 points (99.4% liked)

Aneurysm Posting

3842 readers
196 users here now

For shitposting by people who can smell burnt toast.

Rules:

  1. Nothing promoting crypto, blockchain or NFTs.
  2. Nothing right wing.
  3. Nothing anti science.
  4. No tankie support.
  5. No TERFS.
  6. No porn.
  7. Must tag AI posts as such.

founded 2 years ago
MODERATORS
 

d

you are viewing a single comment's thread
view the rest of the comments
[–] mercano@lemmy.world 8 points 5 hours ago (3 children)

AI doesn’t see a word as a sequence of letters, they just see it as a pointer to an entry in Words table.

[–] Viceversa@lemmy.world 5 points 4 hours ago (1 children)

Semantic Vectors don't work that way.

[–] lambdabeta@lemmy.ca 5 points 4 hours ago (1 children)

Yeah, if words were actually encoded as 1-hot vectors this would be pretty trivial, but the rest of LLM training would be somewhere between infeasible and impossible. The actual embedding vectors obscure spelling even more.

Side note: last time I checked, current embedding vectors were approximately 40 dimensional... Has that gone up significantly in the last couple of years?

[–] Meron35@lemmy.world 1 points 2 hours ago

A fair bit. EmbeddingGemma is open weights and allows for 128-768 dimensions.

It's not as simple as more dimensions = better, due to size, efficiency, and context rot limitations though.

Introducing EmbeddingGemma: The Best-in-Class Open Model for On-Device Embeddings - Google Developers Blog - https://developers.googleblog.com/en/introducing-embeddinggemma/

[–] echodot@feddit.uk 3 points 4 hours ago

Oh thank God. I was worried that I was really stupid.

[–] chicken@lemmy.dbzer0.com 1 points 3 hours ago

Shouldn't it help that it separated them out with underlines? How does this text break down in terms of tokens?