this post was submitted on 15 May 2026
162 points (99.4% liked)

Aneurysm Posting

3842 readers
207 users here now

For shitposting by people who can smell burnt toast.

Rules:

  1. Nothing promoting crypto, blockchain or NFTs.
  2. Nothing right wing.
  3. Nothing anti science.
  4. No tankie support.
  5. No TERFS.
  6. No porn.
  7. Must tag AI posts as such.

founded 2 years ago
MODERATORS
 

d

you are viewing a single comment's thread
view the rest of the comments
[โ€“] lambdabeta@lemmy.ca 5 points 5 hours ago (1 children)

Yeah, if words were actually encoded as 1-hot vectors this would be pretty trivial, but the rest of LLM training would be somewhere between infeasible and impossible. The actual embedding vectors obscure spelling even more.

Side note: last time I checked, current embedding vectors were approximately 40 dimensional... Has that gone up significantly in the last couple of years?

[โ€“] Meron35@lemmy.world 1 points 2 hours ago

A fair bit. EmbeddingGemma is open weights and allows for 128-768 dimensions.

It's not as simple as more dimensions = better, due to size, efficiency, and context rot limitations though.

Introducing EmbeddingGemma: The Best-in-Class Open Model for On-Device Embeddings - Google Developers Blog - https://developers.googleblog.com/en/introducing-embeddinggemma/