this post was submitted on 10 Sep 2025
940 points (99.1% liked)
Fuck AI
5228 readers
1525 users here now
"We did it, Patrick! We made a technological breakthrough!"
A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.
AI, in this case, refers to LLMs, GPT technology, and anything listed as "AI" meant to increase market valuations.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Wait, so the tokens are not "2 to 4 characters" cut as the input goes, anymore? Those can be whole words too?
Pretty much. And more.
"The end."
Might be a mere 3 tokens total:
‘"The ‘ ‘end."’ ‘/n/n’
I don’t know about ClosedAI, but the Chinese models in particular (like Qwen, GLM and Deepseek) went crazy optimizing their tokenizers for English, Chinese, or code, with huge vocabs for common words/phrases and even common groupings of words + punctuation/spacing as single tokens. It makes the models more efficient, as the same text counts as far fewer tokens.
“About 1 token per word” is a decent estimate for a block of text, even including spaces and punctuation.