this post was submitted on 06 May 2025
85 points (100.0% liked)

technology

24029 readers
170 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[โ€“] simontherockjohnson@lemmy.ml 5 points 4 months ago* (last edited 4 months ago) (1 children)

my money is on the higher hallucination rate being a result of the data being polluted with synthetic information. I think its model collapse

But that is effectively what happening with RLMs and refeed. LLMs have statistical weights between model and inputs. For example RAG models will add higher weights to the text retrieved from your source documents. RLM reasoning is a fully automated CoT prompting technique. You don't provide the chain, you don't ask the LLM to create the chain, it just does it all the time for everything. Meaning the inputs becomes more polluted with generated text which reinforces the existing biases in the model.

For example if we take the em dash issue, the idea is that LLMs already generate more em dashes than exist in human written text. Let's say turn 1 you get an output with em dashes. On Turn 2 this is fed back into the machine which reinforces that over indexing on em dashes in your prompt. This means turn 2's output is going to potentially have more em dashes, because the input on turn 2 contained output from turn 1 that had more em dashes than normal. Your input over time end up accumulating the model's biases through the history. The shorter your inputs on each turn and the longer the conversation the faster the conversation input converges on being mostly LLM generated text.

When you do this with an RLM, you have even more output being added to the input automatically with a CoT prompt. Meaning that any model biases accumulate in the input even faster.

Another reason I suspect the CoT refeed vs training data pollution is that GPT-4.5 which is the latest (Feb 2025) non-reasoning model seems to have a lower hallucination rate on SimpleQA than o1. If the training data were the issue we'd see rates closer to o3/o4.

The other big difference between o1 and o3 and o4 that may explain the higher rate of hallucinations is that the o1's reasoning is not user accessible, and it's purposefully trained to not have safe guards on reasoning. Where o3 and o4 have public reasoning and reasoning safeguards. I think safeguards may be a significant source of hallucination because they change prompt intent, encoding and output. So on a non-o1 model that safeguard process is happening twice per turn once for reasoning and once for output, then being accumulated into the next turn's input. On an o1 model that's happening once per turn only for output and then being accumulated.