156
Race for AI is making Hindenburg-style disaster ‘a real risk’, says leading expert
(www.theguardian.com)
This is a most excellent place for technology news and articles.
Hmm. That's probably a pretty straightforward modification for existing LLMs, at least at the token level.
You can obtain token probabilities, so you can give some estimate out-of-band confidence in a response, down to the token level. Don't really need to change anything for that, just expose some data.
And you could make the AI aware of its own neural net's confidence level, feed the confidence back into the neural net for subsequent tokens, see if you can get it to take that information into account.
https://en.wikipedia.org/wiki/Recurrent_neural_network
That means literally nothing. You can get wrong answer with 100% token confidence, and correct one with 0.000001% confidence.
If everything that I've seen in the past has said that 1+1 is 4, then sure
I'm going to say that 1+1 is 4. I will say that 1+1 is 4 and be confident in that.
But if I've seen multiple sources of information that state differing things
say, half of the information that I've seen says that 1+1 is 4 and the other half says that 1+1 is 2, then I can expose that to the user.
I do think that Aceticon does raise a fair point, that fully capturing uncertainty probably needs a higher level of understanding than an LLM directly generating text from its knowledge store is going to have. For example, having many ways of phrasing a response will also reduce confidence in the response, even if both phrasings are semantically compatible. Being on the edge between saying that, oh...an object is "white" or "eggshell" will also reduce the confidence derived from token probability, even if the two responses are both semantically more-or-less identical in the context of the given conversation.
There's probably enough information available to an LLM to do heuristics as to whether two different sentences are semantically-equivalent, but you wouldn't be able to do that efficiently with a trivial change.
You do realise that prompts to and responses from the LLM are not as simple as what you wrote "1+1=?". The context window is growing for a reason. And LLMs dont have two dimensional probability of the next token?
The problem is that LLMs don't generate "an answer" as a whole, they just generate tokens (generally word-sized, but not always) for the next text element given the context of all the text elements (the whole conversation) so far and the confidence level is per-token.
Further, the confidence level is not about logical correctness, it's about "how likely is this token to appear in this context".
So even if you try using token confidence you still end up stuck due to the underlying problem that the LLMs architecture is that of a "realistic text generator" and hence that confidence level is all about "what text comes next" and not at all about the logical elements conveyed via text such as questions and answers.