41
Simply explained: how does GPT work?
(confusedbit.dev)
Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!
Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.
Hope you enjoy the instance!
Rules
Follow the wormhole through a path of communities !webdev@programming.dev
Not quite ELI5 but I'll try "basic understanding of calculus" level.
In very broad terms, the model learns complex relationships between words (or tokens to be specific, explained below) as probabilistic scores. At its simplest, this could mean the likelihood of one word appearing next to another in the massive amounts of text the model was trained with: the words "apple" and "pie" are often found together, so they might have a high-ish score of 0.7, while the words "apple" and "chair" might have a lower score of just 0.2. Recent GPT models consist of several billions of these scores, known as the weights. Once their values have been estabilished by feeding lots of text through the model's training process, they are all that's needed to generate more text.
Without getting into the math too much, this is how a GPT model then uses these numbers to come up with words:
In reality we're not quite so sure what the weights represent to the model exactly, but this is the gist of it. All we know is that they signify the importances or non-importances that the model places on some pattern that was present in the training data. Some of these patterns could be just simple two-word pairs, but many are probably much more complicated. Lots of researchers are currently trying to get a better idea of how these numbers are actually affecting the model's output.