40
An Analysis of DeepMind's 'Language Modeling Is Compression' Paper
(codeconfessions.substack.com)
This is a most excellent place for technology news and articles.
It looks like they did it both ways (“raw rate” vs “adjusted rate”):
Yes. They also mention that using such large models for compression is not practical because their size thwarts any amount of data you might want to compress. But this result gives a good picture into how generalized such large models are, and how well they are able to predict the next tokens for image/audio data at a high accuracy.