I definitely think that's remarkable. But I don't think scoring high on an external measure like a test is enough to prove the ability to reason. For reasoning, the process matters, IMO.
Reasoning models work by Chain-of-Thought which has been shown to provide some false reassurances about their process https://arxiv.org/abs/2305.04388 .
Maybe passing some math test is enough evidence for you but I think it matters what's inside the box. For me it's only proved that tests are a poor measure of the ability to reason.
I'm not saying that we can't ever build a machine that can think. You can do some remarkable things with math. I personally don't think our brains have baked in gradient descent, and I don't think neural networks are a lot like brains at all.
The stochastic parrot is a useful vehicle for criticism and I think there is some truth to it. But I also think LMMs display some super impressive emergent features. But I still think they are really far from AGI.