It's shorter than just pasting comments like yours every time we need an example of fragility.
AtmosphericRiversCuomo
joined 9 months ago
The training datasets don't have the answers because the benchmark is diverse enough. That's why other models struggled to perform as well as humans until they applied the approach outlined in the paper. This is the benchmark: https://liusida.github.io/ARC/
Hell yeah dude, can't wait because I'm a fucking dickhead too.