Technology

42048 readers

183 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.

Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.

Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 6 years ago

MODERATORS

MinutePhrase@lemmy.ml

German researchers achieved 71.6% on ARC-AGI using a regular GPU for 2 cents per task. OpenAI's o3 gets 87% but costs $17 per task making it 850x more expensive. (arxiv.org)

submitted 3 months ago* (last edited 3 months ago) by yogthos@lemmy.ml to c/technology@lemmy.ml

6 comments fedilink hide all child comments

That score is seriously impressive because it actually beats the average human performance of 60.2% and completely changes the narrative that you need massive proprietary models to do abstract reasoning. They used a fine-tuned version of Mistral-NeMo-Minitron-8B and brought the inference cost down to an absurdly cheap level compared to OpenAI's o3 model.

The methodology is really clever because they started by nuking the standard tokenizer and stripping it down to just 64 tokens to stop the model from accidentally merging digits and confusing itself. They also leaned heavily on test-time training where the model fine-tunes itself on the few example pairs of a specific puzzle for a few seconds before trying to solve the test input. For the actual generation they ditched standard sampling for a depth-first search that prunes low-probability paths early so they do not waste compute on obvious dead ends.

The most innovative part of the paper is their Product of Experts selection strategy. Once the model generates a candidate solution they do not just trust it blindly. They take that solution and re-evaluate its probability across different augmentations of the input like rotating the grid or swapping colors. If the solution is actually correct it should look plausible from every perspective so they calculate the geometric mean of those probabilities to filter out hallucinations. It is basically like the model peer reviewing its own work by looking at the problem from different angles to make sure the logic holds up.

What's remarkable is that all of this was done with smart engineering rather than raw compute. You can literally run this tonight on your own machine.

The code is fully open-source: https://github.com/da-fr/Product-of-Experts-ARC-Paper

you are viewing a single comment's thread
view the rest of the comments

[–] neon_nova@lemmy.dbzer0.com 3 points 3 months ago (1 children)

I don’t know much about running this on my own computer other than using ollama. Is that what you mean about running it on my own?

[–] yogthos@lemmy.ml 2 points 3 months ago (1 children)

I haven't tried it with ollama, but it can download gguf files directly if you point it to a huggingface repo. There are a few other runners like vllm and llama.cpp, you can also just run the project directly with Python. I expect the whole Product of Experts algorithm is going to get adopted by all models going forward since it's such a huge improvement, and you can just swap out the current approach.

[–] neon_nova@lemmy.dbzer0.com 1 points 3 months ago (1 children)

So is this a huge breakthrough that’s going to be adopted by ai companies across the board? Or maybe there is some downside.

[–] yogthos@lemmy.ml 2 points 3 months ago (1 children)

Almost certainly given that it drastically reduces the cost of running models. Whether you run them locally or it's a company selling a service, the benefits here are pretty clear.

[–] neon_nova@lemmy.dbzer0.com 1 points 3 months ago (1 children)

It just sounds too good to be true. So, no critics have claimed downsides to this?

[–] yogthos@lemmy.ml 2 points 3 months ago

I mean the paper and code are published. This isn't a heuristic, so there's no loss of accuracy. I'm not sure why you're saying this is too good to be true, the whole tech is very new and there are lots of low hanging fruit for optimizations that people are discovering. Every few months some discovery like this is made right now. Eventually, people will pluck all the easy wins and it's going to get harder to dramatically improve performance, but for the foreseeable future we'll be seeing a lot more stuff like this.