technology

24153 readers

323 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

1. Obviously abide by the sitewide code of conduct. Bigotry will be met with an immediate ban
2. This community is about technology. Offtopic is permitted as long as it is kept in the comment sections
3. Although this is not /c/libre, FOSS related posting is tolerated, and even welcome in the case of effort posts
4. We believe technology should be liberating. As such, avoid promoting proprietary and/or bourgeois technology
5. Explanatory posts to correct the potential mistakes a comrade made in a post of their own are allowed, as long as they remain respectful
6. No crypto (Bitcoin, NFT, etc.) speculation, unless it is purely informative and not too cringe
7. Absolutely no tech bro shit. If you have a good opinion of Silicon Valley billionaires please manifest yourself so we can ban you.

founded 5 years ago

MODERATORS

context@hexbear.net

SexUnderSocialism@hexbear.net

gaycomputeruser@hexbear.net

Wakmrow@hexbear.net

SwitchyandWitchy@hexbear.net

German researchers achieved 71.6% on ARC-AGI using a regular GPU for 2 cents per task. OpenAI's o3 gets 87% but costs $17 per task making it 850x more expensive. (arxiv.org)

submitted 1 month ago* (last edited 1 month ago) by yogthos@lemmygrad.ml to c/technology@hexbear.net

4 comments fedilink hide all child comments

That score is seriously impressive because it actually beats the average human performance of 60.2% and completely changes the narrative that you need massive proprietary models to do abstract reasoning. They used a fine-tuned version of Mistral-NeMo-Minitron-8B and brought the inference cost down to an absurdly cheap level compared to OpenAI's o3 model.

The methodology is really clever because they started by nuking the standard tokenizer and stripping it down to just 64 tokens to stop the model from accidentally merging digits and confusing itself. They also leaned heavily on test-time training where the model fine-tunes itself on the few example pairs of a specific puzzle for a few seconds before trying to solve the test input. For the actual generation they ditched standard sampling for a depth-first search that prunes low-probability paths early so they do not waste compute on obvious dead ends.

The most innovative part of the paper is their Product of Experts selection strategy. Once the model generates a candidate solution they do not just trust it blindly. They take that solution and re-evaluate its probability across different augmentations of the input like rotating the grid or swapping colors. If the solution is actually correct it should look plausible from every perspective so they calculate the geometric mean of those probabilities to filter out hallucinations. It is basically like the model peer reviewing its own work by looking at the problem from different angles to make sure the logic holds up.

What's remarkable is that all of this was done with smart engineering rather than raw compute. You can literally run this tonight on your own machine.

The code is fully open-source: https://github.com/da-fr/Product-of-Experts-ARC-Paper

top 4 comments

sorted by: hot top controversial new old

[–] JoeByeThen@hexbear.net 4 points 1 month ago (1 children)

Nice! I keep seeing unsloth being used, guess I gotta learn what the heck that is.

[–] yogthos@lemmygrad.ml 3 points 1 month ago (1 children)

So, unsloth is basically an optimization hack for fine tuning LLMs that got popular because it solves the headaches of running out of VRAM and waiting forever for training to finish. Using this library makes it possible to finetune models on a consumer GPUs. And it’s essentially a drop in replacement for the standard Hugging Face transformers + peft stack. The api is designed to look almost exactly like Hugging Face, so you just change your import from AutoModelForCausalLM to FastLanguageModel and you're pretty much good to go.

# Instead of this:
# from transformers import AutoModelForCausalLM
# model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-3B")

# You do this:
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Llama-3.2-3B-bnb-4bit", # Pre-quantized for speed
    max_seq_length = 2048,
    load_in_4bit = True,
)

But under the hood, it’s doing something much smarter than standard PyTorch, and the secret sauce is actually pretty interesting from a programming perspective. Standard PyTorch relies on an autograd engine to handle backpropagation, which is great for flexibility but heavy on memory because it has to cache intermediate activations. The guys who built unsloth looked at the transformer architecture and manually derived the backward pass steps mathematically. Since they aren't relying on the generic autograd engine, they stripped out a ton of overhead. The result is that you get fine tuning that is about 2 to 5x faster and uses roughly half the memory, without losing any accuracy.

[–] JoeByeThen@hexbear.net 3 points 1 month ago (1 children)

Huh, good to know, thanks! One day I'll move beyond my 1080 and get back into the nitty gritty. As it stands now I'm trying to find the time to properly use my ollama that's wired into n8n to automate a bunch of my home productivity stuff. Feeling really old and slow with how quick this stuff is happening nowadays.

[–] yogthos@lemmygrad.ml 3 points 1 month ago

It's pretty hard to keep up with. I find I tend to wait till things make it to mainstream stuff like ollama as well. The effort of setting up something custom is usually not worth it cause it'll probably be all obsolete in a few months anyways. There's basically a lot of low hanging fruit in terms of optimizations that people are discovering, and we'll probably see things moving really fast for the next few years, but once all the easy improvements are plucked, things will start stabilizing.