Nice! I keep seeing unsloth being used, guess I gotta learn what the heck that is.
technology
On the road to fully automated luxury gay space communism.
Spreading Linux propaganda since 2020
- Ways to run Microsoft/Adobe and more on Linux
- The Ultimate FOSS Guide For Android
- Great libre software on Windows
- Hey you, the lib still using Chrome. Read this post!
Rules:
- 1. Obviously abide by the sitewide code of conduct. Bigotry will be met with an immediate ban
- 2. This community is about technology. Offtopic is permitted as long as it is kept in the comment sections
- 3. Although this is not /c/libre, FOSS related posting is tolerated, and even welcome in the case of effort posts
- 4. We believe technology should be liberating. As such, avoid promoting proprietary and/or bourgeois technology
- 5. Explanatory posts to correct the potential mistakes a comrade made in a post of their own are allowed, as long as they remain respectful
- 6. No crypto (Bitcoin, NFT, etc.) speculation, unless it is purely informative and not too cringe
- 7. Absolutely no tech bro shit. If you have a good opinion of Silicon Valley billionaires please manifest yourself so we can ban you.
So, unsloth is basically an optimization hack for fine tuning LLMs that got popular because it solves the headaches of running out of VRAM and waiting forever for training to finish. Using this library makes it possible to finetune models on a consumer GPUs. And it’s essentially a drop in replacement for the standard Hugging Face transformers + peft stack. The api is designed to look almost exactly like Hugging Face, so you just change your import from AutoModelForCausalLM to FastLanguageModel and you're pretty much good to go.
# Instead of this:
# from transformers import AutoModelForCausalLM
# model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-3B")
# You do this:
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/Llama-3.2-3B-bnb-4bit", # Pre-quantized for speed
max_seq_length = 2048,
load_in_4bit = True,
)
But under the hood, it’s doing something much smarter than standard PyTorch, and the secret sauce is actually pretty interesting from a programming perspective. Standard PyTorch relies on an autograd engine to handle backpropagation, which is great for flexibility but heavy on memory because it has to cache intermediate activations. The guys who built unsloth looked at the transformer architecture and manually derived the backward pass steps mathematically. Since they aren't relying on the generic autograd engine, they stripped out a ton of overhead. The result is that you get fine tuning that is about 2 to 5x faster and uses roughly half the memory, without losing any accuracy.
Huh, good to know, thanks! One day I'll move beyond my 1080 and get back into the nitty gritty. As it stands now I'm trying to find the time to properly use my ollama that's wired into n8n to automate a bunch of my home productivity stuff. Feeling really old and slow with how quick this stuff is happening nowadays.
It's pretty hard to keep up with. I find I tend to wait till things make it to mainstream stuff like ollama as well. The effort of setting up something custom is usually not worth it cause it'll probably be all obsolete in a few months anyways. There's basically a lot of low hanging fruit in terms of optimizations that people are discovering, and we'll probably see things moving really fast for the next few years, but once all the easy improvements are plucked, things will start stabilizing.