Apologies if this seems like a survey post. I’m just learning about tuning and want to get a lay of the land. I don’t think I have the money to tune locally so might have to rent some VRAM, but curious how much better tuning is vs something like RAG.

What model? What was your use case? What tuning tool did you use? What is hardware setup? How large was your training set and how did you create it? How effective was the model as tasks pre- and post-tuning?

Thanks!

top 15 comments

sorted by: hot top controversial new old

[–] ejs@piefed.social 5 points 3 months ago (1 children)

I do AI research for school. I’m specifically interested in safety alignment. I have studied the original papers for different fine tuning methods: LoRA is typically the baseline and there exist many variants, notably Q-LoRA

In general, fine tuning is not practically beneficial for hobby level foundation models. It in fact comes with many disadvantages. Primarily, it is difficult to maintain the intelligence of the model and avoid overfitting.

If you are trying to adapt a model to a specific task, you are generally going to find more success with using RAG and just adding more context to the model that way. Don’t waste time and compute $$ on training.

[–] venusaur@lemmy.world 0 points 3 months ago (1 children)

Thanks! That’s interesting that RAG alone would be better than a tuned model. Why is that? What If you have a very specific task, like writing copy based off existing documents and decisions are based on a set of specific variables?

What if you use RAG and tune it? Any benefit there?

Last question. Would a fine tuned model be more energy efficient than a model using RAG?

[–] ejs@piefed.social 2 points 3 months ago (1 children)

Honestly it heavily depends on the use case, in terms of making the model better and choosing between RAG/FT. The most important thing to consider is what sort of changes you want to make to the model. FT is still a good choice if you're looking for: strict output formatting (json/yaml/...) and refining for highly specific, narrow domain tasks. RAG is better for knowledge freshness, having source citations, and greatly lowers hallucinations.

RAG will inflate your context windows (more tokens) at inference time, so slower responses and requiring more energy at compute, whereas fine-tuning takes a ton of gpu compute up front (but retains smaller token counts at inference). If you're doing 100,000 prompts a day, and only need to train once, FT makes more sense; if you're doing 100 prompts a day and your knowledge database is constantly changing, RAG makes the most sense.

It's hard to give a formalized estimate on energy efficiency: fine-tuning and getting to a certain training accuracy can take some undeterminate amount of time (and money on rented GPU compute), but could be a better choice if you think that up-front cost will be paid off over time if you use the model very frequently and only fine-tune once. On the other hand, going the RAG route will have an absolutely free up front compute (energy) cost, but be slightly more at compute time due to more tokens.

What's your specific task you're considering for FT or no FT? This is the most important thing to choose.

[–] venusaur@lemmy.world 1 points 3 months ago

Thanks for the explanation!

The use case is writing marketing communications to match a library of content that a company has already written.

We’re currently using RAG and it’s okay, but I’m wondering how much better it would be if it were tuned.

[–] pyr0ball@lemmy.world 2 points 3 months ago (1 children)

Yeah, done two separate things in this space.

Cover letter fine-tuning: Llama-3.2-3B-Instruct as the base, QLoRA via Unsloth (rank 16, 10 epochs). Trained on ~62 of my own cover letters, exported to GGUF, loaded into Ollama. Fits comfortably on 8GB VRAM with 4-bit quantisation. Noticeably more consistent than prompting a generic model for voice and style matching.

Email classification: completely different story. Classifier models for routing emails into categories (rejection, interview scheduled, offer, etc.) don't need a GPU at all. DeBERTa-small runs on CPU in milliseconds. The hard part is the labeling pipeline. We bootstrapped with deterministic heuristics to auto-label high-confidence cases, then routed uncertain ones to a human review queue. Around 2,000 labeled examples was enough for meaningful accuracy.

vs RAG: for classification, fine-tuning wins cleanly. RAG is better when you need to reason over retrieved documents. If you're making a consistent categorical judgment, you want it baked into the weights, not reconstructed from context at inference time.

I build local-first process pipeline tooling at circuitforge.tech

[–] venusaur@lemmy.world 1 points 3 months ago

Oh that’s really interesting! I’m also interested in the classification case. Can you tell me more or direct to where to learn more about DeBerta? Do you train it the same way? Prompt and response sets? Does it work on any open source model? I can only run up to 4B right now.

[–] lunarwingorg@lemmy.world 1 points 1 month ago (1 children)

Personally, I've adjusted dozens of sampler values, written middlewares, llama-server scripts and configuration loading mechanisms, openai api compatible HTTP proxies, and even a python3 API for accessing context information and being able to switch models on the fly. I've even created a local model benchmark performance script.

But besides running some scripts which others have made to tune a model specifically with specific input parameters, not really. Honestly, I have a lot to learn.

[–] venusaur@lemmy.world 0 points 1 month ago (1 children)

I’ve heard a lot about Lora like Unsloth. Have you used anything like it?

[–] lunarwingorg@lemmy.world 1 points 1 month ago (1 children)

lora, yes. mostly custom scripts downloaded off of hugging face to automatically handle a lot of complicated stuff I'm not totally sure of how it actually works under the hood to be honest

[–] venusaur@lemmy.world 1 points 1 month ago* (last edited 1 month ago) (1 children)

Did you notice a big improvement in the tasks you were tuning for? What is your hardware setup and how long did it take?

Thanks!

[–] lunarwingorg@lemmy.world 2 points 1 month ago (1 children)

unfortunately, i did not notice much of a difference with model tuning. it took a pretty decent chunk of time. For my most powerful pc, which is what I run most models (the lower end machines with worse gpus run embedded text models) I got a fairly powerful machine with a single 4090. I have had better luck just downloading differently tuned variants of the same model from others

[–] venusaur@lemmy.world 2 points 1 month ago (1 children)

Bummer. Do you think it was the training data or just nature of fine tuning? Something else? What were you tuning it for if you don’t mind my asking?

[–] lunarwingorg@lemmy.world 1 points 1 month ago (1 children)

just the nature of them being quite old models without proper tool calling functionality. What actually DID help was setting up middleware and custom python servers/clients with proper json mapping to enable the proper tools to be selected. so, literally zero model tuning required in the end.

[–] venusaur@lemmy.world 1 points 1 month ago (1 children)

Got it. You think if you tuned again after calibrating tool calling it would be beneficial?

[–] lunarwingorg@lemmy.world 1 points 1 month ago

it has anything to do with my external calibration stuff so no