LocalLLaMA

4444 readers

1 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

Rules:

Rule 1 - No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.

Rule 2 - No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.

Rule 3 - No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they're still using the same algorithms since <over 10 years ago>.

Rule 4 - No implying that models are devoid of purpose or potential for enriching peoples lives.

founded 2 years ago

MODERATORS

pax@sh.itjust.works

noneabove1182@sh.itjust.works

Smokeydope@lemmy.world

MonsterBug@sh.itjust.works

Smaller qwen3.5 models released (huggingface.co)

submitted 1 week ago by robber@lemmy.ml to c/localllama@sh.itjust.works

8 comments fedilink hide all child comments

The wait is over, most ggufs are already up. Nice to see there's models for many different hardware configurations.

top 8 comments

sorted by: hot top controversial new old

[–] zaidka@lemmy.ca 6 points 1 week ago

Been testing the the smaller one (Qwen3.5-35B-A3B) with OpenCode for the last couple of hours and I'm very impressed! Still too early to say for sure, but I may actually prefer it over gpt-oss-120b and qwen3-coder-next despite it being much smaller.

[–] hendrik@palaver.p3x.de 4 points 1 week ago* (last edited 1 week ago) (1 children)

Nice one. Is there a modern way of "jailbraking" these models? I've put in a request to write a story, and it generates like 2500 tokens of "thinking" text, philosophising about how the system prompt and its internal safety guidelines relate. And it gets lost in some internal dialogue. Ultimately deciding to find ways to weasel out of my prompt. And provide a "safe" version. Same thing with doubling as a coding assistant and security-related stuff. I can edit its "thoughts" and that seems to help a bit for a few paragraphs, but it's pretty adamant on its weird rules, no matter what I do. I mean ultimately it at least provided the requested test case for the SQL injection. After reasoning to no end how it shouldn't do it. But it's a bit hard to squeeze things like that out of it.

[–] timwa@lemmy.snowgoons.ro 3 points 1 week ago (1 children)

Keep an eye on this: https://huggingface.co/heretic-org

I used to use a -heretic abliterated version of gpt-oss-120b, not for any creative reasons but just to reduce the amount of wasted tokens in its thinking, with good results.

(You can turn off thinking mode with the new Qwen models btw - how you do it will depend on how you're hosting it, but basically it's a flag to the chat template. It won't remove the safety guidelines, but it will stop it telling you all about its internal monologue ;).)

[–] timwa@lemmy.snowgoons.ro 2 points 1 week ago (1 children)

I just realised this is the much more useful link: https://github.com/p-e-w/heretic?tab=readme-ov-file

I can see at least one -heretic version of a Qwen3.5 model on Huggingface already; can't vouch for quality though.

[–] hendrik@palaver.p3x.de 2 points 1 week ago (1 children)

Thanks! I'll wait a few days, maybe one of these pops up on Huggingface. Are "abliterated" versions alright these days? Last time I downloaded something with that word in the name, it wasn't very good.

[–] robber@lemmy.ml 2 points 1 week ago* (last edited 1 week ago) (1 children)

I don't follow the discussions on this topic very closely, but as I understood, there are different ways to achieve the goal, but all impact quality to some extent. Heretic is discussed as one one of the SOTA methods. The README posted above states the following, so it seems that heretic is some sort of next gen abliteration.

It combines an advanced implementation of directional ablation, also known as "abliteration" (Arditi et al. 2024, Lai 2025 (1, 2)), with a TPE-based parameter optimizer powered by Optuna.

[–] hendrik@palaver.p3x.de 2 points 1 week ago

Hmmh, thanks. Yeah, I read the Readme. And they claim it performs better than other methods. I guess I'll find out soon.

[–] mudkip 2 points 1 week ago* (last edited 1 week ago)

qwen3.5 35b-a3b is supposed to outperform qwen3(vl) 235b-a22b