this post was submitted on 30 Jul 2024
1197 points (98.1% liked)
linuxmemes
21282 readers
284 users here now
Hint: :q!
Sister communities:
Community rules (click to expand)
1. Follow the site-wide rules
- Instance-wide TOS: https://legal.lemmy.world/tos/
- Lemmy code of conduct: https://join-lemmy.org/docs/code_of_conduct.html
2. Be civil
- Understand the difference between a joke and an insult.
- Do not harrass or attack members of the community for any reason.
- Leave remarks of "peasantry" to the PCMR community. If you dislike an OS/service/application, attack the thing you dislike, not the individuals who use it. Some people may not have a choice.
- Bigotry will not be tolerated.
- These rules are somewhat loosened when the subject is a public figure. Still, do not attack their person or incite harrassment.
3. Post Linux-related content
- Including Unix and BSD.
- Non-Linux content is acceptable as long as it makes a reference to Linux. For example, the poorly made mockery of
sudo
in Windows.
- No porn. Even if you watch it on a Linux machine.
4. No recent reposts
- Everybody uses Arch btw, can't quit Vim, and wants to interject for a moment. You can stop now.
Please report posts and comments that break these rules!
Important: never execute code or follow advice that you don't understand or can't verify, especially here. The word of the day is credibility. This is a meme community -- even the most helpful comments might just be shitposts that can damage your system. Be aware, be smart, don't fork-bomb your computer.
founded 1 year ago
MODERATORS
Some caveats, this massively increases memory usage (unless you quantize the cache with FA) and it also massively slows down CPU generation once the context gets long.
TBH you just need to not keep a long chat history unless you need it,.
Thank you thats useful to know. In your opinion what context size is the sweet spot for llama 3.1 8B and similar models?
Oh I got you mixed up with the other commenter, apologies.
I'm not sure when llama 8b starts to degrade at long context, but I wanna say its well before 128K, and where other "long context" models start to look much more attractive depending on the task. Right now I am testing Amazon's mistral finetune, and it seems to be much better than Nemo or llama 3.1 out there.
Honestly as small as you can manage.
Again, you will get much better speeds out of "extreme" MoE models like deepseek chat lite: https://huggingface.co/YorkieOH10/DeepSeek-V2-Lite-Chat-Q4_K_M-GGUF/tree/main
Another thing I'd recommend is running kobold.cpp instead of ollama if you want to get into the nitty gritty of llms. Its more customizable and (ultimately) faster on more hardware.
Thats good info for low spec laptops. Thanks for the software recommendation. Need to do some more research on the model you suggested. I think you confused me for the other guy though. Im currently working with a six core ryzen 2600 CPU and a RX 580 GPU. edit- no worries we are good it was still great info for the thinkpad users!