technology

24106 readers

261 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

1. Obviously abide by the sitewide code of conduct. Bigotry will be met with an immediate ban
2. This community is about technology. Offtopic is permitted as long as it is kept in the comment sections
3. Although this is not /c/libre, FOSS related posting is tolerated, and even welcome in the case of effort posts
4. We believe technology should be liberating. As such, avoid promoting proprietary and/or bourgeois technology
5. Explanatory posts to correct the potential mistakes a comrade made in a post of their own are allowed, as long as they remain respectful
6. No crypto (Bitcoin, NFT, etc.) speculation, unless it is purely informative and not too cringe
7. Absolutely no tech bro shit. If you have a good opinion of Silicon Valley billionaires please manifest yourself so we can ban you.

founded 5 years ago

MODERATORS

context@hexbear.net

SexUnderSocialism@hexbear.net

gaycomputeruser@hexbear.net

Wakmrow@hexbear.net

SwitchyandWitchy@hexbear.net

The warning signs the AI bubble is about to burst (www.telegraph.co.uk)

submitted 3 months ago by yogthos@lemmygrad.ml to c/technology@hexbear.net

53 comments fedilink hide all child comments

https://archive.ph/QH4hl

you are viewing a single comment's thread
view the rest of the comments

[–] jorge@lemmygrad.ml 2 points 3 months ago (1 children)

I hadn't heard of Qwen. I have only used Deep Seek, and not much. What are Qwen's advantages over Deep Seek? And is there any other model from BRICS countries I should look for? Preferably open source.

And do you recommened a local solution? For which use-case? I have a mid-range gamer laptop. IIRC it has 6GiB VRAM (NVIDIA).

[–] yogthos@lemmygrad.ml 3 points 3 months ago (1 children)

I've found Qwen is overall similar, their smaller model that you can run locally tends to produce somewhat better output in my experience. Another recent open source model that's good at coding is GLM https://z.ai/blog/glm-4.5

6gb vram is unfortunately somewhat low, you can run smaller models but the quality of output is not amazing.

[–] jorge@lemmygrad.ml 1 points 1 month ago (1 children)

How do you stay up to date on LLM? Do you recommend a web feed (RSS, Atom) for LLM news? My interests:

Models from BRICS countries
Models that can run locally on a GNU/Linux desktop or laptop, not necessarily my current laptop. I currently have only 4GiB VRAM (previous message was wrong), but in the future I could buy a stronger laptop or even a desktop that I access over network.
Model accuracy
Effective and responsible use of LLM. I used to be almost Luddite about LLM. I have recently relaxed this restriction, but I still think LLM should be used with care because of privacy and accuracy problems.

[–] yogthos@lemmygrad.ml 2 points 1 month ago (2 children)

I find r/localLLaMA on reddit is a pretty good way to keep up. Reddit also has an easter egg where you can get rss feeds for communities, e.g.: https://reddit.com/r/localLLaMA.rss

Main models to look at are DeepSeek and Qwen, both have smaller versions of the model that run well locally and produce decent results. The two popular ways to run models are either using llama.cpp or ollama. Ollama is a bit easier to get started with and it has cli tools for downloading and managing models. My experience is that you want at least a 32gb model for decent results if you're doing something like coding tasks. For text generation, or making summaries, you can get away with smaller models.

In terms of privacy, if you run the model locally then there's no concern. There is also the whole MCP idea that's pretty popular now which allows models to use tools. You can use something like crush to run a model and have it use tools. The tool will show you the command LLM is trying to run and confirm whether you want to execute it. This kind of stuff can be useful for stuff like pulling information from the web and having the model figure out how to implement stuff like API endpoints based on that. It lets the LLM do stuff like run tests in a project, and fix errors, etc.

For accuracy, you basically have to know the subject yourself and be able to evaluate the output from the LLM. There's no way to guarantee that anything it produces is correct without a human reviewing it.

[–] jorge@lemmygrad.ml 2 points 1 month ago* (last edited 1 month ago) (1 children)

Thank you for the subreddit and the Easter Egg! I have added it to Emacs Elfeed.

Regarding model size: when you write "32gb", I hope you either mean 32 billion parameters, or (since you wrote lower case "b") 32 gigabits. Or do I actually need 32GiB VRAM?

For privacy: before I spend a large multiple of the Brazilian monthly minimum wage (electronics is expensive here), I would like to experiment with hosted solutions. I will try to remember to restrict hosted LLM to non-sensitive code. But I still don't want any leakage to US Big Tech. I fear they can correlate lots of individually non-sensitive stuff and make a profile of me. I will only use Chinese (hopefully we will have great Brazilian models in the future) models, so if they are secure (good encryption, quick patching of vulnerabilities, etc), I should be safe from US profiling. So out of the Chinese hosted coding agents, which have the best security?

Regarding accuracy, I am aware LLM cannot be trusted to be correct. But the more accurate it is, the less correction it will need. Less iterations too. And actually, sometimes I don't now the subject well, but if the LLM is reasonably reliable, I can just check the answer for obvious erros, then accept a moderate risk of error (if the impact is small).

[–] yogthos@lemmygrad.ml 2 points 1 month ago (1 children)

Oh yeah, I was referring to billions of params there. And if you want to use a hosted model to play with it, I would recommend DeepSeek, their pricing is great and I've found it gets pretty decent results. The way I'd recommend using it would be through crush or a similar tool. It's a very different experience from using it in a web chat for example and asking it to come up with code.

And yeah, the better the model is at getting stuff right on the first try the less hand holding you need to do. There are also some tricks I found that can help. One thing I get the model to do is to write a plan in markdown for the steps it's going to do. In particular, you can get it to generate a mermaidjs diagram, then inspect it visually, and then tell it change step x to do blah. Another thing you can do is write the scaffolding by hand, like making the file structure you want, put in function stubs, and then have the LLM start filling in the blanks. It's a really great way to focus it so it doesn't try to get creative. My general experience is that they're good at implementing tasks that are focused and well defined, but if you let them get creative then they can go off track really fast. Another thing I found is that if it doesn't get the solution mostly right on the first shot, it's unlikely to converge on a good solution. It will not rethink the problem, but will simply attempt to add kludges to address the specific issues you point out. So, you're better off to just start from scratch and try to reframe the problem statement.

It's important to keep in mind that it's a tool you have to spend some time learning the behaviors of, and how to get the most out of it.

[–] jorge@lemmygrad.ml 2 points 1 month ago (1 children)

In this comment thread you gave great tips! But very few people are likely to find them. Why don't you organize them and write a Lemmy post?

[–] yogthos@lemmygrad.ml 1 points 1 month ago

Yeah, I might try doing that if I get a bit of energy. Cause some of this stuff is obvious in retrospect, but it took me a while to realize to do these things.

[–] HexReplyBot@hexbear.net 1 points 1 month ago

A Reddit link was detected in your comment. Here are links to the same location on alternative frontends that protect your privacy.