this post was submitted on 15 Dec 2025
12 points (100.0% liked)

technology

24283 readers
166 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

founded 5 years ago
MODERATORS
top 2 comments
sorted by: hot top controversial new old
[โ€“] peeonyou@hexbear.net 7 points 3 months ago* (last edited 3 months ago)

i wasn't able to get llama.cpp to run it even after pulling latest master and rebuilding because of an unknown architecture. chatgpt told me to pull a specific branch and PR and rebuild:

git fetch origin pull/18058/head:nemotron3
git checkout nemotron3

cmake -S . -B build -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON -DLLAMA_CURL=ON
cmake --build build --config Release -j --clean-first --target llama-server

and that did the trick

Also, this thing is flying. I'm using Q4_K_M on my 5090 and i'm getting 220 t/s on average.

[โ€“] JoeByeThen@hexbear.net 6 points 3 months ago

1M context window

awooga