this post was submitted on 25 Jul 2024
647 points (100.0% liked)

196

17523 readers
914 users here now

Be sure to follow the rule before you head out.


Rule: You must post before you leave.



Other rules

Behavior rules:

Posting rules:

NSFW: NSFW content is permitted but it must be tagged and have content warnings. Anything that doesn't adhere to this will be removed. Content warnings should be added like: [penis], [explicit description of sex]. Non-sexualized breasts of any gender are not considered inappropriate and therefore do not need to be blurred/tagged.

If you have any questions, feel free to contact us on our matrix channel or email.

Other 196's:

founded 2 years ago
MODERATORS
 
you are viewing a single comment's thread
view the rest of the comments
[–] Sabata11792@ani.social 4 points 9 months ago* (last edited 9 months ago) (1 children)

Some apps allow you to offload to GPU, and CPU while loading the active part of the model. I have a an old SSD that give me 500gb of "usable" ram set up as swap.

It is horrendously slow and pointless but you can do it. I got about 2 tokens in 10 minutes before I gave up on a 70b model on a 1080 ti.

[–] AeonFelis@lemmy.world 4 points 9 months ago (1 children)

Even if they used more powerful hardware than you, the model they ran is still almost 6 times bigger - so if you got two tokens in 10 minutes, one token in 30 minutes for them sounds plausible.

[–] Sabata11792@ani.social 4 points 9 months ago (1 children)

I would have to use an entire 1tb drive for swap but I'm sure I could manage 1 token before the heat death of the universe.

[–] AeonFelis@lemmy.world 2 points 9 months ago

I'd worry less about the heat death of the universe and more about your hardware's heat from all that load.