Technology

1410 readers

13 users here now

A tech news sub for communists

founded 3 years ago

MODERATORS

muad_dibber@lemmygrad.ml

burlemarx@lemmygrad.ml

egs81t@lemmygrad.ml

Alibaba cuts AI GPU use by 82% with new pooling system— up to 9x increase in output lets 213 GPUs perform like 1,192 (www.tomshardware.com)

submitted 5 months ago by cfgaussian@lemmygrad.ml to c/technology@lemmygrad.ml

17 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] CriticalResist8@lemmygrad.ml 8 points 5 months ago

The article mentions "Packing multiple models per GPU", but also "using a token-level autoscaler to dynamically allocate compute as output is generated, rather than reserving resources at the request level" which I'm not sure what that means but may hint that there are ways to scale this down, possibly.

If not Alibaba then other researchers will eventually get to it.

permalink
fedilink
source
parent