nostr:nprofile1qqswlew3yr0ses5slf6gwflmgkkysl926drdfu3f82cxn68srlz3nqgpz4mhxue69uhhyetvv9ujuerpd46hxtnfduhszxrhwden5te0wfjkccte9ejxjum0vfjhjtnyv4mz7qgkwaehxw309aex2mrp0yhxummnw3ezumn9wshsc8za9h wdyt, are two enough?
Login to reply
Replies (1)
There's a pretty big gap between the light models and the heavy ones. Often theres a flagship MoE midel and a light version like GLM-4.5 and GLM-4.5 air
4.5 air fits easily on 96gb but 4.5 full needs 200gb+ just for the model (no context) (quantized to q4)
CPU offloading makes them runnable but at like 10T/s which is pretty lame
And then there's models like kimi k2 that are >500gb quantized
I want a local rig but keep putting it off BC the reqs keep changing
Stacking 5090s is nice BC they're 1/4 the price but stacking 6000s is just a nicer system (noise;/power etc)