Thread - Nostr Hypermedia

Its your GPU that matters. KoboldCpp (based on llamacpp) can do it regardless though. You want to use its high priority mode because otherwise it may use your e-cores and performance tanks. Now if you have a GPU I recommend a Q4_K_S model that fully fits your GPU in file size. Keep in mind context also takes up space. So for 8GB of vram no higher than 11B Q4_K_S at like 4K context. For 12GB that becomes 13B. For 16GB its up to 20B + Mistrals 24B. 24GB its 30B. If you dont it depends a lot at what speeds you can tollerate but Gemma 3N may be a good starting point.

2025-08-03 10:27:26 from 1 relay(s) ↑ Parent Reply

Replies (1)