And remember models take more space than their weights. So if a model has 7gb of weights, it still might not fit on an 8gb vram card because it needs more memory for the prompt and other stuff. So for example, an 8gb model like gemma3:12b actually needs around 10gb.
Run `ollama ps` to see if the model is loaded to your CPU or GPU (or both)
Login to reply
Replies (1)
i have a framework desktop with 128GB of vram.
even the gpt-oss:120b param model runs with like half my vram still free.
I don't think its a raw hardware problem, but the tooling around it seems to break more. Like once the model calls a tool I lose all context... its strange.