Great article, thanks for sharing!
Not using docker personally for ollama, just running it in a shell tab locally on my linux box. I have more than enough vram, still bad results... might be me doing something stupid.
Any other articles help you out in your learning journey?
Login to reply
Replies (3)
Running ollama directly may introduce security vulnerabilities. It’s best to run it through docker in my research. Performance should be the same.
I haven’t found many good guides. I wrote mine because none of the guides I followed worked without exposing either app to the host network.
My guide was inspire by this video, which might help. His setup didn’t work for me though:
https://youtu.be/qY1W1iaF0yA
I will be updating the guide when I learn how to improve my process. I might switch to using docker compose or I might make a startup script that sets it up and optimizes it for security. I might take this so far as to develop a full app so people stop potentially exposing their processors to the internet to run local AI.
You probably don’t have the GPU configured correctly. I recommend just starting over lol
And remember models take more space than their weights. So if a model has 7gb of weights, it still might not fit on an 8gb vram card because it needs more memory for the prompt and other stuff. So for example, an 8gb model like gemma3:12b actually needs around 10gb.
Run `ollama ps` to see if the model is loaded to your CPU or GPU (or both)