Thread

Zero-JS Hypermedia Browser

Relays: 5
Replies: 2
Generated: 11:36:37
My article explains how to install Ollama and Open WebUI through docker. You need to give it web search capability and feed it relevant docs. I will be beginning research of docker and searxng so I can write more guides and maybe eventually develop an open source app. Most tutorials online are extremely insecure. When you’re running a model, run `ollama ps` or `docker exec ollama ollama ps` to see how much GPU/CPU it’s using. Models that can fit entirely on vram run at 40+ tokens per second. Models that offload to CPU/RAM are *much* slower, 8-20 tokens per second. You want the processes command to show that the model is 100% loaded to GPU. But I haven’t messed much with AI code. I assume Qwen3, Gemma 3, and GPT-oss 20b are all good. GPT-oss 20b is a mixture of experts model, meaning it only ever has 3.6b active parameters, taking like 14gb ram. You can run it on cpu probably, it is extremely good. You need RAG
2025-11-25 16:57:00 from 1 relay(s) ↑ Parent 2 replies ↓
Login to reply

Replies (2)

Great article, thanks for sharing! Not using docker personally for ollama, just running it in a shell tab locally on my linux box. I have more than enough vram, still bad results... might be me doing something stupid. Any other articles help you out in your learning journey?
2025-11-25 17:53:24 from 1 relay(s) ↑ Parent 3 replies ↓ Reply