This slowness comes with a very pleasant upside: I'm able to keep up the thinking process and the output of the LLM. Which grows the feeling of owning the output.
Sebastix's avatar Sebastix
Quite slow, but fully local and 0 costs. image
View quoted note →

Replies (6)

Default avatar
Emmanuel 3 weeks ago
One issue I have with local LLMs (using llama.cpp) is that I run out of vRAM for the context. Once that happens the conversation ends. I have to start a new conversation with an empty context to continue. I haven't found a way to automatically dump some of the older context to make room so I can continue with at least some of the context.
Empka's avatar
Empka 3 weeks ago
I've been considering a local setup better than my current (8GB Nvidia 3070). How is the support for various tools/models on a non cuda setup? Might be cheaper for me to go Ryzen AI instead of 1 or 2 Nvidia GPUs.
Empka's avatar
Empka 3 weeks ago
Not using Nvidia hardware