Replies (9)

Liberty Gal's avatar
Liberty Gal 10 months ago
I've only used it a few times, but I've liked perplexity.ai because it doesn't just give you answers, it tells you where it got the answers (so no making things up) and suggests related questions you might want answered. I have limited experience with various AIs, so I don't know if I'm the best person to ask. I've also stayed away from the big AI programs like Open AI and ChatGPT because I don't trust what they are doing with my info.
Sibshops's avatar
Sibshops 10 months ago
I haven't done anything locally, it still takes like a $6000 machine to run deepseek R1.
Llama3.3 70B is the most useful that I have tried. Any smaller and you have to stick to very general knowledge. Asking for niche knowledge from a small model is a recipe for hallucinations. I use it both for conversational AI to help me narrow search terms and for coding help via the Continue AI vscode plugin. I use, I think, deepSeek coder for auto complete. The main drawback is that it is somewhat slow. I get 3.3 tokens per second which is equivalent to talking to someone who types 150 words per minute. That is actually helpful because it is not so slow as to be intolerable but not so instant that I don't try to figure things out on my own first. Is does require some decent hardware though. I've got a 4090 an 13900k and 64 gigabytes of RAM running at 6400 MT/s That last number is key. The 4bit quantization of llama3.3 is 42 GB. With 24GB of vram that leaves 18GB that have to be processed by the CPU for each token. The result is that the GPU is actually not doing much. You probably don't need a 4090, just as much vram as you can get. A 5090 with 32 GB of vram should be able to do 6 tokens per second simply for having only 10GB to process on the CPU.
The issue I have with models is it's practically useless for doing actual work. I can be fairly useful for asking questions, especially less technical, and getting a response. It's terrible to integrate into things and have it actually perform more than, "who is this" or "what does this term mean", when I need it understand me, my way of doing things, and helping me accomplish tasks. Public AI models require complex integration and it's expensive when you need to ingest large amounts of data to help make predictions and decisions. Local models kind of offer a much cheaper alternative and easier integration into workloads, which is my goal. we love to imagine "what tools could if" or "AI will take over, when", but like it can't do anything practical yet and no one is talking about that.
There are much smaller models and ones that are trained off R1 that need much less hardware, but they seem to be big stinkers, so I was hoping others may have actually be able to do some real work with them without $6000 worth of hardware. I spend that much on far more hardware for my application, doesn't involve GPU's though and hard to justify the price of them for how many CPU's I could buy for the same price.