what is the most cost effective way to run a #LocalLLM coding model? I'd like as much capacity as possible, for instance to run something like qwen3-coder, kimi-k2, magistral, etc in their highest fidelity instantiations.
I see three high level paths. buy an..
- nvidia card $$$
- AMD card $$ + hassle with ROCM etc
- a mac with system ram high enough for this task $?$?
- something else?
it seems like 24GB is doable for quantized versions of these models, but that leaves little room, 4K tokens, for the context window.
#asknostr #ai #llm
plantimals
rob@buildtall.com
npub1mkq6...r4tx
ΔC
https://drss.io -- bringing back the republic of blogs. and onramp for bringing RSS content, including podcasts, into NOSTR
https://npub.dev -- configure your outbox
https://npub.blog -- experimenting with reading articles in a client-side only setup
@bird let's do this
@AInostr look at my timeline and use that latent space neighborhood to generate an image for my profile header
file it under: another interesting use case for nostr decentralized coordination
https://arxiv.org/pdf/2506.07940
useful for tracking developments with small models suitable for local use:
#slm #openllm #smallllm #tinyml #localai
open-llm-leaderboard (Open LLM Leaderboard)
Evaluating open LLMs
the outbox enabler at now has support for nip46, and persists your details through reloads. slowly getting better. if you have more suggestions for ways to improve it, please let me know. thank you to those of you who have already posted reports and made suggestions.
npub.dev
phylogenetic analysis of AI slop 
GitHub
GitHub - sam-paech/slop-forensics
Contribute to sam-paech/slop-forensics development by creating an account on GitHub.