Thread - Nostr Hypermedia

librekitty librekitty@blitzwalletapp.com 1 month ago

intel is seriously competitive for price-to-VRAM, but i don't know about compatibility NVIDIA is usually the clear winner for performance, 5xxx series/blackwell has support for NVFP4 quantized models

NVIDIA Technical Blog

Introducing NVFP4 for Efficient and Accurate Low-Precision Inference | NVIDIA Technical Blog

To get the most out of AI, optimizations are critical. When developers think about optimizing AI models for inference, model compression techniques...

but you could also do like, multiple 3090s or something hope this helps

↑ Parent

Replies (3)

librekitty librekitty@blitzwalletapp.com 1 month ago

you can also go the CPU route with tons of RAM, but inference speed will be terrible compared to GPU accellerated

1 replies ↓

nicodemus 1 month ago

This is true, GPUs are faster for inference. But you'll also be consuming 1500 watts, have to deal with those thermal issues, and still struggle to fit a model larger than 32B with decent quantization. Alternatively, the 395 chips and their NPU are doing pretty good. Combine 2 of them and you're looking at low GPU level inference AND you get 256MB for a larger model and plenty of context and STILL under 1000 watts.

zaytun zaytun@zayt.space 1 month ago

I think dual 3090s would be preferable to fx a dgx spark with regards to inference speed, no? vRAM speed is higher I believe. Downside is model size limit is obviously lower on 48 gb vRAM than 128gb unified of the dgx spark.