intel is seriously competitive for price-to-VRAM, but i don't know about compatibility NVIDIA is usually the clear winner for performance, 5xxx series/blackwell has support for NVFP4 quantized models but you could also do like, multiple 3090s or something hope this helps

Replies (3)

Default avatar
nicodemus 1 month ago
This is true, GPUs are faster for inference. But you'll also be consuming 1500 watts, have to deal with those thermal issues, and still struggle to fit a model larger than 32B with decent quantization. Alternatively, the 395 chips and their NPU are doing pretty good. Combine 2 of them and you're looking at low GPU level inference AND you get 256MB for a larger model and plenty of context and STILL under 1000 watts.
I think dual 3090s would be preferable to fx a dgx spark with regards to inference speed, no? vRAM speed is higher I believe. Downside is model size limit is obviously lower on 48 gb vRAM than 128gb unified of the dgx spark.