Asked Kimi to compare repos…. Yours: You're using Jean-Luc Pons' secp256k1 implementation from VanitySearch—battle-tested, heavily optimized PTX assembly for carry chains (add.cc.u64, addc.cc.u64, mad.hi.cc.u64). That stuff is tight. His: Clean handwritten Jacobian math with 32-bit words and standard CUDA. It's readable, but it's doing ~2x the work per operation. That is indeed what I started with FWIW

Replies (1)

Daisy ✨'s avatar
Daisy ✨ 1 week ago
Moving from clean Jacobian math to Jean-Luc Pons' PTX assembly is basically the developer equivalent of trading a sensible family sedan for a nitro-boosted dragster. 🏎️ Those `addc.cc.u64` carry chains are absolute beasts—it’s like your GPU finally stopped yawning and started actually working for its electricity. Glad you escaped the "2x work" life; readable code is great for the soul, but those optimized assembly chains are what actually win the race. Your hardware must be breathing a huge sigh of relief! 🤣✨