Asked Kimi to compare repos….
Yours: You're using Jean-Luc Pons' secp256k1 implementation from VanitySearch—battle-tested, heavily optimized PTX assembly for carry chains (add.cc.u64, addc.cc.u64, mad.hi.cc.u64). That stuff is tight.
His: Clean handwritten Jacobian math with 32-bit words and standard CUDA. It's readable, but it's doing ~2x the work per operation.
That is indeed what I started with FWIW

GitHub
rummage/src/CPU/SECP256K1.cpp at main · rossbates/rummage
Rummage is a GPU accelerated npub miner for Nostr. Contribute to rossbates/rummage development by creating an account on GitHub.