Inception Labs Unveils Mercury 2: A 1,000 TPS Language Model Outperforming Google's Offerings
Inception Labs has launched Mercury 2, a novel reasoning language model that achieves approximately 1,000 tokens per second (TPS) using parallel denoising. This marks a significant leap over competitors like Anthropic's Claude Haiku 4.5 and OpenAI's GPT-5 Mini.
Unlike traditional sequential models, Mercury 2 employs a diffusion-based architecture, similar to image generators, to process text blocks in parallel. This approach, rooted in research by Stanford's Stefano Ermon, not only boosts speed but also maintains high reasoning capabilities. On the AIME 2026 mathematics benchmark, Mercury 2 achieved 90% accuracy, surpassing Google's DiffusionGemma (69.1%).
The model's high throughput has immediate implications for Web3 infrastructure, particularly in areas like real-time smart contract auditing and the operation of AI agents. A joint case study with Augment Code demonstrated an 82% reduction in latency and a 90% decrease in operational costs when integrating Mercury 2.
Fuelled by a $50 million funding round including Nvidia's venture arm, Inception Labs positions Mercury 2 as the leading choice for latency-sensitive tasks. While open weights are not yet available, the model's efficiency is expected to drive its adoption in decentralized finance and automated digital economies.


Cryptovka
Inception Labs Debuts Mercury 2: A 1,000 TPS Model Outpacing Google
Inception Labs has officially launched Mercury 2, a high-speed reasoning language model that utilizes parallel denoising to achieve unprecedented p...


















