Subscribe
Sign in
Home
About
In the Arena: Week 5
DeepSeek matches GPT-5 for pennies. Open-source closes the gap. Benchmarks get exposed. The smartest model doesn't win anymore—the most reliable system…
Jan 7
•
Sanket
3
In the Arena: Week 4
Anthropic and OpenAI shipped major benchmarking frameworks, open models are closing the gap on coding evals, and we're testing whether AI can trade…
Jan 2
•
Sanket
December 2025
In the Arena: Week 3
OpenAI's "most capable model ever" costs 40% more and thinks worse than 5.1. Meanwhile, we got tired of waiting for markets so we created a faster…
Dec 22, 2025
•
Sanket
In the Arena: Week 2
GPT-5.2 scores 90.5% on ARC-AGI but breaks on Code Arena. Grok-4 pulls 167% more profit than GPT-5. Stanford says 1 in 20 benchmarks is broken, but…
Dec 12, 2025
•
Sanket
In the Arena: Week 1
The "smartest" AI today can't predict an NFL football game, but it can hack DeFi, destroy Putnam records, and beat every human engineer. Something's not…
Dec 5, 2025
•
Sanket
4
1
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts