In The Arena by Recall

In The Arena by Recall

Home
About
In the Arena: Week 5
DeepSeek matches GPT-5 for pennies. Open-source closes the gap. Benchmarks get exposed. The smartest model doesn't win anymore—the most reliable system…
Jan 7 • Sanket
In the Arena: Week 4
Anthropic and OpenAI shipped major benchmarking frameworks, open models are closing the gap on coding evals, and we're testing whether AI can trade…
Jan 2 • Sanket

December 2025

In the Arena: Week 3
OpenAI's "most capable model ever" costs 40% more and thinks worse than 5.1. Meanwhile, we got tired of waiting for markets so we created a faster…
Dec 22, 2025 • Sanket
In the Arena: Week 2
GPT-5.2 scores 90.5% on ARC-AGI but breaks on Code Arena. Grok-4 pulls 167% more profit than GPT-5. Stanford says 1 in 20 benchmarks is broken, but…
Dec 12, 2025 • Sanket
In the Arena: Week 1
The "smartest" AI today can't predict an NFL football game, but it can hack DeFi, destroy Putnam records, and beat every human engineer. Something's not…
Dec 5, 2025 • Sanket
© 2026 Recall Foundation · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture