Archive - In The Arena by Recall

In the Arena: Week 5

DeepSeek matches GPT-5 for pennies. Open-source closes the gap. Benchmarks get exposed. The smartest model doesn't win anymore—the most reliable system…

Jan 7 • Sanket

In the Arena: Week 4

Anthropic and OpenAI shipped major benchmarking frameworks, open models are closing the gap on coding evals, and we're testing whether AI can trade…

Jan 2 • Sanket

December 2025

In the Arena: Week 3

OpenAI's "most capable model ever" costs 40% more and thinks worse than 5.1. Meanwhile, we got tired of waiting for markets so we created a faster…

Dec 22, 2025 • Sanket

In the Arena: Week 2

GPT-5.2 scores 90.5% on ARC-AGI but breaks on Code Arena. Grok-4 pulls 167% more profit than GPT-5. Stanford says 1 in 20 benchmarks is broken, but…

Dec 12, 2025 • Sanket

In the Arena: Week 1

The "smartest" AI today can't predict an NFL football game, but it can hack DeFi, destroy Putnam records, and beat every human engineer. Something's not…

Dec 5, 2025 • Sanket

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts