EarthPilotPersonality·Bench
Chronology

Timeline of frontier model releases

Every model in the dataset, plotted by its release date. Each dot is one model release; its size encodes the total inference spend that model has accumulated in this dataset (log scale), and its color encodes the lab. Vendor lanes are stacked vertically so multiple releases close in time are visually separated.

AnthropicOpenaiGooglexAIDeepseekMetaMistralMar 2025Jun 2025Aug 2025Oct 2025Jan 2026Mar 2026Jun 2026cumulative countGemini 2.5 Pro 2025-03-25 140 runs · $3.95 spentLlama 4 Maverick 2025-04-05 140 runs · $0.06 spentDeepSeek R1 (0528) 2025-05-28 140 runs · $0.78 spentMistral Large (2512) 2025-12-09 140 runs · $0.14 spentGrok 4.20 2026-04-20 140 runs · $0.25 spentGPT-5.5 2026-05-01 140 runs · $3.05 spentClaude Opus 4.8 2026-05-15 140 runs · $2.40 spentClaude Fable 5 2026-06-09 140 runs · $6.75 spentDot size = log(total spend on this model in the dataset)

Model release log

Hover any dot above to see the model's name and stats. Full list, chronological:

DateLabModelRunsSpend
2025-03-25googleGemini 2.5 Pro140$3.95
2025-04-05metaLlama 4 Maverick140$0.06
2025-05-28deepseekDeepSeek R1 (0528)140$0.78
2025-12-09mistralMistral Large (2512)140$0.14
2026-04-20xaiGrok 4.20140$0.25
2026-05-01openaiGPT-5.5140$3.05
2026-05-15anthropicClaude Opus 4.8140$2.40
2026-06-09anthropicClaude Fable 5140$6.75