An Alephic experiment: 45 AI models try to fill a March Madness bracket using an agent loop and web research tools. The bracket is the benchmark — what we’re really exploring is what you have to change about the system to support models of different capability. How it works →

Three difficulty modes: HARD — research tools only, submit a full 63-game bracket in one shot. MID — adds lookup and validation tools. EASY — guided round-by-round with full guardrails.

45 models100 entries61 games decidedCurrent round: Ch
Loading leaderboard...