Brian Samek
About Me Benchmarks Blog

This is a list of AI benchmarks I’m watching. Last updated June 11, 2026.

  • ALE-Bench
  • ARC Prize
  • Artificial Analysis
  • Bullshit Benchmark
  • CAIS AI Dashboard
  • CursorBench
  • DeepSWE
  • Design Arena
  • Epoch AI Models
  • EQ-Bench 3
  • EQ-Bench Creative Writing
  • ExploitBench
  • Frontier Code
  • FutureSearch Benchmarks
  • GBA Eval
  • Gert Labs Rankings
  • Kagi LLM Benchmark
  • lechmazur benchmarks
  • LiveBench
  • LMArena
  • MathArena
  • Mercor Apex
  • METR Time Horizons
  • Pencil Puzzle Bench
  • ProgramBench
  • RuneScape Bench
  • Scale Labs
  • SimpleBench
  • SWE-Marathon
  • SWE-rebench
  • Terminal-Bench
  • Toloka Arena
  • Vals Index
  • Vending-Bench 2
  • Vending-Bench Arena
  • VoxelBench
  • WeirdML
  • Wolfram LLM Benchmarking Project

Meta

Composite indexes built from other benchmarks’ published scores.

  • Boggs Benchmarks
  • Epoch Capabilities Index