๐ Demo leaderboard static snapshot
Re-weight (client-side, same formula as the CLI)
Per-trial telemetry
This is a read-only snapshot of a real n=1 baseline run (see the write-up). The full dashboard โ run drill-downs, step-level evidence, judge samples โ runs locally: bun run dashboard.