โ† about harness-eval

๐Ÿ† Demo leaderboard static snapshot

Re-weight (client-side, same formula as the CLI)

Per-trial telemetry

This is a read-only snapshot of a real n=1 baseline run (see the write-up). The full dashboard โ€” run drill-downs, step-level evidence, judge samples โ€” runs locally: bun run dashboard.