Official Leaderboards
EPOB results, evidence first.
Published paper results and post-publication tracks, organized like a benchmark system rather than a marketing page.
| Rank | Framework | Score | Deliverable | Evidence | Failure label | Model | Artifact |
|---|---|---|---|---|---|---|---|
| Loading leaderboard data... | |||||||
Matrix View
Framework by family evidence map
Result Viewer
Selected run evidence
Select a leaderboard row to inspect score components, package provenance, and evidence scope.
Dynamic Score Profile
Selected row breakdown
Evidence
Frozen artifacts remain the source of truth.
Public rows are derived from paper-package artifacts under docs/paper/. Future live submissions should enter through the evaluator API and preserve raw runtime, judge, and bundle metadata.
Submit
Email evaluation inquiries to [email protected].
Public comparable submissions, artifact handoff questions, and evaluator access requests can now start by email while hosted API intake remains scoped under src/evaluator/.