Official Leaderboards
EPOB results, evidence first.
Published paper results and post-publication tracks, organized like a benchmark system rather than a marketing page.
| Rank | Framework | Score | Deliverable | Evidence | Failure label | Model | Artifact |
|---|---|---|---|---|---|---|---|
| Loading leaderboard data... | |||||||
Matrix View
Framework by family evidence map
Result Viewer
Selected run evidence
Select a leaderboard row to inspect score components, package provenance, and evidence scope.
Dynamic Score Profile
Selected row breakdown
Evidence
Frozen artifacts remain the source of truth.
Public rows are derived from paper-package artifacts under docs/paper/. Future live submissions should enter through the evaluator API and preserve raw runtime, judge, and bundle metadata.
Submit
Evaluator backend is FastAPI, Redis, and PostgreSQL with pgvector.
The website is static and deployable today. Submission intake and hosted leaderboard publication are scoped under src/evaluator/.
GET /health
GET /v1/leaderboard
POST /v1/evaluations