EPOB End-to-End Project Orchestration Benchmark

Official Leaderboards

EPOB results, evidence first.

Published paper results and post-publication tracks, organized like a benchmark system rather than a marketing page.

Rank Framework Score Deliverable Evidence Failure label Model Artifact
Loading leaderboard data...

Matrix View

Framework by family evidence map

Result Viewer

Selected run evidence

Select a leaderboard row to inspect score components, package provenance, and evidence scope.

Dynamic Score Profile

Selected row breakdown

Evidence

Frozen artifacts remain the source of truth.

Public rows are derived from paper-package artifacts under docs/paper/. Future live submissions should enter through the evaluator API and preserve raw runtime, judge, and bundle metadata.

Submit

Evaluator backend is FastAPI, Redis, and PostgreSQL with pgvector.

The website is static and deployable today. Submission intake and hosted leaderboard publication are scoped under src/evaluator/.

GET /health
GET /v1/leaderboard
POST /v1/evaluations