EPOB End-to-End Project Orchestration Benchmark

Official Leaderboards

EPOB results, evidence first.

Published paper results and post-publication tracks, organized like a benchmark system rather than a marketing page.

Rank Framework Score Deliverable Evidence Failure label Model Artifact
Loading leaderboard data...

Matrix View

Framework by family evidence map

Result Viewer

Selected run evidence

Select a leaderboard row to inspect score components, package provenance, and evidence scope.

Dynamic Score Profile

Selected row breakdown

Evidence

Frozen artifacts remain the source of truth.

Public rows are derived from paper-package artifacts under docs/paper/. Future live submissions should enter through the evaluator API and preserve raw runtime, judge, and bundle metadata.

Submit

Email evaluation inquiries to [email protected].

Public comparable submissions, artifact handoff questions, and evaluator access requests can now start by email while hosted API intake remains scoped under src/evaluator/.