Lean static project page for the paper What Are We Actually Benchmarking in Robot Manipulation?
Authors: Tianchong Jiang, Xiangshan Tan, Samuel Wheeler, Luzhe Sun, Tewodros W. Ayalew, Matthew Walter.
Affiliations: TTIC, University of Chicago, Argonne National Laboratory.
Public website path: https://ripl.github.io/manipulation_benchmark_audit/
index.html: dependency-free static project page with inline CSS.figures/benchmark-reported-results-stacked-area.png: paper-owned benchmark-report count figure used on the page.figures/diagnostic-*.png: paper-result figures used in the four diagnostic cards.
- arXiv: https://arxiv.org/abs/2606.04233
- Code & Artifacts: https://github.com/ripl/ManipulationBenchmarkAudit
The page summarizes diagnostics for shortcut solvability, statistical significance, creeping overfitting, and data-source dependence across LIBERO, CALVIN, SimplerEnv, RoboCasa, and RoboTwin 2.0.