What Are We Actually Benchmarking in Robot Manipulation?

Lean static project page for the paper What Are We Actually Benchmarking in Robot Manipulation?

Authors: Tianchong Jiang, Xiangshan Tan, Samuel Wheeler, Luzhe Sun, Tewodros W. Ayalew, Matthew Walter.

Affiliations: TTIC, University of Chicago, Argonne National Laboratory.

Public website path: https://ripl.github.io/manipulation_benchmark_audit/

Files

index.html: dependency-free static project page with inline CSS.
figures/benchmark-reported-results-stacked-area.png: paper-owned benchmark-report count figure used on the page.
figures/diagnostic-*.png: paper-result figures used in the four diagnostic cards.

Links

arXiv: https://arxiv.org/abs/2606.04233
Code & Artifacts: https://github.com/ripl/ManipulationBenchmarkAudit

Scope

The page summarizes diagnostics for shortcut solvability, statistical significance, creeping overfitting, and data-source dependence across LIBERO, CALVIN, SimplerEnv, RoboCasa, and RoboTwin 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
figures		figures
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What Are We Actually Benchmarking in Robot Manipulation?

Files

Links

Scope

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

What Are We Actually Benchmarking in Robot Manipulation?

Files

Links

Scope

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages