This is the project page for CubeBench: Diagnosing Interactive, Long-Horizon Spatial Reasoning under Partial Observations.
CubeBench is a novel generative benchmark centered on the Rubik's Cube, designed to evaluate Large Language Model (LLM) agents' capabilities in:
- Spatial Reasoning: Understanding 3D geometry and action consequences
- Long-Horizon State Tracking: Maintaining and updating world models over long sequences
- Active Exploration under Partial Observation: Constructing complete mental models from limited views
- Three-Tiered Diagnostic Framework: Progressive evaluation from full symbolic state to partial visual observations
- Comprehensive Evaluation: Tested on leading LLMs including GPT-5, Claude Sonnet 4, Gemini 2.5 Pro, and more
- Critical Findings: Uniform 0.00% pass rate on all long-horizon tasks across all models
The website is built using the Academic Project Page Template and includes:
- Abstract: Overview of the research problem and contributions
- Three Core Challenges: Visualization of the cognitive challenges
- CubeBench Framework: Detailed explanation of the three-tiered diagnostic framework
- Key Results: Comprehensive experimental results from three diagnostic experiments
- BibTeX Citation: For easy reference
Simply open index.html in a web browser to view the project page locally, or deploy to a web server for public access.
@article{gao2025cubebench,
title={CubeBench: Diagnosing Interactive, Long-Horizon Spatial Reasoning under Partial Observations},
author={Gao, Huan-ang and Zhang, Zikang and Luo, Tianwei and Yang, Kaisen and Juan, Xinzhe and Qiu, Jiahao and Chen, Tianxing and He, Bingxiang and Zhao, Hao and Zhou, Hao and Liu, Shilong and Wang, Mengdi},
journal={arXiv preprint},
year={2025}
}- Huan-ang Gao*, Zikang Zhang* (Tsinghua University)
- Shilong Liu†, Mengdi Wang† (Princeton University)
- And collaborators from SJTU, UMich, and HKU
*Equal contribution | †Corresponding authors
This website is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.