CubeBench Project Page

This is the project page for CubeBench: Diagnosing Interactive, Long-Horizon Spatial Reasoning under Partial Observations.

Overview

CubeBench is a novel generative benchmark centered on the Rubik's Cube, designed to evaluate Large Language Model (LLM) agents' capabilities in:

Spatial Reasoning: Understanding 3D geometry and action consequences
Long-Horizon State Tracking: Maintaining and updating world models over long sequences
Active Exploration under Partial Observation: Constructing complete mental models from limited views

Key Features

Three-Tiered Diagnostic Framework: Progressive evaluation from full symbolic state to partial visual observations
Comprehensive Evaluation: Tested on leading LLMs including GPT-5, Claude Sonnet 4, Gemini 2.5 Pro, and more
Critical Findings: Uniform 0.00% pass rate on all long-horizon tasks across all models

Structure

The website is built using the Academic Project Page Template and includes:

Abstract: Overview of the research problem and contributions
Three Core Challenges: Visualization of the cognitive challenges
CubeBench Framework: Detailed explanation of the three-tiered diagnostic framework
Key Results: Comprehensive experimental results from three diagnostic experiments
BibTeX Citation: For easy reference

Usage

Simply open index.html in a web browser to view the project page locally, or deploy to a web server for public access.

Citation

@article{gao2025cubebench,
  title={CubeBench: Diagnosing Interactive, Long-Horizon Spatial Reasoning under Partial Observations},
  author={Gao, Huan-ang and Zhang, Zikang and Luo, Tianwei and Yang, Kaisen and Juan, Xinzhe and Qiu, Jiahao and Chen, Tianxing and He, Bingxiang and Zhao, Hao and Zhou, Hao and Liu, Shilong and Wang, Mengdi},
  journal={arXiv preprint},
  year={2025}
}

Authors

Huan-ang Gao*, Zikang Zhang* (Tsinghua University)
Shilong Liu†, Mengdi Wang† (Princeton University)
And collaborators from SJTU, UMich, and HKU

*Equal contribution | †Corresponding authors

License

This website is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
static		static
.nojekyll		.nojekyll
CNAME		CNAME
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CubeBench Project Page

Overview

Key Features

Structure

Usage

Citation

Authors

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CubeBench Project Page

Overview

Key Features

Structure

Usage

Citation

Authors

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages