Skip to content

Princeton-AI2-Lab/CubeBench-Web

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CubeBench Project Page

This is the project page for CubeBench: Diagnosing Interactive, Long-Horizon Spatial Reasoning under Partial Observations.

Overview

CubeBench is a novel generative benchmark centered on the Rubik's Cube, designed to evaluate Large Language Model (LLM) agents' capabilities in:

  • Spatial Reasoning: Understanding 3D geometry and action consequences
  • Long-Horizon State Tracking: Maintaining and updating world models over long sequences
  • Active Exploration under Partial Observation: Constructing complete mental models from limited views

Key Features

  • Three-Tiered Diagnostic Framework: Progressive evaluation from full symbolic state to partial visual observations
  • Comprehensive Evaluation: Tested on leading LLMs including GPT-5, Claude Sonnet 4, Gemini 2.5 Pro, and more
  • Critical Findings: Uniform 0.00% pass rate on all long-horizon tasks across all models

Structure

The website is built using the Academic Project Page Template and includes:

  • Abstract: Overview of the research problem and contributions
  • Three Core Challenges: Visualization of the cognitive challenges
  • CubeBench Framework: Detailed explanation of the three-tiered diagnostic framework
  • Key Results: Comprehensive experimental results from three diagnostic experiments
  • BibTeX Citation: For easy reference

Usage

Simply open index.html in a web browser to view the project page locally, or deploy to a web server for public access.

Citation

@article{gao2025cubebench,
  title={CubeBench: Diagnosing Interactive, Long-Horizon Spatial Reasoning under Partial Observations},
  author={Gao, Huan-ang and Zhang, Zikang and Luo, Tianwei and Yang, Kaisen and Juan, Xinzhe and Qiu, Jiahao and Chen, Tianxing and He, Bingxiang and Zhao, Hao and Zhou, Hao and Liu, Shilong and Wang, Mengdi},
  journal={arXiv preprint},
  year={2025}
}

Authors

  • Huan-ang Gao*, Zikang Zhang* (Tsinghua University)
  • Shilong Liu†, Mengdi Wang† (Princeton University)
  • And collaborators from SJTU, UMich, and HKU

*Equal contribution | †Corresponding authors

License

This website is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors