Skip to content

Add wbench eval framework#2200

Merged
NathanHB merged 1 commit into
huggingface:mainfrom
KainingYing:add-wbench-eval-framework
Jun 1, 2026
Merged

Add wbench eval framework#2200
NathanHB merged 1 commit into
huggingface:mainfrom
KainingYing:add-wbench-eval-framework

Conversation

@KainingYing
Copy link
Copy Markdown
Contributor

@KainingYing KainingYing commented May 29, 2026

Summary

Adds wbench to the supported evaluation frameworks for benchmark dataset eval.yaml files.

WBench is a comprehensive multi-turn benchmark for interactive video world model evaluation, assessing models across 5 dimensions (video quality, setting adherence, interaction adherence, consistency, physics compliance) and 22 metrics over 289 multi-turn interaction cases.

Dataset prepared for the Hub Evaluation Results feature:
https://huggingface.co/datasets/meituan-longcat/WBench

The dataset repo already includes an eval.yaml.


Note

Low Risk
Registry-only metadata change with no runtime logic, auth, or data path changes.

Overview
Registers wbench in EVALUATION_FRAMEWORKS (packages/tasks/src/eval.ts) so benchmark datasets can declare it in eval.yaml and surface correctly on Hub Evaluation Results.

The new entry includes display name, a short description of WBench (multi-turn interactive video world model evaluation), and a link to the upstream repo.

Reviewed by Cursor Bugbot for commit 30f6c1c. Bugbot is set up for automated code reviews on this repo. Configure here.

@NathanHB NathanHB merged commit 48b9ba0 into huggingface:main Jun 1, 2026
4 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants