Add wbench eval framework by KainingYing · Pull Request #2200 · huggingface/huggingface.js

KainingYing · 2026-05-29T09:14:34Z

Summary

Adds wbench to the supported evaluation frameworks for benchmark dataset eval.yaml files.

WBench is a comprehensive multi-turn benchmark for interactive video world model evaluation, assessing models across 5 dimensions (video quality, setting adherence, interaction adherence, consistency, physics compliance) and 22 metrics over 289 multi-turn interaction cases.

Dataset prepared for the Hub Evaluation Results feature:
https://huggingface.co/datasets/meituan-longcat/WBench

The dataset repo already includes an eval.yaml.

Note

Low Risk
Registry-only metadata change with no runtime logic, auth, or data path changes.

Overview
Registers wbench in EVALUATION_FRAMEWORKS (packages/tasks/src/eval.ts) so benchmark datasets can declare it in eval.yaml and surface correctly on Hub Evaluation Results.

The new entry includes display name, a short description of WBench (multi-turn interactive video world model evaluation), and a link to the upstream repo.

^{Reviewed by Cursor Bugbot for commit 30f6c1c. Bugbot is set up for automated code reviews on this repo. Configure here.}

Add wbench eval framework

30f6c1c

KainingYing requested review from NathanHB, gary149, julien-c, krampstudio and pcuenca as code owners May 29, 2026 09:14

NathanHB approved these changes Jun 1, 2026

View reviewed changes

NathanHB merged commit 48b9ba0 into huggingface:main Jun 1, 2026
4 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add wbench eval framework#2200

Add wbench eval framework#2200
NathanHB merged 1 commit into
huggingface:mainfrom
KainingYing:add-wbench-eval-framework

KainingYing commented May 29, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

KainingYing commented May 29, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

KainingYing commented May 29, 2026 •

edited by cursor Bot

Loading