The official evaluation toolkit for Very Big Video Reasoning (VBVR). Unified inference and evaluation across 37 video generation models.
- 37 Models: Commercial APIs (Luma, Veo, Kling, Sora, Runway) and open-source models (LTX-Video, LTX-2, HunyuanVideo, SVD, WAN, CogVideoX, and more)
- VBVR-Bench: 100+ rule-based evaluators with deterministic 0–1 scores and no API calls
- Coming Soon: Human evaluation (Gradio) and VLM-as-a-Judge (GPT-4o, InternVL, Qwen3-VL)
# Install
git clone https://github.com/Video-Reason/VBVR-EvalKit.git && cd VBVR-EvalKit
python -m venv venv && source venv/bin/activate
pip install -e .
# Setup a model
bash setup/install_model.sh --model svd --validate
# Inference
python examples/generate_videos.py --questions-dir setup/test_assets/ --output-dir ./outputs --model svd
# Evaluation (VBVR-Bench)
python examples/score_videos.py --inference-dir ./outputsVBVR-Bench matches each task to a rule-based evaluator by the generator name in the directory path. The evaluator needs both the generated video and reference data side by side:
{model}/{generator_name}/{task_type}/{task_id}/{run_id}/
├── video/output.mp4 # generated video
└── question/ # reference data
├── first_frame.png
├── final_frame.png
├── prompt.txt
└── ground_truth.mp4 # optional
python examples/score_videos.py --inference-dir ./outputs # task_specific score only
python examples/score_videos.py --inference-dir ./outputs --full-score # all 5 dimensionsSee docs/En/SCORING.md for the full end-to-end workflow, scoring dimensions, output format, and CLI reference.
cp env.template .env
# LUMA_API_KEY=... OPENAI_API_KEY=... GEMINI_API_KEY=... KLING_API_KEY=... RUNWAYML_API_SECRET=...| Topic | Link |
|---|---|
| Scoring (VBVR-Bench) | docs/SCORING.md |
| Inference | docs/INFERENCE.md |
| Supported Models | docs/MODELS.md |
| Adding Models | docs/ADDING_MODELS.md |
| End-to-End Workflow | docs/DATA_GENERATOR.md |
| FAQ | docs/FAQ.md |
@article{vbvr2026,
title={A Very Big Video Reasoning Suite},
author={Wang, Maijunxian and Wang, Ruisi and Lin, Juyi and Ji, Ran and Wiedemer, Thaddäus and Gao, Qingying and Luo, Dezhi and Qian, Yaoyao and Huang, Lianyu and Hong, Zelong and Ge, Jiahui and Ma, Qianli and He, Hang and Zhou, Yifan and Guo, Lingzi and Mei, Lantao and Li, Jiachen and Xing, Hanwen and Zhao, Tianqi and Yu, Fengyuan and Xiao, Weihang and Jiao, Yizheng and Hou, Jianheng and Zhang, Danyang and Xu, Pengcheng and Zhong, Boyang and Zhao, Zehong and Fang, Gaoyun and Kitaoka, John and Xu, Yile and Xu, Hua and Blacutt, Kenton and Nguyen, Tin and Song, Siyuan and Sun, Haoran and Wen, Shaoyue and He, Linyang and Wang, Runming and Wang, Yanzhi and Yang, Mengyue and Ma, Ziqiao and Millière, Raphaël and Shi, Freda and Vasconcelos, Nuno and Khashabi, Daniel and Yuille, Alan and Du, Yilun and Liu, Ziming and Lin, Dahua and Liu, Ziwei and Kumar, Vikash and Li, Yijiang and Yang, Lei and Cai, Zhongang and Deng, Hokin},
year={2026}
}Apache 2.0
