SlotCurri

Reconstruction-Guided Slot Curriculum:
Addressing Object Over-Fragmentation in Video Object-Centric Learning

CVPR 2026

WonJun Moon, Hyun Seok Seong, Jae-Pil Heo

Sungkyunkwan University

Abstract

Video Object‑Centric Learning seeks to decompose raw videos into a small set of object slots, but existing slot‑attention models often suffer from severe over‑fragmentation. This is because the model is implicitly encouraged to occupy all slots to minimize the reconstruction objective, thereby representing a single object with multiple redundant slots. We tackle this limitation with a reconstruction‑guided slot curriculum (SlotCurri). Training starts with only a few coarse slots and progressively allocates new slots where reconstruction error remains high, thus expanding capacity only where it is needed and preventing fragmentation from the outset. Yet, during slot expansion, meaningful sub‑parts can emerge only if coarse‑level semantics are already well separated; however, with a small initial slot budget and an MSE objective, semantic boundaries remain blurry. Therefore, we augment MSE with a structure‑aware loss that preserves local contrast and edge information to encourage each slot to sharpen its semantic boundaries. Lastly, we propose a cyclic inference that rolls slots forward and then backward through the frame sequence, producing temporally consistent object representations even in the earliest frames. All combined, SlotCurri addresses object over-fragmentation by allocating representational capacity where reconstruction fails, further enhanced by structural cues and cyclic inference. Notable FG-ARI gains of +6.8 on YouTube-VIS and +8.3 on MOVi-C validate the effectiveness of SlotCurri.

Key Contributions

Component	Description
Slot Curriculum	Progressively allocates slots where reconstruction error is high
Structure-Aware Loss	SSIM-based loss to sharpen semantic boundaries for precise slot partitioning
Cyclic Inference	Forward + backward temporal rolling for consistent slot representations

Installation

# Base install
poetry install

# Optional extras
poetry install -E tensorflow   # for MOVi dataset processing
poetry install -E coco         # for COCO / YouTube-VIS dataset processing
poetry install -E notebook     # for visualization notebooks

# Additional dependencies
poetry run pip install pytorch_msssim kornia lpips

Dataset

The datasets should be placed under a common root directory with the following structure:

├── SlotCurri/
└── dataset/
    ├── ytvis2021_resized/    
    ├── movi_c/               
    └── movi_e/

Dataset	Download
YouTube-VIS 2021	Google Drive
MOVi-C	Google Drive
MOVi-E	Google Drive

See Dataset Preparation below for download and preprocessing instructions.

Training

# YouTube-VIS 2021
poetry run python -m slotcurri.train --run-eval-after-training configs/slotcurri/ytvis2021.yaml

# MOVi-E
poetry run python -m slotcurri.train --run-eval-after-training configs/slotcurri/movi_e.yaml

# MOVi-C
poetry run python -m slotcurri.train --run-eval-after-training configs/slotcurri/movi_c.yaml

Pretrained Checkpoints

Dataset	Download
MOVi-C	Google Drive
MOVi-E	Google Drive
YouTube-VIS 2021	Google Drive

Dataset Preparation

MOVi-C / MOVi-E

poetry install -E tensorflow

# MOVi-C
python data/save_movi.py --level c --split train --maxcount 32 --only-video <root_data_dir>/movi_c
python data/save_movi.py --level c --split validation --maxcount 32 <root_data_dir>/movi_c

# MOVi-E
python data/save_movi.py --level e --split train --maxcount 32 --only-video <root_data_dir>/movi_e
python data/save_movi.py --level e --split validation --maxcount 32 <root_data_dir>/movi_e

COCO

poetry install -E coco

cd data
python save_coco.py --split train --maxcount 128 --only-images --out-path <root_data_dir>/coco
python save_coco.py --split validation --maxcount 128 --out-path <root_data_dir>/coco

Images are resized to 256×256 and saved as sharded .tar files. The script automatically downloads and extracts the raw COCO dataset to --download-dir.

YouTube-VIS 2021

poetry install -E coco

cd data
python save_ytvis2021.py --split train --maxcount 32 --only-videos --resize --out-path <root_data_dir>/ytvis2021_resized
python save_ytvis2021.py  --split validation --maxcount 10 --resize --out-path <root_data_dir>/ytvis2021_resized

Raw files should be downloaded to the ytvis2021_raw folder beforehand.

Qualitative Results

YouTube-VIS 2021

MOVi-C

Citation

Acknowledgement

The codebase is adapted from Videosaur and SlotContrast.

License

This codebase is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
configs/slotcurri		configs/slotcurri
data		data
docs		docs
slotcurri		slotcurri
tests		tests
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SlotCurri

Reconstruction-Guided Slot Curriculum:
Addressing Object Over-Fragmentation in Video Object-Centric Learning

Abstract

Key Contributions

Installation

Dataset

Training

Pretrained Checkpoints

Dataset Preparation

MOVi-C / MOVi-E

COCO

YouTube-VIS 2021

Qualitative Results

YouTube-VIS 2021

MOVi-C

Citation

Acknowledgement

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SlotCurri

Reconstruction-Guided Slot Curriculum: Addressing Object Over-Fragmentation in Video Object-Centric Learning

Abstract

Key Contributions

Installation

Dataset

Training

Pretrained Checkpoints

Dataset Preparation

MOVi-C / MOVi-E

COCO

YouTube-VIS 2021

Qualitative Results

YouTube-VIS 2021

MOVi-C

Citation

Acknowledgement

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Reconstruction-Guided Slot Curriculum:
Addressing Object Over-Fragmentation in Video Object-Centric Learning

Packages