Skip to content

YanNeu/frozen_ood

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

On the Out-of-Distribution Generalization of Reasoning in Multimodal LLMs for Simple Visual Planning Tasks

Yannic Neuhaus1, Nicolas Flammarion2, Matthias Hein1, Francesco Croce3

1University of Tübingen 2EPFL 3ELLIS Institute Finland, Aalto University

arXiv

Data

Download the datasets here and unzip the file in ./data.

Input representations

We provide our datasets with four different input representations (the corresponding jsonls contain the substrings "image", "desc", "grid" or "table")

Examples for the different input representations

Reasoning traces

For the training data, we also provide the reasoning traces

  • desc : simple descriptive reasoning
  • table : ASCII visualization of the grid after each step
  • grid : more concise ASCII visualization of the grid after each step
  • table_desc / grid_desc: combination of the descriptive reasoning with the grid visualizations

Examples for the different input representations

Environment

git clone https://github.com/YanNeu/frozen_ood.git
cd frozen_ood
conda env create -f environment.yml

Fine-tuning

Use task=sft_text for the text based inputs and task=sft_image for image inputs, e.g.

python src/main.py epochs=10 task=sft_text data_path="./data/train/train_grid.jsonl" run_name="sft_grid"

to fine-tune the models with grid input and no reasoning and

python src/main.py epochs=10 task=sft_text data_path="./data/train/train_grid_reasoning_grid_desc.jsonl" run_name="sft_grid_reas_grid_desc"

for the version with description and grid based reasoning traces.

Evaluation

After fine-tuning the model you can evaluate it on all ID test sets via

python src/eval.py load_model_path="./checkpoints/sft_grid" data_path="./data/test_id/test_level3_4_5_6_grid.jsonl" save_dir="./results_id"

or on one of the OOD sets

python src/eval.py load_model_path="./checkpoints/sft_grid" data_path="./data/test_ood/test_level7_grid.jsonl" save_dir="./results_ood"

Citation

@article{neuhaus2026oodreasoning,
      title={On the Out-of-Distribution Generalization of Reasoning in Multimodal LLMs for Simple Visual Planning Tasks}, 
      author={Yannic Neuhaus and Nicolas Flammarion and Matthias Hein and Francesco Croce},
      journal={arXiv preprint arXiv:2602.15460},
      year={2026},
}

Acknowledgement

The code in this repository is based on VSP and Mirage

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages