This repository implements the fully automatic synthetic data generation and fine‑tuning pipeline introduced in Improving Physical Object State Representation in Text‑to‑Image Generative Systems. Starting from a curated set of object nouns, the pipeline:
- Generates template prompts describing objects in empty or absent states.
- Synthesizes images with an off‑the‑shelf text‑to‑image model.
- Filters out incorrect examples using a vision‑language model to verify “empty‑state” accuracy.
- Recaptions prompts via LLMs for linguistic diversity.
- Fine‑tunes generative models on the cleaned synthetic dataset to improve physical state representation.
Install core dependencies with pip (requires Python 3.8+):
# 1. (Optional) Create & activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate
# 2. Install required packages
pip install \
torch torchvision \
diffusers transformers \
openai \
pillow tqdm# Clone the repo and run the pipeline script
git clone https://github.com/your-org/object-state-pipeline.git
cd object-state-pipeline# Generate the first 5 prompts and their images, then exit:
python pipeline.py \
--api_key YOUR_OPENAI_KEY \python pipeline.py \
--experiment_folder experiments/v1 \
--api_key YOUR_OPENAI_KEY \
--prompt_generator ObjectBasedPromptGenerator \
--image_generator StableDiffusionImageGenerator \
--lora /path/to/lora_weights.safetensors \
--image_filter GPT4VImageFilter \
--image_recaptioner GPT4VImageRecaptioner \
--num_images_per_prompt 7python run_pipeline.py \
--api_key YOUR_OPENAI_KEY \
--no_processingAll benchmark prompt‑lists live in the datasets/ folder as JSON files. Each file contains a flat list of prompt strings.
-
object_state_bench.json- Size: 200 prompts (100 machine‑generated + 100 human‑curated)
- Purpose: Evaluates object absence/empty states on common household items
- Hugging Face: Tianle/Object‑State‑Bench
-
genai_object_state.json- Size: 214 prompts (filtered “negation” subset from GenAI‑Bench)
- Purpose: Tests generation of objects in varied physical states drawn from a public negation benchmark
- Hugging Face: Tianle/Object‑State‑Bench
You can load them directly via 🤗 Datasets:
This repository implements the pipeline described in our paper Improving Physical Object State Representation in Text‑to‑Image Generative Systems.
A preprint will be available on arXiv soon.
