This repository contains reference implementation for paper Generalizable Offline Multi-Objective Reinforcement Learning via Preference-Conditioned Diffuser.
Create conda environment:
cd diffmorl
conda env create -f environment.yml
conda activate diffmorl_env
Install the diffuser for DiffMORL:
cd diffuser
python -m pip install -e .
You can download datasets from the PEDA repo following their instructions. Due to storage limit, we are unable to open-source all data variants. We recommend users to download the pretrained behavioral policies from the PEDA repo and generate all data following the examples in data_generation/collect_all.sh and data_generation/collect_custom.sh. Note that custom include the incomplete dataset used in our paper, which is tagged as custom-large. Other types of incomplete datasets can also be collected by modifying the settings in data_generation/custom_pref.py
You shoulf include the path in your PYTHONPATH environment variables by running
export PYTHONPATH=<path-to-diffmorl>
One example here for a single experiment:
python experiment.py --dir experiment_runs/example --env MO-HalfCheetah-v2 --seed 2 --dataset expert_custom --model_type mod --mod_type bc --num_steps_per_iter 400000 --max_iters 1 --use_p_bar True --K 8 --infer_N 7 --n_diffusion_steps 8 --returns_condition True --mixup True --mixup_num 6 --mixup_step 400000
Other example commands are included in scripts/examples.sh. To reproduce our results (after you have collected all datasets to be used), run
sh scripts/diffmorl_main.sh
Double-check your CUDA device and data path in the shell scripts.
After training, models will be evaluated automatically. The Pareto fronts and all metrics will be presented in the directory specified by --dir. Also, the models are saved to the same directory.