Skip to content

muzhancun/DROL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

    Preserve Support, Not Correspondence:
    Dynamic Routing for Offline Reinforcement Learning


    DROL

arXivProject Page

Overview

DROL is a one-step offline RL actor trained with top-1 dynamic routing. For each state, the actor samples K candidate actions from a bounded latent prior, routes the dataset action to its nearest candidate, and updates only the routed winner with behavior cloning and critic guidance.

The key idea is to preserve local action support rather than a fixed latent-to-target correspondence. Routing is recomputed every gradient step, so responsibility for a supported action region can transfer between candidates as optimization progresses. This lets the policy follow local Q improvements while retaining cheap single-pass inference at test time.

This implementation is a minimal DROL-only extraction from the larger FQL research codebase. It keeps the DROL agent, the OGBench/D4RL data path, training, evaluation, logging, and production commands, while removing unrelated agents and analysis scripts.

Installation

The code uses the JAX/Flax training stack inherited from FQL. Install the dependencies with:

pip install -r requirements.txt

For D4RL experiments, use the MuJoCo/D4RL setup expected by the pinned dependencies in requirements.txt.

Usage

The main DROL implementation is in agents/drol.py. The DROL-only training entrypoint is main.py.

Run a default OGBench experiment from this folder:

python main.py --env_name=cube-double-play-singletask-v0

Example OGBench navigation run:

python main.py \
  --env_name=antmaze-large-navigate-singletask-task1-v0 \
  --agent.bc_coef=0.03 \
  --agent.num_candidates=16 \
  --agent.discount=0.995 \
  --agent.q_agg=min

Example D4RL run:

python main.py \
  --env_name=antmaze-medium-play-v2 \
  --offline_steps=500000 \
  --agent.bc_coef=0.1 \
  --agent.num_candidates=16

The default config path is agents/drol.py, so --agent=agents/drol.py is not required for ordinary runs.

Tips For Hyperparameter Tuning

The most important DROL hyperparameters are:

  • --agent.bc_coef: support/behavior-cloning coefficient. Tune this first.
  • --agent.num_candidates: routing budget K. The default paper setting is K=16.
  • --agent.q_agg: critic aggregation. Some AntMaze and Adroit runs use min.
  • --agent.discount: long-horizon OGBench navigation uses 0.995.

In practice, fix K=16 first and tune bc_coef. If additional budget is available, sweep K after the workable bc_coef range is clear. Larger K increases routing capacity, but it is not guaranteed to improve every task monotonically.

Shared paper settings include Adam with learning rate 3e-4, batch size 256, 4-layer actor/critic MLPs with hidden widths (512, 512, 512, 512), critic ensemble size 2, target update rate 0.005, 1e6 offline updates on OGBench, and 5e5 offline updates on D4RL.

Reproducing The Main Results

The production command list is in scripts/production_run_commands.sh:

bash scripts/production_run_commands.sh

It contains the exact command groups used for:

  • OGBench DROL(16), with fixed default K=16.
  • OGBench DROL*, with family-level tuned K.
  • D4RL DROL(16), with fixed default K=16.
  • D4RL tuned supporting reruns.

The script has been adapted for this minimal folder and uses --agent=agents/drol.py.

Repository Layout

  • main.py: DROL-only offline/offline-to-online training entrypoint.
  • agents/drol.py: DROL agent, losses, action sampling, and config.
  • envs/: OGBench and D4RL environment/dataset construction.
  • utils/: dataset, evaluation, Flax, network, encoder, and logging helpers.
  • scripts/production_run_commands.sh: paper production commands.
  • requirements.txt: dependency pins copied from the source repository.

Dataset artifacts are not copied into this folder. OGBench and D4RL loading still follows the package/local-data behavior of the original FQL repository.

Acknowledgments

This codebase is built on top of Flow Q-Learning. DROL reuses the FQL-style JAX training stack and one-step flow actor infrastructure, while replacing the actor-side support regularizer with dynamic routing.

About

DROL is a one-step offline RL actor trained with top-1 dynamic routing.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors