Skip to content

Update RSL-RL configs to work with new version 4.0#4379

Open
ClemensSchwarke wants to merge 14 commits intoisaac-sim:mainfrom
ClemensSchwarke:feature/rsl_rl_3.4_config_updates
Open

Update RSL-RL configs to work with new version 4.0#4379
ClemensSchwarke wants to merge 14 commits intoisaac-sim:mainfrom
ClemensSchwarke:feature/rsl_rl_3.4_config_updates

Conversation

@ClemensSchwarke
Copy link
Collaborator

@ClemensSchwarke ClemensSchwarke commented Jan 13, 2026

This PR updates the rsl-rl config classes to be compatible with version 4.0 and 4.1. It also adds a configuration handling function to ensure that old configs work with version 4.0/4.1 and adapts train.py and play.py accordingly. Lastly, it updates the config classes for ANYmal D locomotion for reference and adds recurrent configs.

The main change for 4.0 is that the policy config is split into seperate actor and critic / student and teacher configs. For more details see the Release Notes.

The main change for 4.1 is that the noise distribution is now configured via the distribution_cfg to allow for arbitrary distributions.

Checklist

  • I have run the pre-commit checks with ./isaaclab.sh --format
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file

@github-actions github-actions bot added enhancement New feature or request isaac-lab Related to Isaac Lab team labels Jan 13, 2026
@ClemensSchwarke ClemensSchwarke changed the title Update RSL-RL configs to work with new version 3.4 Update RSL-RL configs to work with new version 4.0 Jan 14, 2026
@ClemensSchwarke ClemensSchwarke force-pushed the feature/rsl_rl_3.4_config_updates branch from 11521b9 to dea941e Compare January 14, 2026 09:30
@ClemensSchwarke ClemensSchwarke marked this pull request as ready for review February 6, 2026 07:33
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 6, 2026

Greptile Overview

Greptile Summary

  • Updates IsaacLab’s RSL-RL config schema for rsl-rl v4.0.0 by splitting legacy policy into explicit model configs (actor/critic, student/teacher) and adding recurrent variants.
  • Adds a compatibility shim (handle_deprecated_rsl_rl_cfg) to translate old/new config fields depending on installed rsl-rl version, and wires it into train.py/play.py.
  • Updates ANYmal-D locomotion reference configs and gym registry kwargs to expose recurrent PPO/distillation entry points.
  • Pins rsl-rl-lib extra dependency to 4.0.0 and adapts play-time export to use new runner export APIs when available.

Confidence Score: 2/5

  • This PR is not safe to merge until a couple of runtime-breaking issues are fixed.
  • Two changes are likely to cause immediate runtime failures: (1) handle_deprecated_rsl_rl_cfg uses isinstance(x, type(MISSING)) to detect missing config fields, which will evaluate true for almost any value and breaks the compatibility logic; (2) play.py unconditionally calls policy.reset() for rsl-rl >= 4.0.0, which will fail for non-recurrent inference policies that don’t implement reset(). Additionally, RslRlCNNModelCfg lacks the @configclass decorator, which can break config serialization if used.
  • scripts/reinforcement_learning/rsl_rl/cli_args.py, scripts/reinforcement_learning/rsl_rl/play.py, source/isaaclab_rl/isaaclab_rl/rsl_rl/rl_cfg.py

Important Files Changed

Filename Overview
scripts/reinforcement_learning/rsl_rl/cli_args.py Adds handle_deprecated_rsl_rl_cfg() to translate deprecated runner/policy fields across rsl-rl <4 and >=4; likely has incorrect MISSING detection and may mis-handle optimizer removal.
scripts/reinforcement_learning/rsl_rl/play.py Adds rsl-rl version check, runs deprecated-config handler, and uses new runner export APIs for >=4; changes recurrent reset to call policy.reset() which may not exist for non-recurrent policies.
scripts/reinforcement_learning/rsl_rl/train.py Invokes deprecated-config handler before setting env seed; otherwise unchanged; relies on installed_version defined earlier in file.
source/isaaclab_rl/isaaclab_rl/rsl_rl/distillation_cfg.py Introduces explicit student/teacher model cfgs in distillation runner while keeping deprecated policy cfg for backward compatibility; structural reordering only.
source/isaaclab_rl/isaaclab_rl/rsl_rl/rl_cfg.py Splits old policy cfg into new model cfgs (MLP/RNN/CNN) and adds optimizer to PPO algo; keeps deprecated ActorCritic configs; potential missing @configclass on CNN model.
source/isaaclab_rl/setup.py Bumps rsl-rl extra dependency pin from 3.1.2 to 4.0.0.
source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/anymal_d/init.py Registers additional gym kwargs for recurrent RSL-RL PPO/distillation runner entry points.
source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/anymal_d/agents/rsl_rl_distillation_cfg.py Updates ANYmal-D distillation runner to new student/teacher model cfgs and adds recurrent variant; obs_groups keys updated to student/teacher.
source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/anymal_d/agents/rsl_rl_ppo_cfg.py Updates ANYmal-D PPO runner to new actor/critic model cfgs and adds recurrent runner config; adjusts obs_groups accordingly.

Sequence Diagram

sequenceDiagram
    participant U as User CLI
    participant P as play.py / train.py
    participant CA as cli_args.py
    participant H as Hydra/Registry
    participant RL as isaaclab_rl.rsl_rl cfgs
    participant R as rsl_rl Runner

    U->>P: launch with --task/--agent
    P->>H: load env_cfg + agent_cfg (entry_point)
    P->>CA: update_rsl_rl_cfg(agent_cfg, args)
    P->>CA: handle_deprecated_rsl_rl_cfg(agent_cfg, installed_version)
    CA->>RL: map legacy policy<->model cfgs
    P->>R: construct Runner(env, agent_cfg.to_dict())
    R->>R: load checkpoint (play) / learn (train)
    alt rsl-rl >= 4.0.0 (play)
        P->>R: export_policy_to_jit / export_policy_to_onnx
    else rsl-rl < 4.0.0 (play)
        P->>RL: export_policy_as_jit / export_policy_as_onnx
    end
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 110 to 112
agent_cfg.empirical_normalization, type(MISSING)
):
print(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect MISSING sentinel check

dataclasses.MISSING is a sentinel object, but the code checks isinstance(x, type(MISSING)), which is effectively isinstance(x, object) and will always be true for any normal value. This makes the deprecated-config logic behave incorrectly (e.g., treating policy as missing even when it's set) and can raise the "policy required" ValueError for valid configs. Use identity checks instead (e.g., agent_cfg.policy is MISSING) consistently throughout this function.

Comment on lines 202 to +205
obs, _, dones, _ = env.step(actions)
# reset recurrent states for episodes that have terminated
policy_nn.reset(dones)
if version.parse(installed_version) >= version.parse("4.0.0"):
policy.reset(dones)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unconditional reset call

For rsl-rl >= 4.0.0, this calls policy.reset(dones) unconditionally, but policy = runner.get_inference_policy(...) may return a non-recurrent policy callable that does not implement reset(). That will crash during play for non-recurrent checkpoints. Consider guarding with hasattr(policy, "reset") (or only resetting when using a recurrent runner/config) to avoid AttributeError.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 6, 2026

Additional Comments (1)

source/isaaclab_rl/isaaclab_rl/rsl_rl/rl_cfg.py
Missing @configclass decorator

RslRlCNNModelCfg is declared without @configclass, unlike the other config dataclasses. If this is intended to be used like the other cfg types (and nested CNNCfg already is), it won't get the same dataclass/configclass behavior and may break .to_dict()/Hydra/config serialization paths when selected. Add @configclass to RslRlCNNModelCfg.

kevinzakka added a commit to mujocolab/mjlab that referenced this pull request Feb 6, 2026
In rsl-rl 4.0.0, the Actor (MLPModel) contains the obs normalizer
internally. The velocity and tracking runners were extracting the
normalizer and passing it to the ONNX exporter, which wraps it and
applies it in forward(). Since the actor also applies its own internal
normalizer, observations were being normalized twice.

This was originally fixed in db076d9 but regressed in cc69cec.
Confirmed by referencing isaac-sim/IsaacLab#4379, where the rsl-rl
author uses the runner's built-in export (which doesn't pass a
separate normalizer) for v4.0.0.

Fix: pass normalizer=None so the exporter uses Identity, relying on
the actor's internal normalizer.
kevinzakka added a commit to mujocolab/mjlab that referenced this pull request Feb 6, 2026
In rsl-rl 4.0.0, the Actor (MLPModel) contains the obs normalizer
internally. The velocity and tracking runners were extracting the
normalizer and passing it to the ONNX exporter, which wraps it and
applies it in forward(). Since the actor also applies its own internal
normalizer, observations were being normalized twice.

This was originally fixed in db076d9 but regressed in cc69cec.
Confirmed by referencing isaac-sim/IsaacLab#4379, where the rsl-rl
author uses the runner's built-in export (which doesn't pass a
separate normalizer) for v4.0.0.

Fix: pass normalizer=None so the exporter uses Identity, relying on
the actor's internal normalizer.
kevinzakka added a commit to mujocolab/mjlab that referenced this pull request Feb 6, 2026
* upgrades rsl-rl-lib from 3.1.0 to 4.0.0

* fixes breaking changes

* makes legacy checkpoint format work

* adds function annotation to the runner

* removes annoying warning by changing policy to actor

* formats

* fixes ty checks

* updates tests

* updates notebook

* updates lab_api changelog

* reverts back the normalization logic

* fixes normalization

* uses pypi version instead of the github tag now that it has been published

* fix double normalization in ONNX export for rsl-rl 4.0.0

In rsl-rl 4.0.0, the Actor (MLPModel) contains the obs normalizer
internally. The velocity and tracking runners were extracting the
normalizer and passing it to the ONNX exporter, which wraps it and
applies it in forward(). Since the actor also applies its own internal
normalizer, observations were being normalized twice.

This was originally fixed in db076d9 but regressed in cc69cec.
Confirmed by referencing isaac-sim/IsaacLab#4379, where the rsl-rl
author uses the runner's built-in export (which doesn't pass a
separate normalizer) for v4.0.0.

Fix: pass normalizer=None so the exporter uses Identity, relying on
the actor's internal normalizer.

* Revert "fix double normalization in ONNX export for rsl-rl 4.0.0"

This reverts commit 4c30060.

* add test for ONNX exporter normalization correctness

Verifies that _OnnxPolicyExporter with the normalizer passed in
produces identical output to MLPModel.forward, and that omitting
the normalizer skips normalization entirely.

* fix comment style in rl/runner.py

* restore trailing newline in exporter.py

* move local torch import to module level in test_runner

---------

Co-authored-by: Kevin Zakka <kevinarmandzakka@gmail.com>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing share_cnn_encoders in config dataclass

The code uses cfg["algorithm"].pop("share_cnn_encoders", None) to configure shared encoders, but share_cnn_encoders is not defined in the RslRlPpoAlgorithmCfg dataclass.
This causes validation errors with strict config systems (e.g., Hydra/OmegaConf) when users attempt to set this parameter in YAML, as it's treated as an unexpected argument.
Consider adding share_cnn_encoders: bool = False to RslRlPpoAlgorithmCfg to support this feature properly and ensure type safety.

DavidDobas pushed a commit to DavidDobas/mjlab-hackathon that referenced this pull request Feb 25, 2026
* upgrades rsl-rl-lib from 3.1.0 to 4.0.0

* fixes breaking changes

* makes legacy checkpoint format work

* adds function annotation to the runner

* removes annoying warning by changing policy to actor

* formats

* fixes ty checks

* updates tests

* updates notebook

* updates lab_api changelog

* reverts back the normalization logic

* fixes normalization

* uses pypi version instead of the github tag now that it has been published

* fix double normalization in ONNX export for rsl-rl 4.0.0

In rsl-rl 4.0.0, the Actor (MLPModel) contains the obs normalizer
internally. The velocity and tracking runners were extracting the
normalizer and passing it to the ONNX exporter, which wraps it and
applies it in forward(). Since the actor also applies its own internal
normalizer, observations were being normalized twice.

This was originally fixed in db076d9da but regressed in cc69cec2e.
Confirmed by referencing isaac-sim/IsaacLab#4379, where the rsl-rl
author uses the runner's built-in export (which doesn't pass a
separate normalizer) for v4.0.0.

Fix: pass normalizer=None so the exporter uses Identity, relying on
the actor's internal normalizer.

* Revert "fix double normalization in ONNX export for rsl-rl 4.0.0"

This reverts commit 4c30060bf60017703386a7e3d171d0364f21815e.

* add test for ONNX exporter normalization correctness

Verifies that _OnnxPolicyExporter with the normalizer passed in
produces identical output to MLPModel.forward, and that omitting
the normalizer skips normalization entirely.

* fix comment style in rl/runner.py

* restore trailing newline in exporter.py

* move local torch import to module level in test_runner

---------

Co-authored-by: Kevin Zakka <kevinarmandzakka@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants