Update RSL-RL configs to work with new version 4.0 by ClemensSchwarke · Pull Request #4379 · isaac-sim/IsaacLab

ClemensSchwarke · 2026-01-13T16:51:31Z

This PR updates the rsl-rl config classes to be compatible with version 4.0 and 4.1. It also adds a configuration handling function to ensure that old configs work with version 4.0/4.1 and adapts train.py and play.py accordingly. Lastly, it updates the config classes for ANYmal D locomotion for reference and adds recurrent configs.

The main change for 4.0 is that the policy config is split into seperate actor and critic / student and teacher configs. For more details see the Release Notes.

The main change for 4.1 is that the noise distribution is now configured via the distribution_cfg to allow for arbitrary distributions.

Checklist

I have run the pre-commit checks with ./isaaclab.sh --format
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have updated the changelog and the corresponding version in the extension's config/extension.toml file

…olver

greptile-apps · 2026-02-06T07:38:17Z

Greptile Overview

Greptile Summary

Updates IsaacLab’s RSL-RL config schema for rsl-rl v4.0.0 by splitting legacy policy into explicit model configs (actor/critic, student/teacher) and adding recurrent variants.
Adds a compatibility shim (handle_deprecated_rsl_rl_cfg) to translate old/new config fields depending on installed rsl-rl version, and wires it into train.py/play.py.
Updates ANYmal-D locomotion reference configs and gym registry kwargs to expose recurrent PPO/distillation entry points.
Pins rsl-rl-lib extra dependency to 4.0.0 and adapts play-time export to use new runner export APIs when available.

Confidence Score: 2/5

This PR is not safe to merge until a couple of runtime-breaking issues are fixed.
Two changes are likely to cause immediate runtime failures: (1) handle_deprecated_rsl_rl_cfg uses isinstance(x, type(MISSING)) to detect missing config fields, which will evaluate true for almost any value and breaks the compatibility logic; (2) play.py unconditionally calls policy.reset() for rsl-rl >= 4.0.0, which will fail for non-recurrent inference policies that don’t implement reset(). Additionally, RslRlCNNModelCfg lacks the @configclass decorator, which can break config serialization if used.
scripts/reinforcement_learning/rsl_rl/cli_args.py, scripts/reinforcement_learning/rsl_rl/play.py, source/isaaclab_rl/isaaclab_rl/rsl_rl/rl_cfg.py

Important Files Changed

Filename	Overview
scripts/reinforcement_learning/rsl_rl/cli_args.py	Adds handle_deprecated_rsl_rl_cfg() to translate deprecated runner/policy fields across rsl-rl <4 and >=4; likely has incorrect MISSING detection and may mis-handle optimizer removal.
scripts/reinforcement_learning/rsl_rl/play.py	Adds rsl-rl version check, runs deprecated-config handler, and uses new runner export APIs for >=4; changes recurrent reset to call policy.reset() which may not exist for non-recurrent policies.
scripts/reinforcement_learning/rsl_rl/train.py	Invokes deprecated-config handler before setting env seed; otherwise unchanged; relies on installed_version defined earlier in file.
source/isaaclab_rl/isaaclab_rl/rsl_rl/distillation_cfg.py	Introduces explicit student/teacher model cfgs in distillation runner while keeping deprecated policy cfg for backward compatibility; structural reordering only.
source/isaaclab_rl/isaaclab_rl/rsl_rl/rl_cfg.py	Splits old policy cfg into new model cfgs (MLP/RNN/CNN) and adds optimizer to PPO algo; keeps deprecated ActorCritic configs; potential missing @configclass on CNN model.
source/isaaclab_rl/setup.py	Bumps rsl-rl extra dependency pin from 3.1.2 to 4.0.0.
source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/anymal_d/init.py	Registers additional gym kwargs for recurrent RSL-RL PPO/distillation runner entry points.
source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/anymal_d/agents/rsl_rl_distillation_cfg.py	Updates ANYmal-D distillation runner to new student/teacher model cfgs and adds recurrent variant; obs_groups keys updated to student/teacher.
source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/anymal_d/agents/rsl_rl_ppo_cfg.py	Updates ANYmal-D PPO runner to new actor/critic model cfgs and adds recurrent runner config; adjusts obs_groups accordingly.

Sequence Diagram

sequenceDiagram
    participant U as User CLI
    participant P as play.py / train.py
    participant CA as cli_args.py
    participant H as Hydra/Registry
    participant RL as isaaclab_rl.rsl_rl cfgs
    participant R as rsl_rl Runner

    U->>P: launch with --task/--agent
    P->>H: load env_cfg + agent_cfg (entry_point)
    P->>CA: update_rsl_rl_cfg(agent_cfg, args)
    P->>CA: handle_deprecated_rsl_rl_cfg(agent_cfg, installed_version)
    CA->>RL: map legacy policy<->model cfgs
    P->>R: construct Runner(env, agent_cfg.to_dict())
    R->>R: load checkpoint (play) / learn (train)
    alt rsl-rl >= 4.0.0 (play)
        P->>R: export_policy_to_jit / export_policy_to_onnx
    else rsl-rl < 4.0.0 (play)
        P->>RL: export_policy_as_jit / export_policy_as_onnx
    end

greptile-apps

_{9 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-06T07:38:24Z

scripts/reinforcement_learning/rsl_rl/cli_args.py

+            agent_cfg.empirical_normalization, type(MISSING)
+        ):
+            print(


Incorrect MISSING sentinel check

dataclasses.MISSING is a sentinel object, but the code checks isinstance(x, type(MISSING)), which is effectively isinstance(x, object) and will always be true for any normal value. This makes the deprecated-config logic behave incorrectly (e.g., treating policy as missing even when it's set) and can raise the "policy required" ValueError for valid configs. Use identity checks instead (e.g., agent_cfg.policy is MISSING) consistently throughout this function.

greptile-apps · 2026-02-06T07:38:26Z

scripts/reinforcement_learning/rsl_rl/play.py

            obs, _, dones, _ = env.step(actions)
            # reset recurrent states for episodes that have terminated
-            policy_nn.reset(dones)
+            if version.parse(installed_version) >= version.parse("4.0.0"):
+                policy.reset(dones)


Unconditional reset call

For rsl-rl >= 4.0.0, this calls policy.reset(dones) unconditionally, but policy = runner.get_inference_policy(...) may return a non-recurrent policy callable that does not implement reset(). That will crash during play for non-recurrent checkpoints. Consider guarding with hasattr(policy, "reset") (or only resetting when using a recurrent runner/config) to avoid AttributeError.

greptile-apps · 2026-02-06T07:38:37Z

Additional Comments (1)

source/isaaclab_rl/isaaclab_rl/rsl_rl/rl_cfg.py
Missing @configclass decorator

RslRlCNNModelCfg is declared without @configclass, unlike the other config dataclasses. If this is intended to be used like the other cfg types (and nested CNNCfg already is), it won't get the same dataclass/configclass behavior and may break .to_dict()/Hydra/config serialization paths when selected. Add @configclass to RslRlCNNModelCfg.

In rsl-rl 4.0.0, the Actor (MLPModel) contains the obs normalizer internally. The velocity and tracking runners were extracting the normalizer and passing it to the ONNX exporter, which wraps it and applies it in forward(). Since the actor also applies its own internal normalizer, observations were being normalized twice. This was originally fixed in db076d9 but regressed in cc69cec. Confirmed by referencing isaac-sim/IsaacLab#4379, where the rsl-rl author uses the runner's built-in export (which doesn't pass a separate normalizer) for v4.0.0. Fix: pass normalizer=None so the exporter uses Identity, relying on the actor's internal normalizer.

* upgrades rsl-rl-lib from 3.1.0 to 4.0.0 * fixes breaking changes * makes legacy checkpoint format work * adds function annotation to the runner * removes annoying warning by changing policy to actor * formats * fixes ty checks * updates tests * updates notebook * updates lab_api changelog * reverts back the normalization logic * fixes normalization * uses pypi version instead of the github tag now that it has been published * fix double normalization in ONNX export for rsl-rl 4.0.0 In rsl-rl 4.0.0, the Actor (MLPModel) contains the obs normalizer internally. The velocity and tracking runners were extracting the normalizer and passing it to the ONNX exporter, which wraps it and applies it in forward(). Since the actor also applies its own internal normalizer, observations were being normalized twice. This was originally fixed in db076d9 but regressed in cc69cec. Confirmed by referencing isaac-sim/IsaacLab#4379, where the rsl-rl author uses the runner's built-in export (which doesn't pass a separate normalizer) for v4.0.0. Fix: pass normalizer=None so the exporter uses Identity, relying on the actor's internal normalizer. * Revert "fix double normalization in ONNX export for rsl-rl 4.0.0" This reverts commit 4c30060. * add test for ONNX exporter normalization correctness Verifies that _OnnxPolicyExporter with the normalizer passed in produces identical output to MLPModel.forward, and that omitting the normalizer skips normalization entirely. * fix comment style in rl/runner.py * restore trailing newline in exporter.py * move local torch import to module level in test_runner --------- Co-authored-by: Kevin Zakka <kevinarmandzakka@gmail.com>

Wenkai-Dong · 2026-02-15T23:04:54Z

source/isaaclab_rl/isaaclab_rl/rsl_rl/rl_cfg.py

Missing share_cnn_encoders in config dataclass

The code uses cfg["algorithm"].pop("share_cnn_encoders", None) to configure shared encoders, but share_cnn_encoders is not defined in the RslRlPpoAlgorithmCfg dataclass.
This causes validation errors with strict config systems (e.g., Hydra/OmegaConf) when users attempt to set this parameter in YAML, as it's treated as an unexpected argument.
Consider adding share_cnn_encoders: bool = False to RslRlPpoAlgorithmCfg to support this feature properly and ensure type safety.

* upgrades rsl-rl-lib from 3.1.0 to 4.0.0 * fixes breaking changes * makes legacy checkpoint format work * adds function annotation to the runner * removes annoying warning by changing policy to actor * formats * fixes ty checks * updates tests * updates notebook * updates lab_api changelog * reverts back the normalization logic * fixes normalization * uses pypi version instead of the github tag now that it has been published * fix double normalization in ONNX export for rsl-rl 4.0.0 In rsl-rl 4.0.0, the Actor (MLPModel) contains the obs normalizer internally. The velocity and tracking runners were extracting the normalizer and passing it to the ONNX exporter, which wraps it and applies it in forward(). Since the actor also applies its own internal normalizer, observations were being normalized twice. This was originally fixed in db076d9da but regressed in cc69cec2e. Confirmed by referencing isaac-sim/IsaacLab#4379, where the rsl-rl author uses the runner's built-in export (which doesn't pass a separate normalizer) for v4.0.0. Fix: pass normalizer=None so the exporter uses Identity, relying on the actor's internal normalizer. * Revert "fix double normalization in ONNX export for rsl-rl 4.0.0" This reverts commit 4c30060bf60017703386a7e3d171d0364f21815e. * add test for ONNX exporter normalization correctness Verifies that _OnnxPolicyExporter with the normalizer passed in produces identical output to MLPModel.forward, and that omitting the normalizer skips normalization entirely. * fix comment style in rl/runner.py * restore trailing newline in exporter.py * move local torch import to module level in test_runner --------- Co-authored-by: Kevin Zakka <kevinarmandzakka@gmail.com>

github-actions bot added enhancement New feature or request isaac-lab Related to Isaac Lab team labels Jan 13, 2026

update configs

dea941e

ClemensSchwarke changed the title ~~Update RSL-RL configs to work with new version 3.4~~ Update RSL-RL configs to work with new version 4.0 Jan 14, 2026

ClemensSchwarke force-pushed the feature/rsl_rl_3.4_config_updates branch from 11521b9 to dea941e Compare January 14, 2026 09:30

ClemensSchwarke mentioned this pull request Jan 14, 2026

Non recurrent critic leggedrobotics/rsl_rl#155

Closed

ClemensSchwarke added 2 commits January 14, 2026 13:39

add test configs

ee536db

add obs_groups

3b72783

ClemensSchwarke mentioned this pull request Jan 23, 2026

Separates actor and critic leggedrobotics/rsl_rl#159

Merged

Mayankm96 force-pushed the main branch 2 times, most recently from 2ef7fc8 to f3061a4 Compare January 28, 2026 16:16

ClemensSchwarke added 4 commits February 4, 2026 17:42

add old configs again for compatibility and add deprecated config res…

5f5559e

…olver

add deprecated header

cdb00d7

throw error for newer configs with old rsl-rl versions

e51e1ab

update play script for compatibility with all rsl-rl versions

95e83e1

ClemensSchwarke mentioned this pull request Feb 5, 2026

Distributed training hangs in v3.2.0 when using wandb leggedrobotics/rsl_rl#150

Closed

update default rsl_rl

5508066

ClemensSchwarke marked this pull request as ready for review February 6, 2026 07:33

ClemensSchwarke requested review from Mayankm96 and ooctipus as code owners February 6, 2026 07:33

greptile-apps bot reviewed Feb 6, 2026

View reviewed changes

add missing configclass decorator

9b50f08

update version

d40877c

ClemensSchwarke mentioned this pull request Feb 11, 2026

ONNX Export Fails for ActorCriticCNN with Mixed 1D + CNN Observations leggedrobotics/rsl_rl#176

Open

Wenkai-Dong reviewed Feb 15, 2026

View reviewed changes

ClemensSchwarke added 3 commits February 16, 2026 09:45

add share_cnn_encoders option

39793d0

add distribution config

20a7cc8

remove non-needed attributes

967e846

delete old parameters properly

d1504c2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update RSL-RL configs to work with new version 4.0#4379

Update RSL-RL configs to work with new version 4.0#4379
ClemensSchwarke wants to merge 14 commits intoisaac-sim:mainfrom
ClemensSchwarke:feature/rsl_rl_3.4_config_updates

ClemensSchwarke commented Jan 13, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Feb 6, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 6, 2026

Uh oh!

greptile-apps bot Feb 6, 2026

Uh oh!

greptile-apps bot commented Feb 6, 2026

Uh oh!

Wenkai-Dong Feb 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ClemensSchwarke commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

greptile-apps bot commented Feb 6, 2026

Greptile Overview

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Feb 6, 2026

Uh oh!

Wenkai-Dong Feb 15, 2026

Choose a reason for hiding this comment

Missing share_cnn_encoders in config dataclass

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ClemensSchwarke commented Jan 13, 2026 •

edited

Loading