Update RSL-RL configs to work with new version 4.0#4379
Update RSL-RL configs to work with new version 4.0#4379ClemensSchwarke wants to merge 14 commits intoisaac-sim:mainfrom
Conversation
11521b9 to
dea941e
Compare
2ef7fc8 to
f3061a4
Compare
Greptile OverviewGreptile Summary
Confidence Score: 2/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant U as User CLI
participant P as play.py / train.py
participant CA as cli_args.py
participant H as Hydra/Registry
participant RL as isaaclab_rl.rsl_rl cfgs
participant R as rsl_rl Runner
U->>P: launch with --task/--agent
P->>H: load env_cfg + agent_cfg (entry_point)
P->>CA: update_rsl_rl_cfg(agent_cfg, args)
P->>CA: handle_deprecated_rsl_rl_cfg(agent_cfg, installed_version)
CA->>RL: map legacy policy<->model cfgs
P->>R: construct Runner(env, agent_cfg.to_dict())
R->>R: load checkpoint (play) / learn (train)
alt rsl-rl >= 4.0.0 (play)
P->>R: export_policy_to_jit / export_policy_to_onnx
else rsl-rl < 4.0.0 (play)
P->>RL: export_policy_as_jit / export_policy_as_onnx
end
|
| agent_cfg.empirical_normalization, type(MISSING) | ||
| ): | ||
| print( |
There was a problem hiding this comment.
Incorrect MISSING sentinel check
dataclasses.MISSING is a sentinel object, but the code checks isinstance(x, type(MISSING)), which is effectively isinstance(x, object) and will always be true for any normal value. This makes the deprecated-config logic behave incorrectly (e.g., treating policy as missing even when it's set) and can raise the "policy required" ValueError for valid configs. Use identity checks instead (e.g., agent_cfg.policy is MISSING) consistently throughout this function.
| obs, _, dones, _ = env.step(actions) | ||
| # reset recurrent states for episodes that have terminated | ||
| policy_nn.reset(dones) | ||
| if version.parse(installed_version) >= version.parse("4.0.0"): | ||
| policy.reset(dones) |
There was a problem hiding this comment.
Unconditional reset call
For rsl-rl >= 4.0.0, this calls policy.reset(dones) unconditionally, but policy = runner.get_inference_policy(...) may return a non-recurrent policy callable that does not implement reset(). That will crash during play for non-recurrent checkpoints. Consider guarding with hasattr(policy, "reset") (or only resetting when using a recurrent runner/config) to avoid AttributeError.
Additional Comments (1)
|
In rsl-rl 4.0.0, the Actor (MLPModel) contains the obs normalizer internally. The velocity and tracking runners were extracting the normalizer and passing it to the ONNX exporter, which wraps it and applies it in forward(). Since the actor also applies its own internal normalizer, observations were being normalized twice. This was originally fixed in db076d9 but regressed in cc69cec. Confirmed by referencing isaac-sim/IsaacLab#4379, where the rsl-rl author uses the runner's built-in export (which doesn't pass a separate normalizer) for v4.0.0. Fix: pass normalizer=None so the exporter uses Identity, relying on the actor's internal normalizer.
In rsl-rl 4.0.0, the Actor (MLPModel) contains the obs normalizer internally. The velocity and tracking runners were extracting the normalizer and passing it to the ONNX exporter, which wraps it and applies it in forward(). Since the actor also applies its own internal normalizer, observations were being normalized twice. This was originally fixed in db076d9 but regressed in cc69cec. Confirmed by referencing isaac-sim/IsaacLab#4379, where the rsl-rl author uses the runner's built-in export (which doesn't pass a separate normalizer) for v4.0.0. Fix: pass normalizer=None so the exporter uses Identity, relying on the actor's internal normalizer.
* upgrades rsl-rl-lib from 3.1.0 to 4.0.0 * fixes breaking changes * makes legacy checkpoint format work * adds function annotation to the runner * removes annoying warning by changing policy to actor * formats * fixes ty checks * updates tests * updates notebook * updates lab_api changelog * reverts back the normalization logic * fixes normalization * uses pypi version instead of the github tag now that it has been published * fix double normalization in ONNX export for rsl-rl 4.0.0 In rsl-rl 4.0.0, the Actor (MLPModel) contains the obs normalizer internally. The velocity and tracking runners were extracting the normalizer and passing it to the ONNX exporter, which wraps it and applies it in forward(). Since the actor also applies its own internal normalizer, observations were being normalized twice. This was originally fixed in db076d9 but regressed in cc69cec. Confirmed by referencing isaac-sim/IsaacLab#4379, where the rsl-rl author uses the runner's built-in export (which doesn't pass a separate normalizer) for v4.0.0. Fix: pass normalizer=None so the exporter uses Identity, relying on the actor's internal normalizer. * Revert "fix double normalization in ONNX export for rsl-rl 4.0.0" This reverts commit 4c30060. * add test for ONNX exporter normalization correctness Verifies that _OnnxPolicyExporter with the normalizer passed in produces identical output to MLPModel.forward, and that omitting the normalizer skips normalization entirely. * fix comment style in rl/runner.py * restore trailing newline in exporter.py * move local torch import to module level in test_runner --------- Co-authored-by: Kevin Zakka <kevinarmandzakka@gmail.com>
There was a problem hiding this comment.
Missing share_cnn_encoders in config dataclass
The code uses cfg["algorithm"].pop("share_cnn_encoders", None) to configure shared encoders, but share_cnn_encoders is not defined in the RslRlPpoAlgorithmCfg dataclass.
This causes validation errors with strict config systems (e.g., Hydra/OmegaConf) when users attempt to set this parameter in YAML, as it's treated as an unexpected argument.
Consider adding share_cnn_encoders: bool = False to RslRlPpoAlgorithmCfg to support this feature properly and ensure type safety.
* upgrades rsl-rl-lib from 3.1.0 to 4.0.0 * fixes breaking changes * makes legacy checkpoint format work * adds function annotation to the runner * removes annoying warning by changing policy to actor * formats * fixes ty checks * updates tests * updates notebook * updates lab_api changelog * reverts back the normalization logic * fixes normalization * uses pypi version instead of the github tag now that it has been published * fix double normalization in ONNX export for rsl-rl 4.0.0 In rsl-rl 4.0.0, the Actor (MLPModel) contains the obs normalizer internally. The velocity and tracking runners were extracting the normalizer and passing it to the ONNX exporter, which wraps it and applies it in forward(). Since the actor also applies its own internal normalizer, observations were being normalized twice. This was originally fixed in db076d9da but regressed in cc69cec2e. Confirmed by referencing isaac-sim/IsaacLab#4379, where the rsl-rl author uses the runner's built-in export (which doesn't pass a separate normalizer) for v4.0.0. Fix: pass normalizer=None so the exporter uses Identity, relying on the actor's internal normalizer. * Revert "fix double normalization in ONNX export for rsl-rl 4.0.0" This reverts commit 4c30060bf60017703386a7e3d171d0364f21815e. * add test for ONNX exporter normalization correctness Verifies that _OnnxPolicyExporter with the normalizer passed in produces identical output to MLPModel.forward, and that omitting the normalizer skips normalization entirely. * fix comment style in rl/runner.py * restore trailing newline in exporter.py * move local torch import to module level in test_runner --------- Co-authored-by: Kevin Zakka <kevinarmandzakka@gmail.com>
This PR updates the rsl-rl config classes to be compatible with version 4.0 and 4.1. It also adds a configuration handling function to ensure that old configs work with version 4.0/4.1 and adapts
train.pyandplay.pyaccordingly. Lastly, it updates the config classes for ANYmal D locomotion for reference and adds recurrent configs.The main change for 4.0 is that the policy config is split into seperate actor and critic / student and teacher configs. For more details see the Release Notes.
The main change for 4.1 is that the noise distribution is now configured via the
distribution_cfgto allow for arbitrary distributions.Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfile