Skip to content

[Newton] Migrate more envs and mdps to warp#4690

Draft
hujc7 wants to merge 9 commits intoisaac-sim:dev/newtonfrom
hujc7:dev-newton-warp-mdp-mig
Draft

[Newton] Migrate more envs and mdps to warp#4690
hujc7 wants to merge 9 commits intoisaac-sim:dev/newtonfrom
hujc7:dev-newton-warp-mdp-mig

Conversation

@hujc7
Copy link

@hujc7 hujc7 commented Feb 23, 2026

Summary

Warp-first manager-based RL environment infrastructure and MDP term migration for Newton.

Infrastructure (commits 1-5, from dependency branch)

  • Warp-first ManagerBasedRLEnvWarp with ManagerCallSwitch for per-manager execution mode control (stable / warp / warp-captured)
  • Warp-first manager implementations: ActionManager, ObservationManager, EventManager, RewardManager, TerminationManager
  • SceneEntityCfg with body_ids_wp, joint_ids_wp, joint_mask for warp kernel dispatch
  • warp_capturable decorator and is_warp_capturable check for automatic CUDA graph capture fallback
  • manager_call_max_mode per-env capture ceiling (min(configured_mode, cap))
  • Configurable Scene_write_data_to_sim capture mode (was hardcoded non-captured)

MDP terms (commit 6)

Warp-first observation, reward, termination, event, and action terms verified against torch baselines:

  • Observations: base_pos_z, base_lin_vel, base_ang_vel, projected_gravity, joint_pos/vel/rel, last_action, generated_commands
  • Rewards: is_alive, is_terminated, lin_vel_z_l2, ang_vel_xy_l2, flat_orientation_l2, joint_torques_l2, joint_vel_l1/l2, joint_acc_l2, joint_deviation_l1, joint_pos_limits, action_rate_l2, action_l2, undesired_contacts, track_lin_vel_xy_exp, track_ang_vel_z_exp
  • Terminations: time_out, root_height_below_minimum, joint_pos_out_of_manual_limit, illegal_contact
  • Events: randomize_rigid_body_com, apply_external_force_torque, reset_root_state_uniform, reset_joints_by_scale/offset, push_by_setting_velocity
  • Actions: JointPositionAction, JointEffortAction

Terms accessing ArticulationData lazy TimestampedWarpBuffer properties (Tier 2) are marked @warp_capturable(False) to prevent stale data under CUDA graph capture.

Tested env configs (commit 7)

14 envs with training parity verified (warp-only and warp-capture vs torch baseline):

  • Classic: Cartpole, Humanoid, Ant
  • Locomotion velocity (flat): Anymal-B/C/D, G1-v0/v1, H1, Cassie, Unitree A1/Go1/Go2
  • Manipulation: Reach-Franka

Per-robot gym registrations, flat env cfgs, and task-specific MDP terms (humanoid observations/rewards, velocity rewards/terminations/curriculums, reach rewards).

Disabled envs (included but registration commented out)

  • Isaac-Velocity-Rough-Anymal-D-Warp-v0: requires isaaclab_physx (not yet on dev/newton)
  • Isaac-Reach-UR10-Warp-v0: USD asset composition error (broken asset)

Documentation

  • WARP_MIGRATION_GAP_ANALYSIS.md: Full MDP term catalog, per-task usage matrix, migration patterns
  • GRAPH_CAPTURE_MIGRATION.md: ArticulationData Tier 1/2/3 property analysis, capture failure mechanism, proposed materialize_derived() fix
  • MANAGER_TEST_COVERAGE.md: Capturability analysis

Test plan

  • Warp parity tests (3 test files: action, MDP, new terms)
  • Training parity: warp-only (mode=1) — all 14 envs within ±5% reward of torch baseline
  • Training parity: warp-capture (mode=2) — all 14 envs within ±5% reward after @warp_capturable(False) fix
  • Rough terrain variants (blocked by isaaclab_physx dependency)
  • Isaac-Reach-UR10-Warp-v0 (blocked by broken USD asset)

Dependencies

  • Commits 1-5 are from dev-newton-warp-mig-manager-based (pending merge into dev/newton)

Introduces ManagerBasedEnvWarp and ManagerBasedRLEnvWarp with
ManagerCallSwitch for per-manager stable/warp/captured mode selection.
Includes Newton articulation data extensions, RL wrapper adaptations,
and train script integration.
Warp-first implementations of all 7 managers (action, observation,
reward, termination, event, command, recorder) with mask-based reset
for CUDA graph compatibility. Includes MDP term library (observations,
rewards, terminations, events, actions), IO descriptors, and utility
modules (noise, modifiers, circular buffers, warp kernels).
Cartpole env config for the warp manager-based RL env, registered as
Isaac-Cartpole-Warp-v0. Includes warp-first custom reward term
(joint_pos_target_l2).
… updates

- Add manager_call_max_mode field for per-env capture ceiling (min(mode, cap))
- Support dict input for manager_call_config (in addition to JSON string)
- Add "Scene" to MANAGER_NAMES for configurable Scene_write_data_to_sim mode
- Remove hardcoded WARP_NOT_CAPTURED override from Scene_write_data_to_sim
- Add warp_capturable decorator and is_warp_capturable check for mode=2 fallback
- Update managers: action, observation, event with warp-first improvements
- Update scene_entity_cfg with body_ids_wp resolution
- Update train.py CLI arg handling
Warp-first observation, reward, termination, event, and action terms
referenced by the 14 verified training-parity envs.

Observations: base_pos_z, base_lin_vel, base_ang_vel, projected_gravity,
  joint_pos, joint_pos_rel, joint_pos_limit_normalized, joint_vel,
  joint_vel_rel, last_action, generated_commands
Rewards: is_alive, is_terminated, lin_vel_z_l2, ang_vel_xy_l2,
  flat_orientation_l2, joint_torques_l2, joint_vel_l1, joint_vel_l2,
  joint_acc_l2, joint_deviation_l1, joint_pos_limits, action_rate_l2,
  action_l2, undesired_contacts, track_lin_vel_xy_exp, track_ang_vel_z_exp
Terminations: time_out, root_height_below_minimum,
  joint_pos_out_of_manual_limit, illegal_contact
Events: randomize_rigid_body_com, apply_external_force_torque,
  reset_root_state_uniform, reset_joints_by_scale, reset_joints_by_offset,
  push_by_setting_velocity
Actions: JointPositionAction, JointEffortAction

Terms accessing lazy TimestampedWarpBuffer properties (Tier 2) are marked
@warp_capturable(False) to prevent stale data under CUDA graph capture.
Env configs and task-local MDP terms for 14 training-parity verified envs:
- Classic: Cartpole, Humanoid, Ant
- Locomotion velocity (flat): Anymal-B/C/D, G1-v0/v1, H1, Cassie,
  Unitree A1/Go1/Go2
- Manipulation: Reach-Franka

Per-robot config registrations (gym IDs) and flat env cfgs for all
tested locomotion and reach variants.

Task-specific MDP terms:
- Humanoid: base_yaw_roll, base_up_proj, base_heading_proj,
  base_angle_to_target, progress_reward, upright_posture_bonus,
  move_to_target_bonus, power_consumption, joint_pos_limits_penalty_ratio
- Velocity: feet_air_time, feet_air_time_positive_biped, feet_slide,
  track_lin_vel_xy_yaw_frame_exp, track_ang_vel_z_world_exp,
  stand_still_joint_deviation_l1, terrain_out_of_bounds, terrain_levels_vel
- Reach: position_command_error, position_command_error_tanh,
  orientation_command_error

Also includes:
- Warp parity tests (3 test files)
- WARP_MIGRATION_GAP_ANALYSIS.md (MDP term catalog and per-task usage)
- MANAGER_TEST_COVERAGE.md (capturability analysis)
- GRAPH_CAPTURE_MIGRATION.md (ArticulationData Tier 1/2/3 property analysis)
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 23, 2026

Too many files changed for review. (114 files found, 100 file limit)

@github-actions github-actions bot added the isaac-lab Related to Isaac Lab team label Feb 23, 2026
@hujc7 hujc7 marked this pull request as draft February 23, 2026 08:43
@hujc7
Copy link
Author

hujc7 commented Feb 23, 2026

Not sure if there's a way to only show the fils changes not in PR #4480. Probably best to merge the dependency first

@hujc7
Copy link
Author

hujc7 commented Feb 23, 2026

Migrated envs are showing similar training results. For convergence speed, it seems not relevant here as it's not consistent due to noise added.

Final training stats — Warp-only vs baseline

Task Base R Warp R R gap Base L Warp L Status
Isaac-Ant-Warp-v0 (0) 111.37 109.75 -1.5% 953.27 953.39 ⚪ ok
Isaac-Cartpole-Warp-v0 (1) 4.94 4.93 -0.2% 300.00 300.00 ⚪ ok
Isaac-Humanoid-Warp-v0 (2) 114.44 114.53 +0.1% 911.84 899.14 ⚪ ok
Isaac-Reach-Franka-Warp-v0 (3) -0.09 0.26 n/a 360.00 360.00 🟢 better
Isaac-Reach-UR10-Warp-v0 (4) - - - - - ⚫ n/a
Isaac-Velocity-Flat-Anymal-B-Warp-v0 (5) 26.28 26.64 +1.4% 1000.00 990.99 ⚪ ok
Isaac-Velocity-Flat-Anymal-C-Warp-v0 (6) 25.14 24.46 -2.7% 999.20 1000.00 ⚪ ok
Isaac-Velocity-Flat-Anymal-D-Warp-v0 (7) 23.91 24.74 +3.5% 1000.00 1000.00 ⚪ ok
Isaac-Velocity-Flat-Cassie-Warp-v0 (8) -6.27 - - 19.25 - ⚫ n/a
Isaac-Velocity-Flat-G1-Warp-v0 (9) 20.25 19.55 -3.5% 1000.00 1000.00 ⚪ ok
Isaac-Velocity-Flat-G1-Warp-v1 (10) 1.68 1.77 +5.4% 919.31 921.48 ⚪ ok
Isaac-Velocity-Flat-H1-Warp-v0 (11) 25.29 30.88 +22.1% 1000.00 1000.00 🟢 better
Isaac-Velocity-Flat-Unitree-A1-Warp-v0 (12) 38.81 39.41 +1.5% 990.26 981.27 ⚪ ok
Isaac-Velocity-Flat-Unitree-Go1-Warp-v0 (13) 39.33 40.80 +3.7% 1000.00 1000.00 ⚪ ok
Isaac-Velocity-Flat-Unitree-Go2-Warp-v0 (14) 39.71 39.83 +0.3% 1000.00 990.82 ⚪ ok
Isaac-Velocity-Rough-Anymal-D-Warp-v0 (15) - - - - - ⚫ n/a

Final training stats — Warp-capture vs baseline

Task Base R Capture R R gap Base L Capture L Status
Isaac-Ant-Warp-v0 (0) 111.37 111.92 +0.5% 953.27 944.00 ⚪ ok
Isaac-Cartpole-Warp-v0 (1) 4.94 4.93 -0.2% 300.00 300.00 ⚪ ok
Isaac-Humanoid-Warp-v0 (2) 114.44 127.80 +11.7% 911.84 927.64 ⚪ ok
Isaac-Reach-Franka-Warp-v0 (3) -0.09 0.40 n/a 360.00 360.00 🟢 better
Isaac-Reach-UR10-Warp-v0 (4) - - - - - ⚫ n/a
Isaac-Velocity-Flat-Anymal-B-Warp-v0 (5) 26.28 27.07 +3.0% 1000.00 1000.00 ⚪ ok
Isaac-Velocity-Flat-Anymal-C-Warp-v0 (6) 25.14 25.16 +0.1% 999.20 1000.00 ⚪ ok
Isaac-Velocity-Flat-Anymal-D-Warp-v0 (7) 23.91 24.41 +2.1% 1000.00 1000.00 ⚪ ok
Isaac-Velocity-Flat-Cassie-Warp-v0 (8) -6.27 -6.33 n/a 19.25 20.36 ⚪ ok
Isaac-Velocity-Flat-G1-Warp-v0 (9) 20.25 20.92 +3.3% 1000.00 1000.00 ⚪ ok
Isaac-Velocity-Flat-G1-Warp-v1 (10) 1.68 1.49 -11.3% 919.31 991.24 ⚪ ok
Isaac-Velocity-Flat-H1-Warp-v0 (11) 25.29 30.20 +19.4% 1000.00 991.82 ⚪ ok
Isaac-Velocity-Flat-Unitree-A1-Warp-v0 (12) 38.81 40.32 +3.9% 990.26 1000.00 ⚪ ok
Isaac-Velocity-Flat-Unitree-Go1-Warp-v0 (13) 39.33 40.89 +4.0% 1000.00 1000.00 ⚪ ok
Isaac-Velocity-Flat-Unitree-Go2-Warp-v0 (14) 39.71 40.14 +1.1% 1000.00 1000.00 ⚪ ok
Isaac-Velocity-Rough-Anymal-D-Warp-v0 (15) - - - - - ⚫ n/a

Convergence speed — Warp-only vs baseline

Task Target Base iter Warp iter Speed gap Status
Isaac-Ant-Warp-v0 (0) 950 32 35 +9.4% 🔴 slower
Isaac-Cartpole-Warp-v0 (1) 300 56 53 -5.4% 🟢 faster
Isaac-Humanoid-Warp-v0 (2) 900 128 159 +24.2% 🔴 slower
Isaac-Reach-Franka-Warp-v0 (3) 350 14 14 +0.0% ⚪ same
Isaac-Reach-UR10-Warp-v0 (4) - - - - ⚫ n/a
Isaac-Velocity-Flat-Anymal-B-Warp-v0 (5) 1000 110 110 +0.0% ⚪ same
Isaac-Velocity-Flat-Anymal-C-Warp-v0 (6) 1000 124 165 +33.1% 🔴 slower
Isaac-Velocity-Flat-Anymal-D-Warp-v0 (7) 1000 124 118 -4.8% 🟢 faster
Isaac-Velocity-Flat-Cassie-Warp-v0 (8) - - - - ⚫ n/a
Isaac-Velocity-Flat-G1-Warp-v0 (9) 1000 168 117 -30.4% 🟢 faster
Isaac-Velocity-Flat-G1-Warp-v1 (10) 1000 222 194 -12.6% 🟢 faster
Isaac-Velocity-Flat-H1-Warp-v0 (11) 1000 136 112 -17.6% 🟢 faster
Isaac-Velocity-Flat-Unitree-A1-Warp-v0 (12) 1000 148 104 -29.7% 🟢 faster
Isaac-Velocity-Flat-Unitree-Go1-Warp-v0 (13) 1000 102 114 +11.8% 🔴 slower
Isaac-Velocity-Flat-Unitree-Go2-Warp-v0 (14) 1000 62 89 +43.5% 🔴 slower
Isaac-Velocity-Rough-Anymal-D-Warp-v0 (15) - - - - ⚫ n/a

Convergence speed — Warp-capture vs baseline

Task Target Base iter Capture iter Speed gap Status
Isaac-Ant-Warp-v0 (0) 950 32 41 +28.1% 🔴 slower
Isaac-Cartpole-Warp-v0 (1) 300 56 53 -5.4% 🟢 faster
Isaac-Humanoid-Warp-v0 (2) 950 131 248 +89.3% 🔴 slower
Isaac-Reach-Franka-Warp-v0 (3) 350 14 14 +0.0% ⚪ same
Isaac-Reach-UR10-Warp-v0 (4) - - - - ⚫ n/a
Isaac-Velocity-Flat-Anymal-B-Warp-v0 (5) 1000 110 106 -3.6% 🟢 faster
Isaac-Velocity-Flat-Anymal-C-Warp-v0 (6) 1000 124 122 -1.6% 🟢 faster
Isaac-Velocity-Flat-Anymal-D-Warp-v0 (7) 1000 124 129 +4.0% 🔴 slower
Isaac-Velocity-Flat-Cassie-Warp-v0 (8) - - - - ⚫ n/a
Isaac-Velocity-Flat-G1-Warp-v0 (9) 1000 168 113 -32.7% 🟢 faster
Isaac-Velocity-Flat-G1-Warp-v1 (10) 1000 222 167 -24.8% 🟢 faster
Isaac-Velocity-Flat-H1-Warp-v0 (11) 1000 136 114 -16.2% 🟢 faster
Isaac-Velocity-Flat-Unitree-A1-Warp-v0 (12) 1000 148 116 -21.6% 🟢 faster
Isaac-Velocity-Flat-Unitree-Go1-Warp-v0 (13) 1000 102 100 -2.0% 🟢 faster
Isaac-Velocity-Flat-Unitree-Go2-Warp-v0 (14) 1000 62 45 -27.4% 🟢 faster
Isaac-Velocity-Rough-Anymal-D-Warp-v0 (15) - - - - ⚫ n/a

@hujc7
Copy link
Author

hujc7 commented Feb 24, 2026

Time performance gain

Warp-capture vs baseline (repeat=5 average, timer-only env_step)

Task Base env_step (us) Capture env_step (avg us) % change
Isaac-Ant-Warp-v0 (0) 12450.25 5384.53 -56.8%
Isaac-Cartpole-Warp-v0 (1) 9038.00 1357.52 -85.0%
Isaac-Humanoid-Warp-v0 (2) 20600.74 13653.34 -33.7%
Isaac-Reach-Franka-Warp-v0 (3) 12202.51 5863.21 -52.0%
Isaac-Reach-UR10-Warp-v0 (4) - - -
Isaac-Velocity-Flat-Anymal-B-Warp-v0 (5) 38029.59 27247.39 -28.4%
Isaac-Velocity-Flat-Anymal-C-Warp-v0 (6) 37881.29 27281.55 -28.0%
Isaac-Velocity-Flat-Anymal-D-Warp-v0 (7) 39227.52 27860.87 -29.0%
Isaac-Velocity-Flat-Cassie-Warp-v0 (8) 22765.51 11213.25 -50.7%
Isaac-Velocity-Flat-G1-Warp-v0 (9) 39951.73 27201.89 -31.9%
Isaac-Velocity-Flat-G1-Warp-v1 (10) 55177.31 42330.15 -23.3%
Isaac-Velocity-Flat-H1-Warp-v0 (11) 28866.51 16818.43 -41.7%
Isaac-Velocity-Flat-Unitree-A1-Warp-v0 (12) 20112.75 10467.07 -48.0%
Isaac-Velocity-Flat-Unitree-Go1-Warp-v0 (13) 20738.22 11996.80 -42.2%
Isaac-Velocity-Flat-Unitree-Go2-Warp-v0 (14) 18656.75 9831.00 -47.3%
Isaac-Velocity-Rough-Anymal-D-Warp-v0 (15) - - -

- Rewrite obs/reward kernels to consume Tier 1 compound types directly, bypassing lazy Tier 2 properties that break CUDA graph capture
- Update GRAPH_CAPTURE_MIGRATION.md and WARP_MIGRATION_GAP_ANALYSIS.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant