[Newton] Migrate more envs and mdps to warp by hujc7 · Pull Request #4690 · isaac-sim/IsaacLab

hujc7 · 2026-02-23T08:40:24Z

Summary

Warp-first manager-based RL environment infrastructure and MDP term migration for Newton.

Infrastructure (commits 1-5, from dependency branch)

Warp-first ManagerBasedRLEnvWarp with ManagerCallSwitch for per-manager execution mode control (stable / warp / warp-captured)
Warp-first manager implementations: ActionManager, ObservationManager, EventManager, RewardManager, TerminationManager
SceneEntityCfg with body_ids_wp, joint_ids_wp, joint_mask for warp kernel dispatch
warp_capturable decorator and is_warp_capturable check for automatic CUDA graph capture fallback
manager_call_max_mode per-env capture ceiling (min(configured_mode, cap))
Configurable Scene_write_data_to_sim capture mode (was hardcoded non-captured)

MDP terms (commit 6)

Warp-first observation, reward, termination, event, and action terms verified against torch baselines:

Observations: base_pos_z, base_lin_vel, base_ang_vel, projected_gravity, joint_pos/vel/rel, last_action, generated_commands
Rewards: is_alive, is_terminated, lin_vel_z_l2, ang_vel_xy_l2, flat_orientation_l2, joint_torques_l2, joint_vel_l1/l2, joint_acc_l2, joint_deviation_l1, joint_pos_limits, action_rate_l2, action_l2, undesired_contacts, track_lin_vel_xy_exp, track_ang_vel_z_exp
Terminations: time_out, root_height_below_minimum, joint_pos_out_of_manual_limit, illegal_contact
Events: randomize_rigid_body_com, apply_external_force_torque, reset_root_state_uniform, reset_joints_by_scale/offset, push_by_setting_velocity
Actions: JointPositionAction, JointEffortAction

Terms accessing ArticulationData lazy TimestampedWarpBuffer properties (Tier 2) are marked @warp_capturable(False) to prevent stale data under CUDA graph capture.

Tested env configs (commit 7)

14 envs with training parity verified (warp-only and warp-capture vs torch baseline):

Classic: Cartpole, Humanoid, Ant
Locomotion velocity (flat): Anymal-B/C/D, G1-v0/v1, H1, Cassie, Unitree A1/Go1/Go2
Manipulation: Reach-Franka

Per-robot gym registrations, flat env cfgs, and task-specific MDP terms (humanoid observations/rewards, velocity rewards/terminations/curriculums, reach rewards).

Disabled envs (included but registration commented out)

Isaac-Velocity-Rough-Anymal-D-Warp-v0: requires isaaclab_physx (not yet on dev/newton)
Isaac-Reach-UR10-Warp-v0: USD asset composition error (broken asset)

Documentation

WARP_MIGRATION_GAP_ANALYSIS.md: Full MDP term catalog, per-task usage matrix, migration patterns
GRAPH_CAPTURE_MIGRATION.md: ArticulationData Tier 1/2/3 property analysis, capture failure mechanism, proposed materialize_derived() fix
MANAGER_TEST_COVERAGE.md: Capturability analysis

Test plan

Warp parity tests (3 test files: action, MDP, new terms)
Training parity: warp-only (mode=1) — all 14 envs within ±5% reward of torch baseline
Training parity: warp-capture (mode=2) — all 14 envs within ±5% reward after @warp_capturable(False) fix
Rough terrain variants (blocked by isaaclab_physx dependency)
Isaac-Reach-UR10-Warp-v0 (blocked by broken USD asset)

Dependencies

Commits 1-5 are from dev-newton-warp-mig-manager-based (pending merge into dev/newton)

Introduces ManagerBasedEnvWarp and ManagerBasedRLEnvWarp with ManagerCallSwitch for per-manager stable/warp/captured mode selection. Includes Newton articulation data extensions, RL wrapper adaptations, and train script integration.

Warp-first implementations of all 7 managers (action, observation, reward, termination, event, command, recorder) with mask-based reset for CUDA graph compatibility. Includes MDP term library (observations, rewards, terminations, events, actions), IO descriptors, and utility modules (noise, modifiers, circular buffers, warp kernels).

Cartpole env config for the warp manager-based RL env, registered as Isaac-Cartpole-Warp-v0. Includes warp-first custom reward term (joint_pos_target_l2).

… updates - Add manager_call_max_mode field for per-env capture ceiling (min(mode, cap)) - Support dict input for manager_call_config (in addition to JSON string) - Add "Scene" to MANAGER_NAMES for configurable Scene_write_data_to_sim mode - Remove hardcoded WARP_NOT_CAPTURED override from Scene_write_data_to_sim - Add warp_capturable decorator and is_warp_capturable check for mode=2 fallback - Update managers: action, observation, event with warp-first improvements - Update scene_entity_cfg with body_ids_wp resolution - Update train.py CLI arg handling

Warp-first observation, reward, termination, event, and action terms referenced by the 14 verified training-parity envs. Observations: base_pos_z, base_lin_vel, base_ang_vel, projected_gravity, joint_pos, joint_pos_rel, joint_pos_limit_normalized, joint_vel, joint_vel_rel, last_action, generated_commands Rewards: is_alive, is_terminated, lin_vel_z_l2, ang_vel_xy_l2, flat_orientation_l2, joint_torques_l2, joint_vel_l1, joint_vel_l2, joint_acc_l2, joint_deviation_l1, joint_pos_limits, action_rate_l2, action_l2, undesired_contacts, track_lin_vel_xy_exp, track_ang_vel_z_exp Terminations: time_out, root_height_below_minimum, joint_pos_out_of_manual_limit, illegal_contact Events: randomize_rigid_body_com, apply_external_force_torque, reset_root_state_uniform, reset_joints_by_scale, reset_joints_by_offset, push_by_setting_velocity Actions: JointPositionAction, JointEffortAction Terms accessing lazy TimestampedWarpBuffer properties (Tier 2) are marked @warp_capturable(False) to prevent stale data under CUDA graph capture.

Env configs and task-local MDP terms for 14 training-parity verified envs: - Classic: Cartpole, Humanoid, Ant - Locomotion velocity (flat): Anymal-B/C/D, G1-v0/v1, H1, Cassie, Unitree A1/Go1/Go2 - Manipulation: Reach-Franka Per-robot config registrations (gym IDs) and flat env cfgs for all tested locomotion and reach variants. Task-specific MDP terms: - Humanoid: base_yaw_roll, base_up_proj, base_heading_proj, base_angle_to_target, progress_reward, upright_posture_bonus, move_to_target_bonus, power_consumption, joint_pos_limits_penalty_ratio - Velocity: feet_air_time, feet_air_time_positive_biped, feet_slide, track_lin_vel_xy_yaw_frame_exp, track_ang_vel_z_world_exp, stand_still_joint_deviation_l1, terrain_out_of_bounds, terrain_levels_vel - Reach: position_command_error, position_command_error_tanh, orientation_command_error Also includes: - Warp parity tests (3 test files) - WARP_MIGRATION_GAP_ANALYSIS.md (MDP term catalog and per-task usage) - MANAGER_TEST_COVERAGE.md (capturability analysis) - GRAPH_CAPTURE_MIGRATION.md (ArticulationData Tier 1/2/3 property analysis)

greptile-apps · 2026-02-23T08:40:31Z

Too many files changed for review. (114 files found, 100 file limit)

hujc7 · 2026-02-23T08:44:03Z

Not sure if there's a way to only show the fils changes not in PR #4480. Probably best to merge the dependency first

hujc7 · 2026-02-23T09:51:22Z

Migrated envs are showing similar training results. For convergence speed, it seems not relevant here as it's not consistent due to noise added.

Final training stats — Warp-only vs baseline

Task	Base R	Warp R	R gap	Base L	Warp L	Status
Isaac-Ant-Warp-v0 (0)	111.37	109.75	-1.5%	953.27	953.39	⚪ ok
Isaac-Cartpole-Warp-v0 (1)	4.94	4.93	-0.2%	300.00	300.00	⚪ ok
Isaac-Humanoid-Warp-v0 (2)	114.44	114.53	+0.1%	911.84	899.14	⚪ ok
Isaac-Reach-Franka-Warp-v0 (3)	-0.09	0.26	n/a	360.00	360.00	🟢 better
Isaac-Reach-UR10-Warp-v0 (4)	-	-	-	-	-	⚫ n/a
Isaac-Velocity-Flat-Anymal-B-Warp-v0 (5)	26.28	26.64	+1.4%	1000.00	990.99	⚪ ok
Isaac-Velocity-Flat-Anymal-C-Warp-v0 (6)	25.14	24.46	-2.7%	999.20	1000.00	⚪ ok
Isaac-Velocity-Flat-Anymal-D-Warp-v0 (7)	23.91	24.74	+3.5%	1000.00	1000.00	⚪ ok
Isaac-Velocity-Flat-Cassie-Warp-v0 (8)	-6.27	-	-	19.25	-	⚫ n/a
Isaac-Velocity-Flat-G1-Warp-v0 (9)	20.25	19.55	-3.5%	1000.00	1000.00	⚪ ok
Isaac-Velocity-Flat-G1-Warp-v1 (10)	1.68	1.77	+5.4%	919.31	921.48	⚪ ok
Isaac-Velocity-Flat-H1-Warp-v0 (11)	25.29	30.88	+22.1%	1000.00	1000.00	🟢 better
Isaac-Velocity-Flat-Unitree-A1-Warp-v0 (12)	38.81	39.41	+1.5%	990.26	981.27	⚪ ok
Isaac-Velocity-Flat-Unitree-Go1-Warp-v0 (13)	39.33	40.80	+3.7%	1000.00	1000.00	⚪ ok
Isaac-Velocity-Flat-Unitree-Go2-Warp-v0 (14)	39.71	39.83	+0.3%	1000.00	990.82	⚪ ok
Isaac-Velocity-Rough-Anymal-D-Warp-v0 (15)	-	-	-	-	-	⚫ n/a

Final training stats — Warp-capture vs baseline

Task	Base R	Capture R	R gap	Base L	Capture L	Status
Isaac-Ant-Warp-v0 (0)	111.37	111.92	+0.5%	953.27	944.00	⚪ ok
Isaac-Cartpole-Warp-v0 (1)	4.94	4.93	-0.2%	300.00	300.00	⚪ ok
Isaac-Humanoid-Warp-v0 (2)	114.44	127.80	+11.7%	911.84	927.64	⚪ ok
Isaac-Reach-Franka-Warp-v0 (3)	-0.09	0.40	n/a	360.00	360.00	🟢 better
Isaac-Reach-UR10-Warp-v0 (4)	-	-	-	-	-	⚫ n/a
Isaac-Velocity-Flat-Anymal-B-Warp-v0 (5)	26.28	27.07	+3.0%	1000.00	1000.00	⚪ ok
Isaac-Velocity-Flat-Anymal-C-Warp-v0 (6)	25.14	25.16	+0.1%	999.20	1000.00	⚪ ok
Isaac-Velocity-Flat-Anymal-D-Warp-v0 (7)	23.91	24.41	+2.1%	1000.00	1000.00	⚪ ok
Isaac-Velocity-Flat-Cassie-Warp-v0 (8)	-6.27	-6.33	n/a	19.25	20.36	⚪ ok
Isaac-Velocity-Flat-G1-Warp-v0 (9)	20.25	20.92	+3.3%	1000.00	1000.00	⚪ ok
Isaac-Velocity-Flat-G1-Warp-v1 (10)	1.68	1.49	-11.3%	919.31	991.24	⚪ ok
Isaac-Velocity-Flat-H1-Warp-v0 (11)	25.29	30.20	+19.4%	1000.00	991.82	⚪ ok
Isaac-Velocity-Flat-Unitree-A1-Warp-v0 (12)	38.81	40.32	+3.9%	990.26	1000.00	⚪ ok
Isaac-Velocity-Flat-Unitree-Go1-Warp-v0 (13)	39.33	40.89	+4.0%	1000.00	1000.00	⚪ ok
Isaac-Velocity-Flat-Unitree-Go2-Warp-v0 (14)	39.71	40.14	+1.1%	1000.00	1000.00	⚪ ok
Isaac-Velocity-Rough-Anymal-D-Warp-v0 (15)	-	-	-	-	-	⚫ n/a

Convergence speed — Warp-only vs baseline

Task	Target	Base iter	Warp iter	Speed gap	Status
Isaac-Ant-Warp-v0 (0)	950	32	35	+9.4%	🔴 slower
Isaac-Cartpole-Warp-v0 (1)	300	56	53	-5.4%	🟢 faster
Isaac-Humanoid-Warp-v0 (2)	900	128	159	+24.2%	🔴 slower
Isaac-Reach-Franka-Warp-v0 (3)	350	14	14	+0.0%	⚪ same
Isaac-Reach-UR10-Warp-v0 (4)	-	-	-	-	⚫ n/a
Isaac-Velocity-Flat-Anymal-B-Warp-v0 (5)	1000	110	110	+0.0%	⚪ same
Isaac-Velocity-Flat-Anymal-C-Warp-v0 (6)	1000	124	165	+33.1%	🔴 slower
Isaac-Velocity-Flat-Anymal-D-Warp-v0 (7)	1000	124	118	-4.8%	🟢 faster
Isaac-Velocity-Flat-Cassie-Warp-v0 (8)	-	-	-	-	⚫ n/a
Isaac-Velocity-Flat-G1-Warp-v0 (9)	1000	168	117	-30.4%	🟢 faster
Isaac-Velocity-Flat-G1-Warp-v1 (10)	1000	222	194	-12.6%	🟢 faster
Isaac-Velocity-Flat-H1-Warp-v0 (11)	1000	136	112	-17.6%	🟢 faster
Isaac-Velocity-Flat-Unitree-A1-Warp-v0 (12)	1000	148	104	-29.7%	🟢 faster
Isaac-Velocity-Flat-Unitree-Go1-Warp-v0 (13)	1000	102	114	+11.8%	🔴 slower
Isaac-Velocity-Flat-Unitree-Go2-Warp-v0 (14)	1000	62	89	+43.5%	🔴 slower
Isaac-Velocity-Rough-Anymal-D-Warp-v0 (15)	-	-	-	-	⚫ n/a

Convergence speed — Warp-capture vs baseline

Task	Target	Base iter	Capture iter	Speed gap	Status
Isaac-Ant-Warp-v0 (0)	950	32	41	+28.1%	🔴 slower
Isaac-Cartpole-Warp-v0 (1)	300	56	53	-5.4%	🟢 faster
Isaac-Humanoid-Warp-v0 (2)	950	131	248	+89.3%	🔴 slower
Isaac-Reach-Franka-Warp-v0 (3)	350	14	14	+0.0%	⚪ same
Isaac-Reach-UR10-Warp-v0 (4)	-	-	-	-	⚫ n/a
Isaac-Velocity-Flat-Anymal-B-Warp-v0 (5)	1000	110	106	-3.6%	🟢 faster
Isaac-Velocity-Flat-Anymal-C-Warp-v0 (6)	1000	124	122	-1.6%	🟢 faster
Isaac-Velocity-Flat-Anymal-D-Warp-v0 (7)	1000	124	129	+4.0%	🔴 slower
Isaac-Velocity-Flat-Cassie-Warp-v0 (8)	-	-	-	-	⚫ n/a
Isaac-Velocity-Flat-G1-Warp-v0 (9)	1000	168	113	-32.7%	🟢 faster
Isaac-Velocity-Flat-G1-Warp-v1 (10)	1000	222	167	-24.8%	🟢 faster
Isaac-Velocity-Flat-H1-Warp-v0 (11)	1000	136	114	-16.2%	🟢 faster
Isaac-Velocity-Flat-Unitree-A1-Warp-v0 (12)	1000	148	116	-21.6%	🟢 faster
Isaac-Velocity-Flat-Unitree-Go1-Warp-v0 (13)	1000	102	100	-2.0%	🟢 faster
Isaac-Velocity-Flat-Unitree-Go2-Warp-v0 (14)	1000	62	45	-27.4%	🟢 faster
Isaac-Velocity-Rough-Anymal-D-Warp-v0 (15)	-	-	-	-	⚫ n/a

hujc7 · 2026-02-24T04:59:24Z

Time performance gain

Warp-capture vs baseline (repeat=5 average, timer-only `env_step`)

Task	Base env_step (us)	Capture env_step (avg us)	% change
Isaac-Ant-Warp-v0 (0)	12450.25	5384.53	-56.8%
Isaac-Cartpole-Warp-v0 (1)	9038.00	1357.52	-85.0%
Isaac-Humanoid-Warp-v0 (2)	20600.74	13653.34	-33.7%
Isaac-Reach-Franka-Warp-v0 (3)	12202.51	5863.21	-52.0%
Isaac-Reach-UR10-Warp-v0 (4)	-	-	-
Isaac-Velocity-Flat-Anymal-B-Warp-v0 (5)	38029.59	27247.39	-28.4%
Isaac-Velocity-Flat-Anymal-C-Warp-v0 (6)	37881.29	27281.55	-28.0%
Isaac-Velocity-Flat-Anymal-D-Warp-v0 (7)	39227.52	27860.87	-29.0%
Isaac-Velocity-Flat-Cassie-Warp-v0 (8)	22765.51	11213.25	-50.7%
Isaac-Velocity-Flat-G1-Warp-v0 (9)	39951.73	27201.89	-31.9%
Isaac-Velocity-Flat-G1-Warp-v1 (10)	55177.31	42330.15	-23.3%
Isaac-Velocity-Flat-H1-Warp-v0 (11)	28866.51	16818.43	-41.7%
Isaac-Velocity-Flat-Unitree-A1-Warp-v0 (12)	20112.75	10467.07	-48.0%
Isaac-Velocity-Flat-Unitree-Go1-Warp-v0 (13)	20738.22	11996.80	-42.2%
Isaac-Velocity-Flat-Unitree-Go2-Warp-v0 (14)	18656.75	9831.00	-47.3%
Isaac-Velocity-Rough-Anymal-D-Warp-v0 (15)	-	-	-

- Rewrite obs/reward kernels to consume Tier 1 compound types directly, bypassing lazy Tier 2 properties that break CUDA graph capture - Update GRAPH_CAPTURE_MIGRATION.md and WARP_MIGRATION_GAP_ANALYSIS.md

hujc7 added 8 commits February 20, 2026 01:36

Add warp manager-based RL env infrastructure

74675fc

Introduces ManagerBasedEnvWarp and ManagerBasedRLEnvWarp with ManagerCallSwitch for per-manager stable/warp/captured mode selection. Includes Newton articulation data extensions, RL wrapper adaptations, and train script integration.

Add warp Cartpole task configuration

f107b29

Cartpole env config for the warp manager-based RL env, registered as Isaac-Cartpole-Warp-v0. Includes warp-first custom reward term (joint_pos_target_l2).

Address greptile comments

bc08182

Adapted to latest dev/newton

a247e5d

hujc7 requested review from Mayankm96, hhansen-bdai, jtigue-bdai and kellyguo11 as code owners February 23, 2026 08:40

github-actions bot added the isaac-lab Related to Isaac Lab team label Feb 23, 2026

hujc7 marked this pull request as draft February 23, 2026 08:43

Make MDP kernels graph-capturable and consolidate test infrastructure

b87037c

- Rewrite obs/reward kernels to consume Tier 1 compound types directly, bypassing lazy Tier 2 properties that break CUDA graph capture - Update GRAPH_CAPTURE_MIGRATION.md and WARP_MIGRATION_GAP_ANALYSIS.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Newton] Migrate more envs and mdps to warp#4690

[Newton] Migrate more envs and mdps to warp#4690
hujc7 wants to merge 9 commits intoisaac-sim:dev/newtonfrom
hujc7:dev-newton-warp-mdp-mig

hujc7 commented Feb 23, 2026

Uh oh!

greptile-apps bot commented Feb 23, 2026

Uh oh!

hujc7 commented Feb 23, 2026

Uh oh!

hujc7 commented Feb 23, 2026 •

edited

Loading

Uh oh!

hujc7 commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hujc7 commented Feb 23, 2026

Summary

Infrastructure (commits 1-5, from dependency branch)

MDP terms (commit 6)

Tested env configs (commit 7)

Disabled envs (included but registration commented out)

Documentation

Test plan

Dependencies

Uh oh!

greptile-apps bot commented Feb 23, 2026

Uh oh!

hujc7 commented Feb 23, 2026

Uh oh!

hujc7 commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Final training stats — Warp-only vs baseline

Final training stats — Warp-capture vs baseline

Convergence speed — Warp-only vs baseline

Convergence speed — Warp-capture vs baseline

Uh oh!

hujc7 commented Feb 24, 2026

Time performance gain

Warp-capture vs baseline (repeat=5 average, timer-only env_step)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hujc7 commented Feb 23, 2026 •

edited

Loading

Warp-capture vs baseline (repeat=5 average, timer-only `env_step`)