diff --git a/docs/source/overview/imitation-learning/humanoids_imitation.rst b/docs/source/overview/imitation-learning/humanoids_imitation.rst index f00f87ba142..b1833c9ee54 100644 --- a/docs/source/overview/imitation-learning/humanoids_imitation.rst +++ b/docs/source/overview/imitation-learning/humanoids_imitation.rst @@ -417,6 +417,8 @@ The robot picks up an object at the initial location (point A) and places it at updated pre-trained policies with separate upper and lower body policies for flexibtility. They have been verified in the real world and can be directly deployed. Users can also train their own locomotion or whole-body control policies using the AGILE framework. +.. _generate-the-manipulation-dataset: + Generate the manipulation dataset ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -565,7 +567,9 @@ To generate the locomanipulation dataset, use the following command: --navigate_step 130 \ --enable_pinocchio \ --output_file ./datasets/generated_dataset_g1_locomanipulation_sdg.hdf5 \ - --enable_cameras + --enable_cameras \ + --randomize_placement \ + --visualizer kit .. note:: @@ -573,8 +577,8 @@ To generate the locomanipulation dataset, use the following command: The key parameters for locomanipulation dataset generation are: -* ``--lift_step 70``: Number of steps for the lifting phase of the manipulation task. This should mark the point immediately after the robot has grasped the object. -* ``--navigate_step 120``: Number of steps for the navigation phase between locations. This should make the point where the robot has lifted the object and is ready to walk. +* ``--lift_step 60``: Number of steps for the lifting phase of the manipulation task. This should mark the point immediately after the robot has grasped the object. +* ``--navigate_step 130``: Number of steps for the navigation phase between locations. This should make the point where the robot has lifted the object and is ready to walk. * ``--output_file``: Name of the output dataset file This process creates a dataset where the robot performs the manipulation task at different locations, requiring it to navigate between points while maintaining the learned manipulation behaviors. The resulting dataset can be used to train policies that combine both locomotion and manipulation capabilities. @@ -601,3 +605,66 @@ in the GR00T N1.5 repository. An example closed-loop policy rollout is shown in The policy shown above uses the camera image, hand poses, hand joint positions, object pose, and base goal pose as inputs. The output of the model is the target base velocity, hand poses, and hand joint positions for the next several timesteps. + +Use NuRec Background in Locomanipulation SDG +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The `NuRec assets `__ +are neural volumes reconstructed from real-world captures. When integrated into the locomanipulation SDG workflow, these +assets allow you to generate synthetic data in photorealistic environments that mirror real-world. + +You can load your own USD or USDZ file, which must include neural reconstruction for rendering, and a collision mesh to +enable physical interaction and be set to invisible. + +Pre-constructed assets are available via the `PhysicalAI Robotics NuRec `__ +dataset. Some of them are captured from a humanoid-viewpoint to match the camera view of the humanoid robot. + +For example, when using the asset ``hand_hold-voyager-babyboom``, the relevant files are: + +- ``stage.usdz``: a USDZ archive that bundles 3D Gaussian splatting (``volume.nurec``), a collision mesh (``mesh.usd``), etc. +- ``occupancy_map.yaml`` and ``occupancy_map.png``: occupancy map for path planning and navigation. + +Download the files and place them under ````. + +Ensure you have the manipulation dataset from the previous step. You can also download a pre-recorded +annotated dataset as in :ref:`Generate the manipulation dataset ` +and place it under ``/dataset_annotated_g1_locomanip.hdf5``. + +Then run the following command: + +.. code:: bash + + ./isaaclab.sh -p scripts/imitation_learning/locomanipulation_sdg/generate_data.py \ + --device cpu \ + --kit_args="--enable isaacsim.replicator.mobility_gen" \ + --task="Isaac-G1-SteeringWheel-Locomanipulation" \ + --dataset /dataset_annotated_g1_locomanip.hdf5 \ + --num_runs 1 \ + --lift_step 60 \ + --navigate_step 130 \ + --enable_pinocchio \ + --output_file /generated_dataset_g1_locomanipulation_sdg_with_background.hdf5 \ + --enable_cameras \ + --visualizer kit \ + --background_usd_path /stage.usdz \ + --background_occupancy_yaml_file /occupancy_map.yaml \ + --init_camera_view \ + --randomize_placement \ + --visualizer kit + +The key parameters are: + +- ``--background_usd_path``: Path to the NuRec USD asset. +- ``--background_occupancy_yaml_file``: Path to the occupancy map file. +- ``--high_res_video``: Enable high resolution video recording for the ego-centric camera view. +- ``--init_camera_view``: Set the viewport camera behind the robot at the start of episode. + +On successful task completion, an HDF5 dataset is generated containing camera observations. You can convert +the ego-centric view to MP4: + +.. code:: bash + + ./isaaclab.sh -p scripts/tools/hdf5_to_mp4.py \ + --input_file /generated_dataset_g1_locomanipulation_sdg_with_background.hdf5 \ + --output_dir / \ + --input_keys robot_pov_cam diff --git a/scripts/imitation_learning/locomanipulation_sdg/generate_data.py b/scripts/imitation_learning/locomanipulation_sdg/generate_data.py index 5dc0635da59..872b024000e 100644 --- a/scripts/imitation_learning/locomanipulation_sdg/generate_data.py +++ b/scripts/imitation_learning/locomanipulation_sdg/generate_data.py @@ -30,14 +30,17 @@ "--navigate_step", type=int, help=( - "The step index in the input recording where the robot is ready to navigate. Aka, where it has finished" - " lifting the object" + "The step index in the input recording where the robot is ready to navigate. Aka, where it has finished" + " lifting the object." ), ) -parser.add_argument("--demo", type=str, default=None, help="The demo in the input dataset to use.") +parser.add_argument("--demo", type=str, default=None, help="The demo in the input dataset to use, e.g. 'demo_0'") parser.add_argument("--num_runs", type=int, default=1, help="The number of trajectories to generate.") parser.add_argument( - "--draw_visualization", type=bool, default=False, help="Draw the occupancy map and path planning visualization." + "--draw_visualization", + action="store_true", + default=False, + help="Draw the occupancy map and path planning visualization.", ) parser.add_argument( "--angular_gain", @@ -88,10 +91,47 @@ ) parser.add_argument( "--randomize_placement", - type=bool, - default=True, + action="store_true", + default=False, help="Whether or not to randomize the placement of fixtures in the scene upon environment initialization.", ) +parser.add_argument( + "--enable_pinocchio", + action="store_true", + default=False, + help="Enable Pinocchio.", +) +parser.add_argument( + "--background_usd_path", + type=str, + default=None, + help="Path to the USD file for the background asset", +) +parser.add_argument( + "--background_occupancy_yaml_file", + type=str, + default=None, + help="Path to the occupancy map YAML file for the background asset", +) +parser.add_argument( + "--high_res_video", + action="store_true", + default=False, + help="Whether to use high resolution video for the robot's POV camera.", +) +parser.add_argument( + "--seed", + type=int, + default=None, + help="Random seed for reproducibility.", +) +parser.add_argument( + "--init_camera_view", + action="store_true", + default=False, + help="Set the viewport camera behind the robot at the start of each episode.", +) + AppLauncher.add_app_launcher_args(parser) args_cli = parser.parse_args() @@ -100,19 +140,34 @@ import enum import random +from dataclasses import dataclass import gymnasium as gym +import numpy as np import torch +import warp as wp -import isaaclab.sim as sim_utils +import omni.kit +import omni.usd + +from isaaclab.managers import DatasetExportMode from isaaclab.utils import configclass from isaaclab.utils.datasets import EpisodeData, HDF5DatasetFileHandler +from isaaclab.utils.math import convert_quat +from isaaclab.utils.seed import configure_seed import isaaclab_mimic.locomanipulation_sdg.envs # noqa: F401 -from isaaclab_mimic.locomanipulation_sdg.data_classes import LocomanipulationSDGOutputData +from isaaclab_mimic.locomanipulation_sdg.data_classes import ( + LocomanipulationSDGOutputData, +) + +if args_cli.seed is not None: + configure_seed(args_cli.seed) + from isaaclab_mimic.locomanipulation_sdg.envs.locomanipulation_sdg_env import LocomanipulationSDGEnv from isaaclab_mimic.locomanipulation_sdg.occupancy_map_utils import ( OccupancyMap, + OccupancyMapDataValue, merge_occupancy_maps, occupancy_map_add_to_stage, ) @@ -158,7 +213,7 @@ class LocomanipulationSDGControlConfig: linear_max: float = 1.0 """Maximum allowed linear velocity (m/s)""" - distance_threshold: float = 0.1 + distance_threshold: float = 0.2 """Distance threshold for state transitions (m)""" following_offset: float = 0.6 @@ -167,12 +222,28 @@ class LocomanipulationSDGControlConfig: angle_threshold: float = 0.2 """Angular threshold for orientation control (rad)""" - approach_distance: float = 1.0 + approach_distance: float = 0.5 """Buffer distance from final goal (m)""" +@dataclass +class NavigationScene: + """Navigation scene data class.""" + + """The occupancy map of the navigation scene.""" + occupancy_map: OccupancyMap + """The base path helper of the navigation scene.""" + base_path_helper: ParameterizedPath + """The base goal of the navigation scene.""" + base_goal: RelativePose + """The approach goal of the navigation scene.""" + base_goal_approach: RelativePose + + def compute_navigation_velocity( - current_pose: torch.Tensor, target_xy: torch.Tensor, config: LocomanipulationSDGControlConfig + current_pose: torch.Tensor, + target_xy: torch.Tensor, + config: LocomanipulationSDGControlConfig, ) -> tuple[torch.Tensor, torch.Tensor]: """Compute linear and angular velocities for navigation control. @@ -192,8 +263,8 @@ def compute_navigation_velocity( delta_distance = torch.sqrt(torch.sum(delta_xy**2)) target_yaw = torch.arctan2(delta_xy[1], delta_xy[0]) - delta_yaw = target_yaw - current_yaw # Normalize angle to [-π, π] + delta_yaw = target_yaw - current_yaw delta_yaw = (delta_yaw + torch.pi) % (2 * torch.pi) - torch.pi # Compute control commands @@ -211,7 +282,7 @@ def load_and_transform_recording_data( recording_step: int, reference_pose: torch.Tensor, target_pose: torch.Tensor, -) -> tuple[torch.Tensor, torch.Tensor]: +) -> tuple[torch.Tensor | None, torch.Tensor | None]: """Load recording data and transform hand targets to current reference frame. Args: @@ -234,12 +305,146 @@ def load_and_transform_recording_data( return left_hand_pose, right_hand_pose +def sync_simulation_state(env: LocomanipulationSDGEnv): + """Push USD pose updates into physics, advance the sim, and sync back + + Args: + env: The locomanipulation SDG environment + """ + env.scene.write_data_to_sim() + env.sim.step(render=False) + env.scene.update(dt=env.physics_dt) + + +def _init_camera_view(env: LocomanipulationSDGEnv, cam_offset: tuple[float, float, float] = (-3.0, 0.0, 1.5)): + """Set the viewport camera behind the robot at episode start. + + The camera offset is expressed in the robot's local frame (x=forward, y=left, z=up) + and is rotated into world frame using the robot's current yaw before being added to + the pelvis position. + + Args: + env: The locomanipulation SDG environment + cam_offset: Camera position offset in robot local frame (x, y, z) + """ + base_pose_2d = env.get_base().get_pose_2d()[0] # [x, y, yaw] + base_pos = env.get_base().get_pose()[0, :3] # [x, y, z] + yaw = base_pose_2d[2] + + cos_yaw = torch.cos(yaw) + sin_yaw = torch.sin(yaw) + dx, dy, dz = cam_offset + + # Rotate local offset into world frame + world_offset = torch.stack( + [ + dx * cos_yaw - dy * sin_yaw, + dx * sin_yaw + dy * cos_yaw, + torch.tensor(dz, device=env.device), + ] + ) + + cam_eye = (base_pos + world_offset).cpu().numpy().tolist() + cam_target = (base_pos + torch.tensor([0.0, 0.0, 0.5], device=env.device)).cpu().numpy().tolist() + env.sim.set_camera_view(eye=cam_eye, target=cam_target) + + +def project_robot_state_into_env(env: LocomanipulationSDGEnv, input_episode_data: EpisodeData) -> torch.Tensor: + """Project the recorded robot pose and joint state into the current environment. + + Args: + env: The locomanipulation SDG environment + input_episode_data: Input episode data + + Returns: + The new robot pose + """ + initial_state = env.load_input_data(input_episode_data, 0) + recording_initial_state = input_episode_data.get_initial_state() + + new_robot_pose = transform_mul( + env.get_start_fixture().get_pose(), + transform_mul( + transform_inv(initial_state.fixture_pose.to(env.device)), + initial_state.base_pose.to(env.device), + ), + ) + + env.scene["robot"].write_root_pose_to_sim_index(root_pose=new_robot_pose, env_ids=[0]) + env.scene["robot"].write_root_velocity_to_sim_index( + root_velocity=torch.zeros((1, 6), device=env.device), env_ids=[0] + ) + # Update default root pose and velocity for correct state on reset + default_pose = wp.to_torch(env.scene["robot"].data.default_root_pose).clone() + default_pose[0] = new_robot_pose[0] + env.scene["robot"].data.default_root_pose.assign( + wp.from_torch(default_pose.to(env.device).contiguous()).view(wp.transformf) + ) + default_vel = wp.to_torch(env.scene["robot"].data.default_root_vel).clone() + default_vel[0] = torch.zeros(6, device=env.device) + env.scene["robot"].data.default_root_vel.assign(wp.from_torch(default_vel.to(env.device).contiguous())) + + robot_state = recording_initial_state["articulation"]["robot"] + joint_position = robot_state["joint_position"][0].to(env.device) + joint_velocity = robot_state["joint_velocity"][0].to(env.device) + + # Update default joint positions and velocities for correct state on reset + default_joint_pos = wp.to_torch(env.scene["robot"].data.default_joint_pos).clone() + default_joint_pos[0] = joint_position + env.scene["robot"].data.default_joint_pos.assign(wp.from_torch(default_joint_pos.to(env.device).contiguous())) + default_joint_vel = wp.to_torch(env.scene["robot"].data.default_joint_vel).clone() + default_joint_vel[0] = joint_velocity + env.scene["robot"].data.default_joint_vel.assign(wp.from_torch(default_joint_vel.to(env.device).contiguous())) + env.scene["robot"].write_joint_position_to_sim_index(position=joint_position[None, :], env_ids=[0]) + env.scene["robot"].write_joint_velocity_to_sim_index(velocity=joint_velocity[None, :], env_ids=[0]) + + return new_robot_pose + + +def project_object_state_into_env(env: LocomanipulationSDGEnv, input_episode_data: EpisodeData) -> torch.Tensor: + """Project the recorded object pose into the current environment. + + Args: + env: The locomanipulation SDG environment + input_episode_data: Input episode data + + Returns: + The new object pose + """ + initial_state = env.load_input_data(input_episode_data, 0) + + new_object_pose = transform_mul( + env.get_start_fixture().get_pose(), + transform_mul( + transform_inv(initial_state.fixture_pose.to(env.device)), + initial_state.object_pose.to(env.device), + ), + ) + + env.scene["object"].write_root_pose_to_sim_index(root_pose=new_object_pose, env_ids=[0]) + env.scene["object"].write_root_velocity_to_sim_index( + root_velocity=torch.zeros((1, 6), device=env.device), env_ids=[0] + ) + # Update default root pose and velocity for correct state on reset + default_pose = wp.to_torch(env.scene["object"].data.default_root_pose).clone() + default_pose[0] = new_object_pose[0] + env.scene["object"].data.default_root_pose.assign( + wp.from_torch(default_pose.to(env.device).contiguous()).view(wp.transformf) + ) + default_vel = wp.to_torch(env.scene["object"].data.default_root_vel).clone() + default_vel[0] = torch.zeros(6, device=env.device) + env.scene["object"].data.default_root_vel.assign(wp.from_torch(default_vel.to(env.device).contiguous())) + + return new_object_pose + + def setup_navigation_scene( env: LocomanipulationSDGEnv, input_episode_data: EpisodeData, approach_distance: float, randomize_placement: bool = True, -) -> tuple[OccupancyMap, ParameterizedPath, RelativePose, RelativePose]: + draw_visualization: bool = False, +) -> NavigationScene | None: """Set up the navigation scene with occupancy map and path planning. Args: @@ -249,40 +454,94 @@ def setup_navigation_scene( randomize_placement: Whether to randomize fixture placement Returns: - Tuple of (occupancy_map, path_helper, base_goal, base_goal_approach) + NavigationScene or None if the navigation scene setup failed. """ - # Create base occupancy map - occupancy_map = merge_occupancy_maps( - [ - OccupancyMap.make_empty(start=(-7, -7), end=(7, 7), resolution=0.05), - env.get_start_fixture().get_occupancy_map(), - ] - ) - # Randomize fixture placement if enabled - if randomize_placement: + background_fixture = env.get_background_fixture() + if background_fixture is not None: + occupancy_map = background_fixture.get_occupancy_map() + fixtures = [env.get_start_fixture(), env.get_end_fixture()] + env.get_obstacle_fixtures() + if not randomize_placement: + raise ValueError("randomize_placement needs to be True when background_usd_path is provided") + else: + occupancy_map = merge_occupancy_maps( + [ + OccupancyMap.make_empty(start=(-7, -7), end=(7, 7), resolution=0.05), + env.get_start_fixture().get_occupancy_map(), + ] + ) + fixtures = [env.get_end_fixture()] + env.get_obstacle_fixtures() - for fixture in fixtures: - place_randomly(fixture, occupancy_map.buffered_meters(1.0)) + + for fixture in fixtures: + if randomize_placement: + if not place_randomly(fixture, occupancy_map.buffered_meters(0.7), num_iter=5000): + print(f"Failed to randomize fixture placement for {fixture.entity_name}", flush=True) + return None + + sync_simulation_state(env) occupancy_map = merge_occupancy_maps([occupancy_map, fixture.get_occupancy_map()]) - # Compute goal poses from initial state initial_state = env.load_input_data(input_episode_data, 0) + base_goal = RelativePose( - relative_pose=transform_mul(transform_inv(initial_state.fixture_pose), initial_state.base_pose), + relative_pose=transform_mul( + transform_inv(initial_state.fixture_pose.to(env.device)), + initial_state.base_pose.to(env.device), + ), parent=env.get_end_fixture(), ) base_goal_approach = RelativePose( - relative_pose=torch.tensor([-approach_distance, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0]), parent=base_goal + relative_pose=torch.tensor([-approach_distance, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0], device=env.device), + parent=base_goal, ) - # Plan navigation path - base_path = plan_path( - start=env.get_base(), end=base_goal_approach, occupancy_map=occupancy_map.buffered_meters(0.15) - ) + project_robot_state_into_env(env, input_episode_data) + project_object_state_into_env(env, input_episode_data) + # Flush physics writes so scene data buffers reflect the projected poses before path planning. + # Without this, env.get_base() would return the stale pre-projection position. + sync_simulation_state(env) + + nav_map = occupancy_map.buffered_meters(0.15) + nav_fs = nav_map.freespace_mask() + start_pos = env.get_base().get_pose()[0, :2].detach().cpu().numpy() + start_px = nav_map.world_to_pixel_numpy(start_pos[None])[0].astype(int) + sx, sy = int(start_px[0]), int(start_px[1]) + + # Clear the buffer zone around the robot's start pixel if it falls inside it. + # The robot stands adjacent to the start table so its position sits within the 0.15m buffer. + if 0 <= sy < nav_fs.shape[0] and 0 <= sx < nav_fs.shape[1] and not nav_fs[sy, sx]: + clear_r = int(0.15 / nav_map.resolution) + yy, xx = np.ogrid[: nav_map.data.shape[0], : nav_map.data.shape[1]] + nav_map.data[(xx - sx) ** 2 + (yy - sy) ** 2 <= clear_r**2] = OccupancyMapDataValue.FREESPACE + + base_path = plan_path(start=env.get_base(), end=base_goal_approach, occupancy_map=nav_map) + + if base_path is None: + return None + base_path_helper = ParameterizedPath(base_path) + if len(base_path_helper.points) <= 2: + print(f"Base path does not have enough points: {len(base_path_helper.points)} points", flush=True) + return None - return occupancy_map, base_path_helper, base_goal, base_goal_approach + sync_simulation_state(env) + + if draw_visualization: + occupancy_map_add_to_stage( + occupancy_map, + stage=omni.usd.get_context().get_stage(), + path="/OccupancyMap", + z_offset=0.01, + draw_path=base_path_helper.points, + ) + + return NavigationScene( + occupancy_map=occupancy_map, + base_path_helper=base_path_helper, + base_goal=base_goal, + base_goal_approach=base_goal_approach, + ) def handle_grasp_state( @@ -309,7 +568,7 @@ def handle_grasp_state( # Set control targets - robot stays stationary during grasping output_data.data_generation_state = int(LocomanipulationSDGDataGenerationState.GRASP_OBJECT) output_data.recording_step = recording_step - output_data.base_velocity_target = torch.tensor([0.0, 0.0, 0.0]) + output_data.base_velocity_target = torch.tensor([0.0, 0.0, 0.0], device=env.device) # Transform hand poses relative to object output_data.left_hand_pose_target = transform_relative_pose( @@ -357,7 +616,7 @@ def handle_lift_state( # Set control targets - robot stays stationary during lifting output_data.data_generation_state = int(LocomanipulationSDGDataGenerationState.LIFT_OBJECT) output_data.recording_step = recording_step - output_data.base_velocity_target = torch.tensor([0.0, 0.0, 0.0]) + output_data.base_velocity_target = torch.tensor([0.0, 0.0, 0.0], device=env.device) # Transform hand poses relative to base output_data.left_hand_pose_target = transform_relative_pose( @@ -416,7 +675,7 @@ def handle_navigate_state( # Set control targets output_data.data_generation_state = int(LocomanipulationSDGDataGenerationState.NAVIGATE) output_data.recording_step = recording_step - output_data.base_velocity_target = torch.tensor([linear_velocity, 0.0, angular_velocity]) + output_data.base_velocity_target = torch.tensor([linear_velocity, 0.0, angular_velocity], device=env.device) # Transform hand poses relative to base output_data.left_hand_pose_target = transform_relative_pose( @@ -470,7 +729,7 @@ def handle_approach_state( # Set control targets output_data.data_generation_state = int(LocomanipulationSDGDataGenerationState.APPROACH) output_data.recording_step = recording_step - output_data.base_velocity_target = torch.tensor([linear_velocity, 0.0, angular_velocity]) + output_data.base_velocity_target = torch.tensor([linear_velocity, 0.0, angular_velocity], device=env.device) # Transform hand poses relative to base output_data.left_hand_pose_target = transform_relative_pose( @@ -531,7 +790,7 @@ def handle_drop_off_state( # Set control targets output_data.data_generation_state = int(LocomanipulationSDGDataGenerationState.DROP_OFF_OBJECT) output_data.recording_step = recording_step - output_data.base_velocity_target = torch.tensor([linear_velocity, 0.0, angular_velocity]) + output_data.base_velocity_target = torch.tensor([linear_velocity, 0.0, angular_velocity], device=env.device) # Transform hand poses relative to end fixture output_data.left_hand_pose_target = transform_relative_pose( @@ -559,7 +818,7 @@ def populate_output_data( base_goal: RelativePose, base_goal_approach: RelativePose, base_path: torch.Tensor, -) -> None: +): """Populate remaining output data fields. Args: @@ -596,12 +855,13 @@ def replay( angular_gain: float = 2.0, linear_gain: float = 1.0, linear_max: float = 1.0, - distance_threshold: float = 0.1, + distance_threshold: float = 0.2, following_offset: float = 0.6, angle_threshold: float = 0.2, - approach_distance: float = 1.0, + approach_distance: float = 0.5, randomize_placement: bool = True, -) -> None: + init_camera_view: bool = False, +) -> bool: """Replay a locomanipulation SDG episode with state machine control. This function implements a state machine for locomanipulation SDG, where the robot: @@ -627,8 +887,14 @@ def replay( randomize_placement: Whether to randomize obstacle placement """ + # Reset recorder manager to clear any leftover episode data from previous runs + # This prevents duplicate exports when reset_to calls record_pre_reset + env.recorder_manager.reset(env_ids=[0]) + # Initialize environment to starting state - env.reset_to(state=input_episode_data.get_initial_state(), env_ids=torch.tensor([0]), is_relative=True) + env.reset_to( + state=input_episode_data.get_initial_state(), env_ids=torch.tensor([0], device=env.device), is_relative=True + ) # Create navigation control configuration config = LocomanipulationSDGControlConfig( @@ -641,29 +907,27 @@ def replay( approach_distance=approach_distance, ) - # Set up navigation scene and path planning - occupancy_map, base_path_helper, base_goal, base_goal_approach = setup_navigation_scene( - env, input_episode_data, approach_distance, randomize_placement + nav_scene = setup_navigation_scene( + env, input_episode_data, approach_distance, randomize_placement, draw_visualization ) + if nav_scene is None: + print("Failed to setup navigation scene", flush=True) + return False - # Visualize occupancy map and path if requested - if draw_visualization: - occupancy_map_add_to_stage( - occupancy_map, - stage=sim_utils.get_current_stage(), - path="/OccupancyMap", - z_offset=0.01, - draw_path=base_path_helper.points, - ) + if init_camera_view: + _init_camera_view(env) # Initialize state machine output_data = LocomanipulationSDGOutputData() current_state = LocomanipulationSDGDataGenerationState.GRASP_OBJECT + previous_state = None recording_step = 0 # Main simulation loop with state machine while simulation_app.is_running() and not simulation_app.is_exiting(): - print(f"Current state: {current_state.name}, Recording step: {recording_step}") + if current_state != previous_state: + print(f"State changed: {current_state.name}, Recording step: {recording_step}", flush=True) + previous_state = current_state # Execute state-specific logic using helper functions if current_state == LocomanipulationSDGDataGenerationState.GRASP_OBJECT: @@ -678,24 +942,32 @@ def replay( elif current_state == LocomanipulationSDGDataGenerationState.NAVIGATE: current_state = handle_navigate_state( - env, input_episode_data, recording_step, base_path_helper, base_goal_approach, config, output_data + env, + input_episode_data, + recording_step, + nav_scene.base_path_helper, + nav_scene.base_goal_approach, + config, + output_data, ) elif current_state == LocomanipulationSDGDataGenerationState.APPROACH: current_state = handle_approach_state( - env, input_episode_data, recording_step, base_goal, config, output_data + env, input_episode_data, recording_step, nav_scene.base_goal, config, output_data ) elif current_state == LocomanipulationSDGDataGenerationState.DROP_OFF_OBJECT: recording_step, next_state = handle_drop_off_state( - env, input_episode_data, recording_step, base_goal, config, output_data + env, input_episode_data, recording_step, nav_scene.base_goal, config, output_data ) if next_state is None: # End of episode data break current_state = next_state # Populate additional output data fields - populate_output_data(env, output_data, base_goal, base_goal_approach, base_path_helper.points) + populate_output_data( + env, output_data, nav_scene.base_goal, nav_scene.base_goal_approach, nav_scene.base_path_helper.points + ) # Attach output data to environment for recording env._locomanipulation_sdg_output_data = output_data @@ -708,8 +980,25 @@ def replay( left_hand_pose_target=output_data.left_hand_pose_target, right_hand_pose_target=output_data.right_hand_pose_target, ) + _, _, reset_terminated, reset_time_outs, _ = env.step(action) + + if reset_terminated[0] or reset_time_outs[0]: + print(f"Environment terminated at state {current_state.name}, step {recording_step}", flush=True) + success_terminated = False + + # Check if termination was due to success + if hasattr(env, "termination_manager") and "success" in env.termination_manager.active_terms: + success_terminated = bool(env.termination_manager.get_term("success")[0].item()) - env.step(action) + if success_terminated: + print(f"Success termination at state {current_state.name}, step {recording_step}", flush=True) + else: + print(f"Non-success termination at state {current_state.name}, step {recording_step}", flush=True) + + return success_terminated + + print("Replay completed!", flush=True) + return False if __name__ == "__main__": @@ -724,14 +1013,26 @@ def replay( env_cfg.sim.device = "cpu" env_cfg.recorders.dataset_export_dir_path = os.path.dirname(args_cli.output_file) env_cfg.recorders.dataset_filename = os.path.basename(args_cli.output_file) + env_cfg.recorders.dataset_export_mode = DatasetExportMode.EXPORT_SUCCEEDED_ONLY + env_cfg.recorders.export_in_record_pre_reset = True + + if args_cli.background_usd_path is not None and args_cli.background_occupancy_yaml_file is not None: + env_cfg.background_usd_path = args_cli.background_usd_path + env_cfg.background_occupancy_yaml_file = args_cli.background_occupancy_yaml_file + + env_cfg.high_res_video = args_cli.high_res_video env = gym.make(args_cli.task, cfg=env_cfg).unwrapped # Load input data input_dataset_file_handler = HDF5DatasetFileHandler() input_dataset_file_handler.open(args_cli.dataset) + is_legacy_quat_format = input_dataset_file_handler.is_legacy_quaternion_format() + + run_id = 0 + success_runs = 0 - for i in range(args_cli.num_runs): + while success_runs < args_cli.num_runs: if args_cli.demo is None: demo = random.choice(list(input_dataset_file_handler.get_episode_names())) else: @@ -739,7 +1040,17 @@ def replay( input_episode_data = input_dataset_file_handler.load_episode(demo, args_cli.device) - replay( + assert input_episode_data is not None + assert "actions" in input_episode_data.data + + # See G1LocomanipulationSDGEnv.build_action_vector() for the action layout; + # left_quat=[3:7], right_quat=[10:14]. + if is_legacy_quat_format: + actions = input_episode_data.data["actions"] + actions[:, 3:7] = convert_quat(actions[:, 3:7], to="xyzw") # left hand quat + actions[:, 10:14] = convert_quat(actions[:, 10:14], to="xyzw") # right hand quat + + success = replay( env=env, input_episode_data=input_episode_data, lift_step=args_cli.lift_step, @@ -753,9 +1064,18 @@ def replay( angle_threshold=args_cli.angle_threshold, approach_distance=args_cli.approach_distance, randomize_placement=args_cli.randomize_placement, + init_camera_view=args_cli.init_camera_view, ) - env.reset() # FIXME: hack to handle missing final recording + run_id += 1 + + if success: + success_runs += 1 + print("successful episodes count", env.recorder_manager.exported_successful_episode_count, flush=True) + else: + print(f"Run {run_id} failed (demo={demo}), retrying...", flush=True) + + env.recorder_manager.close() env.close() simulation_app.close() diff --git a/scripts/tools/hdf5_to_mp4.py b/scripts/tools/hdf5_to_mp4.py index 0cd8a40c78f..b9e7bdb1a31 100644 --- a/scripts/tools/hdf5_to_mp4.py +++ b/scripts/tools/hdf5_to_mp4.py @@ -21,6 +21,7 @@ --video_height Height of the output video in pixels. (default: 704) --video_width Width of the output video in pixels. (default: 1280) --framerate Frames per second for the output video. (default: 30) + --demo_id If provided, only export this specific demo_id. (default: None) """ import argparse @@ -88,6 +89,12 @@ def parse_args(): default=DEFAULT_FRAMERATE, help="Frames per second for the output video.", ) + parser.add_argument( + "--demo_id", + type=int, + default=None, + help="If provided, only export this specific demo_id.", + ) args = parser.parse_args() @@ -188,8 +195,14 @@ def main(): num_demos = get_num_demos(args.input_file) print(f"Found {num_demos} demonstrations in {args.input_file}") + if args.demo_id is not None: + demo_ids = [args.demo_id] + print(f"Exporting only demo_id {args.demo_id}") + else: + demo_ids = list(range(num_demos)) + # Convert each demonstration - for i in range(num_demos): + for i in demo_ids: frames_path = f"data/demo_{str(i)}/obs" for input_key in args.input_keys: write_demo_to_mp4( diff --git a/source/isaaclab/isaaclab/scene/interactive_scene.py b/source/isaaclab/isaaclab/scene/interactive_scene.py index 54fbcc068af..7ccd762eaf5 100644 --- a/source/isaaclab/isaaclab/scene/interactive_scene.py +++ b/source/isaaclab/isaaclab/scene/interactive_scene.py @@ -454,15 +454,15 @@ def reset_to( for asset_name, articulation in self._articulations.items(): asset_state = state["articulation"][asset_name] # root state - root_pose = asset_state["root_pose"].clone() + root_pose = asset_state["root_pose"].clone().to(self.device) if is_relative: root_pose[:, :3] += self.env_origins[env_ids] - root_velocity = asset_state["root_velocity"].clone() + root_velocity = asset_state["root_velocity"].clone().to(self.device) articulation.write_root_pose_to_sim(root_pose, env_ids=env_ids) articulation.write_root_velocity_to_sim(root_velocity, env_ids=env_ids) # joint state - joint_position = asset_state["joint_position"].clone() - joint_velocity = asset_state["joint_velocity"].clone() + joint_position = asset_state["joint_position"].clone().to(self.device) + joint_velocity = asset_state["joint_velocity"].clone().to(self.device) articulation.write_joint_state_to_sim(joint_position, joint_velocity, env_ids=env_ids) # FIXME: This is not generic as it assumes PD control over the joints. # This assumption does not hold for effort controlled joints. @@ -471,19 +471,19 @@ def reset_to( # deformable objects for asset_name, deformable_object in self._deformable_objects.items(): asset_state = state["deformable_object"][asset_name] - nodal_position = asset_state["nodal_position"].clone() + nodal_position = asset_state["nodal_position"].clone().to(self.device) if is_relative: nodal_position[:, :3] += self.env_origins[env_ids] - nodal_velocity = asset_state["nodal_velocity"].clone() + nodal_velocity = asset_state["nodal_velocity"].clone().to(self.device) deformable_object.write_nodal_pos_to_sim(nodal_position, env_ids=env_ids) deformable_object.write_nodal_velocity_to_sim(nodal_velocity, env_ids=env_ids) # rigid objects for asset_name, rigid_object in self._rigid_objects.items(): asset_state = state["rigid_object"][asset_name] - root_pose = asset_state["root_pose"].clone() + root_pose = asset_state["root_pose"].clone().to(self.device) if is_relative: root_pose[:, :3] += self.env_origins[env_ids] - root_velocity = asset_state["root_velocity"].clone() + root_velocity = asset_state["root_velocity"].clone().to(self.device) rigid_object.write_root_pose_to_sim(root_pose, env_ids=env_ids) rigid_object.write_root_velocity_to_sim(root_velocity, env_ids=env_ids) # surface grippers diff --git a/source/isaaclab_mimic/isaaclab_mimic/locomanipulation_sdg/envs/g1_locomanipulation_sdg_env.py b/source/isaaclab_mimic/isaaclab_mimic/locomanipulation_sdg/envs/g1_locomanipulation_sdg_env.py index c2ce6f0dcbe..4f5e8c29d34 100644 --- a/source/isaaclab_mimic/isaaclab_mimic/locomanipulation_sdg/envs/g1_locomanipulation_sdg_env.py +++ b/source/isaaclab_mimic/isaaclab_mimic/locomanipulation_sdg/envs/g1_locomanipulation_sdg_env.py @@ -18,6 +18,7 @@ from isaaclab.utils.datasets import EpisodeData from isaaclab_mimic.locomanipulation_sdg.data_classes import LocomanipulationSDGInputData +from isaaclab_mimic.locomanipulation_sdg.occupancy_map_utils import OccupancyMap from isaaclab_mimic.locomanipulation_sdg.scene_utils import HasPose, SceneBody, SceneFixture from isaaclab_tasks.manager_based.locomanipulation.pick_place.locomanipulation_g1_env_cfg import ( @@ -49,46 +50,55 @@ class G1LocomanipulationSDGSceneCfg(LocomanipulationG1SceneCfg): ), ) - robot_pov_cam = CameraCfg( - prim_path="/World/envs/env_.*/Robot/torso_link/d435_link/camera", - update_period=0.0, - height=160, - width=256, - data_types=["rgb"], - spawn=sim_utils.PinholeCameraCfg(focal_length=8.0, clipping_range=(0.1, 20.0)), - offset=CameraCfg.OffsetCfg(pos=(0.0, 0.0, 0.0), rot=(0.0, -0.1736482, 0.0, 0.9848078), convention="world"), - ) - - -# Add forklifts -for i in range(NUM_FORKLIFTS): - setattr( - G1LocomanipulationSDGSceneCfg, - f"forklift_{i}", - AssetBaseCfg( - prim_path=f"/World/envs/env_.*/Forklift{i}", - init_state=AssetBaseCfg.InitialStateCfg(pos=[0.0, 0.0, 0.0], rot=[0.0, 0.0, 0.0, 1.0]), - spawn=UsdFileCfg( - usd_path=f"{ISAAC_NUCLEUS_DIR}/Props/Forklift/forklift.usd", - rigid_props=sim_utils.RigidBodyPropertiesCfg(kinematic_enabled=True), + def add_robot_pov_cam(self, height, width): + robot_pov_cam = CameraCfg( + prim_path="/World/envs/env_.*/Robot/torso_link/d435_link/camera", + update_period=0.0, + height=height, + width=width, + data_types=["rgb"], + spawn=sim_utils.PinholeCameraCfg(focal_length=8.0, clipping_range=(0.1, 20.0)), + offset=CameraCfg.OffsetCfg(pos=(0.0, 0.0, 0.0), rot=(0.0, -0.1736482, 0.0, 0.9848078), convention="world"), + ) + setattr(self, "robot_pov_cam", robot_pov_cam) + + def add_background_asset(self, background_usd_path: str): + background = AssetBaseCfg( + prim_path="/World/envs/env_.*/Background", + init_state=AssetBaseCfg.InitialStateCfg( + pos=[0, 0, 0], + rot=[0.0, 0.0, 0.0, 1.0], ), - ), - ) - -# Add boxes -for i in range(NUM_BOXES): - setattr( - G1LocomanipulationSDGSceneCfg, - f"box_{i}", - AssetBaseCfg( - prim_path=f"/World/envs/env_.*/Box{i}", - init_state=AssetBaseCfg.InitialStateCfg(pos=[0.0, 0.0, 0.0], rot=[0.0, 0.0, 0.0, 1.0]), spawn=UsdFileCfg( - usd_path=f"{ISAAC_NUCLEUS_DIR}/Environments/Simple_Warehouse/Props/SM_CardBoxB_01_681.usd", + usd_path=background_usd_path, rigid_props=sim_utils.RigidBodyPropertiesCfg(kinematic_enabled=True), ), - ), - ) + ) + + setattr(self, "background", background) + + def add_forklifts(self, num_forklifts: int): + for i in range(num_forklifts): + forklift = AssetBaseCfg( + prim_path=f"/World/envs/env_.*/Forklift{i}", + init_state=AssetBaseCfg.InitialStateCfg(pos=[0.0, 0.0, 0.0], rot=[0.0, 0.0, 0.0, 1.0]), + spawn=UsdFileCfg( + usd_path=f"{ISAAC_NUCLEUS_DIR}/Props/Forklift/forklift.usd", + rigid_props=sim_utils.RigidBodyPropertiesCfg(kinematic_enabled=True), + ), + ) + setattr(self, f"forklift_{i}", forklift) + + def add_boxes(self, num_boxes: int): + for i in range(num_boxes): + box = AssetBaseCfg( + prim_path=f"/World/envs/env_.*/Box{i}", + init_state=AssetBaseCfg.InitialStateCfg(pos=[0.0, 0.0, 0.0], rot=[0.0, 0.0, 0.0, 1.0]), + spawn=UsdFileCfg( + usd_path=f"{ISAAC_NUCLEUS_DIR}/Environments/Simple_Warehouse/Props/SM_CardBoxB_01_681.usd", + ), + ) + setattr(self, f"box_{i}", box) @configclass @@ -122,11 +132,15 @@ class G1LocomanipulationSDGEnvCfg(LocomanipulationG1EnvCfg, LocomanipulationSDGE recorders: LocomanipulationSDGRecorderManagerCfg = LocomanipulationSDGRecorderManagerCfg() observations: G1LocomanipulationSDGObservationsCfg = G1LocomanipulationSDGObservationsCfg() + background_usd_path: str | None = None + background_occupancy_yaml_file: str | None = None + high_res_video: bool = False + def __post_init__(self): """Post initialization.""" # general settings self.decimation = 4 - self.episode_length_s = 100.0 + self.episode_length_s = 50.0 # simulation settings self.sim.dt = 1 / 200 # 200Hz self.sim.render_interval = 6 @@ -140,6 +154,27 @@ def __post_init__(self): class G1LocomanipulationSDGEnv(LocomanipulationSDGEnv): def __init__(self, cfg: G1LocomanipulationSDGEnvCfg, **kwargs): + if cfg.background_usd_path is not None: + self._num_forklifts = 0 + self._num_boxes = 0 + remove_virtual_ground = True + cfg.scene.add_background_asset(cfg.background_usd_path) + else: + self._num_forklifts = NUM_FORKLIFTS + self._num_boxes = NUM_BOXES + remove_virtual_ground = False + + if cfg.high_res_video: + cfg.scene.add_robot_pov_cam(540, 960) + else: + cfg.scene.add_robot_pov_cam(160, 256) + + cfg.scene.add_forklifts(self._num_forklifts) + cfg.scene.add_boxes(self._num_boxes) + + if remove_virtual_ground: + delattr(cfg.scene, "ground") + super().__init__(cfg) self.sim.set_camera_view([10.5, 10.5, 10.5], [0.0, 0.0, 0.5]) self._upper_body_dim = self.action_manager.get_term("upper_body_ik").action_dim @@ -160,16 +195,17 @@ def load_input_data(self, episode_data: EpisodeData, step: int) -> Locomanipulat object_pose = dataset_state["rigid_object"]["object"]["root_pose"] + base_pose = episode_data.get_initial_state()["articulation"]["robot"]["root_pose"] data = LocomanipulationSDGInputData( left_hand_pose_target=dataset_action[0:7], right_hand_pose_target=dataset_action[7:14], left_hand_joint_positions_target=dataset_action[14:21], right_hand_joint_positions_target=dataset_action[21:28], - base_pose=episode_data.get_initial_state()["articulation"]["robot"]["root_pose"], + base_pose=base_pose, object_pose=object_pose, fixture_pose=torch.tensor( - [0.0, 0.55, -0.3, 1.0, 0.0, 0.0, 0.0] - ), # Table pose is not recorded for this env. + [0.0, 0.55, -0.3, 0.0, 0.0, 0.0, 1.0], device=base_pose.device + ), # Table pose is not recorded for this env. Quaternion in XYZW format (identity). ) return data @@ -182,6 +218,18 @@ def build_action_vector( right_hand_joint_positions_target: torch.Tensor, base_velocity_target: torch.Tensor, ): + """Build the action vector for the G1 locomanipulation environment. + + Action layout (upper_body_dim = left_pose + right_pose + left_fingers + right_fingers = 28): + + Indices Content + [0:7] left hand pose (pos[0:3] + quat[3:7]) + [7:14] right hand pose (pos[7:10] + quat[10:14]) + [14:21] left hand finger joints (7) + [21:28] right hand finger joints (7) + [upper_body_dim:upper_body_dim+3] base velocity (vx, vy, yaw_rate) + [upper_body_dim+3] base height + """ action = torch.zeros(self.action_space.shape) # Set base height @@ -240,40 +288,50 @@ def get_object(self) -> HasPose: return SceneBody(self.scene, "object", "sm_steeringwheel_a01_01") def get_start_fixture(self) -> SceneFixture: - return SceneFixture( + return SceneFixture.from_boundary( self.scene, "packing_table", - occupancy_map_boundary=np.array([[-1.45, -0.45], [1.45, -0.45], [1.45, 0.45], [-1.45, 0.45]]), - occupancy_map_resolution=0.05, + np.array([[-1.45, -0.45], [1.45, -0.45], [1.45, 0.45], [-1.45, 0.45]]), + 0.05, ) def get_end_fixture(self) -> SceneFixture: - return SceneFixture( + return SceneFixture.from_boundary( self.scene, "packing_table_2", - occupancy_map_boundary=np.array([[-1.45, -0.45], [1.45, -0.45], [1.45, 0.45], [-1.45, 0.45]]), - occupancy_map_resolution=0.05, + np.array([[-1.45, -0.45], [1.45, -0.45], [1.45, 0.45], [-1.45, 0.45]]), + 0.05, ) def get_obstacle_fixtures(self): obstacles = [ - SceneFixture( + SceneFixture.from_boundary( self.scene, f"forklift_{i}", - occupancy_map_boundary=np.array([[-1.0, -1.9], [1.0, -1.9], [1.0, 2.1], [-1.0, 2.1]]), - occupancy_map_resolution=0.05, + np.array([[-1.0, -1.9], [1.0, -1.9], [1.0, 2.1], [-1.0, 2.1]]), + 0.05, ) - for i in range(NUM_FORKLIFTS) + for i in range(self._num_forklifts) ] obstacles += [ - SceneFixture( + SceneFixture.from_boundary( self.scene, f"box_{i}", - occupancy_map_boundary=np.array([[-0.5, -0.5], [0.5, -0.5], [0.5, 0.5], [-0.5, 0.5]]), - occupancy_map_resolution=0.05, + np.array([[-0.5, -0.5], [0.5, -0.5], [0.5, 0.5], [-0.5, 0.5]]), + 0.05, ) - for i in range(NUM_BOXES) + for i in range(self._num_boxes) ] return obstacles + + def get_background_fixture(self) -> SceneFixture | None: + if "background" not in self.scene.keys(): + return None + + background_map = OccupancyMap.from_ros_yaml( + ros_yaml_path=self.cfg.background_occupancy_yaml_file, + ) + background_fixture = SceneFixture(self.scene, "background", background_map) + return background_fixture diff --git a/source/isaaclab_mimic/isaaclab_mimic/locomanipulation_sdg/envs/locomanipulation_sdg_env.py b/source/isaaclab_mimic/isaaclab_mimic/locomanipulation_sdg/envs/locomanipulation_sdg_env.py index 83ae8e65684..5163ebc5f73 100644 --- a/source/isaaclab_mimic/isaaclab_mimic/locomanipulation_sdg/envs/locomanipulation_sdg_env.py +++ b/source/isaaclab_mimic/isaaclab_mimic/locomanipulation_sdg/envs/locomanipulation_sdg_env.py @@ -89,3 +89,7 @@ def get_end_fixture(self) -> SceneFixture: def get_obstacle_fixtures(self) -> list[SceneFixture]: """Get the set of obstacle fixtures.""" raise NotImplementedError + + def get_background_fixture(self) -> SceneFixture | None: + """Get the background fixture body.""" + raise NotImplementedError diff --git a/source/isaaclab_mimic/isaaclab_mimic/locomanipulation_sdg/occupancy_map_utils.py b/source/isaaclab_mimic/isaaclab_mimic/locomanipulation_sdg/occupancy_map_utils.py index b2fdbdfb852..87c66042bb0 100644 --- a/source/isaaclab_mimic/isaaclab_mimic/locomanipulation_sdg/occupancy_map_utils.py +++ b/source/isaaclab_mimic/isaaclab_mimic/locomanipulation_sdg/occupancy_map_utils.py @@ -220,7 +220,9 @@ def from_ros_image( data = 255 - data freespace_mask = data < free_thresh - occupied_mask = data > occupied_thresh + + # To handle unknown areas as occupied + occupied_mask = ~freespace_mask return OccupancyMap.from_masks( freespace_mask=freespace_mask, occupied_mask=occupied_mask, resolution=resolution, origin=origin @@ -477,10 +479,7 @@ def check_world_point_in_bounds(self, point: Point2d) -> bool: x_px = int(pixel[0, 0]) y_px = int(pixel[0, 1]) - if (x_px < 0) or (x_px >= self.width_pixels()) or (y_px < 0) or (y_px >= self.height_pixels()): - return False - - return True + return self.check_pixel_in_bounds(x_px, y_px) def check_world_point_in_freespace(self, point: Point2d) -> bool: """Check if a world coordinate is inside the freespace region of the occupancy map @@ -500,6 +499,21 @@ def check_world_point_in_freespace(self, point: Point2d) -> bool: freespace = self.freespace_mask() return bool(freespace[y_px, x_px]) + def check_pixel_in_bounds(self, x_px: int, y_px: int) -> bool: + """Check if a pixel coordinate is inside the bounds of the occupancy map. + + Args: + x (int): The x coordinate. + y (int): The y coordinate. + + Returns: + bool: True if the coordinate is inside the bounds of the occupancy map. + """ + if (x_px < 0) or (x_px >= self.width_pixels()) or (y_px < 0) or (y_px >= self.height_pixels()): + return False + + return True + def transformed(self, transform: np.ndarray) -> "OccupancyMap": return transform_occupancy_map(self, transform) @@ -680,10 +694,13 @@ def occupancy_map_add_to_stage( draw = ImageDraw.Draw(image) line_coordinates = [] path_pixels = occupancy_map.world_to_pixel_numpy(draw_path) - for i in range(len(path_pixels)): - line_coordinates.append(int(path_pixels[i, 0])) - line_coordinates.append(int(path_pixels[i, 1])) width_pixels = draw_path_line_width_meter / occupancy_map.resolution + circle_radius = int(width_pixels / 2) + for i in range(len(path_pixels)): + x, y = int(path_pixels[i, 0]), int(path_pixels[i, 1]) + line_coordinates.append(x) + line_coordinates.append(y) + draw.ellipse([x - circle_radius, y - circle_radius, x + circle_radius, y + circle_radius], fill="red") draw.line(line_coordinates, fill="green", width=int(width_pixels / 2), joint="curve") # need to flip, ros uses inverted coordinates on y axis @@ -696,6 +713,7 @@ def occupancy_map_add_to_stage( # Add model modelRoot = UsdGeom.Xform.Define(stage, path) Usd.ModelAPI(modelRoot).SetKind(Kind.Tokens.component) + UsdGeom.Imageable(modelRoot).MakeInvisible() # Add mesh mesh = UsdGeom.Mesh.Define(stage, os.path.join(path, "mesh")) diff --git a/source/isaaclab_mimic/isaaclab_mimic/locomanipulation_sdg/path_utils.py b/source/isaaclab_mimic/isaaclab_mimic/locomanipulation_sdg/path_utils.py index f3e203f401e..3baacc3dfb3 100644 --- a/source/isaaclab_mimic/isaaclab_mimic/locomanipulation_sdg/path_utils.py +++ b/source/isaaclab_mimic/isaaclab_mimic/locomanipulation_sdg/path_utils.py @@ -171,7 +171,7 @@ def find_nearest(self, point: torch.Tensor) -> tuple[torch.Tensor, float, tuple[ return min_pt, min_pt_dist_along_path, min_pt_seg, min_pt_dist_to_seg -def plan_path(start: HasPose2d, end: HasPose2d, occupancy_map: OccupancyMap) -> torch.Tensor: +def plan_path(start: HasPose2d, end: HasPose2d, occupancy_map: OccupancyMap) -> torch.Tensor | None: """Plan collision-free path between start and end positions. Args: @@ -196,6 +196,25 @@ def plan_path(start: HasPose2d, end: HasPose2d, occupancy_map: OccupancyMap) -> start_yx_pixels = start_xy_pixels[..., 0, ::-1] end_yx_pixels = end_xy_pixels[..., 0, ::-1] + # Check if end_yx_pixels are inside the occupancy map bounds + map_height, map_width = occupancy_map.freespace_mask().shape + start_y, start_x = int(start_yx_pixels[0]), int(start_yx_pixels[1]) + end_y, end_x = int(end_yx_pixels[0]), int(end_yx_pixels[1]) + + if not occupancy_map.check_pixel_in_bounds(start_x, start_y): + print( + f"Warning: start_yx_pixels ({start_y}, {start_x}) is outside occupancy map bounds " + f"(height={map_height}, width={map_width})" + ) + return None + + if not occupancy_map.check_pixel_in_bounds(end_x, end_y): + print( + f"Warning: end_yx_pixels ({end_y}, {end_x}) is outside occupancy map bounds " + f"(height={map_height}, width={map_width})" + ) + return None + # Generate path using the mobility path planner path_planner_output = generate_paths(start=start_yx_pixels, freespace=occupancy_map.freespace_mask()) diff --git a/source/isaaclab_mimic/isaaclab_mimic/locomanipulation_sdg/scene_utils.py b/source/isaaclab_mimic/isaaclab_mimic/locomanipulation_sdg/scene_utils.py index dc9c969171c..79cf307c9cb 100644 --- a/source/isaaclab_mimic/isaaclab_mimic/locomanipulation_sdg/scene_utils.py +++ b/source/isaaclab_mimic/isaaclab_mimic/locomanipulation_sdg/scene_utils.py @@ -7,8 +7,10 @@ import numpy as np import torch +import warp as wp import isaaclab.utils.math as math_utils +from isaaclab.sim.views import XformPrimView from .occupancy_map_utils import OccupancyMap, intersect_occupancy_maps from .transform_utils import transform_mul @@ -83,7 +85,8 @@ def __init__(self, scene, entity_name: str, body_name: str): def get_pose(self): """Get the 3D pose of the entity.""" - pose = self.scene[self.entity_name].data.body_link_state_w[ + body_link_state_w = wp.to_torch(self.scene[self.entity_name].data.body_link_state_w) + pose = body_link_state_w[ :, self.scene[self.entity_name].data.body_names.index(self.body_name), :7, @@ -98,16 +101,25 @@ def __init__(self, scene, entity_name: str): self.scene = scene self.entity_name = entity_name + def _get_xform_view(self) -> XformPrimView: + """Return the XformPrimView for this asset, refreshing it if prims were not yet cloned.""" + xform_prim = self.scene[self.entity_name] + if xform_prim.count == 0: + # The view was created before environment cloning; rebuild it now that prims exist. + xform_prim = XformPrimView(xform_prim._prim_path, device=xform_prim.device) + self.scene.extras[self.entity_name] = xform_prim + return xform_prim + def get_pose(self): """Get the 3D pose of the entity.""" - xform_prim = self.scene[self.entity_name] + xform_prim = self._get_xform_view() position, orientation = xform_prim.get_world_poses() pose = torch.cat([position, orientation], dim=-1) return pose def set_pose(self, pose: torch.Tensor): """Set the 3D pose of the entity.""" - xform_prim = self.scene[self.entity_name] + xform_prim = self._get_xform_view() position = pose[..., :3] orientation = pose[..., 3:] xform_prim.set_world_poses(position, orientation, None) @@ -124,8 +136,9 @@ def get_pose(self): """Get the 3D pose of the entity.""" parent_pose = self.parent.get_pose() + relative_pose = self.relative_pose.to(parent_pose.device) - pose = transform_mul(parent_pose, self.relative_pose) + pose = transform_mul(parent_pose, relative_pose) return pose @@ -133,23 +146,53 @@ def get_pose(self): class SceneFixture(SceneAsset, HasOccupancyMap): """A helper class for working with assets in a scene that have an associated occupancy map.""" - def __init__( - self, scene, entity_name: str, occupancy_map_boundary: np.ndarray, occupancy_map_resolution: float = 0.05 - ): + def __init__(self, scene, entity_name: str, local_occupancy_map: OccupancyMap): + """Initialize a SceneFixture from a local occupancy map + + Args: + scene: The scene + entity_name: The name of the entity + local_occupancy_map: The local occupancy map + """ SceneAsset.__init__(self, scene, entity_name) - self.occupancy_map_boundary = occupancy_map_boundary - self.occupancy_map_resolution = occupancy_map_resolution + self.local_occupancy_map = local_occupancy_map + + @classmethod + def from_boundary( + cls, scene, entity_name: str, occupancy_map_boundary: np.ndarray, occupancy_map_resolution: float = 0.05 + ) -> "SceneFixture": + """Create a SceneFixture from a known boundary/resolution pair + + Args: + scene: The scene + entity_name: The name of the entity + occupancy_map_boundary: The boundary of the occupancy map + occupancy_map_resolution: The resolution of the occupancy map + + Returns: + SceneFixture: The SceneFixture + """ + occupancy_map = OccupancyMap.from_occupancy_boundary( + boundary=occupancy_map_boundary, resolution=occupancy_map_resolution + ) + return cls(scene, entity_name, occupancy_map) def get_occupancy_map(self): - local_occupancy_map = OccupancyMap.from_occupancy_boundary( - boundary=self.occupancy_map_boundary, resolution=self.occupancy_map_resolution - ) + """Get the occupancy map of the SceneFixture - transform = self.get_transform_2d().detach().cpu().numpy() + Returns: + OccupancyMap: The occupancy map + """ + if self.local_occupancy_map is None: + raise RuntimeError("SceneFixture requires an occupancy map before querying it.") - occupancy_map = local_occupancy_map.transformed(transform) + transform = self.get_transform_2d().detach().cpu().numpy() + # get_world_poses() may return a batched (num_envs, 3, 3) or empty (0, 3, 3) tensor. + # For a fixed background asset placed at the world origin, fall back to identity when empty. + if transform.ndim == 3: + transform = transform[0] if transform.shape[0] > 0 else np.eye(3) - return occupancy_map + return self.local_occupancy_map.transformed(transform) def place_randomly( diff --git a/source/isaaclab_mimic/isaaclab_mimic/locomanipulation_sdg/transform_utils.py b/source/isaaclab_mimic/isaaclab_mimic/locomanipulation_sdg/transform_utils.py index 73406a187ff..8d832656c23 100644 --- a/source/isaaclab_mimic/isaaclab_mimic/locomanipulation_sdg/transform_utils.py +++ b/source/isaaclab_mimic/isaaclab_mimic/locomanipulation_sdg/transform_utils.py @@ -44,5 +44,8 @@ def transform_inv(transform: torch.Tensor) -> torch.Tensor: def transform_relative_pose(world_pose: torch.Tensor, src_frame_pose: torch.Tensor, dst_frame_pose: torch.Tensor): """Compute the relative pose with respect to a source frame, and apply this relative pose to a destination frame.""" + device = dst_frame_pose.device + world_pose = world_pose.to(device) + src_frame_pose = src_frame_pose.to(device) pose = transform_mul(dst_frame_pose, transform_mul(transform_inv(src_frame_pose), world_pose)) return pose diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomanipulation/pick_place/fixed_base_upper_body_ik_g1_env_cfg.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomanipulation/pick_place/fixed_base_upper_body_ik_g1_env_cfg.py index aee922770fa..e225660e8e8 100644 --- a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomanipulation/pick_place/fixed_base_upper_body_ik_g1_env_cfg.py +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomanipulation/pick_place/fixed_base_upper_body_ik_g1_env_cfg.py @@ -345,7 +345,10 @@ class TerminationsCfg: func=base_mdp.root_height_below_minimum, params={"minimum_height": 0.5, "asset_cfg": SceneEntityCfg("object")} ) - success = DoneTerm(func=manip_mdp.task_done_pick_place, params={"task_link_name": "right_wrist_yaw_link"}) + success = DoneTerm( + func=locomanip_mdp.task_done_pick_place_table_frame, + params={"task_link_name": "right_wrist_yaw_link", "table_cfg": SceneEntityCfg("packing_table")}, + ) ## diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomanipulation/pick_place/locomanipulation_g1_env_cfg.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomanipulation/pick_place/locomanipulation_g1_env_cfg.py index da49d3ccd4b..552c4d7d942 100644 --- a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomanipulation/pick_place/locomanipulation_g1_env_cfg.py +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomanipulation/pick_place/locomanipulation_g1_env_cfg.py @@ -382,7 +382,22 @@ class TerminationsCfg: func=base_mdp.root_height_below_minimum, params={"minimum_height": 0.5, "asset_cfg": SceneEntityCfg("object")} ) - success = DoneTerm(func=manip_mdp.task_done_pick_place, params={"task_link_name": "right_wrist_yaw_link"}) + object_too_far = DoneTerm( + func=locomanip_mdp.object_too_far_from_robot, + params={ + "robot_cfg": SceneEntityCfg("robot"), + "object_cfg": SceneEntityCfg("object"), + "max_distance": 1.0, + }, + ) + + success = DoneTerm( + func=locomanip_mdp.task_done_pick_place_table_frame, + params={ + "task_link_name": "right_wrist_yaw_link", + "table_cfg": SceneEntityCfg("packing_table_2"), + }, + ) ## diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomanipulation/pick_place/mdp/__init__.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomanipulation/pick_place/mdp/__init__.py index 7e559309b5c..d0275ceb7be 100644 --- a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomanipulation/pick_place/mdp/__init__.py +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomanipulation/pick_place/mdp/__init__.py @@ -10,3 +10,4 @@ from .actions import * # noqa: F401, F403 from .observations import * # noqa: F401, F403 +from .terminations import * # noqa: F401, F403 diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomanipulation/pick_place/mdp/terminations.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomanipulation/pick_place/mdp/terminations.py new file mode 100644 index 00000000000..c2f3fd0f8b0 --- /dev/null +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomanipulation/pick_place/mdp/terminations.py @@ -0,0 +1,154 @@ +# Copyright (c) 2022-2026, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause + +"""Termination functions specific to locomanipulation environments.""" + +from __future__ import annotations + +from typing import TYPE_CHECKING + +import torch +import warp as wp + +from isaaclab.assets import RigidObject +from isaaclab.managers import SceneEntityCfg +from isaaclab.sim.views import XformPrimView +from isaaclab.utils.math import quat_apply_inverse + +if TYPE_CHECKING: + from isaaclab.envs import ManagerBasedRLEnv + + +def task_done_pick_place_table_frame( + env: ManagerBasedRLEnv, + task_link_name: str = "", + object_cfg: SceneEntityCfg = SceneEntityCfg("object"), + table_cfg: SceneEntityCfg = SceneEntityCfg("table"), + right_wrist_max_x: float = 0.26, + min_x: float = 0.40, + max_x: float = 0.85, + min_y: float = -0.20, + max_y: float = 0.05, + max_height: float = 1.10, + min_vel: float = 0.20, + debug: bool = False, +) -> torch.Tensor: + """Determine if the object placement task is complete for locomanipulation environments. + + Unlike the base pick-place termination, this version checks object position relative + to the destination table frame rather than the environment origin. This is necessary + for locomanipulation tasks where the destination table can be placed at arbitrary + world positions. + + This function checks whether all success conditions for the task have been met: + 1. Object is within the target x/y range relative to the destination table + 2. Object is below a maximum height relative to the destination table + 3. Object velocity is below threshold + 4. Right robot wrist is retracted back towards body (past a given x threshold in table frame) + + Args: + env: The RL environment instance. + task_link_name: Name of the right wrist link on the robot. + object_cfg: Configuration for the object entity. + table_cfg: Configuration for the destination table entity (must be an XformPrimView). + right_wrist_max_x: Maximum x position of the right wrist in table frame for task completion. + min_x: Minimum x position of the object relative to the table for task completion. + max_x: Maximum x position of the object relative to the table for task completion. + min_y: Minimum y position of the object relative to the table for task completion. + max_y: Maximum y position of the object relative to the table for task completion. + max_height: Maximum height (z) of the object relative to the table for task completion. + min_vel: Maximum velocity magnitude of the object for task completion. + debug: If True, print debug info for the first environment each step. + + Returns: + Boolean tensor indicating which environments have completed the task. + """ + if task_link_name == "": + raise ValueError("task_link_name must be provided to task_done_pick_place") + + object: RigidObject = env.scene[object_cfg.name] + table = env.scene[table_cfg.name] + if not isinstance(table, XformPrimView): + raise TypeError(f"Expected table '{table_cfg.name}' to be an XformPrimView, got {type(table)}") + + # Get table world pose + table_pos_w, table_quat_w = table.get_world_poses() + + # Broadcast table pose if a single table is shared across all envs + object_root_pos_w = wp.to_torch(object.data.root_pos_w) # [num_envs, 3] + if table_pos_w.shape[0] == 1 and object_root_pos_w.shape[0] > 1: + table_pos_w = table_pos_w.expand(object_root_pos_w.shape[0], -1) + table_quat_w = table_quat_w.expand(object_root_pos_w.shape[0], -1) + + # Object position in table frame + object_to_table_pos = quat_apply_inverse(table_quat_w, object_root_pos_w - table_pos_w) + object_x = object_to_table_pos[:, 0] + object_y = object_to_table_pos[:, 1] + object_height = object_to_table_pos[:, 2] + object_vel = torch.abs(wp.to_torch(object.data.root_vel_w)) + + # Right wrist position in table frame + robot_body_pos_w = wp.to_torch(env.scene["robot"].data.body_pos_w) + right_eef_idx = env.scene["robot"].data.body_names.index(task_link_name) + right_wrist_pos_w = robot_body_pos_w[:, right_eef_idx, :] - table_pos_w + right_wrist_x = quat_apply_inverse(table_quat_w, right_wrist_pos_w)[:, 0] + + # Check all success conditions + done = object_x < max_x + done = torch.logical_and(done, object_x > min_x) + done = torch.logical_and(done, object_y < max_y) + done = torch.logical_and(done, object_y > min_y) + done = torch.logical_and(done, object_height < max_height) + done = torch.logical_and(done, right_wrist_x < right_wrist_max_x) + done = torch.logical_and(done, object_vel[:, 0] < min_vel) + done = torch.logical_and(done, object_vel[:, 1] < min_vel) + done = torch.logical_and(done, object_vel[:, 2] < min_vel) + + if debug: + env_id = 0 + obj_vel_env = object_vel[env_id] + print( + "[task_done_pick_place] " + f"obj_pos_t=({object_x[env_id]:.3f}, {object_y[env_id]:.3f}, {object_height[env_id]:.3f}), " + f"obj_vel=({obj_vel_env[0]:.3f}, {obj_vel_env[1]:.3f}, {obj_vel_env[2]:.3f}), " + f"right_wrist_x={right_wrist_x[env_id]:.3f} | " + f"x[{min_x:.3f},{max_x:.3f}] y[{min_y:.3f},{max_y:.3f}] " + f"z<{max_height:.3f} wrist_x<{right_wrist_max_x:.3f} vel<{min_vel:.3f} " + f"=> done={bool(done[env_id])}", + flush=True, + ) + + return done + + +def object_too_far_from_robot( + env: ManagerBasedRLEnv, + robot_cfg: SceneEntityCfg = SceneEntityCfg("robot"), + object_cfg: SceneEntityCfg = SceneEntityCfg("object"), + max_distance: float = 1.0, +) -> torch.Tensor: + """Terminate when the object is too far from the robot (failed to pick up). + + This checks the distance between the robot's root position and the object's position. + If the distance exceeds max_distance, the episode is terminated as a failure. + + Args: + env: The RL environment instance. + robot_cfg: Configuration for the robot entity. + object_cfg: Configuration for the object entity. + max_distance: Maximum allowed distance between robot and object. + + Returns: + Boolean tensor indicating which environments have exceeded the distance threshold. + """ + robot: RigidObject = env.scene[robot_cfg.name] + object: RigidObject = env.scene[object_cfg.name] + + robot_pos = wp.to_torch(robot.data.root_pos_w)[:, :3] + object_pos = wp.to_torch(object.data.root_pos_w)[:, :3] + + distance = torch.norm(robot_pos - object_pos, dim=1) + + return distance > max_distance