[WIP] VLN Benchmark: H1 Navigation in Matterport 3D with NaVILA VLM by kanghui0204 · Pull Request #446 · isaac-sim/IsaacLab-Arena

kanghui0204 · 2026-02-26T12:08:17Z

[WIP] VLN Benchmark: H1 Navigation in Matterport 3D with NaVILA VLM

Summary

Add Vision-Language Navigation (VLN) benchmark support to IsaacLab Arena, enabling H1 humanoid navigation in Matterport 3D indoor scenes using the NaVILA VLM.

This is a draft PR for early review and feedback. The core pipeline is functional and tested, but some areas may need refinement before merging.

Architecture

Two-level hierarchical policy:

High-level: NaVILA VLM generates velocity commands from RGB image history
Low-level: RSL-RL locomotion policy converts velocity commands to joint actions

Communication between Isaac Sim (client) and NaVILA (server) uses Arena's ZeroMQ remote-policy framework (merged in #394).

What's Included

Component	Location	Description
H1 Embodiment	`isaaclab_arena/embodiments/h1/`	Standard H1 + VLN extension (cameras, observations)
VLN Task	`isaaclab_arena/tasks/vln_r2r_matterport_task.py`	R2R episode management, scene filtering, termination
VLN Metrics	`isaaclab_arena/metrics/vln_metrics.py`	SPL, Success, PathLength, DistanceToGoal
Matterport Background	`isaaclab_arena/assets/matterport_background.py`	Scene loading + lighting + ground plane
Client Policy	`isaaclab_arena/policy/vln/`	VlnVlmLocomotionPolicy (VLM + RSL-RL composite)
NaVILA Server	`isaaclab_arena_navila/`	NaVilaServerPolicy (LLaVA-based VLM inference)
Environment	`isaaclab_arena_environments/vln_environment.py`	`h1_vln_matterport` environment registration
Docker	`docker/Dockerfile.vln_server`, `docker/run_vln_server.sh`	VLM server container
Pretrained LL model	`isaaclab_arena/policy/vln/pretrained/`	H1 locomotion checkpoint (4.7MB)

Key Design Decisions

Full image history + uniform sampling: The VLM receives 8 uniformly sampled frames from the entire episode history (not just the last 8). This enables the VLM to determine task completion and output "stop".
Scene filtering: Episodes are automatically filtered by the loaded USD scene. --episode_start/end refers to indices within the filtered set.
XY metrics: Distance calculations use horizontal (XY) plane only, because robot pelvis height (~0.9m) differs from dataset waypoint height (~0.17m floor level).
Modular server: NaVILA server is in a separate package (isaaclab_arena_navila/), following the isaaclab_arena_gr00t pattern. Other VLMs can be added without changing client code.

Test Results

Episodes	Success	SPL	Avg Distance-to-Goal
10 (zsNo4HB9uLZ)	0.40	0.36	6.17m
3 (zsNo4HB9uLZ)	0.67	0.59	5.41m
1 (best case)	1.00	0.77	0.77m

VLM correctly outputs "stop" when task is complete (e.g., "I think I should stop because I have finished the instruction.").

Known Limitations

num_envs must be 1 (multi-env VLM instruction tracking not yet implemented)
Scene switching requires process restart
Uses invisible ground plane instead of Matterport mesh collision (GPU physics limitation)
Pretrained checkpoint included for early testing; will be removed before final merge

How to Test

See isaaclab_arena_navila/README.md for full setup instructions (English + Chinese).

Server
bash docker/run_vln_server.sh -m /path/to/navila-model --port 5555

Client (inside Isaac Sim container)

/isaac-sim/python.sh -u -m isaaclab_arena.evaluation.policy_runner \
--enable_cameras --num_envs 1 \
--policy_type isaaclab_arena.policy.vln.vln_vlm_locomotion_policy.VlnVlmLocomotionPolicy \
--remote_host localhost --remote_port 5555 \
--ll_checkpoint_path isaaclab_arena/policy/vln/pretrained/h1_navila_locomotion.pt \
--ll_agent_cfg isaaclab_arena/policy/vln/pretrained/h1_navila_agent.yaml \
--num_episodes 5 \
h1_vln_matterport \
--usd_path /datasets/VLN-CE-Isaac/matterport_usd/zsNo4HB9uLZ/zsNo4HB9uLZ.usd \
--r2r_dataset_path /datasets/VLN-CE-Isaac/vln_ce_isaac_v1.json.gz

Checklist

Two-level hierarchical policy for Vision-Language Navigation: - High-level: NaVILA VLM generates velocity commands from RGB images - Low-level: RSL-RL locomotion policy converts to joint actions Code organization follows Arena patterns: - isaaclab_arena/embodiments/h1/ Standard H1 + VLN extension - isaaclab_arena/tasks/ VlnR2rMatterportTask - isaaclab_arena/metrics/ SPL, Success, PathLength, DTG (XY) - isaaclab_arena/assets/ MatterportBackground with lighting - isaaclab_arena/policy/vln/ VlnVlmLocomotionPolicy (client) - isaaclab_arena_navila/ NaVilaServerPolicy (server) - isaaclab_arena_environments/ h1_vln_matterport environment Key features: - Auto scene-episode matching via scene_filter - Full image history + uniform sampling for VLM stop detection - Configurable head + follow cameras - Docker VLM server (docker/Dockerfile.vln_server) - VLN-CE R2R dataset support (11 Matterport scenes, 1077 episodes) Verified: success=1.0, SPL=0.77 on zsNo4HB9uLZ scene

kanghui0204 requested review from alexmillane, viiik-inside and xyao-nv February 26, 2026 12:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] VLN Benchmark: H1 Navigation in Matterport 3D with NaVILA VLM#446

[WIP] VLN Benchmark: H1 Navigation in Matterport 3D with NaVILA VLM#446
kanghui0204 wants to merge 1 commit intomainfrom
feature/vln-benchmark

kanghui0204 commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kanghui0204 commented Feb 26, 2026