GaTech-RL2 · RyanPCo · Jun 16, 2026 · Jun 16, 2026 · Jun 16, 2026 · Jun 16, 2026
diff --git a/.gitmodules b/.gitmodules
@@ -1,3 +1,6 @@
 [submodule "external/openpi"]
 	path = external/openpi
 	url = https://github.com/GaTech-RL2/openpi.git
+[submodule "external/RoboTwin"]
+	path = external/RoboTwin
+	url = https://github.com/GaTech-RL2/RoboTwin.git
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -36,6 +36,17 @@ all `*.ipynb`. **Never read into context**: venvs (`emimic/`, `.venv/`), caches
   salloc -A gts-dxu345-rl2 -N1 -q inferno -t 1:00:00 --mem=75G --gres=gpu:h200:1
   ```
   Always use the `inferno` queue (`-q inferno`) rather than `ember` — it's faster. Adjust `-t`, `--mem`, and `--gres` to the job. `salloc` is best for interactive / iterative work (smoke tests, debugging) where you hold the node and run into it repeatedly. For large or long-running jobs (real training runs), submit through Hydra's submitit launcher instead (`hydra/launcher/submitit.yaml`) so the job queues and runs unattended. Lightweight read-only work (lint, type checks, small unit tests, file edits, single-file syntax checks) is fine on the login node.
+- **Pick the GPU variant by job — don't always queue for `h200`.** `gpu-h200` is the most contended partition, so jobs sit. Choose by what the job needs to fit in VRAM (pi0.5 is 3.6 B params ≈ **14.5 GB fp32 / ~7 GB bf16** for the weights alone). Each type is its own partition `gpu-<type>` with `--gres=gpu:<type>:N`, same `-A gts-dxu345-rl2 -q inferno`:
+  | partition | `--gres` type | GPU mem | use for |
+  |---|---|---|---|
+  | `gpu-h200` | `h200` | 141 GB | **pi0.5 full fp32 AdamW training** (model+optimizer ≈ 57 GB) |
+  | `gpu-rtxpro-blackwell` | `rtx_pro_6000_blackwell` | 96 GB | training (newer, fewer nodes) |
+  | `gpu-h100` | `h100` | 80 GB | training |
+  | `gpu-l40s` | `l40s` | 48 GB | **pi0.5 eval/inference** (the eval sbatch was originally sized for a 48 GB a40) |
+  | `gpu-a100` | `a100` | 40 GB | **pi0.5 eval/inference**; training only with `--adam8bit` + small batch |
+  | `gpu-rtx6000` | `rtx_6000` | 24 GB | small / bf16-only inference — tight for pi0.5, plentiful nodes |
+  | `gpu-v100` | `v100` | 16 GB | too small to load pi0.5 comfortably — avoid |
+  Rule of thumb: **training needs ≥80 GB** (h200/h100/blackwell); **eval/inference fits 40–48 GB** (l40s/a100, with room for DINOv2 retrieval + SAPIEN) and those queues schedule far faster than h200. **Each partition caps the CPU:GPU ratio differently** — `gpu-h200` is 8:1, `gpu-l40s` is **4:1** — so `--cpus-per-task` must be ≤ ratio × #GPUs or the job is rejected with "Invalid generic resource (gres) specification" (e.g. 8 CPUs / 1 l40s fails; use 4). Validate a header without queuing via `sbatch --test-only <script>`.
 - **Short GPU runs (eval-only, smoke, a few hundred forward passes): export `TORCH_COMPILE_DISABLE=1`.** pi0.5's `sample_actions` triggers a `torch.compile` max-autotune compile on the first call — minutes of warmup that only pays off across a long training run. Disabling it runs eager (slower per call, no warmup), a net win when you're not training for a while. Leave compile ON for real training.
 - Python 3.11. Activate the project venv before any Python tooling: `source emimic/bin/activate`.
 - Package is installed editable as `egomimic` (see `pyproject.toml`). Linting is `ruff` via pre-commit.

diff --git a/egomimic/hydra_configs/data/robotwin_local.yaml b/egomimic/hydra_configs/data/robotwin_local.yaml
@@ -0,0 +1,36 @@
+# RoboTwin reusable-corpus data recipe: load converted local .zarr stores (no S3/SQL)
+# through the standard MultiDataset pipeline as eva_bimanual cartesian episodes, so
+# RoboTwin can cotrain alongside the eva/human corpus. Produce the stores first with
+# `egomimic/ricl/scripts/robotwin_to_zarr.py` and point `paths.dataset_dir` at the
+# output dir. Mirrors `data/eva.yaml` but uses LocalEpisodeResolver + the PI image
+# keymap (`cartesian_pi` -> base_0_rgb / *_wrist_0_rgb) and a no-op filter.
+_target_: egomimic.pl_utils.pl_data_utils.MultiDataModuleWrapper
+
+train_datasets:
+  eva_bimanual:
+    _target_: egomimic.rldb.zarr.zarr_dataset_multi.MultiDataset._from_resolver
+    resolver:
+      _target_: egomimic.rldb.zarr.zarr_dataset_multi.LocalEpisodeResolver
+      folder_path: ${paths.dataset_dir} # dir of converted RoboTwin <hash>.zarr stores
+      key_map:
+        _target_: egomimic.rldb.embodiment.eva.Eva.get_keymap
+        keymap_mode: cartesian_pi
+      transform_list:
+        _target_: egomimic.rldb.embodiment.eva.Eva.get_transform_list
+        mode: cartesian
+    filters:
+      _target_: egomimic.rldb.filters.DatasetFilter
+      filter_lambdas: [] # load every local .zarr (all are RoboTwin eva_bimanual)
+    mode: total
+
+valid_datasets:
+  eva_bimanual: ${train_datasets.eva_bimanual}
+
+train_dataloader_params:
+  eva_bimanual:
+    batch_size: 8
+    num_workers: 4
+valid_dataloader_params:
+  eva_bimanual:
+    batch_size: 8
+    num_workers: 4
diff --git a/egomimic/ricl/CLAUDE.md b/egomimic/ricl/CLAUDE.md
@@ -23,6 +23,61 @@ resolve `episode_lists/`, `pg_tokenizer/`, `outputs/` relative to the parent dir
   builder over the pre-pooled `top_image_embeddings`; runnable `--build-cache`),
   `droid_eval.py` (`DroidRiclEval` retrieval vs a *true* zero-context floor,
   paired flow-loss; `DroidRiclModelWrapper`).
+- RoboTwin integration (joint-space shim mirroring DROID + a reusable Zarr
+  converter; goal/steps in `robotwin_setup.md`): `robotwin_data.py`
+  (`RoboTwinCorpus` reads RoboTwin HDF5 — `joint_action/vector` + `endpose` +
+  `observation/<cam>/rgb` — query dataset, bank provider, within-task LOO cache;
+  **detects the embodiment's qpos dim + gripper slots** — `aloha-agilex` is 6-DOF
+  -> 14-D, slots 6/13 — so don't hardcode 16/(7,15)), `robotwin_eval.py`
+  (`RoboTwinRiclEval`/`RoboTwinRiclModelWrapper`, thin `DroidRiclEval` subclass for
+  RoboTwin's dim/grippers). Scripts: `scripts/download_robotwin.py`
+  (HF zip slice from `dataset/<task>/<embodiment>_<setting>_<N>.zip` — the bimanual
+  embodiment is `aloha-agilex`, smallest is `clean_50` ~230 MB; plus a `--mode
+  synthetic` fixture for tests), `scripts/robotwin_to_zarr.py` (HDF5 ->
+  `eva_bimanual` cartesian Zarr via `ZarrWriter`; needs `endpose`; cmd==obs pose,
+  chunked at load like aria), `scripts/train_robotwin_ricl.py` (`--stage cpu` = fast
+  data-path/collate smoke with `--embed fake`; `--stage full` = GPU training on a
+  Lightning `Trainer` — NOT submitit; launch via `scripts/train_robotwin_ricl.sbatch`
+  [PACE: gpu-h200, qos inferno] and it saves `quantiles.json` beside the checkpoints
+  for eval), `scripts/build_robotwin_bank_index.py` (consolidated DINOv2 bank index over
+  a `RoboTwinCorpus`, built with the SAME `.embed` as eval's `OnlineRetriever`; output
+  format = `build_embedding_index.build_retrieval_index`). Tests:
+  `tests/{robotwin_data_test.py,robotwin_to_zarr_test.py,robotwin_bank_index_test.py}`
+  build a synthetic 6-DOF fixture. Reusable-corpus path: `hydra_configs/data/robotwin_local.yaml`
+  (`LocalEpisodeResolver` + `Eva` keymap/transform, no SQL/S3) +
+  `tests/robotwin_zarr_multidataset_test.py` (converts fixture -> Zarr -> loads via
+  `MultiDataset._from_resolver` + replicates trainHydra norm-stats wiring). Closed-loop
+  eval: `robotwin_adapter.py` (model-free glue — `obs_to_state`,
+  `quantile_norm`/`unnorm`, `state_to_model_input`, `unnormalize_action`,
+  `OnlineRetriever`; unit-tested in `tests/robotwin_adapter_test.py` with the model/sim
+  mocked) and `robotwin_policy.py` — the deploy contract (`encode_obs`/`get_model`/`eval`/
+  `reset_model` + `PIRiclPolicy` backed by EgoVerse `PIRicl`), the **source of truth**.
+  RoboTwin loads it via a **thin shim** at `policy/pi_ricl_egoverse/`
+  (`__init__.py` = `from .deploy_policy import *`; `deploy_policy.py` = `from
+  egomimic.ricl.robotwin_policy import *`; + `deploy_policy.yml`/`eval.sh`) that lives in
+  the **`GaTech-RL2/RoboTwin` fork** = the `external/RoboTwin` submodule (matches the
+  `GaTech-RL2/openpi` pattern; new-cluster setup = `git submodule update --init
+  external/RoboTwin`). Driven by RoboTwin's SAPIEN `script/eval_policy.py`. The shim's
+  `deploy_policy.yml` needs four artifacts: `egoverse_checkpoint` (trained `.ckpt`),
+  `bank_root` (a RoboTwin task dir), `bank_index_dir` (`build_robotwin_bank_index.py`
+  output), `quantiles_path` (the `quantiles.json` the trainer saves). Sim deps
+  (sapien/mplib/curobo/pytorch3d) install **selectively** — do NOT run RoboTwin's
+  `script/_install.sh` (it pins `torch==2.4.1`, breaking emimic's 2.7.1). Eval task
+  config = `demo_clean` (matches the `clean` data slice).
+  **Cross-embodiment RICL** (retrieve from embodiment A, predict embodiment B —
+  arx-x5 and aloha-agilex share the identical 14-D dual-arm layout): `robotwin_data.
+  build_cross_embodiment_retrieval_cache` (per emb-B query frame, kNN emb-A frames of
+  the SAME task; no leave-one-out — different corpora) + `train_robotwin_ricl.
+  build_data_xemb` (gated by `--bank-root`/`--eval-bank-root`; query/target = emb B
+  with its own quantiles, in-context bank = emb A with ITS own quantiles, so the
+  spliced demo is encoded in emb-A units). Pass `BANK_ROOT`/`EVAL_BANK_ROOT` to
+  `train_robotwin_ricl.sbatch` (also `NO_RANDOM=1` — the xemb path builds no
+  random-control caches). The run saves `bank_quantiles.json` (emb-A) beside
+  `quantiles.json` (emb-B). Closed-loop: `scripts/eval_xemb_compare.sbatch` (sim
+  controls emb B; cross-emb retrieves the emb-A bank, passing `bank_quantiles_path`
+  to `robotwin_policy` so retrieved demos normalize in emb-A units — the one deploy
+  change cross-embodiment needs). Result: arx-x5 demos cut held-out aloha action
+  loss ~40% (helps 100% of frames).
 - Embedding -> index: embed a corpus with
   `egomimic/scripts/embedding_process/zarr_embedding.py` — supports SQL-registry
   filters (`--filter-lambda` + `--sync-root`) and writes to a writable mirror

diff --git a/egomimic/ricl/robotwin_adapter.py b/egomimic/ricl/robotwin_adapter.py
@@ -0,0 +1,191 @@
+"""Pure-Python glue for the RoboTwin closed-loop RICL eval adapter.
+
+The closed-loop benchmark runs a trained EgoVerse PyTorch ``PIRicl`` checkpoint
+inside RoboTwin's SAPIEN sim. This module holds the embodiment/observation/action
+**glue** that is independent of both SAPIEN and the model, so it is directly
+unit-testable:
+
+- ``obs_to_state``        : RoboTwin obs dict -> ([head, right, left] rgb, qpos vector)
+  (mirrors ``policy/pi05/deploy_policy.encode_obs``).
+- ``quantile_norm`` / ``quantile_unnorm`` : the [-1,1] map used by the shim and its
+  inverse (for turning a model action back into RoboTwin qpos).
+- ``state_to_model_input`` : raw qpos -> normalized, slot-filled 32-D proprio.
+- ``unnormalize_action``  : model action (32-D, normalized) -> RoboTwin qpos chunk.
+- ``OnlineRetriever``     : embed the live head frame -> kNN against a prebuilt demo
+  bank -> assemble ``ricl_retrieved_*`` blocks (the same keys ``build_ricl_collate``
+  produces) for the model prefix.
+
+The RoboTwin-side ``policy/pi_ricl_egoverse/{deploy_policy,pi_model}.py`` is a thin
+wrapper that loads the checkpoint + a DINOv2 embedder and drives these primitives.
+Quantiles come from the training corpus
+(``robotwin_data.RoboTwinCorpus.save_quantiles`` / ``load_quantiles``); the action
+space is RoboTwin's native qpos so the adapter returns a ``(steps, state_dim)`` chunk
+executed with ``take_action(action)`` in the default ``qpos`` control mode.
+"""
+
+from __future__ import annotations
+
+from typing import Callable
+
+import numpy as np
+import torch
+
+from egomimic.ricl.robotwin_data import slot32
+
+# RoboTwin obs camera -> RICL image key (head exterior is the retrieval/base view).
+_RGB_ORDER = ("head_camera", "right_camera", "left_camera")  # matches pi05 encode_obs
+
+
+def obs_to_state(observation: dict) -> tuple[list[np.ndarray], np.ndarray]:
+    """RoboTwin ``get_obs()`` dict -> ``([head, right, left] rgb, qpos vector)``.
+
+    Mirrors ``external/RoboTwin/policy/pi05/deploy_policy.encode_obs`` exactly so the
+    same eval driver works."""
+    obs = observation["observation"]
+    rgb = [np.asarray(obs[cam]["rgb"]) for cam in _RGB_ORDER]
+    state = np.asarray(observation["joint_action"]["vector"], dtype=np.float32)
+    return rgb, state
+
+
+def quantile_norm(x: np.ndarray, q01: np.ndarray, q99: np.ndarray) -> np.ndarray:
+    """Map raw qpos to [-1, 1] (same formula as ``RoboTwinCorpus.quantile_norm``)."""
+    x = np.asarray(x, dtype=np.float32)
+    return 2.0 * ((x - q01) / (q99 - q01 + 1e-6)) - 1.0
+
+
+def quantile_unnorm(x: np.ndarray, q01: np.ndarray, q99: np.ndarray) -> np.ndarray:
+    """Inverse of :func:`quantile_norm`: [-1, 1] -> raw qpos units."""
+    x = np.asarray(x, dtype=np.float32)
+    return (x + 1.0) / 2.0 * (q99 - q01 + 1e-6) + q01
+
+
+def state_to_model_input(
+    state: np.ndarray, q_state: tuple[np.ndarray, np.ndarray]
+) -> np.ndarray:
+    """Raw qpos -> normalized, slot-filled 32-D proprio (the model's state input)."""
+    q01, q99 = q_state
+    return slot32(quantile_norm(state, q01, q99))
+
+
+def unnormalize_action(
+    pred: np.ndarray,
+    q_actions: tuple[np.ndarray, np.ndarray],
+    state_dim: int,
+) -> np.ndarray:
+    """Model action -> RoboTwin qpos chunk.
+
+    ``pred`` is ``(H, 32)`` or ``(32,)`` normalized (the 32-D shared space, slot-filled);
+    keep the first ``state_dim`` dims and invert the quantile normalization with the
+    training corpus's action quantiles. Returns ``(H, state_dim)`` (or ``(state_dim,)``)."""
+    pred = np.asarray(pred, dtype=np.float32)
+    q01, q99 = q_actions
+    sliced = pred[..., :state_dim]
+    return quantile_unnorm(sliced, q01, q99)
+
+
+def _to_chw01(img_hwc: np.ndarray, hw: tuple[int, int] = (224, 224)) -> torch.Tensor:
+    """HWC uint8 (or float) -> CHW float in [0, 1], resized to ``hw`` (mirrors the
+    collate's image handling so retrieved frames match the trained representation)."""
+    t = torch.as_tensor(np.asarray(img_hwc))
+    if t.ndim == 3 and t.shape[-1] in (1, 3):
+        t = t.permute(2, 0, 1)
+    t = t.float()
+    if t.max() > 1.5:  # uint8 -> [0, 1]
+        t = t / 255.0
+    if tuple(t.shape[-2:]) != tuple(hw):
+        t = torch.nn.functional.interpolate(
+            t[None], size=tuple(hw), mode="bilinear", align_corners=False
+        )[0]
+    return t.contiguous()
+
+
+class OnlineRetriever:
+    """Build ``ricl_retrieved_*`` blocks for a live query frame at eval time.
+
+    Embeds the query head frame, finds the k nearest demos in a prebuilt bank
+    (``egomimic.ricl.retrieval.RetrievalIndex`` over the same 64-patch DINOv2
+    descriptors used in training), and gathers each neighbor's (image, state, action)
+    via ``bank_provider`` into the batch keys the model prefix consumes — the online
+    analog of ``build_ricl_collate``. The embedder and index are injected so this is
+    unit-testable with stubs (no DINOv2 needed).
+
+    Args:
+        index: a ``RetrievalIndex`` (``.query(vecs, k) -> (hashes, frames, dists)``).
+        bank_provider: ``(episode_hash, frame_idx) -> {"image": HWC, "state": (D,),
+            "action": (H, D)}`` (e.g. ``robotwin_data.make_robotwin_bank_provider``).
+        embedder: object with ``.embed(images_thwc) -> (T, EMBED_DIM)`` (e.g.
+            ``retrieval.DinoV2Embedder``); only called on the single query frame.
+        k, action_horizon, state_dim: block dimensions; image_hw the resize target.
+    """
+
+    def __init__(
+        self,
+        index,
+        bank_provider: Callable[[str, int], dict],
+        embedder,
+        *,
+        k: int = 4,
+        action_horizon: int = 15,
+        state_dim: int = 14,
+        image_hw: tuple[int, int] = (224, 224),
+    ):
+        self.index = index
+        self.bank_provider = bank_provider
+        self.embedder = embedder
+        self.k = int(k)
+        self.action_horizon = int(action_horizon)
+        self.state_dim = int(state_dim)
+        self.image_hw = image_hw
+
+    def retrieve(self, query_image_hwc: np.ndarray) -> dict:
+        """Return ``ricl_retrieved_*`` tensors (batch dim 1) for one query frame."""
+        H, W = self.image_hw
+        k, Ha, D = self.k, self.action_horizon, self.state_dim
+        qvec = self.embedder.embed(np.asarray(query_image_hwc)[None])  # (1, EMBED_DIM)
+        hashes, frames, dists = self.index.query(qvec, k=k)  # each (1, k)
+        hashes, frames, dists = hashes[0], frames[0], dists[0]
+
+        imgs = [torch.zeros(3, H, W) for _ in range(k)]
+        states = [torch.zeros(D) for _ in range(k)]
+        acts = [torch.zeros(Ha, D) for _ in range(k)]
+        mask = torch.zeros(k, dtype=torch.bool)
+        dist = torch.full((k,), float("inf"))
+        for i in range(k):
+            h = str(hashes[i])
+            if not h:  # padded slot (bank smaller than k)
+                continue
+            blk = self.bank_provider(h, int(frames[i]))
+            imgs[i] = _to_chw01(blk["image"], self.image_hw)
+            st = torch.as_tensor(np.asarray(blk["state"]), dtype=torch.float32).reshape(
+                -1
+            )
+            states[i] = _fit(st, D)
+            ac = torch.as_tensor(np.asarray(blk["action"]), dtype=torch.float32)
+            acts[i] = _fit_action(ac, Ha, D)
+            mask[i] = True
+            dist[i] = float(dists[i])
+        return {
+            "ricl_retrieved_images": torch.stack(imgs)[None],  # (1, k, 3, H, W)
+            "ricl_retrieved_state": torch.stack(states)[None],  # (1, k, D)
+            "ricl_retrieved_action": torch.stack(acts)[None],  # (1, k, Ha, D)
+            "ricl_retrieved_mask": mask[None],  # (1, k)
+            "ricl_retrieved_dist": dist[None],  # (1, k)
+        }
+
+
+def _fit(v: torch.Tensor, dim: int) -> torch.Tensor:
+    if v.numel() == dim:
+        return v
+    out = torch.zeros(dim, dtype=torch.float32)
+    n = min(dim, v.numel())
+    out[:n] = v.reshape(-1)[:n]
+    return out
+
+
+def _fit_action(a: torch.Tensor, horizon: int, dim: int) -> torch.Tensor:
+    if a.ndim == 1:
+        a = a[None, :]
+    h = min(horizon, a.shape[0])
+    out = torch.zeros(horizon, dim, dtype=torch.float32)
+    out[:h, : min(dim, a.shape[1])] = a[:h, : min(dim, a.shape[1])]
+    return out