Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
[submodule "external/openpi"]
path = external/openpi
url = https://github.com/GaTech-RL2/openpi.git
[submodule "external/RoboTwin"]
path = external/RoboTwin
url = https://github.com/GaTech-RL2/RoboTwin.git
11 changes: 11 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,17 @@ all `*.ipynb`. **Never read into context**: venvs (`emimic/`, `.venv/`), caches
salloc -A gts-dxu345-rl2 -N1 -q inferno -t 1:00:00 --mem=75G --gres=gpu:h200:1
```
Always use the `inferno` queue (`-q inferno`) rather than `ember` — it's faster. Adjust `-t`, `--mem`, and `--gres` to the job. `salloc` is best for interactive / iterative work (smoke tests, debugging) where you hold the node and run into it repeatedly. For large or long-running jobs (real training runs), submit through Hydra's submitit launcher instead (`hydra/launcher/submitit.yaml`) so the job queues and runs unattended. Lightweight read-only work (lint, type checks, small unit tests, file edits, single-file syntax checks) is fine on the login node.
- **Pick the GPU variant by job — don't always queue for `h200`.** `gpu-h200` is the most contended partition, so jobs sit. Choose by what the job needs to fit in VRAM (pi0.5 is 3.6 B params ≈ **14.5 GB fp32 / ~7 GB bf16** for the weights alone). Each type is its own partition `gpu-<type>` with `--gres=gpu:<type>:N`, same `-A gts-dxu345-rl2 -q inferno`:
| partition | `--gres` type | GPU mem | use for |
|---|---|---|---|
| `gpu-h200` | `h200` | 141 GB | **pi0.5 full fp32 AdamW training** (model+optimizer ≈ 57 GB) |
| `gpu-rtxpro-blackwell` | `rtx_pro_6000_blackwell` | 96 GB | training (newer, fewer nodes) |
| `gpu-h100` | `h100` | 80 GB | training |
| `gpu-l40s` | `l40s` | 48 GB | **pi0.5 eval/inference** (the eval sbatch was originally sized for a 48 GB a40) |
| `gpu-a100` | `a100` | 40 GB | **pi0.5 eval/inference**; training only with `--adam8bit` + small batch |
| `gpu-rtx6000` | `rtx_6000` | 24 GB | small / bf16-only inference — tight for pi0.5, plentiful nodes |
| `gpu-v100` | `v100` | 16 GB | too small to load pi0.5 comfortably — avoid |
Rule of thumb: **training needs ≥80 GB** (h200/h100/blackwell); **eval/inference fits 40–48 GB** (l40s/a100, with room for DINOv2 retrieval + SAPIEN) and those queues schedule far faster than h200. **Each partition caps the CPU:GPU ratio differently** — `gpu-h200` is 8:1, `gpu-l40s` is **4:1** — so `--cpus-per-task` must be ≤ ratio × #GPUs or the job is rejected with "Invalid generic resource (gres) specification" (e.g. 8 CPUs / 1 l40s fails; use 4). Validate a header without queuing via `sbatch --test-only <script>`.
- **Short GPU runs (eval-only, smoke, a few hundred forward passes): export `TORCH_COMPILE_DISABLE=1`.** pi0.5's `sample_actions` triggers a `torch.compile` max-autotune compile on the first call — minutes of warmup that only pays off across a long training run. Disabling it runs eager (slower per call, no warmup), a net win when you're not training for a while. Leave compile ON for real training.
- Python 3.11. Activate the project venv before any Python tooling: `source emimic/bin/activate`.
- Package is installed editable as `egomimic` (see `pyproject.toml`). Linting is `ruff` via pre-commit.
Expand Down
36 changes: 36 additions & 0 deletions egomimic/hydra_configs/data/robotwin_local.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# RoboTwin reusable-corpus data recipe: load converted local .zarr stores (no S3/SQL)
# through the standard MultiDataset pipeline as eva_bimanual cartesian episodes, so
# RoboTwin can cotrain alongside the eva/human corpus. Produce the stores first with
# `egomimic/ricl/scripts/robotwin_to_zarr.py` and point `paths.dataset_dir` at the
# output dir. Mirrors `data/eva.yaml` but uses LocalEpisodeResolver + the PI image
# keymap (`cartesian_pi` -> base_0_rgb / *_wrist_0_rgb) and a no-op filter.
_target_: egomimic.pl_utils.pl_data_utils.MultiDataModuleWrapper

train_datasets:
eva_bimanual:
_target_: egomimic.rldb.zarr.zarr_dataset_multi.MultiDataset._from_resolver
resolver:
_target_: egomimic.rldb.zarr.zarr_dataset_multi.LocalEpisodeResolver
folder_path: ${paths.dataset_dir} # dir of converted RoboTwin <hash>.zarr stores
key_map:
_target_: egomimic.rldb.embodiment.eva.Eva.get_keymap
keymap_mode: cartesian_pi
transform_list:
_target_: egomimic.rldb.embodiment.eva.Eva.get_transform_list
mode: cartesian
filters:
_target_: egomimic.rldb.filters.DatasetFilter
filter_lambdas: [] # load every local .zarr (all are RoboTwin eva_bimanual)
mode: total

valid_datasets:
eva_bimanual: ${train_datasets.eva_bimanual}

train_dataloader_params:
eva_bimanual:
batch_size: 8
num_workers: 4
valid_dataloader_params:
eva_bimanual:
batch_size: 8
num_workers: 4
55 changes: 55 additions & 0 deletions egomimic/ricl/CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,61 @@ resolve `episode_lists/`, `pg_tokenizer/`, `outputs/` relative to the parent dir
builder over the pre-pooled `top_image_embeddings`; runnable `--build-cache`),
`droid_eval.py` (`DroidRiclEval` retrieval vs a *true* zero-context floor,
paired flow-loss; `DroidRiclModelWrapper`).
- RoboTwin integration (joint-space shim mirroring DROID + a reusable Zarr
converter; goal/steps in `robotwin_setup.md`): `robotwin_data.py`
(`RoboTwinCorpus` reads RoboTwin HDF5 — `joint_action/vector` + `endpose` +
`observation/<cam>/rgb` — query dataset, bank provider, within-task LOO cache;
**detects the embodiment's qpos dim + gripper slots** — `aloha-agilex` is 6-DOF
-> 14-D, slots 6/13 — so don't hardcode 16/(7,15)), `robotwin_eval.py`
(`RoboTwinRiclEval`/`RoboTwinRiclModelWrapper`, thin `DroidRiclEval` subclass for
RoboTwin's dim/grippers). Scripts: `scripts/download_robotwin.py`
(HF zip slice from `dataset/<task>/<embodiment>_<setting>_<N>.zip` — the bimanual
embodiment is `aloha-agilex`, smallest is `clean_50` ~230 MB; plus a `--mode
synthetic` fixture for tests), `scripts/robotwin_to_zarr.py` (HDF5 ->
`eva_bimanual` cartesian Zarr via `ZarrWriter`; needs `endpose`; cmd==obs pose,
chunked at load like aria), `scripts/train_robotwin_ricl.py` (`--stage cpu` = fast
data-path/collate smoke with `--embed fake`; `--stage full` = GPU training on a
Lightning `Trainer` — NOT submitit; launch via `scripts/train_robotwin_ricl.sbatch`
[PACE: gpu-h200, qos inferno] and it saves `quantiles.json` beside the checkpoints
for eval), `scripts/build_robotwin_bank_index.py` (consolidated DINOv2 bank index over
a `RoboTwinCorpus`, built with the SAME `.embed` as eval's `OnlineRetriever`; output
format = `build_embedding_index.build_retrieval_index`). Tests:
`tests/{robotwin_data_test.py,robotwin_to_zarr_test.py,robotwin_bank_index_test.py}`
build a synthetic 6-DOF fixture. Reusable-corpus path: `hydra_configs/data/robotwin_local.yaml`
(`LocalEpisodeResolver` + `Eva` keymap/transform, no SQL/S3) +
`tests/robotwin_zarr_multidataset_test.py` (converts fixture -> Zarr -> loads via
`MultiDataset._from_resolver` + replicates trainHydra norm-stats wiring). Closed-loop
eval: `robotwin_adapter.py` (model-free glue — `obs_to_state`,
`quantile_norm`/`unnorm`, `state_to_model_input`, `unnormalize_action`,
`OnlineRetriever`; unit-tested in `tests/robotwin_adapter_test.py` with the model/sim
mocked) and `robotwin_policy.py` — the deploy contract (`encode_obs`/`get_model`/`eval`/
`reset_model` + `PIRiclPolicy` backed by EgoVerse `PIRicl`), the **source of truth**.
RoboTwin loads it via a **thin shim** at `policy/pi_ricl_egoverse/`
(`__init__.py` = `from .deploy_policy import *`; `deploy_policy.py` = `from
egomimic.ricl.robotwin_policy import *`; + `deploy_policy.yml`/`eval.sh`) that lives in
the **`GaTech-RL2/RoboTwin` fork** = the `external/RoboTwin` submodule (matches the
`GaTech-RL2/openpi` pattern; new-cluster setup = `git submodule update --init
external/RoboTwin`). Driven by RoboTwin's SAPIEN `script/eval_policy.py`. The shim's
`deploy_policy.yml` needs four artifacts: `egoverse_checkpoint` (trained `.ckpt`),
`bank_root` (a RoboTwin task dir), `bank_index_dir` (`build_robotwin_bank_index.py`
output), `quantiles_path` (the `quantiles.json` the trainer saves). Sim deps
(sapien/mplib/curobo/pytorch3d) install **selectively** — do NOT run RoboTwin's
`script/_install.sh` (it pins `torch==2.4.1`, breaking emimic's 2.7.1). Eval task
config = `demo_clean` (matches the `clean` data slice).
**Cross-embodiment RICL** (retrieve from embodiment A, predict embodiment B —
arx-x5 and aloha-agilex share the identical 14-D dual-arm layout): `robotwin_data.
build_cross_embodiment_retrieval_cache` (per emb-B query frame, kNN emb-A frames of
the SAME task; no leave-one-out — different corpora) + `train_robotwin_ricl.
build_data_xemb` (gated by `--bank-root`/`--eval-bank-root`; query/target = emb B
with its own quantiles, in-context bank = emb A with ITS own quantiles, so the
spliced demo is encoded in emb-A units). Pass `BANK_ROOT`/`EVAL_BANK_ROOT` to
`train_robotwin_ricl.sbatch` (also `NO_RANDOM=1` — the xemb path builds no
random-control caches). The run saves `bank_quantiles.json` (emb-A) beside
`quantiles.json` (emb-B). Closed-loop: `scripts/eval_xemb_compare.sbatch` (sim
controls emb B; cross-emb retrieves the emb-A bank, passing `bank_quantiles_path`
to `robotwin_policy` so retrieved demos normalize in emb-A units — the one deploy
change cross-embodiment needs). Result: arx-x5 demos cut held-out aloha action
loss ~40% (helps 100% of frames).
- Embedding -> index: embed a corpus with
`egomimic/scripts/embedding_process/zarr_embedding.py` — supports SQL-registry
filters (`--filter-lambda` + `--sync-root`) and writes to a writable mirror
Expand Down
191 changes: 191 additions & 0 deletions egomimic/ricl/robotwin_adapter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
"""Pure-Python glue for the RoboTwin closed-loop RICL eval adapter.

The closed-loop benchmark runs a trained EgoVerse PyTorch ``PIRicl`` checkpoint
inside RoboTwin's SAPIEN sim. This module holds the embodiment/observation/action
**glue** that is independent of both SAPIEN and the model, so it is directly
unit-testable:

- ``obs_to_state`` : RoboTwin obs dict -> ([head, right, left] rgb, qpos vector)
(mirrors ``policy/pi05/deploy_policy.encode_obs``).
- ``quantile_norm`` / ``quantile_unnorm`` : the [-1,1] map used by the shim and its
inverse (for turning a model action back into RoboTwin qpos).
- ``state_to_model_input`` : raw qpos -> normalized, slot-filled 32-D proprio.
- ``unnormalize_action`` : model action (32-D, normalized) -> RoboTwin qpos chunk.
- ``OnlineRetriever`` : embed the live head frame -> kNN against a prebuilt demo
bank -> assemble ``ricl_retrieved_*`` blocks (the same keys ``build_ricl_collate``
produces) for the model prefix.

The RoboTwin-side ``policy/pi_ricl_egoverse/{deploy_policy,pi_model}.py`` is a thin
wrapper that loads the checkpoint + a DINOv2 embedder and drives these primitives.
Quantiles come from the training corpus
(``robotwin_data.RoboTwinCorpus.save_quantiles`` / ``load_quantiles``); the action
space is RoboTwin's native qpos so the adapter returns a ``(steps, state_dim)`` chunk
executed with ``take_action(action)`` in the default ``qpos`` control mode.
"""

from __future__ import annotations

from typing import Callable

import numpy as np
import torch

from egomimic.ricl.robotwin_data import slot32

# RoboTwin obs camera -> RICL image key (head exterior is the retrieval/base view).
_RGB_ORDER = ("head_camera", "right_camera", "left_camera") # matches pi05 encode_obs


def obs_to_state(observation: dict) -> tuple[list[np.ndarray], np.ndarray]:
"""RoboTwin ``get_obs()`` dict -> ``([head, right, left] rgb, qpos vector)``.

Mirrors ``external/RoboTwin/policy/pi05/deploy_policy.encode_obs`` exactly so the
same eval driver works."""
obs = observation["observation"]
rgb = [np.asarray(obs[cam]["rgb"]) for cam in _RGB_ORDER]
state = np.asarray(observation["joint_action"]["vector"], dtype=np.float32)
return rgb, state


def quantile_norm(x: np.ndarray, q01: np.ndarray, q99: np.ndarray) -> np.ndarray:
"""Map raw qpos to [-1, 1] (same formula as ``RoboTwinCorpus.quantile_norm``)."""
x = np.asarray(x, dtype=np.float32)
return 2.0 * ((x - q01) / (q99 - q01 + 1e-6)) - 1.0


def quantile_unnorm(x: np.ndarray, q01: np.ndarray, q99: np.ndarray) -> np.ndarray:
"""Inverse of :func:`quantile_norm`: [-1, 1] -> raw qpos units."""
x = np.asarray(x, dtype=np.float32)
return (x + 1.0) / 2.0 * (q99 - q01 + 1e-6) + q01


def state_to_model_input(
state: np.ndarray, q_state: tuple[np.ndarray, np.ndarray]
) -> np.ndarray:
"""Raw qpos -> normalized, slot-filled 32-D proprio (the model's state input)."""
q01, q99 = q_state
return slot32(quantile_norm(state, q01, q99))


def unnormalize_action(
pred: np.ndarray,
q_actions: tuple[np.ndarray, np.ndarray],
state_dim: int,
) -> np.ndarray:
"""Model action -> RoboTwin qpos chunk.

``pred`` is ``(H, 32)`` or ``(32,)`` normalized (the 32-D shared space, slot-filled);
keep the first ``state_dim`` dims and invert the quantile normalization with the
training corpus's action quantiles. Returns ``(H, state_dim)`` (or ``(state_dim,)``)."""
pred = np.asarray(pred, dtype=np.float32)
q01, q99 = q_actions
sliced = pred[..., :state_dim]
return quantile_unnorm(sliced, q01, q99)


def _to_chw01(img_hwc: np.ndarray, hw: tuple[int, int] = (224, 224)) -> torch.Tensor:
"""HWC uint8 (or float) -> CHW float in [0, 1], resized to ``hw`` (mirrors the
collate's image handling so retrieved frames match the trained representation)."""
t = torch.as_tensor(np.asarray(img_hwc))
if t.ndim == 3 and t.shape[-1] in (1, 3):
t = t.permute(2, 0, 1)
t = t.float()
if t.max() > 1.5: # uint8 -> [0, 1]
t = t / 255.0
if tuple(t.shape[-2:]) != tuple(hw):
t = torch.nn.functional.interpolate(
t[None], size=tuple(hw), mode="bilinear", align_corners=False
)[0]
return t.contiguous()


class OnlineRetriever:
"""Build ``ricl_retrieved_*`` blocks for a live query frame at eval time.

Embeds the query head frame, finds the k nearest demos in a prebuilt bank
(``egomimic.ricl.retrieval.RetrievalIndex`` over the same 64-patch DINOv2
descriptors used in training), and gathers each neighbor's (image, state, action)
via ``bank_provider`` into the batch keys the model prefix consumes — the online
analog of ``build_ricl_collate``. The embedder and index are injected so this is
unit-testable with stubs (no DINOv2 needed).

Args:
index: a ``RetrievalIndex`` (``.query(vecs, k) -> (hashes, frames, dists)``).
bank_provider: ``(episode_hash, frame_idx) -> {"image": HWC, "state": (D,),
"action": (H, D)}`` (e.g. ``robotwin_data.make_robotwin_bank_provider``).
embedder: object with ``.embed(images_thwc) -> (T, EMBED_DIM)`` (e.g.
``retrieval.DinoV2Embedder``); only called on the single query frame.
k, action_horizon, state_dim: block dimensions; image_hw the resize target.
"""

def __init__(
self,
index,
bank_provider: Callable[[str, int], dict],
embedder,
*,
k: int = 4,
action_horizon: int = 15,
state_dim: int = 14,
image_hw: tuple[int, int] = (224, 224),
):
self.index = index
self.bank_provider = bank_provider
self.embedder = embedder
self.k = int(k)
self.action_horizon = int(action_horizon)
self.state_dim = int(state_dim)
self.image_hw = image_hw

def retrieve(self, query_image_hwc: np.ndarray) -> dict:
"""Return ``ricl_retrieved_*`` tensors (batch dim 1) for one query frame."""
H, W = self.image_hw
k, Ha, D = self.k, self.action_horizon, self.state_dim
qvec = self.embedder.embed(np.asarray(query_image_hwc)[None]) # (1, EMBED_DIM)
hashes, frames, dists = self.index.query(qvec, k=k) # each (1, k)
hashes, frames, dists = hashes[0], frames[0], dists[0]

imgs = [torch.zeros(3, H, W) for _ in range(k)]
states = [torch.zeros(D) for _ in range(k)]
acts = [torch.zeros(Ha, D) for _ in range(k)]
mask = torch.zeros(k, dtype=torch.bool)
dist = torch.full((k,), float("inf"))
for i in range(k):
h = str(hashes[i])
if not h: # padded slot (bank smaller than k)
continue
blk = self.bank_provider(h, int(frames[i]))
imgs[i] = _to_chw01(blk["image"], self.image_hw)
st = torch.as_tensor(np.asarray(blk["state"]), dtype=torch.float32).reshape(
-1
)
states[i] = _fit(st, D)
ac = torch.as_tensor(np.asarray(blk["action"]), dtype=torch.float32)
acts[i] = _fit_action(ac, Ha, D)
mask[i] = True
dist[i] = float(dists[i])
return {
"ricl_retrieved_images": torch.stack(imgs)[None], # (1, k, 3, H, W)
"ricl_retrieved_state": torch.stack(states)[None], # (1, k, D)
"ricl_retrieved_action": torch.stack(acts)[None], # (1, k, Ha, D)
"ricl_retrieved_mask": mask[None], # (1, k)
"ricl_retrieved_dist": dist[None], # (1, k)
}


def _fit(v: torch.Tensor, dim: int) -> torch.Tensor:
if v.numel() == dim:
return v
out = torch.zeros(dim, dtype=torch.float32)
n = min(dim, v.numel())
out[:n] = v.reshape(-1)[:n]
return out


def _fit_action(a: torch.Tensor, horizon: int, dim: int) -> torch.Tensor:
if a.ndim == 1:
a = a[None, :]
h = min(horizon, a.shape[0])
out = torch.zeros(horizon, dim, dtype=torch.float32)
out[:h, : min(dim, a.shape[1])] = a[:h, : min(dim, a.shape[1])]
return out
Loading