Motivation
ACT (Zhao et al., 2023) is the dominant policy for low-cost real robot arms (Aloha, SO-ARM100, Lekiwi) and has become the standard entry point for the open-source hardware community. Without it, TorchRL is invisible to this growing segment. The existing DecisionTransformer is architecturally different (causal sequence modelling vs. CVAE + chunk prediction) and cannot substitute.
This PR adds the full ACT stack: model backbone, loss module, and SOTA training script using the OpenX dataset loader already in the codebase.
Changes
torchrl/modules/models/act.py — new ACTModel
CVAE encoder: [CLS, action_tokens, obs_token] → z_mu, z_log_var
DETR-style Transformer decoder: learned action queries attend to [obs_token, z_token] → action chunk of shape (chunk_size, action_dim)
Training mode: pass action_chunk to activate the encoder and sample z
Inference mode: omit action_chunk; model defaults to z = 0 (prior mean)
torchrl/objectives/act.py — new ACTLoss
loss = L1(action_pred, action_chunk) + kl_weight * KL(q(z|o,a) || N(0,I))
Returns loss_act, loss_reconstruction, loss_kl separately for logging
kl_weight (β) defaults to 10.0 per the original paper
torchrl/modules/models/init.py — registers ACTModel
torchrl/objectives/init.py — registers ACTLoss
test/test_objectives.py — TestACTLoss covering forward/backward, KL weighting, inference-mode zeros
sota-implementations/act/ — training script + Hydra config using OpenXExperienceReplay
docs/ — reference entries for ACTModel and ACTLoss
Motivation
ACT (Zhao et al., 2023) is the dominant policy for low-cost real robot arms (Aloha, SO-ARM100, Lekiwi) and has become the standard entry point for the open-source hardware community. Without it, TorchRL is invisible to this growing segment. The existing DecisionTransformer is architecturally different (causal sequence modelling vs. CVAE + chunk prediction) and cannot substitute.
This PR adds the full ACT stack: model backbone, loss module, and SOTA training script using the OpenX dataset loader already in the codebase.
Changes
torchrl/modules/models/act.py — new ACTModel
CVAE encoder: [CLS, action_tokens, obs_token] → z_mu, z_log_var
DETR-style Transformer decoder: learned action queries attend to [obs_token, z_token] → action chunk of shape (chunk_size, action_dim)
Training mode: pass action_chunk to activate the encoder and sample z
Inference mode: omit action_chunk; model defaults to z = 0 (prior mean)
torchrl/objectives/act.py — new ACTLoss
loss = L1(action_pred, action_chunk) + kl_weight * KL(q(z|o,a) || N(0,I))
Returns loss_act, loss_reconstruction, loss_kl separately for logging
kl_weight (β) defaults to 10.0 per the original paper
torchrl/modules/models/init.py — registers ACTModel
torchrl/objectives/init.py — registers ACTLoss
test/test_objectives.py — TestACTLoss covering forward/backward, KL weighting, inference-mode zeros
sota-implementations/act/ — training script + Hydra config using OpenXExperienceReplay
docs/ — reference entries for ACTModel and ACTLoss