Vision-Language-Action System for Bimanual Robotic LEGO Assembly
Features • Installation • Quick Start • Documentation • Contributing
VLA-LEGO is a Master's thesis research project developing a Vision-Language-Action (VLA) system for robotic bimanual manipulation. The project replicates and extends the EO-1 model architecture for coordinated two-arm assembly tasks on the IHMC Alex humanoid robot.
This research is conducted as part of an exchange program between Politecnico di Milano and Purdue University, under the supervision of Prof. Eugenio Culurciello and Prof. Marcello Restelli.
- Replicate the EO-1 Vision-Language-Action architecture
- Extend the model for bimanual manipulation tasks
- Evaluate on LIBERO benchmark (Spatial, Object, Goal, Long subsets)
- Deploy on IHMC Alex humanoid robot for LEGO assembly
- EO-1 Architecture — Unified decoder-only transformer with Qwen 2.5 VL backbone (3B parameters), combining discrete autoregressive decoding with continuous flow matching
- Bimanual Manipulation — Coordinated two-arm control for assembly tasks on IHMC Alex
- LIBERO Evaluation — Comprehensive benchmark evaluation across spatial, object, goal, and long-horizon tasks
- Hydra Configuration — Hierarchical, composable configs with CLI overrides
- Distributed Training — PyTorch DDP with NCCL backend, optimized for 8x A100 GPUs
- HPC Ready — Pre-configured for Slurm clusters with container support (Docker + Apptainer)
- Experiment Tracking — Weights & Biases integration for logging and visualization
- Reproducible — Deterministic training with seed control and checkpoint management
┌─────────────────────────────────────────────────────────────────────────┐
│ VLA-LEGO Pipeline (EO-1) │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────────┐ ┌─────────────────────┐ │
│ │ Vision │ │ Language │ │ Action │ │
│ │ Encoder │───▶│ Reasoning │───▶│ Generation │ │
│ │ │ │ │ │ │ │
│ │ Qwen 2.5 │ │ Interleaved │ │ Autoregressive + │ │
│ │ VL (3B) │ │ Vision-Text │ │ Flow Matching │ │
│ └─────────────┘ └─────────────────┘ └─────────────────────┘ │
│ │
│ Input: RGB Images + Language Instructions │
│ Output: Continuous Action Trajectories (Bimanual) │
│ │
└─────────────────────────────────────────────────────────────────────────┘
- Python 3.10+
- CUDA 11.8+ (for GPU training)
- Git
# Clone the repository
git clone https://github.com/PatrizioAcquadro/VLA-LEGO_Project.git
cd VLA-LEGO_Project
# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate # Linux/macOS
# .venv\Scripts\activate # Windows
# Install in development mode
pip install -e ".[dev]"
# Set up pre-commit hooks
pre-commit install# Quick smoke test (100 steps, small batch)
python -m train.trainer trainer=debug cluster=local# Train with base model configuration
python -m train.trainer cluster=local
# Train with large model on GPU cluster
python -m train.trainer model=large cluster=gilbrethAll settings can be overridden from the command line:
python -m train.trainer \
model=large \
trainer.optimizer.lr=1e-5 \
trainer.training.batch_size_per_device=16 \
trainer.training.max_steps=50000VLA-LEGO uses Hydra for configuration management. All configs are in configs/:
| Config Group | Options | Description |
|---|---|---|
model |
base, large |
Model architecture settings |
trainer |
default, debug |
Training hyperparameters |
data |
default |
Dataset and dataloader settings |
cluster |
local, gilbreth |
Cluster-specific settings |
logging |
wandb |
Experiment tracking |
# Combine multiple config overrides
python -m train.trainer \
model=large \
trainer=default \
cluster=local \
experiment.seed=123 \
trainer.optimizer.lr=5e-5VLA-LEGO_Project/
├── configs/ # Hydra configuration files
│ ├── config.yaml # Main config (composes defaults)
│ ├── model/ # Model architectures (base, large)
│ ├── trainer/ # Training settings (default, debug)
│ ├── data/ # Dataset configuration
│ ├── cluster/ # Cluster settings (local, gilbreth)
│ └── logging/ # W&B integration
├── data/ # Data loading and processing
│ ├── dataset.py # Dataset classes
│ └── loader.py # DataLoader utilities
├── models/ # Model implementations
│ ├── transformer.py # TransformerModel
│ └── utils.py # Model utilities
├── train/ # Training code
│ └── trainer.py # Main Trainer class
├── eval/ # Evaluation code
├── scripts/ # Utility scripts
├── tests/ # Test suite
├── docs/ # Documentation
├── Dockerfile # Container definition
└── apptainer.def # Singularity/Apptainer definition
# Run all tests
pytest
# Run with coverage report
pytest --cov=. --cov-report=html
# Run specific test file
pytest tests/test_models.py -v# Format code
black .
isort .
# Lint
ruff check .
# Type checking
mypy sim models train eval --ignore-missing-imports
# Run all checks (pre-commit)
pre-commit run --all-files# Validate all config combinations
python scripts/validate_configs.py# Submit training job
sbatch scripts/train.sh
# Interactive session
sinteractive -A <account> -n 1 -g 1 -t 4:00:00
# Load container and run
apptainer exec --nv vla-lego.sif python -m train.trainer cluster=gilbrethSee docs/git-workflow.md for detailed cluster instructions.
Container images contain only dependencies (CUDA, Python, PyTorch, etc.). Your code is bind-mounted at runtime from your git checkout.
- No rebuilds for code changes —
git pullupdates your code instantly - Reproducibility — run = (git commit) + (image digest/tag)
- Smaller images — no repo code baked in
# Using wrapper script (recommended)
./scripts/docker-run.sh python -m train.trainer --help
./scripts/docker-run.sh python -m train.trainer trainer=debug cluster=local
# Or directly with docker
docker run --rm -it --gpus all \
-v $(pwd):/workspace \
ghcr.io/patrizioacquadro/vla-lego_project:latest \
python -m train.trainer cluster=local# Download image once (or use release artifact)
apptainer pull vla-lego.sif docker://ghcr.io/patrizioacquadro/vla-lego_project:latest
# Using wrapper script (recommended)
./scripts/apptainer-run.sh python -m train.trainer cluster=gilbrethEach container run prints git commit, Python version, and GPU info. This output is saved to /tmp/vla_run_info.txt inside the container. Record this for experiment tracking:
=== VLA-LEGO Container Run ===
Timestamp: 2024-01-15T10:30:00+00:00
Python: Python 3.10.12
Git commit: abc1234...
Git branch: main
Git dirty: 0 files
PyTorch: 2.2.0
CUDA: True
GPU: NVIDIA A100-SXM4-80GB
==============================
| Document | Description |
|---|---|
| docs/git-workflow.md | Git branching and workflow guide |
| CONTRIBUTING.md | Contribution guidelines |
- EO-1 architecture implementation (Qwen 2.5 VL backbone)
- Data pipeline for LIBERO benchmark
- Flow matching action head integration
- Bimanual action space extension
- LIBERO benchmark evaluation
- IHMC Alex deployment
Contributions are welcome! Please read our Contributing Guidelines before submitting a PR.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'feat: add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License — see the LICENSE file for details.
- Politecnico di Milano — Primary institution
- Purdue University — Exchange program host
- Prof. Eugenio Culurciello — Purdue University
- Prof. Marcello Restelli — Politecnico di Milano
- EO-1: A Unified Model for Embodied AI — Base architecture
- Lerobot — Training framework
- LIBERO — Evaluation benchmark
- PyTorch — Deep learning framework
- Hydra — Configuration management
- Weights & Biases — Experiment tracking
Author: Patrizio Acquadro