This repository contains the full lifecycle of an internship research project focused on Generalizing Vision-Language-Action (VLA) models and implementing Temporal-Difference Quality Calibration (TDQC) for proactive failure detection in robotic manipulation.
The project is divided into two major tracks:
- Literature & Structured Research: Mapping the state-of-the-art in unseen object exploration, world models, and VLA uncertainty.
- Implementation (Phase 1 & 2): Fine-tuning the SimVLA model on LIBERO datasets and developing a standalone LSTM-based failure calibrator.
00_subjects/: Official internship briefs and requirements.02_search_strategy/&03_search_runs/: Comprehensive literature search history and paper shortlists.04_structured_research/: Deep-dive analysis and field schemas for "Unseen Object Exploration" and "World Models."06_papers/: Local repository of key PDF papers and reading manifests.
SimVLA/: The base VLA model repository (SmolVLM backbone).envs/simvla/: Dedicated Conda environment (Python 3.10, PyTorch, CUDA 12.4).phase2_tdqc_standalone/: [ACTIVE] Consolidated failure detection project.config/&data/: Simulation settings and LIBERO datasets.
We have successfully completed the training of the Phase 2 TDQC LSTM Calibrator.
- Model: 1-layer, 128-unit LSTM.
- Status: Finalized (Stage 5 Polish).
- Checkpoint:
intern_ship_ws/phase2_tdqc_standalone/results/checkpoints/lstm_td0_final_polish_v2/best.pt - Primary Metric: Global Brier Score of 0.0823.
If you are resuming this project in a new session:
- Check
intern_ship_ws/phase2_tdqc_standalone/README.mdfor the failure detection metrics. - Activate the environment:
source intern_ship_ws/activate_simvla.sh. - Ensure
PYTHONPATHincludesintern_ship_ws/phase2_tdqc_standalone/code.