Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
250 commits
Select commit Hold shift + click to select a range
ecc884c
feat: add MuJoCo Playground env backend with GPU acceleration
kengz Mar 6, 2026
7877cf4
feat: add playground dstack config and --playground flag for remote runs
kengz Mar 6, 2026
dcdbd28
docs: add Phase 5 Playground benchmark table to BENCHMARKS.md
kengz Mar 6, 2026
9ff4b5f
fix: skip random baseline generation for playground/ envs
kengz Mar 6, 2026
830c0fa
feat: add live rendering for MuJoCo Playground environments
kengz Mar 6, 2026
a7dcd8e
feat: GPU-native replay buffer and torch obs normalization
kengz Mar 6, 2026
0d7cc04
feat: Phase 5 playground spec refactor and dstack unification
kengz Mar 6, 2026
48d52c0
fix: dstack env var format XLA_PYTHON_CLIENT_PREALLOCATE=false (not d…
kengz Mar 6, 2026
da60074
fix: remove device:cuda from ppo_playground_arc (OnPolicyBatchReplay …
kengz Mar 6, 2026
efe0b18
fix: to_torch_batch handles list-of-tensors (GPU-native on-policy rol…
kengz Mar 6, 2026
883f570
feat: auto-resolve playground env device (like net gpu:auto) + logging
kengz Mar 6, 2026
050f4c5
fix: move state to net device in calc_pdparam (GPU-env CPU tensor fix)
kengz Mar 6, 2026
10dc4ae
feat: Phase 5 PendulumSwingup PPO benchmark (35.74)
kengz Mar 6, 2026
e2b1f4c
feat: Phase 5 CartpoleBalance PPO benchmark (709.15)
kengz Mar 6, 2026
8df7ba0
feat: Phase 5 CheetahRun SAC benchmark (112.59)
kengz Mar 6, 2026
c75da66
feat: Phase 5 FingerSpin SAC benchmark (251.38)
kengz Mar 6, 2026
5f10b88
feat: Phase 5 CartpoleSwingup SAC benchmark (193.71)
kengz Mar 6, 2026
b12d79d
feat: Phase 5 PointMass PPO benchmark (493.79) + ReacherEasy PPO (285…
kengz Mar 6, 2026
dbeb8d9
feat: Phase 5 AcrobotSwingup SAC benchmark (3.46)
kengz Mar 6, 2026
3c85242
feat: Phase 5 AcrobotSwingup SAC baseline (3.46) + new playground spe…
kengz Mar 6, 2026
4a7021c
feat: Phase 5 WalkerWalk SAC (883.86) + WalkerStand SAC (934.20)
kengz Mar 6, 2026
740e0ad
feat: Phase 5 CheetahRun PPO baseline (53.61)
kengz Mar 6, 2026
cf3ff57
feat: Phase 5 WalkerWalk PPO baseline (73.21)
kengz Mar 6, 2026
c247581
feat: add sac_playground_arc_utd1 spec (UTD=1.0, 4 envs) for hard DM …
kengz Mar 6, 2026
c5f5134
feat: add sac_playground_arc_datarich spec for high-fps playground SAC
kengz Mar 6, 2026
b07cc0f
fix: JAX CUDA install in dstack + DLPack safety + data-rich SAC spec
kengz Mar 6, 2026
416cb31
feat: Phase 5 CartpoleSwingup SAC baseline (371.40)
kengz Mar 6, 2026
8a9a6ef
feat: Phase 5 HopperStand SAC baseline (277.46)
kengz Mar 6, 2026
372c943
fix: UTD=1.0 spec — use max_frame=1M (6h dstack wall at ~60fps)
kengz Mar 6, 2026
935e41a
feat: Phase 5 CartpoleBalance PPO baseline (918.12)
kengz Mar 6, 2026
7a7e350
feat: Phase 5 FingerSpin SAC baseline (327.57)
kengz Mar 6, 2026
699af60
feat: Phase 5 FingerTurnEasy SAC baseline (408.73)
kengz Mar 6, 2026
c7278d2
docs: add FPS, Frames, Wall Clock, D4PG Target, Hardware to Phase 5 t…
kengz Mar 6, 2026
a40ed8c
feat: Phase 5 FishSwim SAC baseline (98.57)
kengz Mar 6, 2026
02477a9
feat: Phase 5 ReacherHard SAC (958.57) + FingerTurnHard SAC (198.22)
kengz Mar 7, 2026
409fd02
feat: Phase 5 CartpoleBalanceSparse PPO (504.59) + BallInCup SAC (158…
kengz Mar 7, 2026
61413ac
feat: Phase 5 WalkerRun SAC baseline (302.88)
kengz Mar 7, 2026
4edc9a0
fix: correct playground env names in Phase 5 BENCHMARKS.md
kengz Mar 7, 2026
d333e39
docs: add AcrobotSwingup UTD=1 (~61 est.) and HopperHop UTD=1 (~9.7 e…
kengz Mar 7, 2026
e696122
chore: purge bad playground specs (baseline/fast-ppo/tunable-utd)
kengz Mar 7, 2026
b8228a4
feat: Phase 5 SwimmerSwimmer6 SAC baseline (137.46)
kengz Mar 7, 2026
2487a6e
feat: Phase 5 HopperHop SAC baseline (0.00)
kengz Mar 7, 2026
360335b
feat: Phase 5 HumanoidWalk SAC baseline (10.33), HumanoidRun SAC base…
kengz Mar 7, 2026
ad8f6bb
refactor: consolidate playground SAC specs into single sac_playground…
kengz Mar 7, 2026
096b69a
feat: Phase 5 AcrobotSwingup SAC UTD=1 (46.28), AcrobotSwingupSparse …
kengz Mar 7, 2026
ae36c8f
feat: Phase 5 CartpoleSwingupSparse SAC UTD=1 (0.38), HopperHop SAC U…
kengz Mar 7, 2026
c111538
feat: Phase 5 HumanoidStand SAC (26.44), HumanoidRun SAC 4M (7.90), P…
kengz Mar 7, 2026
ad5bc7b
docs: add Phase 5 DM Control plot grid to BENCHMARKS.md
kengz Mar 7, 2026
60af065
fix: unescape plot image syntax in Phase 5 BENCHMARKS.md grid
kengz Mar 7, 2026
750a16b
refactor: remove deprecated utd1 spec from sac_playground_arc.yaml
kengz Mar 7, 2026
e68ac18
docs: sort Phase 5 DM Control table and plot grid alphabetically
kengz Mar 7, 2026
df54f54
fix: regenerate all Phase 5 playground plots with exact correct folde…
kengz Mar 7, 2026
41ac218
docs: tighten plot intake instructions, add Phase 5 priority strategy
kengz Mar 7, 2026
2a6e450
docs: tighten plot generation rules — use only exact HF Data folders
kengz Mar 7, 2026
5c2cefe
feat: Phase 5 BallInCup SAC 2M (369.78), up from 158.88
kengz Mar 7, 2026
a41bd0c
feat: Phase 5 BallInCup SAC 2M plot update (369.78)
kengz Mar 7, 2026
e9c7431
docs: restructure Phase 5 DM Control table to per-algo rows (Atari st…
kengz Mar 7, 2026
520fc5b
docs: restructure Locomotion and Manipulation tables to per-algo rows
kengz Mar 7, 2026
f589951
feat: add crossq_playground_arc.yaml spec for JAX/MJX playground envs
kengz Mar 7, 2026
707fe7e
feat: Phase 5 PendulumSwingup PPO 4M (342.23), up from 35.74
kengz Mar 7, 2026
4b30f53
feat: Phase 5 WalkerRun SAC 4M (604.51), up from 302.88
kengz Mar 7, 2026
37863ea
feat: Phase 5 AcrobotSwingup PPO 4M (29.53)
kengz Mar 7, 2026
1fd6bb6
fix: remove duplicate CartpoleBalance plot in Phase 5 grid, reflow 3-…
kengz Mar 7, 2026
9c3826a
feat: Phase 5 CartpoleSwingupSparse PPO 4M (545.07)
kengz Mar 7, 2026
626a5eb
feat: Phase 5 FingerSpin PPO 4M (416.99), FingerTurnEasy PPO 4M (346.28)
kengz Mar 7, 2026
26ed254
feat: Phase 5 FingerTurnHard PPO 4M (328.15)
kengz Mar 7, 2026
eb7a507
feat: Phase 5 CartpoleSwingup PPO 4M (725.55)
kengz Mar 7, 2026
9627057
feat: Phase 5 FishSwim PPO 4M (81.68)
kengz Mar 7, 2026
95e9460
feat: Phase 5 BallInCup PPO 4M (898.71)
kengz Mar 7, 2026
8e84e3e
feat: Phase 5 CrossQ batch - CartpoleSwingup (324.11), HopperStand (1…
kengz Mar 8, 2026
1b0347b
feat: Phase 5 PPO plots - WalkerStand, ReacherHard, HopperStand, Hopp…
kengz Mar 8, 2026
8972b4d
feat: add ppo_playground_arc_loco spec for hard locomotion envs
kengz Mar 8, 2026
5b584d3
feat: Phase 5 AcrobotSwingup CrossQ 2M (1.96)
kengz Mar 8, 2026
ed004bd
feat: Phase 5 CartpoleSwingupSparse CrossQ 2M (0.00)
kengz Mar 8, 2026
d6c6c30
feat: Phase 5 HumanoidWalk CrossQ 1M (4.94)
kengz Mar 8, 2026
51b2865
feat: Phase 5 HopperHop CrossQ 2M (0.14)
kengz Mar 8, 2026
679d9b4
feat: Phase 5 CrossQ FishSwim (89.72), HopperHop (0.24), HumanoidWalk…
kengz Mar 8, 2026
8536162
feat: Phase 5 CrossQ HumanoidRun (1.96) + updated plots
kengz Mar 8, 2026
fea9f5d
feat: Phase 5 HumanoidStand CrossQ 1M (9.99)
kengz Mar 8, 2026
5341b14
feat: Phase 5 CrossQ BallInCup (247.26), HumanoidStand (9.99) + plots
kengz Mar 8, 2026
69ef0f4
feat: Phase 5 FingerTurnHard CrossQ 2M (312.10)
kengz Mar 8, 2026
3c9c05b
feat: Phase 5 WalkerStand CrossQ 2M (731.10)
kengz Mar 8, 2026
d768f7c
feat: Phase 5 CrossQ WalkerStand (903.27), FingerTurnHard (330.96) + …
kengz Mar 8, 2026
0c58501
feat: Phase 5 ReacherHard CrossQ 2M (810.80)
kengz Mar 8, 2026
aa1371a
feat: Phase 5 CrossQ ReacherHard (921.27) + plot
kengz Mar 8, 2026
ef34990
feat: Phase 5 CrossQ WalkerWalk (919.08) + plot
kengz Mar 8, 2026
5a16e88
feat: Phase 5 CrossQ ReacherEasy (948.75) + plot
kengz Mar 8, 2026
b775539
feat: Phase 5 WalkerStand PPO loco (304.58) + updated plot
kengz Mar 8, 2026
dbdd018
feat: Phase 5 HopperHop plot updated with PPO loco overlay
kengz Mar 8, 2026
93e4b04
feat: Phase 5 BarkourJoystick PPO plot (0.00)
kengz Mar 8, 2026
6a44131
feat: Phase 5 HumanoidStand PPO loco (16.07) + BarkourJoystick PPO (0…
kengz Mar 8, 2026
6c63e04
feat: Phase 5 HumanoidWalk PPO loco (7.71) + Go1JoystickFlat (0.00)
kengz Mar 8, 2026
9c4d43a
feat: Phase 5 Apollo PPO (-1.88) robotics loco results
kengz Mar 8, 2026
5e4b5de
feat: Phase 5 Robotics PPO loco batch (10 envs) - all negative results
kengz Mar 8, 2026
0a9fda3
feat: Phase 5 Locomotion Robots plot grid (11 envs) + Apollo plot
kengz Mar 8, 2026
20a5b30
fix: pre-clone mujoco_menagerie before sessions start to prevent race…
kengz Mar 8, 2026
243a8c1
feat: Phase 5 HumanoidRun PPO (2.32) + updated plot
kengz Mar 8, 2026
3de4833
feat: Phase 5 DM Control gap fills + plot fixes (9 new entries)
kengz Mar 9, 2026
0eb2ff2
fix: loco robot plots batch 1 — 4-session reruns (Barkour, H1x2, Op3,…
kengz Mar 9, 2026
e521e40
feat: Phase 5 Manipulation (8 envs) + SwimmerSwimmer6 PPO update
kengz Mar 9, 2026
b056da3
fix: loco robot plots batch 2 — SpotGetup, T1, AlohaSinglePegInsertio…
kengz Mar 9, 2026
e7338ca
refactor: move playground specs to benchmark_arc/{algo}/ per convention
kengz Mar 9, 2026
81217e5
fix: manip plots batch 3 — PandaPickCube, LeapCubeRotateZAxis 4-sessi…
kengz Mar 9, 2026
7ceb79a
fix: loco plots batch 3 — Go1Handstand 4-session rerun
kengz Mar 9, 2026
23f655f
fix: manip plots batch 4 — PandaOpenCabinet, PandaPickCubeOrientation…
kengz Mar 9, 2026
f8fe992
fix: loco robot plots batch 3 — Go1Footstand, JoystickFlat/Rough, Spo…
kengz Mar 9, 2026
5250286
fix: DM Control gap fills batch 1 — CartpoleSwingup SAC 538, FingerSp…
kengz Mar 10, 2026
161cc62
docs: rename Phase 5 subsections to 5.1 DM Control, 5.2 Locomotion, 5…
kengz Mar 10, 2026
b841eee
fix: clean up Phase 5.2/5.3 tables — PPO loco→PPO, remove empty SAC/C…
kengz Mar 10, 2026
0bbeaef
docs: add Phase 5 spec file table and corrected repro commands with -…
kengz Mar 10, 2026
c19a37f
docs: update Phase 5.2/5.3 targets from N/A to dash (no published bas…
kengz Mar 10, 2026
d8f2fb3
docs: align Phase 5 format with Phase 3 standard
kengz Mar 10, 2026
6464a38
fix: Phase 5 consistency audit corrections
kengz Mar 10, 2026
14acb73
refactor: merge ppo_playground_arc_loco into ppo_playground_arc
kengz Mar 10, 2026
ff8a0c0
fix: regenerate DM Control plots with CrossQ+SAC — all algos now shown
kengz Mar 10, 2026
75e1edd
refactor: unify Phase 5 playground specs and audit BENCHMARKS.md
kengz Mar 10, 2026
39e32d7
feat: intake CrossQ CartpoleSwingup score=580.89 (4M frames)
kengz Mar 10, 2026
75e901d
feat: intake SAC FingerSpin score=472.65 (5M frames)
kengz Mar 10, 2026
8d7035f
feat: intake CrossQ CheetahRun score=656.92 (4M frames)
kengz Mar 10, 2026
25c0b28
feat: intake SAC FingerTurnEasy score=407.10 (4M frames)
kengz Mar 10, 2026
f32a84a
feat: intake SAC FingerTurnHard=227.83, CrossQ BallInCup=948.62
kengz Mar 10, 2026
e21ee3e
feat: intake SAC WalkerStand=966.84, SAC HopperStand=194.76
kengz Mar 10, 2026
06bd030
feat: intake PPO ReacherEasy=790.13, CrossQ HumanoidStand=13.05
kengz Mar 10, 2026
f871c1d
feat: intake PPO HumanoidWalk=7.47
kengz Mar 10, 2026
093acb0
docs: add frame budget rules to benchmark skill and BENCHMARKS.md rep…
kengz Mar 10, 2026
8dcc90f
fix: regenerate ReacherEasy and HumanoidStand plots with canonical fo…
kengz Mar 11, 2026
d3ee549
fix: remove duplicate PPO row from HumanoidStand, keep loco variant (…
kengz Mar 11, 2026
d6153fb
fix: regenerate 14 Phase 5.1 DM Control plots from canonical table fo…
kengz Mar 11, 2026
49f7e17
fix: regenerate 8 Phase 5.1 plots with canonical folders (stale Cross…
kengz Mar 11, 2026
4f5ce3d
docs: mark Phase 5 benchmarks as pre-MJWarp baselines pending rerun
kengz Mar 11, 2026
29ecbbd
feat: enable MJWarp backend in PlaygroundVecEnv via impl='warp'
kengz Mar 11, 2026
c03a40b
feat(env): MJWarpVecEnv — Warp GPU backend for MuJoCo Playground
kengz Mar 11, 2026
f41a6d4
Revert "feat(env): MJWarpVecEnv — Warp GPU backend for MuJoCo Playgro…
kengz Mar 11, 2026
febb94a
fix: detect CUDA via torch for MJWarp impl selection; add MJX_GPU_DEF…
kengz Mar 11, 2026
afe4211
fix: use JAX device check for MJWarp detection; remove conflicting ja…
kengz Mar 11, 2026
83e189b
fix: clean up JAX CUDA warning and remove ineffective extra jax install
kengz Mar 11, 2026
1611c8c
docs: update Phase 5 MJWarp note — CPU MJX is valid baseline, no reru…
kengz Mar 11, 2026
93bd4d3
chore: pin jax[cuda12]==0.5.3 and add CUDA_VISIBLE_DEVICES=0 for play…
kengz Mar 11, 2026
817df15
fix: install playground[cuda] from git HEAD for MJWarp GPU support
kengz Mar 11, 2026
0ce121d
fix: use PyPI playground[cuda] to avoid dev mujoco macOS conflict
kengz Mar 11, 2026
e6aca18
fix: correct truncation extraction and DLPack API in PlaygroundVecEnv
kengz Mar 11, 2026
b87fadd
refactor: use impl=warp uniformly, unify dstack YAML
kengz Mar 11, 2026
dd82cb8
refactor: remove dead --playground flag, unify search YAML
kengz Mar 11, 2026
7e85978
docs: update playground docstring and BENCHMARKS.md for unified MJWarp
kengz Mar 11, 2026
0760e9a
feat: MJWarp docs, updated PPO playground hparams, Phase 5 frame budgets
kengz Mar 11, 2026
2b100f7
refactor: clean up Phase 5 playground specs and BENCHMARKS.md
kengz Mar 11, 2026
2a71d15
docs: Phase 5 benchmark instructions and spec name fixes
kengz Mar 11, 2026
6340d81
chore: reset Phase 5 tables for MJWarp rerun
kengz Mar 11, 2026
b50a9c8
chore: round Phase 5 target scores to clean values
kengz Mar 11, 2026
c038655
docs: add GPU utilization check section to benchmark skill
kengz Mar 11, 2026
5887418
chore: remove stale design/planning docs superseded by implementation
kengz Mar 11, 2026
da19c8d
fix: add missing shared key to ppo_playground net_spec
kengz Mar 11, 2026
3a62ed9
fix: log random baseline fallback once per env at INFO, not WARNING spam
kengz Mar 11, 2026
3d2d4db
chore: bump playground log/eval_frequency to 100K for MJWarp high-fps…
kengz Mar 11, 2026
05d7b2a
fix: move random baseline missing log to Session init (once per trial…
kengz Mar 11, 2026
8e65116
feat: intake PPO pilot runs — CartpoleBalance 319.77, CheetahRun 51.2…
kengz Mar 11, 2026
83a8f16
fix: auto-detect njmax for warp backend when env default is 0 (fixes …
kengz Mar 11, 2026
19b0a19
feat: intake PPO batch 1+2 — CartpoleBalance 876, CheetahRun 511, Car…
kengz Mar 11, 2026
cb7ce79
feat: update CheetahRun plot with 100M frame run (511.27)
kengz Mar 11, 2026
cea83f2
feat: intake WalkerWalk 613.47 (100M frames)
kengz Mar 11, 2026
182b377
feat: intake HopperStand 2.87 (converged, 100M) — note: far below 650…
kengz Mar 11, 2026
0a5a48c
feat: intake ReacherEasy 964.20, ReacherHard 957.94, WalkerStand 781.26
kengz Mar 11, 2026
a498763
feat: intake BallInCup 718.66, WalkerRun 223.46, FingerTurnHard 415.89
kengz Mar 11, 2026
939540b
feat: intake FingerTurnEasy 528.73
kengz Mar 11, 2026
f03f8e0
feat: intake batch 4 + update targets/status from ref plots
kengz Mar 11, 2026
d171bef
feat: intake HumanoidStand 16.23 (❌ vs 700 ref), regenerate plots, up…
kengz Mar 11, 2026
df92326
feat: intake HumanoidRun 3.81, HumanoidWalk 9.20, AcrobotSwingupSpars…
kengz Mar 11, 2026
ea154df
feat: add ppo_playground_v2/loco_v2 — [256,256] network + linear lr/c…
kengz Mar 11, 2026
1f75b48
fix: increase UV_HTTP_TIMEOUT to 300s in dstack YAMLs
kengz Mar 11, 2026
89c63c1
feat: intake PPO v2 FingerTurnHard (455, ❌) + FingerSpin (501, ⚠️)
kengz Mar 12, 2026
6d9d0a6
feat: intake PPO v2 CheetahRun (663, ⚠️) + HumanoidStand (29, ❌)
kengz Mar 12, 2026
b6d6004
refactor: canonicalize ppo_playground spec — [512,512]+ELU, lr/clip/e…
kengz Mar 12, 2026
1f924e3
feat: intake PPO WalkerRun (482, ⚠️) — 2x improvement vs v1
kengz Mar 12, 2026
6162f9d
feat: intake PPO FingerTurnEasy (594, ❌) — +12% vs v1, still climbing
kengz Mar 12, 2026
795ca39
feat: intake PPO FishSwim (215, ❌) + HopperStand (224, ✅) + HumanoidR…
kengz Mar 12, 2026
262ff49
feat: intake PPO HumanoidStand (30, ❌) — 3 iters v1/v2/v3, still far …
kengz Mar 12, 2026
c1ebf05
refactor: apply sample efficiency fixes to ppo_playground spec
kengz Mar 12, 2026
8584e15
feat: intake PPO HopperStand (203, ✅) — loco spec 3x above target
kengz Mar 12, 2026
6e03442
fix: correct MA scores and spec names in BENCHMARKS.md
kengz Mar 12, 2026
3ec6647
feat: intake PPO v3 HumanoidRun (8.95, ❌) — +110% vs v2, still climbing
kengz Mar 12, 2026
8b45f99
feat: intake PPO v4 HumanoidStand (30.04, ❌) — normalize_reward no im…
kengz Mar 12, 2026
33ee5ee
feat: add HumanoidWalk v4 pilot (12.21) — confirms high variance
kengz Mar 12, 2026
62323e1
refactor: revert ppo_playground spec to b6d6004d — iter3 caused regre…
kengz Mar 12, 2026
2a0b269
docs: fix Phase 5 target ref (100M steps) and add Settings lines per …
kengz Mar 12, 2026
48b7e73
docs: fix Phase 5 settings — uniform num_envs per phase, correct ref …
kengz Mar 12, 2026
2ae2443
refactor: overhaul ppo_playground spec based on official DM Control c…
kengz Mar 12, 2026
853f327
docs: update Phase 5 docs to reflect actual settings and confirmed nu…
kengz Mar 12, 2026
6077a29
fix: remove incorrect time_horizon % num_envs guard in PPO init
kengz Mar 12, 2026
61fcd82
fix: reorder PPO guard — clamp minibatch_size before divisibility check
kengz Mar 12, 2026
edb0270
feat: bump ppo_playground num_envs to 2048 — matches official DM Control
kengz Mar 12, 2026
bb00129
docs: update Phase 5 spec tables and settings to 2048 envs
kengz Mar 12, 2026
84ba45b
docs: mark Phase 5.1 PPO v1 results as pre-fix (1M-batch bug)
kengz Mar 12, 2026
bf9f55f
docs: clear stale Phase 5.1 PPO scores — full rerun in progress
kengz Mar 12, 2026
0e95445
feat: intake Phase 5.1 PPO batch 1 results (p5-ppo5 spec)
kengz Mar 12, 2026
38369fc
refactor: reduce time_horizon to match official mujoco_playground
kengz Mar 12, 2026
66f80a9
feat: intake WalkerWalk (952 ✅) + FingerSpin (537 ⚠️) + CartpoleBalan…
kengz Mar 12, 2026
307a277
refactor: match official num_minibatches=32 for DM Control spec
kengz Mar 12, 2026
4195bfc
fix: correct minibatch_size for DM Control — venv constraint requires…
kengz Mar 12, 2026
ce51e52
fix: tighten log_std clamp to prevent physics blowup in complex envs
kengz Mar 12, 2026
baafb0c
feat: intake AcrobotSwingup (123 ❌) + CartpoleBalanceSparse (459 ❌) +…
kengz Mar 12, 2026
53b8f6e
feat: intake PPO BallInCup (942, ✅) + HopperHop (22, ❌)
kengz Mar 12, 2026
c13e81f
feat: intake PPO PendulumSwingup (276, ⚠️) + PointMass (863, ⚠️)
kengz Mar 12, 2026
e56e7ae
feat: intake PPO ReacherEasy (955, ✅) + ReacherHard (947, ✅)
kengz Mar 12, 2026
d007bcd
feat: intake PPO SwimmerSwimmer6 (485, ⚠️) — plateaued below target 560
kengz Mar 12, 2026
37f1106
feat: intake PPO p5-ppo6 batch 1 — CartpoleBalance (968 ✅), CartpoleS…
kengz Mar 12, 2026
bc0ddfd
feat: intake PPO p5-ppo6 batch 2 — AcrobotSwingup (209 ⚠️), FishSwim …
kengz Mar 12, 2026
f07ac4f
perf: increase log_frequency 100K→1M in ppo_playground spec
kengz Mar 12, 2026
fb55c2f
fix: rate-limit NaN loss warning + revert log_frequency to 100K
kengz Mar 12, 2026
085fbe9
docs: add PHASE5_OPS.md — work tracker for Phase 5.1 PPO benchmarks
kengz Mar 12, 2026
8fe7bc7
feat: per-env canonical spec variants + action_repeat fix for Pendulu…
kengz Mar 12, 2026
3f4ede3
fix: mark HopperHop ✅ (scored 22 >> reference target ~2)
kengz Mar 12, 2026
3c622b4
docs: update PHASE5_OPS with researcher findings and rerun queue
kengz Mar 12, 2026
67183e4
feat: intake PPO HopperStand 16.38 ⚠️ — curve rising steeply, needs 2…
kengz Mar 12, 2026
53b3b3e
docs: update queue — fingerturnhard2 launched, hopperstand 200M queued
kengz Mar 12, 2026
3e9150c
feat: intake PPO FingerSpin 561.3 ⚠️ — improved from 537, near target…
kengz Mar 12, 2026
d747c06
docs: update queue — fishswim2 launched, fingerspin canonical run add…
kengz Mar 13, 2026
b6ef49d
feat: intake PPO p5-ppo6 batch — AcrobotSwingup (253 ✅), CartpoleBala…
kengz Mar 13, 2026
dcb1d3a
fix: bump HopperStand queue budget 200M→700M (36.6K fps × 19800s = 725M)
kengz Mar 13, 2026
0ee97a7
fix: remove >100M frame queue entries — 100M is hard cap, failing env…
kengz Mar 13, 2026
0f81861
docs: comprehensive handover — PHASE5_OPS with full state, spec fixes…
kengz Mar 13, 2026
82a874a
docs: critical correction — Humanoid is DM Control not loco, rerun wi…
kengz Mar 13, 2026
d030991
fix: correct BENCHMARKS.md scores to use strength metric + intake Fin…
kengz Mar 13, 2026
1adfde3
docs: metric correction — strength vs final_strength, CartpoleSwingup…
kengz Mar 13, 2026
6eb08fe
feat: match official Brax PPO hyperparameters for playground benchmarks
Mar 13, 2026
88156e3
fix: CCD overflow monkey-patch + guard impl override for envs without…
kengz Mar 15, 2026
0729228
fix: switch playground to git install for naccdmax/naconmax API support
kengz Mar 15, 2026
c1d3ab9
fix: suppress MuJoCo stderr warnings + use only state obs for dict-ob…
kengz Mar 16, 2026
11726ab
docs: mark close-enough benchmark envs as passed
kengz Mar 16, 2026
7d44da4
docs: update PHASE5_OPS.md with wave 3 status and failing env breakdown
kengz Mar 16, 2026
ac4d451
feat: Phase 3 infrastructure — Pavlovian env, eval harness, configs
kengz Mar 16, 2026
90405f7
feat: sensorimotor env TC-11–24, 14 tasks, 56-dim obs, MuJoCo physics
kengz Mar 16, 2026
c57cecb
fix: block_until_ready in step() before restoring stderr
kengz Mar 16, 2026
8f1dc0d
feat: Dasein agent L0+L1+PPO — perception, being embedding, DaseinNet
kengz Mar 16, 2026
089a0e7
docs: update benchmark scores — SpotFlat 45.75 ✅, AlohaPeg 216.36
kengz Mar 16, 2026
9eb29bb
docs: Go1Getup 0.00 → 18.16 ✅ (obs fix unblocked it)
kengz Mar 16, 2026
f32c376
docs: Go1Handstand 17.88 ✅, H1Inplace 5.54 improved
kengz Mar 16, 2026
24976ff
docs: H1JoystickGaitTracking 16.24 → 27.83 ✅ (loco_precise)
kengz Mar 16, 2026
3ae40c6
docs: H1Inplace 11.95 ✅, H1Joystick 31.11 ✅ (loco_precise)
kengz Mar 16, 2026
9b69c0a
docs: update AlohaPeg 222.49, Go1Footstand 23.48 new bests
kengz Mar 16, 2026
5e9db9d
docs: SpotFlat 48.58 new best (loco_precise)
kengz Mar 16, 2026
6c4c722
fix: remove block_until_ready from step() — suppress stderr permanent…
kengz Mar 17, 2026
6bcb8ef
feat: L3 emotions, replay buffer, curriculum sequencer (Step 4)
kengz Mar 17, 2026
cbf29a3
fix: Phase 3.2a review — spec compliance, integration tests, quality …
kengz Mar 17, 2026
a06a0b3
feat: Phase 3.2b perception pipeline — DINOv2, stereo fusion, FiLM, I…
kengz Mar 17, 2026
0872e1c
docs: FingerTurnHard 560.32 → 590.43 new best (vnorm_constlr)
kengz Mar 17, 2026
2f9ab47
docs: CartpoleSwingup 729.09 ✅, AlohaPeg 223.26 new bests
kengz Mar 17, 2026
d309f67
docs: audit Phase 5 benchmarks — update HF links and regenerate plots
kengz Mar 19, 2026
90abea1
docs: Phase 5 benchmark audit — fix scores, links, plots, clean up ta…
kengz Mar 20, 2026
ca21103
docs: graduate Phase 5 HF data — benchmark-dev → benchmark
kengz Mar 20, 2026
3ebe26c
fix: CI fixes — ruff E741/E402, add scipy dev dependency
kengz Mar 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
159 changes: 143 additions & 16 deletions .claude/skills/benchmark/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,25 +14,105 @@ description: Run SLM-Lab deep RL benchmarks, monitor dstack jobs, extract result
5. **Runs must complete in <6h** (dstack max_duration)
6. **Max 10 concurrent dstack runs** — launch in batches of 10, wait for capacity/completion before launching more. Never submit all runs at once; dstack capacity is limited and mass submissions cause "no offers" failures

## Frame Budget — MANDATORY CALCULATION (do this BEFORE every submission)

**dstack kills jobs at 6h with ZERO data** — no trial_metrics, no HF upload, nothing. A run killed at the wall = complete waste.

**Rule: max_frame = observed_fps × 5.5h × 3600** (5.5h, not 6h — leaves 30min margin)

**ALWAYS check FPS after 5-10 min of a new run before committing to the frame budget:**
```bash
dstack logs NAME --since 10m 2>&1 | grep "trial_metrics" | tail -3
# fps = frames_so_far / elapsed_seconds
```
If projected wall clock > 5.5h at observed fps → **stop immediately and relaunch with reduced max_frame**.

**Known fps at 64 envs (ppo_playground):**
| Env category | fps | Safe max_frame (5.5h) |
|---|---|---|
| CartpoleBalance, CheetahRun, WalkerWalk | ~450-1800 | 8M–10M |
| WalkerStand, HopperStand | ~270 | 5M |
| HumanoidStand | ~200 | 4M |
| HumanoidWalk | ~290 | 5M |
| Rough terrain loco (G1Rough, T1Rough, Go1Getup) | ~60-65 | 1M |
| BerkeleyHumanoidRough | ~36 | 700K |

**For unknown envs:** Submit with conservative 2M, check fps after 5 min, stop and relaunch with correct budget if needed.

## GPU Utilization Check — MANDATORY for Phase 5 / MJWarp runs

**MJWarp must run on GPU. Always verify GPU is actually utilized after a new run starts.**

```bash
# Option 1: dstack metrics (easiest — shows live GPU %)
dstack metrics NAME

# Option 2: SSH in and run nvidia-smi
dstack ssh NAME
# inside the instance:
nvidia-smi
watch -n 2 nvidia-smi
```

**Thresholds:**
- GPU util >80% → MJWarp GPU acceleration working correctly ✅
- GPU util <20% → GPU not utilized — CPU fallback or JAX not using CUDA ❌ Stop run, investigate

**What to check:**
- GPU utilization % (should be high)
- GPU memory used (1024 envs on A5000 24GB — expect 8–16GB used)
- Confirm logs show: `Playground device: GPU (cuda) — DLPack zero-copy` and `impl=warp`

**FPS sanity check for MJWarp at high num_envs (A5000):**
- 64 envs → ~450fps (confirmed baseline)
- 1024 envs → ~5000–7000fps expected (linear GPU scaling)
- 512 envs → ~2500–3500fps expected
- If fps < 1000 at 1024 envs → MJWarp not GPU-accelerated, stop and investigate before launching more runs

**Phase 5 Playground spec selection:**
- DM Control (5.1): `ppo_playground` (1024 envs), `sac_playground` (256 envs), `crossq_playground` (16 envs)
- Locomotion (5.2) / Manipulation (5.3): `ppo_playground_loco` (512 envs), same SAC/CrossQ specs
- DM Control with NaN rewards: override with `-s normalize_obs=false`
- Run order: PPO first (fastest), then SAC, then CrossQ

## Per-Run Intake Checklist

**Every completed run MUST go through ALL of these steps. No exceptions. Do not skip any step.**

When a run completes (`dstack ps` shows `exited (0)`):

1. **Extract score**: `dstack logs NAME | grep "trial_metrics"` → get `total_reward_ma`
1. **Extract score + stats** from logs:
```bash
dstack logs NAME 2>&1 | grep "trial_metrics" # → total_reward_ma, frame
dstack logs NAME 2>&1 | grep "fps:" | tail -5 # → fps (take last stable value)
dstack logs NAME 2>&1 | grep "wall_t:" | tail -1 # → wall_t in seconds → convert to h:mm
```
- **MA** = `total_reward_ma` from trial_metrics
- **Frames** = `frame:` from trial_metrics (e.g. `1.00e+08`)
- **FPS** = last fps value from step logs (e.g. `12500`)
- **Wall Clock** = `wall_t` seconds → format as `Xh Ym` (e.g. `2h 18m`)
2. **Find HF folder name**: `dstack logs NAME 2>&1 | grep "Uploading data/"` → extract folder name from the upload log line
3. **Update table score** in BENCHMARKS.md
3. **Update table** in BENCHMARKS.md: fill ALL columns — MA, HF Data, FPS, Frames, Wall Clock
4. **Update table HF link**: `[FOLDER](https://huggingface.co/datasets/SLM-Lab/benchmark-dev/tree/main/data/FOLDER)`
5. **Pull HF data locally**: `source .env && huggingface-cli download SLM-Lab/benchmark-dev --local-dir data/benchmark-dev --repo-type dataset --include "data/FOLDER/*"`
6. **Generate plot**: List ALL data folders for that env (`ls data/benchmark-dev/data/ | grep -i envname`), then generate with ONLY the folders matching BENCHMARKS.md entries:
6. **Generate plot** (MANDATORY — do NOT skip):
```bash
uv run slm-lab plot -t "EnvName" -d data/benchmark-dev/data -f FOLDER1,FOLDER2,...
```
NOTE: `-d` sets the base data dir, `-f` takes folder names (NOT full paths).
If some folders are in `data/` (local runs) and some in `data/benchmark-dev/data/`, use `data/` as base (it has the `info/` subfolder needed for metrics).
7. **Verify plot exists** in `docs/plots/`
8. **Commit** score + link + plot together
CRITICAL RULES for plot generation:
- Use ONLY the exact folder(s) from the HF Data column of the BENCHMARKS.md table — NEVER grep or ls to find folders
- Multiple folders in data/benchmark-dev/data/ may exist for the same env (old failed runs + new good runs). Only use the canonical folder from the table.
- Include ALL algorithms that have entries in the table for that env (e.g., both PPO and SAC folders if both have scores)
- If the canonical folder is in local `data/` (not in `data/benchmark-dev/data/`), use `-d data` instead
- `-d` sets the base data dir, `-f` takes folder names (NOT full paths)
7. **Display plot** (MANDATORY — call the Read tool on the image file, no exceptions):
```
Read: docs/plots/EnvName_multi_trial_graph_mean_returns_ma_vs_frames.png
```
This MUST happen in your agent turn — call Read, see the image, THEN send your completion message.
Team-lead must also call Read to display it in the main conversation.
8. **Embed plot in BENCHMARKS.md** — for Phase 5 playground envs, ensure the plot is in the DM Control plot grid (search for the existing grid in the Phase 5 section). If the env is already in the grid, no action needed. If missing, add it.
9. **Commit** score + link + plot together

A row in BENCHMARKS.md is NOT complete until it has: score, HF link, and plot.

Expand Down Expand Up @@ -136,18 +216,65 @@ source .env && uv run slm-lab run-remote --gpu SPEC_FILE SPEC_NAME search -n NAM

Budget: ~3-4 trials per dimension. After search: update spec with best params, run `train`, use that result.

## Autonomous Execution
## Agent Team Workflow (MANDATORY for team lead)

**You are the team lead. Never work solo on benchmarks — always spawn an agent team.**

### Team Roles

**launcher** — Reads BENCHMARKS.md, identifies missing entries, launches up to 10 dstack runs. Checks FPS after ~5min and stops slow runs (>6h projected). Reports run names + envs to team lead.

**monitor** — Polls `dstack ps` every 5min (`sleep 300 && dstack ps`). Detects completions and failures. When runs complete, assigns intake tasks. When runs fail, reports to team lead immediately. Runs continuously until all runs are done.

**intake-A / intake-B / intake-C** — Each owns a batch of 3-4 completed runs. Executes the full intake checklist (score → HF folder → pull data → plot → BENCHMARKS.md update). Does NOT commit — team lead commits.

### Spawn Pattern

```
TeamCreate → TaskCreate (one per batch of runs) →
Agent(launcher) + Agent(monitor) + Agent(intake-A) + Agent(intake-B) + ...
```

Spawn all agents in parallel. Intake agents start idle and pick up work as monitor assigns completed runs.

### Team Lead Responsibilities

1. **On spawn**: Brief each agent with full context (run names, env names, BENCHMARKS.md format, intake checklist)
2. **On intake completion**: Read each plot image (Read tool), verify BENCHMARKS.md edits, then commit
3. **On monitor report**: If runs fail, relaunch immediately; if fps too slow, stop + reduce frames
4. **Commit cadence**: Batch-commit after each intake wave (score + HF link + plot per commit)
5. **Shutdown team**: When all runs intaked and committed, send shutdown_request to all teammates

### Monitor Agent Instructions Template

```
You are monitor on team TEAM_NAME. Poll dstack ps every 5min.
Active runs: [LIST OF RUN NAMES]
When a run shows exited(0): send message to team-lead with run name and env name.
When a run shows exited(1) or failed: send message to team-lead immediately.
Use: while true; do dstack ps; sleep 300; done
Stop when team-lead sends shutdown_request.
```

### Intake Agent Instructions Template

```
You are intake-agent-X on team TEAM_NAME. Intake these completed runs: [LIST]
For each run, follow the full intake checklist in the benchmark skill.
Working dir: /Users/keng/projects/SLM-Lab
Do NOT commit — team lead commits.
After all runs done: send results summary to team-lead (scores, HF folders, any issues).
```

Work continuously when benchmarking. Use `sleep 300 && dstack ps` to actively wait (5 min intervals) — never delegate monitoring to background processes or scripts. Stay engaged in the conversation.
### Autonomous Execution

**Workflow loop** (repeat every 5-10 minutes):
1. **Check status**: `dstack ps` — identify completed/failed/running
2. **Intake completed runs**: For EACH completed run, do the full intake checklist above (score → HF link → pull → plot → table update)
3. **Launch next batch**: Up to 10 concurrent. Check capacity before launching more
4. **Iterate on failures**: Relaunch or adjust config immediately
5. **Commit progress**: Regular commits of score + link + plot updates
**Workflow loop** (team lead orchestrates, agents execute):
1. **launcher**: Identifies gaps in BENCHMARKS.md → launches up to 10 runs → reports to team lead
2. **monitor**: Watches for completions → notifies team lead → assigns intake work
3. **intake agents**: Execute full checklist per run → report to team lead
4. **team lead**: Reviews plots, commits, relaunches failures, spawns next batch

**Key principle**: Work continuously, check in regularly, iterate immediately on failures. Never idle. Keep reminding yourself to continue without pausing — check on tasks, update, plan, and pick up the next task immediately until all tasks are completed.
**Key principle**: Keep agents working in parallel. Never idle as team lead while GPU runs are active — spawn a monitor agent. Commit after each intake wave. Shut down team cleanly when done.

## Troubleshooting

Expand Down
4 changes: 3 additions & 1 deletion .dstack/run-gpu-search.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,12 @@ env:
- PROFILE
- PROF_SKIP
- PROF_ACTIVE
- UV_HTTP_TIMEOUT=300

commands:
- apt-get update && apt-get install -y swig libgl1-mesa-glx libglib2.0-0 libsm6 libxext6 libxrender-dev libgomp1
- cd /workflow && uv sync
- cd /workflow && uv sync --group playground
- cd /workflow && uv run python -c "from mujoco_playground._src.mjx_env import ensure_menagerie_exists; ensure_menagerie_exists()"
- cd /workflow && uv run slm-lab run ${SPEC_VARS} ${SPEC_FILE} ${SPEC_NAME} ${LAB_MODE} --upload-hf

resources:
Expand Down
7 changes: 5 additions & 2 deletions .dstack/run-gpu-train.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,13 @@ env:
- PROFILE
- PROF_SKIP
- PROF_ACTIVE
- XLA_PYTHON_CLIENT_PREALLOCATE=false
- UV_HTTP_TIMEOUT=300

commands:
- apt-get update && apt-get install -y swig libgl1-mesa-glx libglib2.0-0 libsm6 libxext6 libxrender-dev libgomp1
- cd /workflow && uv sync
- cd /workflow && uv sync --group playground
- cd /workflow && uv run python -c "from mujoco_playground._src.mjx_env import ensure_menagerie_exists; ensure_menagerie_exists()"
- cd /workflow && uv run slm-lab run ${SPEC_VARS} ${SPEC_FILE} ${SPEC_NAME} ${LAB_MODE} --upload-hf

resources:
Expand All @@ -29,7 +32,7 @@ resources:
memory: 32GB..

spot_policy: auto
max_duration: 8h
max_duration: 6h
max_price: 0.50
retry:
on_events: [no-capacity]
Expand Down
Loading
Loading