Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
23dca2a
refactor: framework improvements — PPO, nets, memory, plotting, utils
kengz Feb 28, 2026
945625b
feat: CrossQ algorithm — SAC without target networks + BatchRenorm
kengz Feb 28, 2026
37a25ea
feat: CrossQ benchmark specs for all environments
kengz Feb 28, 2026
90adefe
docs: CrossQ benchmark results, plots, and documentation
kengz Feb 28, 2026
ca91e73
fix: update dstack configs for 0.20.x compatibility
kengz Feb 28, 2026
82cf6c4
feat: CrossQ improvement specs for ⚠️ envs (InvPend, InvDblPend, Hopper)
kengz Feb 28, 2026
1905f37
docs: update CrossQ classic/box2d benchmark scores and plots
kengz Feb 28, 2026
267a696
docs: update CrossQ MuJoCo benchmark scores and plots
kengz Mar 1, 2026
7f82b2e
docs: update CrossQ HumanoidStandup and Reacher HF links to clean-nam…
kengz Mar 1, 2026
92803f7
fix: CrossQ InvDblPend critic [1024]→[512], Humanoid 3.5M→4M frames
kengz Mar 1, 2026
1aae308
fix: cap CrossQ max_frame at SAC levels for fair comparison
kengz Mar 1, 2026
f251cba
fix: adjust CrossQ frames from learning curves — Ant 3M, Humanoid 2M,…
kengz Mar 1, 2026
cb74867
docs: regenerate 5 MuJoCo plots with clean legend entries
kengz Mar 1, 2026
63eefba
docs: update CrossQ Ant, HalfCheetah, InvDoublePendulum scores and pl…
kengz Mar 1, 2026
5cafac8
docs: update CrossQ LunarLanderContinuous score 249.85→268.91 from rerun
kengz Mar 1, 2026
f0f1dc0
fix: CartPole 200K→300K frames, Humanoid iter=2→4 for higher UTD
kengz Mar 1, 2026
bda2611
fix: CartPole revert to 200K frames, add training_iter=2 for more gra…
kengz Mar 1, 2026
f57d843
fix: CartPole training_iter=4, BRN warmup=2000 for more gradients
kengz Mar 1, 2026
3e6102d
fix: CartPole training_start_step=5000 for better initial buffer dive…
kengz Mar 2, 2026
93804b7
docs: update CrossQ Humanoid score 1850→1102 and plot
kengz Mar 2, 2026
d8695ec
fix: revert CartPole spec to match arc run that scored 405
kengz Mar 2, 2026
1323766
fix: CartPole training_iter=2 for moderate UTD bump
kengz Mar 2, 2026
a3eeb72
docs: update CrossQ CartPole score 405.88→324.10 from non-arc rerun (…
kengz Mar 2, 2026
7c093de
docs: graduate CrossQ CartPole HF link to public benchmark
kengz Mar 2, 2026
ba0d861
docs: regenerate all CrossQ benchmark plots
kengz Mar 2, 2026
13a32b7
docs: use 3M Hopper run (within env settings) instead of 6M
kengz Mar 2, 2026
61b5c97
docs: fix CrossQ Atari HF links to standard-named folders on public repo
kengz Mar 2, 2026
a4d3026
fix: align CrossQ reproduce table max_frame with actual run data
kengz Mar 2, 2026
2cf11b0
fix: align CrossQ MuJoCo specs with actual benchmark runs for reprodu…
kengz Mar 2, 2026
2bdd565
feat: integrate optimization branch — pinned memory, profiler, PPO mi…
kengz Mar 2, 2026
1114fb5
fix: canonical numeric substitution and unsubstituted var validation …
kengz Mar 3, 2026
d8f5078
fix: restore PPO minibatch_size=64 and hardcoded Atari max_frame
kengz Mar 3, 2026
21beecb
fix: revert pinned memory to diagnose consistent score regression
kengz Mar 3, 2026
51a633f
docs: sync CLAUDE.md with updated good-code skill template
kengz Mar 3, 2026
7cfc6c5
fix: correct 4 BENCHMARKS.md discrepancies found in audit
kengz Mar 3, 2026
6e0f68b
feat: bump version to 5.2.0 — CrossQ algorithm
kengz Mar 3, 2026
1b62a17
chore: remove CrossQ tracker doc (superseded by BENCHMARKS.md)
kengz Mar 3, 2026
e936e75
chore: remove improvements roadmap doc (work completed)
kengz Mar 3, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 38 additions & 14 deletions .claude/skills/benchmark/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,35 @@ When a run completes (`dstack ps` shows `exited (0)`):
2. **Find HF folder name**: `dstack logs NAME 2>&1 | grep "Uploading data/"` → extract folder name from the upload log line
3. **Update table score** in BENCHMARKS.md
4. **Update table HF link**: `[FOLDER](https://huggingface.co/datasets/SLM-Lab/benchmark-dev/tree/main/data/FOLDER)`
5. **Pull HF data locally**: `source .env && hf download SLM-Lab/benchmark-dev --local-dir data/benchmark-dev --repo-type dataset --include "data/FOLDER/*"`
6. **Generate plot**: `uv run slm-lab plot -t "EnvName" -f data/benchmark-dev/data/FOLDER1,data/benchmark-dev/data/FOLDER2`
5. **Pull HF data locally**: `source .env && huggingface-cli download SLM-Lab/benchmark-dev --local-dir data/benchmark-dev --repo-type dataset --include "data/FOLDER/*"`
6. **Generate plot**: List ALL data folders for that env (`ls data/benchmark-dev/data/ | grep -i envname`), then generate with ONLY the folders matching BENCHMARKS.md entries:
```bash
uv run slm-lab plot -t "EnvName" -d data/benchmark-dev/data -f FOLDER1,FOLDER2,...
```
NOTE: `-d` sets the base data dir, `-f` takes folder names (NOT full paths).
If some folders are in `data/` (local runs) and some in `data/benchmark-dev/data/`, use `data/` as base (it has the `info/` subfolder needed for metrics).
7. **Verify plot exists** in `docs/plots/`
8. **Commit** score + link + plot together

A row in BENCHMARKS.md is NOT complete until it has: score, HF link, and plot.

## Per-Run Graduation Checklist

**After intake, graduate each finalized run to public HF benchmark:**

1. **Upload folder to public HF**:
```bash
source .env && huggingface-cli upload SLM-Lab/benchmark data/benchmark-dev/data/FOLDER data/FOLDER --repo-type dataset
```
2. **Update BENCHMARKS.md link**: Change `SLM-Lab/benchmark-dev` → `SLM-Lab/benchmark` for that entry
3. **Upload docs/ to public HF** (updated plots + BENCHMARKS.md):
```bash
source .env && huggingface-cli upload SLM-Lab/benchmark docs docs --repo-type dataset
source .env && huggingface-cli upload SLM-Lab/benchmark README.md README.md --repo-type dataset
```
4. **Commit** link update
5. **Push** to origin

## Launch

```bash
Expand Down Expand Up @@ -75,26 +97,28 @@ source .env && hf download SLM-Lab/benchmark-dev \
### Generate Plots

```bash
# Find folders for a game
# Find folders for a game (check both local data/ and benchmark-dev)
ls data/ | grep -i pong
ls data/benchmark-dev/data/ | grep -i pong

# Generate comparison plot (include all algorithms available)
uv run slm-lab plot -t "Pong" \
-f data/benchmark-dev/data/ppo_folder,data/benchmark-dev/data/sac_folder
# Generate comparison plot — use -d for base dir, -f for folder names only
# Use data/ as base (has info/ subfolder with trial_metrics)
uv run slm-lab plot -t "Pong-v5" -f ppo_pong_folder,sac_pong_folder,crossq_pong_folder
```

### Graduate to Public HF

When benchmarks are finalized, publish from `benchmark-dev` → `benchmark`:
When a run is finalized, graduate individually from `benchmark-dev` → `benchmark`:

```bash
source .env && hf upload SLM-Lab/benchmark \
data/benchmark-dev/data data --repo-type dataset

# Update BENCHMARKS.md links: benchmark-dev → benchmark
# Upload docs and README
source .env && hf upload SLM-Lab/benchmark docs docs --repo-type dataset
source .env && hf upload SLM-Lab/benchmark README.md README.md --repo-type dataset
# Upload individual folder
source .env && huggingface-cli upload SLM-Lab/benchmark \
data/benchmark-dev/data/FOLDER data/FOLDER --repo-type dataset

# Update BENCHMARKS.md link for that entry: benchmark-dev → benchmark
# Then upload docs/ (includes updated plots + BENCHMARKS.md)
source .env && huggingface-cli upload SLM-Lab/benchmark docs docs --repo-type dataset
source .env && huggingface-cli upload SLM-Lab/benchmark README.md README.md --repo-type dataset
```

| Repo | Purpose |
Expand Down
7 changes: 5 additions & 2 deletions .dstack/run-cpu-search.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ name: slm-lab

python: 3.12

files:
- ..:/workflow
repos:
- "..:/workflow"

env:
- HF_TOKEN
Expand All @@ -13,6 +13,9 @@ env:
- SPEC_NAME
- LAB_MODE
- SPEC_VARS # --set overrides, e.g. "-s env=ALE/Breakout-v5"
- PROFILE
- PROF_SKIP
- PROF_ACTIVE

commands:
- apt-get update && apt-get install -y swig libgl1-mesa-glx libglib2.0-0 libsm6 libxext6 libxrender-dev libgomp1
Expand Down
7 changes: 5 additions & 2 deletions .dstack/run-cpu-train.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ name: slm-lab

python: 3.12

files:
- ..:/workflow
repos:
- "..:/workflow"

env:
- HF_TOKEN
Expand All @@ -13,6 +13,9 @@ env:
- SPEC_NAME
- LAB_MODE
- SPEC_VARS # --set overrides, e.g. "-s env=ALE/Breakout-v5"
- PROFILE
- PROF_SKIP
- PROF_ACTIVE

commands:
- apt-get update && apt-get install -y swig libgl1-mesa-glx libglib2.0-0 libsm6 libxext6 libxrender-dev libgomp1
Expand Down
7 changes: 5 additions & 2 deletions .dstack/run-gpu-search.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ name: slm-lab

python: 3.12

files:
- ..:/workflow
repos:
- "..:/workflow"

env:
- HF_TOKEN
Expand All @@ -13,6 +13,9 @@ env:
- SPEC_NAME
- LAB_MODE
- SPEC_VARS # --set overrides, e.g. "-s env=ALE/Breakout-v5"
- PROFILE
- PROF_SKIP
- PROF_ACTIVE

commands:
- apt-get update && apt-get install -y swig libgl1-mesa-glx libglib2.0-0 libsm6 libxext6 libxrender-dev libgomp1
Expand Down
11 changes: 7 additions & 4 deletions .dstack/run-gpu-train.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ name: slm-lab

python: 3.12

files:
- ..:/workflow
repos:
- "..:/workflow"

env:
- HF_TOKEN
Expand All @@ -13,6 +13,9 @@ env:
- SPEC_NAME
- LAB_MODE
- SPEC_VARS # --set overrides, e.g. "-s env=ALE/Breakout-v5"
- PROFILE
- PROF_SKIP
- PROF_ACTIVE

commands:
- apt-get update && apt-get install -y swig libgl1-mesa-glx libglib2.0-0 libsm6 libxext6 libxrender-dev libgomp1
Expand All @@ -21,12 +24,12 @@ commands:

resources:
gpu:
name: [RTX3090]
memory: 20GB..
count: 1
memory: 32GB..

spot_policy: auto
max_duration: 6h
max_duration: 8h
max_price: 0.50
retry:
on_events: [no-capacity]
Expand Down
51 changes: 51 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.lz4 filter=lfs diff=lfs merge=lfs -text
*.mds filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
# Audio files - uncompressed
*.pcm filter=lfs diff=lfs merge=lfs -text
*.sam filter=lfs diff=lfs merge=lfs -text
*.raw filter=lfs diff=lfs merge=lfs -text
# Audio files - compressed
*.aac filter=lfs diff=lfs merge=lfs -text
*.flac filter=lfs diff=lfs merge=lfs -text
*.mp3 filter=lfs diff=lfs merge=lfs -text
*.ogg filter=lfs diff=lfs merge=lfs -text
*.wav filter=lfs diff=lfs merge=lfs -text
# Image files - small plot PNGs tracked as regular git objects (no LFS needed)
# Video files - compressed
*.mp4 filter=lfs diff=lfs merge=lfs -text
*.webm filter=lfs diff=lfs merge=lfs -text
22 changes: 16 additions & 6 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

You are a seasoned software engineer with the following traits:

- **Supervisor-first**: Delegate implementation to agent teams — your role is to orchestrate, review, and commit, not to implement directly
- **Quality-driven**: Code quality is non-negotiable - clean, idiomatic, maintainable code every time
- **Autonomous**: Make informed technical decisions independently - only ask when requirements are genuinely unclear
- **Pragmatic**: Balance perfect with practical - ship working solutions, iterate when needed
Expand All @@ -22,11 +23,17 @@ You are a seasoned software engineer with the following traits:
Apply these six principles to every decision.

1. **Consistent** — Design from first principles — unified naming, patterns, and conventions throughout.
Establish naming conventions and structural patterns first. When the same concept uses the same name everywhere, the codebase becomes searchable, replaceable, and predictable.
2. **Correct** — Constructed from known truths, not debugged into shape.
Build upward from solid foundations — each layer verified before the next is added. Correctness is built from the start, not tested into existence.
3. **Clear** — Code does what it says — intent is obvious from naming and logic alone.
A lot of coding is naming. If you need a comment to explain what code does, the code is not clear enough.
4. **Concise** — Simplified to the essence — nothing left to remove.
Brevity is about fewer concepts to hold in your head, not fewer characters. Eliminate duplication, remove dead code, strip unnecessary abstraction.
5. **Simple** — Few moving parts, easy to explain, cheap to maintain — complexity is not sophistication.
A complex architecture with dozens of tangled dependencies is not intelligence — it is poor design. Reduce to the fewest moving parts while losing nothing essential.
6. **Salient** — Essential enough to be used widely, fundamental enough to last.
Code that follows the preceding principles naturally endures — used broadly, needed deeply, lasting because it was built right.

## Style Guide

Expand Down Expand Up @@ -60,14 +67,17 @@ Apply these six principles to every decision.

## Agent Teams

**For any non-trivial task, deploy agent teams.** This is the standard operating mode — do not default to working solo. The lead orchestrates (breaks down work, assigns tasks, reviews outputs, commits) — it should never get buried in implementation. Delegation keeps the lead strategic, enables parallel execution, and protects context window from long-running tasks.
**You are the lead. You do not implement — you delegate, supervise, and review.**

**Guidelines:**
1. **Give enough context in spawn prompts** - teammates don't inherit conversation history, only CLAUDE.md and project context
2. **Size tasks appropriately** - self-contained units with clear deliverables, ~5-6 per teammate
3. **Avoid file conflicts** - each teammate owns different files
For any non-trivial task, use TeamCreate with multiple teammates (not single-Agent subagents). Teammates share a task list, claim work, and message each other directly. Solo work is only acceptable for trivial, single-file changes.

> Work autonomously: run things in parallel, continue without pausing, pick up the next task immediately. For long-running tasks, use `sleep N` to actively wait and check in — do NOT delegate to background processes. Stay engaged in the conversation.
**Do NOT:** use subagents as a substitute for teams, implement tasks yourself (spawn new teammates instead), or start implementing while teammates are still working.

**Workflow:** Break into parallel units → TeamCreate → TaskCreate per unit → spawn 3-5 teammates with full context (they only inherit CLAUDE.md, not conversation history) → require plan approval for risky tasks → supervise and review → commit final result yourself.

**Sizing:** ~5-6 tasks per teammate, self-contained units, each teammate owns different files.

**Panel of agents:** For design decisions or ambiguous requirements, spawn 3+ teammates with different perspectives. Have them debate and challenge each other — adversarial review beats independent comparison. Converge on the approach that survives scrutiny.


## Documentation
Expand Down
Loading