fix(wave34): OPTIMIZER env alias + strict optimizer dispatch (no silent AdamW fallback) by gHashTag · Pull Request #135 · gHashTag/trios-trainer-igla

gHashTag · 2026-05-12T05:31:25Z

Wave-34 RCA fix — OPTIMIZER env alias + strict optimizer dispatch

Summary

Two compound bugs caused Wave-34 (38 services × ~4h fleet credits) to converge 15 nominally-distinct optimizers to bit-identical BPB = 2.6814258098602295 on seed=123. This PR fixes both and adds a hard fail-fast for unknown optimizer labels.

Bugs

Bug #1 — `src/bin/entrypoint.rs:26`

let optimizer = env_or("TRIOS_OPTIMIZER", "adamw");

Does not honor the un-prefixed OPTIMIZER alias. PR #130 (Wave-33 hotfix) added resolve_env_alias for STEPS/LR/HIDDEN/SEED but forgot the optimizer knob. Wave-34 set OPTIMIZER=lion (and 14 other labels) on 38 services and every one silently defaulted to "adamw".

Bug #2 — `src/bin/trios-train.rs:298`

let outcome = match cli.optimizer.as_str() {
    "muon" => train_loop::run_single_muon(&args, false)?,
    "muon-cwd" => train_loop::run_single_muon(&args, true)?,
    _ => train_loop::run_single(&args)?,   // ← silent fallback to AdamW
};

Wildcard arm silently routes any unsupported optimizer label (lion / lamb / soap / tiger / sgdm / prodigy / adafactor / shampoo / yogi / ranger / radam / adabelief / adamax) to AdamW. Even if Bug #1 were fixed, this fallback alone would have caused identical BPB.

Fix

entrypoint.rs — replace env_or with resolve_env_alias("TRIOS_OPTIMIZER", "OPTIMIZER", "adamw"). Precedence (matches PR fix(wave33): entrypoint env-alias hotfix — root cause Wave-29 STEPS=200000 silent drop #130 contract): TRIOS_OPTIMIZER > OPTIMIZER > default "adamw". Adds the optimizer source to the [entrypoint-trace] line so operators can grep one keyword to verify the override reached the trainer.
trios-train.rs — explicit "adamw" => … arm plus other => anyhow::bail!("Unsupported optimizer: …"). Fail-fast with a clear message listing the supported set.

Honest probe — reproduction & verification

Reproduction artefacts at /home/user/workspace/skills/user/igla-honest-short-run/SKILL.md (new user skill, sibling of tri-gardener-runbook v2.3).

Test	Before fix	After fix
adamw, 100 steps, seed=123	BPB=6.4700	BPB=6.4700 ✅
muon, 100 steps, seed=123	BPB=6.4409	BPB=6.4409 ✅ (≠ adamw)
lion (fake)	silent → 6.4700 (same as adamw) ❌	`bail!("Unsupported optimizer: \"lion\"…")` ✅
OPTIMIZER=muon alias	silent → adamw default ❌	trace: `opt=(muon, src=alias)` ✅
TRIOS_OPTIMIZER=muon-cwd + OPTIMIZER=lion	TRIOS_* wins (already worked)	`opt=(muon-cwd, src=TRIOS_*)` ✅

Why this took 38 services × 4h to spot

Wave-34 dashboard showed 15 distinct optimizer LABELS, BPB values printed in trace logs LOOKED distinct because rounding hid the byte-identity, and the canonical-name index pivoted on (canon, seed) — making it look like 15 architectures each had their own BPB. Only when one operator dumped raw bpb to 16 decimals did the byte-identity become visible.

Operational mitigation

tri-gardener-runbook skill bumped to v2.3 with a new mandatory section "Pre-flight gate — igla-honest-short-run". Three gate criteria before ANY Wave-N Railway deploy:

Seed variance real: 4 seeds × adamw → BPB spread ≥ 0.005 (catches dead SEED alias)
Optimizer variance real: seed=123 × {adamw, muon, muon-cwd} → at least one pair differs by ≥ 0.001 BPB (catches dead OPTIMIZER alias)
Fake optimizer trap: seed=123 × {adamw, lion, lamb, soap} — if all four byte-identical, the wildcard fallback is still present → HARD BLOCK

Total wall-clock cost: < 5 min on operator's CPU box. Compare with Wave-34 cost: 38 × 4h cloud = 152 service-hours.

RCA artefacts

Bug report: trios#143 comment 4427543239
Honest probe results: trios#143 comment 4427583321
Wave-34 final BPB (canonical names only): trios#143 comment 4427530532
New skill: igla-honest-short-run (scope=user)
Updated skill: tri-gardener-runbook v2.3 (scope=user, §"Pre-flight gate (MANDATORY)")

Checklist

Build: cargo build --release --bin trios-train --bin entrypoint → green
Smoke: fake optimizer fails fast with clear error
Smoke: real optimizers produce distinct BPB values
Smoke: OPTIMIZER alias resolves through entrypoint
Smoke: TRIOS_OPTIMIZER precedence preserved
No new dependencies (uses existing anyhow + resolve_env_alias)
Backward-compatible: existing TRIOS_OPTIMIZER deployments unaffected

🌻 phi² + phi⁻² = 3 · TRINITY · NEVER STOP · DOI 10.5281/zenodo.19227877

Two compound bugs caused Wave-34 (38 services × ~4h fleet credits) to converge 15 nominally-distinct optimizers to bit-identical BPB 2.6814258098602295 on seed=123: 1. src/bin/entrypoint.rs:26 — `let optimizer = env_or("TRIOS_OPTIMIZER", "adamw")` did NOT honor the un-prefixed `OPTIMIZER` alias. PR #130 added alias resolution for STEPS/LR/HIDDEN/SEED but forgot the optimizer knob. Wave-34 set `OPTIMIZER=lion` (etc.) on 38 services and every one silently defaulted to "adamw". 2. src/bin/trios-train.rs:298 — the dispatch `match cli.optimizer.as_str()` had a wildcard `_ => run_single(...)` arm. Result: even if the alias had been honored, any unsupported optimizer label (lion / lamb / soap / tiger / sgdm / prodigy / adafactor / shampoo / yogi / ranger / radam / adabelief / adamax) would have silently routed to AdamW anyway. This PR: - Replaces `env_or` with `resolve_env_alias("TRIOS_OPTIMIZER", "OPTIMIZER", "adamw")` so `OPTIMIZER=muon` (etc.) reaches `trios-train`. Precedence: TRIOS_OPTIMIZER > OPTIMIZER > default "adamw" (matches PR #130 contract). - Adds the optimizer source to the entrypoint-trace line so operators can verify with one grep that the override reached the trainer. - Replaces the wildcard arm with explicit `"adamw" => …` plus `other => anyhow::bail!("Unsupported optimizer: …")`. Unsupported labels now FAIL FAST with a clear error citing the supported set. Reproduced honest probe (4 seeds × adamw + 3 real opts × seed=123 + 4 fake opts × seed=123) at /home/user/workspace/skills/user/ igla-honest-short-run/SKILL.md. RCA + table: trios#143 comment 4427543239 / 4427583321. Pre-flight gate now required before any Wave-N deploy: skill 'tri-gardener-runbook' v2.3 §'Pre-flight gate'. Verified locally: - adamw 100 steps seed=123 → BPB=6.4700 - muon 100 steps seed=123 → BPB=6.4409 (≠ adamw, real) - lion 100 steps seed=123 → bail!("Unsupported optimizer: \"lion\"…") - OPTIMIZER=muon (alias) → trace: opt=(muon, src=alias) - TRIOS_OPTIMIZER=muon-cwd + OPTIMIZER=lion → opt=(muon-cwd, src=TRIOS_*) Anchor: phi^2 + phi^-2 = 3 · TRINITY · NEVER STOP · DOI 10.5281/zenodo.19227877

This was referenced May 12, 2026

🎯 IGLA RACE — Distributed Hunt: JEPA-T + NCA + GF16 + ASHA + Coq Invariants (Rust-only, Never-Stopping) gHashTag/trios#143

Closed

fix(matrix_runner): R5 guards — ALGO_WHITELIST + SEED_CANON + FORMAT_ALIAS + LANE canon_name #145

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(wave34): OPTIMIZER env alias + strict optimizer dispatch (no silent AdamW fallback)#135

fix(wave34): OPTIMIZER env alias + strict optimizer dispatch (no silent AdamW fallback)#135
gHashTag wants to merge 1 commit into
mainfrom
fix/wave34-optimizer-alias-and-strict-dispatch

gHashTag commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gHashTag commented May 12, 2026

Wave-34 RCA fix — OPTIMIZER env alias + strict optimizer dispatch

Summary

Bugs

Bug #1 — src/bin/entrypoint.rs:26

Bug #2 — src/bin/trios-train.rs:298

Fix

Honest probe — reproduction & verification

Why this took 38 services × 4h to spot

Operational mitigation

RCA artefacts

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Bug #1 — `src/bin/entrypoint.rs:26`

Bug #2 — `src/bin/trios-train.rs:298`