yieldthought · yieldthought · Feb 26, 2026
diff --git a/MODELS.md b/MODELS.md
@@ -27,6 +27,7 @@ Note: Keep the table columns padded with spaces and right-justify numeric cells
 | Qwen/Qwen3-0.6B                     |   n300   | functional |   99% |  100% |  943ms |   2.0 |   40960 |
 | Qwen/Qwen3-0.6B                     |  t3000   | functional |   98% |  100% |  229ms |   6.2 |   40960 |
 | Qwen/Qwen3-30B-A3B                  |   n150   | functional |   94% |  100% | 100081ms |   0.4 |   40960 |
+| Qwen/Qwen3.5-35B-A3B                |   n150   | functional |   97% |  100% |  5403ms |   2.5 |    4096 |
 | google/gemma-3-4b-it                |   n150   | functional |   92% |  100% |   98ms |  13.9 |   40960 |
 | google/gemma-3-4b-it                |   n300   | functional |   94% |  100% |  535ms |   3.2 |   40960 |
 | google/gemma-3-4b-it                |  t3000   | functional |   92% |  100% |  330ms |   4.7 |   40960 |
@@ -55,6 +56,7 @@ Note: Keep the table columns padded with spaces and right-justify numeric cells
 | Qwen/Qwen3-0.6B                     |   n300   | optimized  |   99% |  100% |   54ms |  55.3 |   40960 |
 | Qwen/Qwen3-0.6B                     |  t3000   | optimized  |   98% |  100% |   59ms |  61.9 |   40960 |
 | Qwen/Qwen3-30B-A3B                  |   n150   | optimized  |   96% |  100% |  2197ms |   4.8 |   40960 |
+| Qwen/Qwen3.5-35B-A3B                |   n150   | optimized  |   96% |  100% |  5393ms |   4.0 |    4096 |
 | google/gemma-3-4b-it                |   n150   | optimized  |   92% |  100% |   70ms |  14.5 |   40960 |
 | google/gemma-3-4b-it                |   n300   | optimized  |   94% |  100% |   68ms |  18.5 |   40960 |
 | google/gemma-3-4b-it                |  t3000   | optimized  |   91% |  100% |   78ms |  19.4 |   40960 |

diff --git a/models/Qwen/Qwen3.5-35B-A3B/n150/MODEL_BRINGUP.md b/models/Qwen/Qwen3.5-35B-A3B/n150/MODEL_BRINGUP.md
@@ -0,0 +1,40 @@
+# MODEL_BRINGUP.md — models/Qwen/Qwen3.5-35B-A3B/n150
+
+## Overview
+Optimization pass for `models/Qwen/Qwen3.5-35B-A3B/n150` using `ttnn-model-optimization`.
+
+Retained changes:
+1. Decode trace is enabled by default (`QWEN35_USE_DECODE_TRACE=1`) and used in the optimized flow.
+2. Trace capture targets the decode head (`hidden -> lm_head`) so capture avoids host-MoE writes.
+3. Prefill-only on-device argmax (`next_token_device`) is kept for TTFT.
+4. Decode-only MoE route cap is kept at `decode_top_k=6` (env override: `QWEN35_DECODE_TOP_K`).
+
+## Baseline vs Final
+
+| Metric | Baseline (functional) | Final (optimized) | Delta |
+|---|---:|---:|---:|
+| Top-1 (100-token eval) | 97.00% | 96.00% | -1.00 pt |
+| Top-5 (100-token eval) | 100.00% | 100.00% | 0.00 pt |
+| TTFT | 5403 ms | 5393 ms | -10 ms |
+| Decode throughput | 2.46 t/s/u | 4.04 t/s/u | +1.57 t/s/u (+63.8%) |
+
+## Decode Trace Status
+- Optimized default path uses decode trace (`USE_DECODE_TRACE` default is on).
+- Successful traced decode evidence from `demo.log`:
+  - `decode_trace: captured lm_head trace`
+  - `decode_trace: executing captured lm_head trace`
+- Final measured run: `ttft_ms=5393.270309781656`, `decode_tps_u=4.037444069807221`.
+
+## Optimization Decisions
+1. Kept decode-head trace capture/execute.
+   - Why: full decode trace is blocked by host MoE writes during capture; decode-head trace captures cleanly and executes every decode step after capture.
+2. Kept decode route cap at 6.
+   - Why: improves decode throughput while preserving acceptable eval quality.
+3. Kept prefill-only device argmax.
+   - Why: avoids full-vocab host transfer in prefill token selection path.
+4. Rejected full decode trace capture.
+   - Why: runtime raises `TT_FATAL: Writes are not supported during trace capture` when host writes are present in capture region.
+
+## Commands Used
+Demo command is logged in `demo.log`.
+Eval command is logged in `eval.log`.
diff --git a/models/Qwen/Qwen3.5-35B-A3B/n150/demo.log b/models/Qwen/Qwen3.5-35B-A3B/n150/demo.log
@@ -0,0 +1,35 @@
+# demo.log — models/Qwen/Qwen3.5-35B-A3B/n150
+
+## Baseline (functional)
+Command:
+PYTHONPATH=/tmp/transformers520_custom:$PYTHONPATH TTNN_TRANSFORMERS_PYTHONPATH=/tmp/transformers520_custom HF_HOME=/localdev/moconnor/hf-cache HF_HUB_DISABLE_PROGRESS_BARS=1 TT_METAL_CACHE=/tmp/tt-metal-cache TT_METAL_RUNTIME_ROOT=/proj_sw/user_dev/moconnor/tt-metal python -u models/Qwen/Qwen3.5-35B-A3B/n150/run_demo_bf16.py models/Qwen/Qwen3.5-35B-A3B/n150/model.py --max-new-tokens 128 --max_seq_len 4096 --temperature 0 --seed 0
+
+Output (key lines):
+TTFT: 5403 ms | Decode: 2.5 t/s/u (127 tokens)
+YT_METRICS={"mode": "tt_demo", "model": "Qwen/Qwen3.5-35B-A3B", "system": "n150", "mesh_shape": [1, 1], "prompt_tokens": 54, "generated_tokens": 128, "ttft_ms": 5403.092756867409, "decode_tps_u": 2.464734602922183, "decode_tokens": 127, "max_seq_len": 4096}
+
+## Optimized (trace-enabled default)
+Command:
+PYTHONPATH=/tmp/transformers520_custom:$PYTHONPATH TTNN_TRANSFORMERS_PYTHONPATH=/tmp/transformers520_custom HF_HOME=/localdev/moconnor/hf-cache HF_HUB_DISABLE_PROGRESS_BARS=1 TT_METAL_CACHE=/tmp/tt-metal-cache TT_METAL_RUNTIME_ROOT=/proj_sw/user_dev/moconnor/tt-metal python -u models/Qwen/Qwen3.5-35B-A3B/n150/run_demo_bf16.py models/Qwen/Qwen3.5-35B-A3B/n150/model.py --max-new-tokens 128 --max_seq_len 4096 --temperature 0 --seed 0 --output-format yt_metrics
+
+Output (key lines):
+decode_trace: captured lm_head trace
+2026-02-26 00:40:38.881 | warning  |           Metal | Allocating device buffers is unsafe due to the existence of an active trace. These buffers may be corrupted once a trace is executed. (allocator.cpp:105)
+decode_trace: executing captured lm_head trace
+YT_METRICS={"mode": "tt_demo", "model": "Qwen/Qwen3.5-35B-A3B", "system": "n150", "mesh_shape": [1, 1], "prompt_tokens": 54, "generated_tokens": 128, "ttft_ms": 5393.270309781656, "decode_tps_u": 4.037444069807221, "decode_tokens": 126, "max_seq_len": 4096}
+
+## Coherence Evidence (optimized, trace-enabled)
+Command:
+PYTHONPATH=/tmp/transformers520_custom:$PYTHONPATH TTNN_TRANSFORMERS_PYTHONPATH=/tmp/transformers520_custom HF_HOME=/localdev/moconnor/hf-cache HF_HUB_DISABLE_PROGRESS_BARS=1 TT_METAL_CACHE=/tmp/tt-metal-cache TT_METAL_RUNTIME_ROOT=/proj_sw/user_dev/moconnor/tt-metal python -u models/Qwen/Qwen3.5-35B-A3B/n150/run_demo_bf16.py models/Qwen/Qwen3.5-35B-A3B/n150/model.py --max-new-tokens 64 --max_seq_len 4096 --temperature 0 --seed 0
+
+Output (excerpt):
+TT demo (n150)
+Model: Qwen/Qwen3.5-35B-A3B
+Mesh shape: 1x1
+Prompt tokens: 54 | Generated tokens: 64
+TTFT: 5382 ms | Decode: 3.9 t/s/u (62 tokens)
+
+Output:
+ the future is not a straight line but a spiral, and that we are all just notes in a song we haven’t finished composing.
+Journal entry, 1962: The moon is a silent promise. Tonight, the stars seem closer, as if they’re waiting for us to catch up. I sk
+YT_METRICS={"mode": "tt_demo", "model": "Qwen/Qwen3.5-35B-A3B", "system": "n150", "mesh_shape": [1, 1], "prompt_tokens": 54, "generated_tokens": 64, "ttft_ms": 5382.3705250397325, "decode_tps_u": 3.8546857605168765, "decode_tokens": 62, "max_seq_len": 4096}
diff --git a/models/Qwen/Qwen3.5-35B-A3B/n150/eval.log b/models/Qwen/Qwen3.5-35B-A3B/n150/eval.log
@@ -0,0 +1,19 @@
+# eval.log — models/Qwen/Qwen3.5-35B-A3B/n150
+
+## Baseline (functional)
+Command:
+PYTHONPATH=/tmp/transformers520_custom:$PYTHONPATH TTNN_TRANSFORMERS_PYTHONPATH=/tmp/transformers520_custom HF_HOME=/localdev/moconnor/hf-cache HF_HUB_DISABLE_PROGRESS_BARS=1 TT_METAL_CACHE=/tmp/tt-metal-cache TT_METAL_RUNTIME_ROOT=/proj_sw/user_dev/moconnor/tt-metal python -u models/Qwen/Qwen3.5-35B-A3B/n150/run_eval_bf16.py models/Qwen/Qwen3.5-35B-A3B/n150/model.py --model Qwen/Qwen3.5-35B-A3B --prompt_file prompts/bringup_eval_long.txt --max_new_tokens 100 --max_seq_len 4096
+
+Output (key lines):
+Top-1 accuracy: 97.00% (0.9700)
+Top-5 accuracy: 100.00% (1.0000)
+YT_METRICS={"mode": "tt_eval", "model": "Qwen/Qwen3.5-35B-A3B", "top1": 0.97, "top5": 1.0, "top1_pct": 97.0, "top5_pct": 100.0, "total_tokens": 100, "max_new_tokens": 100, "max_seq_len": 4096}
+
+## Optimized
+Command:
+PYTHONPATH=/tmp/transformers520_custom:$PYTHONPATH TTNN_TRANSFORMERS_PYTHONPATH=/tmp/transformers520_custom HF_HOME=/localdev/moconnor/hf-cache HF_HUB_DISABLE_PROGRESS_BARS=1 TT_METAL_CACHE=/tmp/tt-metal-cache TT_METAL_RUNTIME_ROOT=/proj_sw/user_dev/moconnor/tt-metal python -u models/Qwen/Qwen3.5-35B-A3B/n150/run_eval_bf16.py models/Qwen/Qwen3.5-35B-A3B/n150/model.py --model Qwen/Qwen3.5-35B-A3B --prompt_file prompts/bringup_eval_long.txt --max_new_tokens 100 --max_seq_len 4096
+
+Output (key lines):
+Top-1 accuracy: 96.00% (0.9600)
+Top-5 accuracy: 100.00% (1.0000)
+YT_METRICS={"mode": "tt_eval", "model": "Qwen/Qwen3.5-35B-A3B", "top1": 0.96, "top5": 1.0, "top1_pct": 96.0, "top5_pct": 100.0, "total_tokens": 100, "max_new_tokens": 100, "max_seq_len": 4096}