Fix portability: runtime paths, missing setup scripts, vocab format bug by msitarzewski · Pull Request #1 · danveloper/flash-moe

msitarzewski · 2026-03-19T19:20:55Z

Summary

Runtime model path resolution — replaced hardcoded /Users/danielwoods/... paths with $HOME-based auto-discovery of the HuggingFace snapshot directory. Works for any user on any machine.
Added generate_expert_index.py — the missing script that produces expert_index.json from safetensors headers. repack_experts.py requires this file but the repo had no way to generate it.
Added export_vocab.py — generates vocab.bin in the simple decode format that infer.m's load_vocab() expects. The existing export_tokenizer.py produces a BPE tokenizer format (magic bytes "BPET") which load_vocab() misreads as num_entries=1,112,888,660, causing an instant OOM kill on startup.
Disabled expert file mmap — 120GB of mmap reservations for the layer files can trigger OOM kills on systems with memory pressure. The engine already falls back to pread() when mmap is unavailable, which is the primary I/O path anyway.

Testing

Tested end-to-end on Apple M5 Max (128GB RAM, 8TB SSD):

generate_expert_index.py correctly produced 540 entries (60 layers × 9 components)
export_vocab.py produced 248,077 vocab entries
./infer --prompt "hi" --tokens 10 --2bit → 15.66 tok/s (vs 5.55 on the original M3 Max)
./infer --serve 6601 --2bit + ./chat --port 6601 → interactive chat working

Test plan

Clean clone, make builds without errors
generate_expert_index.py --model <path> produces valid expert_index.json
export_vocab.py <tokenizer.json> produces working vocab.bin
./infer --prompt "test" --tokens 10 --2bit generates coherent output
Server mode (--serve) + chat client work end-to-end

🤖 Generated with Claude Code

…ripts, fix vocab format - Replace hardcoded /Users/danielwoods paths with runtime $HOME resolution that auto-discovers the HuggingFace snapshot directory - Add generate_expert_index.py: scans safetensors headers to produce expert_index.json, which repack_experts.py requires but had no generator - Add export_vocab.py: generates vocab.bin in the simple decode format that infer.m's load_vocab() expects. The existing export_tokenizer.py produces a BPE format (magic "BPET") which load_vocab() misreads as num_entries=1.1B, causing an OOM kill on startup - Skip mmap for 120GB expert layer files to avoid OOM kills on systems with memory pressure; pread() fallback works fine Tested on M5 Max (128GB) — 15.66 tok/s with 2-bit experts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

0xClandestine · 2026-03-20T21:39:31Z

just demo'd using this branch with some minor changes, thanks

0xClandestine · 2026-03-20T21:53:30Z

metal_infer/export_vocab.py

+            if b:
+                f.write(b)
+
+    import os


Can remove this import.

@0xClandestine

Addresses review feedback from @0xClandestine — the import was used (for os.path.getsize) but placed inline at the bottom of main(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

crnzkhjcm · 2026-03-23T10:29:17Z

metal_infer/infer.m


-#define MODEL_PATH_DEFAULT "/Users/danielwoods/.cache/huggingface/hub/models--mlx-community--Qwen3.5-397B-A17B-4bit/snapshots/39159bd8aa74f5c8446d2b2dc584f62bb51cb0d3"
+// MODEL_PATH_DEFAULT is resolved at runtime via get_default_model_path() below
+#define MODEL_PATH_DEFAULT NULL


We can remove this, since this not used anywhere

0xClandestine reviewed Mar 20, 2026

View reviewed changes

metal_infer/export_vocab.py Outdated

if b:

f.write(b)

import os

Copy link
Copy Markdown

0xClandestine Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove this import.

Move inline import os to top-level imports

d2c9339

Addresses review feedback from @0xClandestine — the import was used (for os.path.getsize) but placed inline at the bottom of main(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ccckblaze mentioned this pull request Mar 23, 2026

Nonsensical output on Apple M4 Pro (Mac Mini 64GB) — 14.5 tok/s but garbage generation #10

Open

crnzkhjcm reviewed Mar 23, 2026

View reviewed changes

scottpeterman mentioned this pull request Mar 28, 2026

I did get it working, with a lot of pain, if your interested here's a readme I had claud crank out capturing the gotchas. #15

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix portability: runtime paths, missing setup scripts, vocab format bug#1

Fix portability: runtime paths, missing setup scripts, vocab format bug#1
msitarzewski wants to merge 2 commits intodanveloper:mainfrom
msitarzewski:fix/portable-setup-and-vocab

msitarzewski commented Mar 19, 2026

Uh oh!

0xClandestine commented Mar 20, 2026

Uh oh!

0xClandestine Mar 20, 2026

Uh oh!

crnzkhjcm Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

msitarzewski commented Mar 19, 2026

Summary

Testing

Test plan

Uh oh!

0xClandestine commented Mar 20, 2026

Uh oh!

0xClandestine Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

crnzkhjcm Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants