feat: runtime model config from HuggingFace config.json#3
Open
Alexintosh wants to merge 7 commits intodanveloper:mainfrom
Open
feat: runtime model config from HuggingFace config.json#3Alexintosh wants to merge 7 commits intodanveloper:mainfrom
Alexintosh wants to merge 7 commits intodanveloper:mainfrom
Conversation
Spec for replacing ~40 hardcoded #define model constants with a runtime ModelConfig struct populated from HuggingFace config.json, enabling model switching via --model flag without recompilation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add missing arrays (g_lz4_index, g_pred_experts, g_pred_count, stack VLAs), full_attn_interval fallback, thread safety invariant, MODEL_PATH_DEFAULT handling, MAX_BATCH_SLOTS coupling note, and clarify chat.m needs zero changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds ModelConfig struct, compute_expert_offsets(), and load_model_config() that parses HuggingFace config.json + tokenizer.json via NSJSONSerialization. Old #defines still present. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove ~54 model-specific #define constants and replace ~960 occurrences with cfg.* runtime struct fields. Convert 13 static/ stack arrays to dynamic allocation. Parse config.json + tokenizer.json at startup via NSJSONSerialization. Expert byte offsets computed from model dimensions and quantization params. Switching models now requires only --model flag, no recompilation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Generalize file header comment to describe multi-model support. Update startup banner from hardcoded model name to "Flash-MoE" with dynamic config path display. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e models Lists local HF-cached models with compatibility check, searches HuggingFace for compatible Qwen3.5 MoE models (35B-A3B, 122B-A10B, 397B-A17B) with MLX quantization, and supports downloading via huggingface-cli or huggingface_hub. Usage: python model_manager.py # list local + remote python model_manager.py --local # local only python model_manager.py --search # remote only python model_manager.py --download <repo> python model_manager.py --check <path> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add compatible models table, model manager usage instructions, updated quick start with --model flag and FLASH_MOE_MODEL env var, revised project structure, and generalized architecture description. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
#definemodel constants with a runtimeModelConfigstruct populated from HuggingFaceconfig.jsonat startup via NSJSONSerialization. Switch between Qwen3.5 models (35B, 122B, 397B) with just--model <path>— no recompilation needed.model_manager.pyutility to list local compatible models, search HuggingFace for MLX-quantized Qwen3.5 MoE models, download them, and validate compatibility.--modelflag usage, andFLASH_MOE_MODELenv var.What changed in
infer.mModelConfigstruct +load_model_config()parsesconfig.json(architecture, quantization, layer types, RoPE, EOS tokens) andtokenizer.json(think tokens)compute_expert_offsets()derives all expert byte offsets from dimensions + quantization paramsalloc_tracking_arrays()dynamically allocates all tracking arrays (expert freq, cache state, predictions, layer cache) previously sized by compile-time constants#definereferences replaced withcfg.*fields via helper macros (FREQ(),CACHE_SEEN(),PRED_EXPERT(), etc.)MetalCtxbuffer arrays converted from fixed-size to dynamically allocated (__strongARC pointers)Test plan
cd metal_infer && make./infer --model ~/.cache/huggingface/hub/models--mlx-community--Qwen3.5-35B-A3B-4bit --prompt "What is 2+2?" --tokens 20python model_manager.py --localto list cached modelspython model_manager.py --searchto find remote modelsFLASH_MOE_MODELenv var as default model path🤖 Generated with Claude Code