[Bug]: Jetsam SIGKILL (OOM) on M2 Max 32GB when loading 2-bit Qwen3.5-397B (Wired Memory Limit)

### Describe the bug
I am consistently getting a `zsh: killed` (SIGKILL by macOS Jetsam) when trying to run the `Qwen3.5-397B-A17B` model, even after successfully repacking the experts to 2-bit. 

It seems that on the M2 architecture (which lacks hardware Dynamic Caching present in M3/M4), the strict `wired_memory` limits for the Metal GPU buffers are being hit, causing Jetsam to instantly kill the process to prevent a kernel panic.

### Hardware & OS
* **Device:** Mac Studio (Apple M2 Max)
* **Unified Memory (RAM):** 32 GB
* **OS:** macOS

### Steps to reproduce
1. Downloaded `mlx-community/Qwen3.5-397B-A17B-4bit`.
2. Repacked experts to 2-bit using `python repack_experts_2bit.py` (Completed successfully).
3. Ran the inference engine: `./infer --serve 8000 --2bit --k 1`

### Debugging & Logs
I wrote a background script to monitor the exact RSS memory footprint just before the crash:

```bash
./infer --serve 8000 --2bit --k 1 &
PID=$!
while kill -0 $PID 2>/dev/null; do
    ps -o rss= -p $PID | awk '{print "Memory: " $1/1024 " MB"}'
    sleep 0.1
done

Output:

[metal] Device: Apple M2 Max
[metal] Shader compile: 1 ms
[metal] GPU attention buffers: 15 KV caches (16.8 MB each), scores buf 134.2 MB
[metal] Delta-net GPU buffers: 45 layers (195.4 MB state + 0.2 MB scratch)
[metal] Inference pipelines ready (multi-expert[8] + shared buffers allocated)
=== Qwen3.5-397B-A17B Metal Inference Engine ===
...
Quant:    2-bit experts (3932160 bytes each)
...
[weights] mmap'd 5.52 GB from model_weights.bin
[metal] Weight file wrapped as Metal buffer (5.52 GB)
...
Memory: 22834.4 MB
Memory: 23249.4 MB
Memory: 23535.3 MB
Memory: 23746.4 MB
Memory: 0 MB
[1]  + killed     ./infer --serve 8000 --2bit --k 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Jetsam SIGKILL (OOM) on M2 Max 32GB when loading 2-bit Qwen3.5-397B (Wired Memory Limit) #2

Describe the bug

Hardware & OS

Steps to reproduce

Debugging & Logs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug]: Jetsam SIGKILL (OOM) on M2 Max 32GB when loading 2-bit Qwen3.5-397B (Wired Memory Limit) #2

Description

Describe the bug

Hardware & OS

Steps to reproduce

Debugging & Logs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions