Skip to content

[Bug]: Jetsam SIGKILL (OOM) on M2 Max 32GB when loading 2-bit Qwen3.5-397B (Wired Memory Limit) #2

@wojtas999

Description

@wojtas999

Describe the bug

I am consistently getting a zsh: killed (SIGKILL by macOS Jetsam) when trying to run the Qwen3.5-397B-A17B model, even after successfully repacking the experts to 2-bit.

It seems that on the M2 architecture (which lacks hardware Dynamic Caching present in M3/M4), the strict wired_memory limits for the Metal GPU buffers are being hit, causing Jetsam to instantly kill the process to prevent a kernel panic.

Hardware & OS

  • Device: Mac Studio (Apple M2 Max)
  • Unified Memory (RAM): 32 GB
  • OS: macOS

Steps to reproduce

  1. Downloaded mlx-community/Qwen3.5-397B-A17B-4bit.
  2. Repacked experts to 2-bit using python repack_experts_2bit.py (Completed successfully).
  3. Ran the inference engine: ./infer --serve 8000 --2bit --k 1

Debugging & Logs

I wrote a background script to monitor the exact RSS memory footprint just before the crash:

./infer --serve 8000 --2bit --k 1 &
PID=$!
while kill -0 $PID 2>/dev/null; do
    ps -o rss= -p $PID | awk '{print "Memory: " $1/1024 " MB"}'
    sleep 0.1
done

Output:

[metal] Device: Apple M2 Max
[metal] Shader compile: 1 ms
[metal] GPU attention buffers: 15 KV caches (16.8 MB each), scores buf 134.2 MB
[metal] Delta-net GPU buffers: 45 layers (195.4 MB state + 0.2 MB scratch)
[metal] Inference pipelines ready (multi-expert[8] + shared buffers allocated)
=== Qwen3.5-397B-A17B Metal Inference Engine ===
...
Quant:    2-bit experts (3932160 bytes each)
...
[weights] mmap'd 5.52 GB from model_weights.bin
[metal] Weight file wrapped as Metal buffer (5.52 GB)
...
Memory: 22834.4 MB
Memory: 23249.4 MB
Memory: 23535.3 MB
Memory: 23746.4 MB
Memory: 0 MB
[1]  + killed     ./infer --serve 8000 --2bit --k 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions