Skip to content

Official iOS static-shape decode path crashes at runtime on the macOS 27 / iOS 27 beta — MPSGraph can't lower the data-indexed KV-cache slice_update #5

@john-rocky

Description

@john-rocky

Summary

Following the documented iOS static-shape decode export (export/ios.py static buckets + KVCacheHandler
fixed-capacity KV state + CoreAIStaticShapeEngine, with the per-step in_step write), the exported decode
core converts fine but SIGTRAPs / SIGSEGVs at the first execute on the WWDC26 betas. The crash is in
the Core AI runtime / MPSGraph backend and reproduces on Mac GPU, iPhone GPU, and iPhone ANE.

I isolated it to one thing: the slice_update that writes the new KV column uses a runtime-tensor begin
index
(in_step). The same graph with a shape-symint begin index (the KVCache.update_and_fetch
dynamic path) lowers and runs. Reporting here because this breaks the repo's official on-device LLM recipe;
also filed with Apple Feedback (FB23024751) since the root cause looks like an MPSGraph lowering limit.

Environment

  • macOS 27.0 (26A5353q), Apple Silicon (Mac Studio)
  • Xcode 27.0 (27A5194q), iOS SDK 27.0; iPhone 17 Pro on iOS 27 beta
  • coreai-torch 0.4.0, coreai-core 1.0.0b1, coreai-models @ b1cb71b

Steps to reproduce

A single attention block built from coreai_models.primitives (official KVCache write + composite
SDPA), exported three ways that differ in only the KV-write column index. Full runnable script:
https://gist.github.com/john-rocky/1fd6add76b3d5393ebc44fac52ce6b27. The decisive line:

# write one new KV column at decode position p:  cache[:, :, :, p:p+1, :] = k_new   (slice_update)
# (a) begin index from a SHAPE symint   — the update_and_fetch path
p = position_ids.shape[-1] - query_len    # symint
# (b) begin index from a RUNTIME TENSOR  — the static / in_step path
p = in_step                               # int32 scalar input

Export each variant, load on the GPU delegate, run one forward at read-length B = 512 (each in its own
process).

Observed

begin index shapes macOS 27 Mac GPU
shape symint dynamic runs, finite output (exit 0)
runtime tensor dynamic SIGTRAP, exit 133
runtime tensor static SIGTRAP, exit 133
  • Mac GPU: EXC_BREAKPOINT (SIGTRAP, code 5); faulting-thread top frames are all CoreAIRuntime
    _coreai_runtime_os.cpython-311-darwin.so (at execute).
  • iPhone GPU: SIGSEGV at the first execute (loads + specializes first).
  • iPhone ANE: MPSGraphExecutable.mmoptimizeOriginalModule → "MLIR pass manager failed" (SIGABRT).

Real artifact (not just the minimal block): the stock Gemma-4 E2B iOS static decode core
(set_static_shape_config, in_step write) SIGTRAPs identically on the Mac GPU.

Expected

slice_update with a runtime-tensor begin lowers and executes on MPSGraph (GPU + ANE), exactly as the
shape-symint form does. As-is, the documented fixed-shape / ANE path is unusable on the beta and only the
slower re-specializing dynamic path (recompiles per sequence length) runs.

Workaround (and it localizes the bug)

Drop the Core AI state + indexed write: keep KV as plain model I/O, append the new column with
torch.cat, and have the host write it back between steps — so there is no in-graph indexed write at
all (only cat + masked SDPA over plain inputs). Numerically identical (8/8 top-1 vs Hugging Face), and
it runs on Mac GPU, iPhone GPU (full model), and iPhone ANE (chunked). That a cat-append works while the
indexed slice_update does not points specifically at the data-indexed slice-update lowering.

Notes

  • Decisive pair = symint-dyn (runs) vs tensor-dyn (crashes): identical module, identical dynamic Dim,
    only the begin-index source differs.
  • Model-agnostic — every model shares KVCache.update_and_fetch. Confirmed the official gemma3 and
    qwen3 dynamic (symint) cores run + re-specialize.
  • Happy to attach the crash .ips, the full repro script, and the official-model dynamic-runs counterpart.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions