Model request: Gemma 4 E4B (google/gemma-4-E4B-it) on iOS

### Source link

https://huggingface.co/google/gemma-4-E4B-it

### Target platform

iOS

### Use case

Use case:
  We ship a production iOS app with an on-device AI
  agent: tool calling against wallet/market tools, streaming chat, strict privacy
  requirement — portfolio data must never leave the device, so on-device inference
  is the product, not an optimization.

  Today we run Gemma 4 E4B-it (GGUF, imatrix Q4_K_M / Q5_K_M) via llama.cpp with
  Metal on 8–12 GB iPhones (context 4096). It works, but CPU/GPU decoding costs
  battery and thermals on long agent sessions. A Core AI iOS preset for the same
  model would let us migrate as a pure runtime swap — same weights, same prompts,
  same tool-call format — and pick up the ANE static-shape path, framework-managed
  KV cache, and FoundationModels tool calling / guided generation.


### Preferred precision / compression

mixed 4/8-bit (like the qwen3-4b iOS preset), max context 4096

### Additional context

  Additional context:
  - Gemma 4 E4B is the current flagship open on-device model (Apache 2.0,
    agentic/tool-calling focus, native function calling + configurable thinking),
    so iOS support would likely serve many apps beyond ours.
  - The catalog currently has Gemma 3 4B/12B as macOS-only and no Gemma on iOS;
    the iOS LLM presets are Qwen-only. I assume the blocker for the E-series is
    the per-layer embeddings (PLE) tensor — for reference, llama.cpp exports and
    runs it fine on iPhone hardware (the per_layer_token_embd tensor maps as a
    plain weight), and at 4-bit the E4B footprint (~5 GB) fits 8 GB devices.
  - E2B would be a nice-to-have for lower-RAM devices, but E4B is the ask here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model request: Gemma 4 E4B (google/gemma-4-E4B-it) on iOS #20

Source link

Target platform

Use case

Preferred precision / compression

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Model request: Gemma 4 E4B (google/gemma-4-E4B-it) on iOS #20

Description

Source link

Target platform

Use case

Preferred precision / compression

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions