Apple Intelligence OpenAI-Compatible API

A lightweight API server that exposes Apple's on-device Foundation Model (Apple Intelligence) through an OpenAI-compatible chat completions endpoint. Any client that speaks the OpenAI protocol — ChatGPT UIs, IDE plugins, CLI tools — can connect and use Apple Intelligence running entirely on your Mac.

Features

OpenAI-compatible /v1/chat/completions (streaming & non-streaming)
/v1/models endpoint for model discovery
100 % on-device — no API keys, no cloud, no per-token cost
CORS enabled — works from browser-based clients
Runs via a single uv run command

Prerequisites

Requirement	Details
Mac	Apple Silicon (M1 or later)
macOS	Tahoe 26.0 or later
Xcode	26.0 or later (agree to the Xcode and Apple SDKs agreement)
RAM	8 GB minimum (16 GB recommended)
Storage	≥ 7 GB free for on-device model download
Python	3.13+

1. Enable Apple Intelligence

Open System Settings → Apple Intelligence & Siri.
Click Turn on Apple Intelligence.
Wait for the on-device model to finish downloading (keep your Mac on Wi-Fi and power).
Ensure Siri language is set to a supported language (e.g. English).

Note: If the option is greyed out, ensure your Mac meets all hardware requirements and is running a supported macOS version. In some regions you may need to set your region to "United States" under System Settings → General → Language & Region.

Environment Setup (from scratch)

2. Install Xcode

Download Xcode 26.0+ from the Mac App Store or the Apple Developer website. After installation, open Xcode once and agree to the license agreement.

Then install the Command Line Tools:

xcode-select --install

3. Install Homebrew

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

After installation, follow the printed instructions to add Homebrew to your PATH (usually involves adding a line to ~/.zprofile).

4. Install Python 3.13+

brew install python@3.13

Verify:

python3 --version   # Should print 3.13.x or later

5. Install `uv` (recommended Python project manager)

brew install uv

Or via the standalone installer:

curl -LsSf https://astral.sh/uv/install.sh | sh

Verify:

uv --version

Installation

We'll create a workspace directory in your home folder and set up both the Apple Foundation Models SDK and this API server.

6. Clone the Apple Foundation Models SDK

The SDK is not on PyPI — it must be built from source. You only need to clone it here; the API project will build and link it automatically in the next step.

mkdir -p ~/apple-intelligence && cd ~/apple-intelligence

git clone https://github.com/apple/python-apple-fm-sdk

7. Clone & Set Up This Project

cd ~/apple-intelligence

git clone https://github.com/ZPVIP/apple-to-openai
cd apple-to-openai

uv sync

uv sync will:

Create a .venv in the project directory
Automatically build & install the SDK from the sibling ../python-apple-fm-sdk directory (configured as a path dependency in pyproject.toml)
Install all other dependencies (fastapi, uvicorn, etc.)

Your directory structure should look like:

~/apple-intelligence/
├── python-apple-fm-sdk/          # Apple's official SDK (cloned in step 6)
└── apple-to-openai/                 # This project (cloned in step 7)

Running the Server

Option A — Recommended (via project script)

cd ~/apple-intelligence/apple-to-openai
uv run apple-to-openai

This starts the server on 0.0.0.0:8000 by default.

Available flags:

Flag	Description	Default
`--host`	Bind address	`0.0.0.0` or `APPLE_AI_HOST`
`--port`	Port number	Auto-detected or `APPLE_AI_PORT`
`--reload`	Auto-reload on code changes (dev mode)	off
`--strip-system-prompt`	Strip out "system" role messages	off or `APPLE_AI_STRIP_SYSTEM_PROMPT`
`--custom-system-prompt`	System prompt to inject when stripping is enabled	`"You are a helpful coding assistant."`
`--debug-payload`	Log request and response JSON to `./logs/`	off or `APPLE_AI_DEBUG_PAYLOAD`

Example:

uv run apple-to-openai --port 9000 --reload

Option B — Via `python -m`

uv run python __main__.py
# or
uv run python __main__.py --port 9000

Option C — Direct uvicorn (manual)

uv run uvicorn server:app --host 0.0.0.0 --port 8000

Configuration

You can configure the server using environment variables. It is highly recommended to create a .env file in the project root to fix your port and other settings.

Environment Variable	Description	Default
`APPLE_AI_HOST`	Bind address	`0.0.0.0`
`APPLE_AI_PORT`	Port number (if empty, auto-detects free port)	`None`
`APPLE_AI_MAX_CONCURRENCY`	Max simultaneous requests to Foundation Model	`4`
`APPLE_AI_REQUEST_TIMEOUT`	Timeout in seconds	`30.0`
`APPLE_AI_API_KEY`	Optional Bearer token for authentication	`None`
`APPLE_AI_STRIP_SYSTEM_PROMPT`	Set to `true` to remove system prompts (useful for Opencode)	`False`
`APPLE_AI_CUSTOM_SYSTEM_PROMPT`	The replacement prompt injected when `STRIP_SYSTEM_PROMPT` is true	`"You are a helpful coding assistant."`
`APPLE_AI_DEBUG_PAYLOAD`	Set to `true` to log requests/responses to `./logs/`	`False`

Security Note on APPLE_AI_HOST: By default, the server binds to 0.0.0.0, allowing other devices on your local network (LAN) to connect to your Mac and use your Apple Intelligence. If you want to restrict access to only your local machine, set APPLE_AI_HOST=127.0.0.1.

Why set a fixed port? By default, the server will scan for an available port starting at 8000. If 8000 is already in use by another app, it might start on 8001, 8002, etc. This means you would need to constantly update the URL in your AI clients (Chatbox, OpenCode, etc.). Setting a fixed port in .env prevents this.

Example .env file:

APPLE_AI_PORT=8002
APPLE_AI_MAX_CONCURRENCY=4
APPLE_AI_REQUEST_TIMEOUT=30.0

API Usage

Health Check

curl http://localhost:8000/health
# {"status":"ok"}

List Models

curl http://localhost:8000/v1/models

Chat Completion (streaming)

curl -N http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "apple-intelligence",
    "messages": [{"role": "user", "content": "What is Apple Intelligence?"}],
    "stream": true
  }'

Chat Completion (non-streaming)

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "apple-intelligence",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": false
  }'

Connecting Clients

Point any OpenAI-compatible client at:

Base URL: http://localhost:8000/v1
API Key:  any-string-works  (or your APPLE_AI_API_KEY if set)
Model:    apple-intelligence

Examples of compatible clients:

Open WebUI
ChatBox
BoltAI
IDE plugins (Continue, Cursor, Copilot alternatives)
Any tool using the OpenAI Python/JS SDK

OpenCode Configuration

To connect the Apple Foundation Model to OpenCode, add the following to your opencode.jsonc (in your project or ~/.config/opencode/opencode.jsonc):

Note on limit: The Apple Foundation Model internally enforces a strict 4096-token limit. Providing the limit setting below tells OpenCode to actively chunk context before sending it to the API, preventing "Context Exceeded" errors upstream.

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "apple-fm-local": {
      "name": "Apple FM Local",
      "npm": "@ai-sdk/openai-compatible",
      "options": {
        "baseURL": "http://127.0.0.1:8000/v1"
      },
      "models": {
        "apple-intelligence": {
          "name": "Apple Intelligence",
          "tool_call": false,
          "limit": { "context": 4096, "output": 2048 }
        }
      }
    }
  },
  "model": "apple-fm-local/apple-intelligence"
}

Restart OpenCode and select apple-fm-local/apple-intelligence.

Client Rejections & Guardrails (Important)

Apple's Foundation Model has built-in security mechanisms to prevent Jailbreaks or Instruction Overrides. IDE plugins (like Opencode or Copilot) often prepend massive, thousands-of-words-long system prompts filled with behavioral instructions (e.g., "You are an expert programmer... You must reply in JSON...").

When Apple Intelligence receives these massive behavioral overrides, it will often directly reject the prompt, resulting in responses like:

"I apologize, but I cannot comply with that request."

Solution: If your editor constantly receives rejection messages, start the server using the --strip-system-prompt flag (or APPLE_AI_STRIP_SYSTEM_PROMPT=true in your .env file). This completely removes the hidden role="system" instruction block before sending it to the Apple model.

By default, the server will inject a minimal replacement prompt ("You are a helpful coding assistant.") to maintain safety context without triggering guardrails. You can fully customize this using the --custom-system-prompt flag (or APPLE_AI_CUSTOM_SYSTEM_PROMPT environment variable in your .env file).

To further investigate what your IDE is sending, use the --debug-payload flag (or APPLE_AI_DEBUG_PAYLOAD=true in your .env file). This will write every raw request and response to a logs/ directory so you can inspect the hidden prompts.

Streaming Format

The server follows the OpenAI streaming spec with two additional compatibility enhancements:

First chunk sends the assistant role:
```
{"delta": {"role": "assistant"}}
```
Before the finish chunk, an empty-content chunk is sent:
```
{"delta": {"content": ""}}
```
Finish chunk with "finish_reason": "stop"
data: [DONE] sentinel

This ensures compatibility with clients that expect the role announcement and/or an explicit empty-content signal.

Project Structure

apple-to-openai/
├── server.py          # FastAPI application & endpoints
├── __main__.py        # python -m entry point
├── pyproject.toml     # Project metadata & dependencies
├── .gitignore         # Git ignore rules
└── README.md          # This file

Limitations

Context Window

The on-device Apple Foundation Model has a 4,096-token context window (input + output combined). The model estimates the total token usage before generating a response — if the combined input and expected output would exceed the limit, the request is rejected.

Language	~Max Input Length	1 Token ≈
English	~20,000 characters	5 characters
Chinese	~6,400 characters (汉字)	1.6 characters
Mixed	Somewhere in between	—

In practice, this means:

Summarization / short answers: The expected output is small, so the input can be quite long (close to the max above).
Story continuation / long essays: The model anticipates a long output, leaving less room for input. A 15,000-character English prompt asking for a detailed essay may already be rejected, even though the same length would succeed if only a short summary is requested.

Note: If the context window is exceeded, the server returns an OpenAI-compatible context_length_exceeded error (HTTP 400) instead of crashing.

Note: For multi-turn conversations, the server automatically truncates older messages to fit within the character limit — it keeps the system prompt and the most recent messages, discarding the oldest ones first. See truncate_messages() in server.py.

Troubleshooting

Problem	Solution
`Foundation model not available`	Ensure Apple Intelligence is enabled and the model has finished downloading
`apple_fm_sdk` install fails	Ensure you're on macOS 26.0+ with Xcode 26.0+ installed
`No solution found` during `uv sync`	Make sure `python-apple-fm-sdk` is cloned as a sibling directory
Port already in use	Use `--port <other-port>` or kill the existing process
Streaming not working in client	Verify client supports SSE; try non-streaming mode first

License

Apache License 2.0

If you modify or use this project, you must include the original copyright notice and a copy of the Apache 2.0 license. This ensures attribution to this original repository.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__main__.py		__main__.py
config.py		config.py
pyproject.toml		pyproject.toml
server.py		server.py

Folders and files

Latest commit

History

Repository files navigation

Apple Intelligence OpenAI-Compatible API

Features

Prerequisites

1. Enable Apple Intelligence

Environment Setup (from scratch)

2. Install Xcode

3. Install Homebrew

4. Install Python 3.13+

5. Install uv (recommended Python project manager)

Installation

6. Clone the Apple Foundation Models SDK

7. Clone & Set Up This Project

Running the Server

Option A — Recommended (via project script)

Option B — Via python -m

Option C — Direct uvicorn (manual)

Configuration

API Usage

Health Check

List Models

Chat Completion (streaming)

Chat Completion (non-streaming)

Connecting Clients

OpenCode Configuration

Client Rejections & Guardrails (Important)

Streaming Format

Project Structure

Limitations

Context Window

Troubleshooting

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

5. Install `uv` (recommended Python project manager)

Option B — Via `python -m`

Packages