A lightweight API server that exposes Apple's on-device Foundation Model (Apple Intelligence) through an OpenAI-compatible chat completions endpoint. Any client that speaks the OpenAI protocol — ChatGPT UIs, IDE plugins, CLI tools — can connect and use Apple Intelligence running entirely on your Mac.
- OpenAI-compatible
/v1/chat/completions(streaming & non-streaming) /v1/modelsendpoint for model discovery- 100 % on-device — no API keys, no cloud, no per-token cost
- CORS enabled — works from browser-based clients
- Runs via a single
uv runcommand
| Requirement | Details |
|---|---|
| Mac | Apple Silicon (M1 or later) |
| macOS | Tahoe 26.0 or later |
| Xcode | 26.0 or later (agree to the Xcode and Apple SDKs agreement) |
| RAM | 8 GB minimum (16 GB recommended) |
| Storage | ≥ 7 GB free for on-device model download |
| Python | 3.13+ |
- Open System Settings → Apple Intelligence & Siri.
- Click Turn on Apple Intelligence.
- Wait for the on-device model to finish downloading (keep your Mac on Wi-Fi and power).
- Ensure Siri language is set to a supported language (e.g. English).
Note: If the option is greyed out, ensure your Mac meets all hardware requirements and is running a supported macOS version. In some regions you may need to set your region to "United States" under System Settings → General → Language & Region.
Download Xcode 26.0+ from the Mac App Store or the Apple Developer website. After installation, open Xcode once and agree to the license agreement.
Then install the Command Line Tools:
xcode-select --install/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"After installation, follow the printed instructions to add Homebrew to your PATH (usually involves adding a line to ~/.zprofile).
brew install python@3.13Verify:
python3 --version # Should print 3.13.x or laterbrew install uvOr via the standalone installer:
curl -LsSf https://astral.sh/uv/install.sh | shVerify:
uv --versionWe'll create a workspace directory in your home folder and set up both the Apple Foundation Models SDK and this API server.
The SDK is not on PyPI — it must be built from source. You only need to clone it here; the API project will build and link it automatically in the next step.
mkdir -p ~/apple-intelligence && cd ~/apple-intelligence
git clone https://github.com/apple/python-apple-fm-sdkcd ~/apple-intelligence
git clone https://github.com/ZPVIP/apple-to-openai
cd apple-to-openai
uv syncuv sync will:
- Create a
.venvin the project directory - Automatically build & install the SDK from the sibling
../python-apple-fm-sdkdirectory (configured as a path dependency inpyproject.toml) - Install all other dependencies (
fastapi,uvicorn, etc.)
Your directory structure should look like:
~/apple-intelligence/
├── python-apple-fm-sdk/ # Apple's official SDK (cloned in step 6)
└── apple-to-openai/ # This project (cloned in step 7)
cd ~/apple-intelligence/apple-to-openai
uv run apple-to-openaiThis starts the server on 0.0.0.0:8000 by default.
Available flags:
| Flag | Description | Default |
|---|---|---|
--host |
Bind address | 0.0.0.0 or APPLE_AI_HOST |
--port |
Port number | Auto-detected or APPLE_AI_PORT |
--reload |
Auto-reload on code changes (dev mode) | off |
--strip-system-prompt |
Strip out "system" role messages | off or APPLE_AI_STRIP_SYSTEM_PROMPT |
--custom-system-prompt |
System prompt to inject when stripping is enabled | "You are a helpful coding assistant." |
--debug-payload |
Log request and response JSON to ./logs/ |
off or APPLE_AI_DEBUG_PAYLOAD |
Example:
uv run apple-to-openai --port 9000 --reloaduv run python __main__.py
# or
uv run python __main__.py --port 9000uv run uvicorn server:app --host 0.0.0.0 --port 8000You can configure the server using environment variables. It is highly recommended to create a .env file in the project root to fix your port and other settings.
| Environment Variable | Description | Default |
|---|---|---|
APPLE_AI_HOST |
Bind address | 0.0.0.0 |
APPLE_AI_PORT |
Port number (if empty, auto-detects free port) | None |
APPLE_AI_MAX_CONCURRENCY |
Max simultaneous requests to Foundation Model | 4 |
APPLE_AI_REQUEST_TIMEOUT |
Timeout in seconds | 30.0 |
APPLE_AI_API_KEY |
Optional Bearer token for authentication | None |
APPLE_AI_STRIP_SYSTEM_PROMPT |
Set to true to remove system prompts (useful for Opencode) |
False |
APPLE_AI_CUSTOM_SYSTEM_PROMPT |
The replacement prompt injected when STRIP_SYSTEM_PROMPT is true |
"You are a helpful coding assistant." |
APPLE_AI_DEBUG_PAYLOAD |
Set to true to log requests/responses to ./logs/ |
False |
Security Note on APPLE_AI_HOST:
By default, the server binds to 0.0.0.0, allowing other devices on your local network (LAN) to connect to your Mac and use your Apple Intelligence. If you want to restrict access to only your local machine, set APPLE_AI_HOST=127.0.0.1.
Why set a fixed port?
By default, the server will scan for an available port starting at 8000. If 8000 is already in use by another app, it might start on 8001, 8002, etc. This means you would need to constantly update the URL in your AI clients (Chatbox, OpenCode, etc.). Setting a fixed port in .env prevents this.
Example .env file:
APPLE_AI_PORT=8002
APPLE_AI_MAX_CONCURRENCY=4
APPLE_AI_REQUEST_TIMEOUT=30.0curl http://localhost:8000/health
# {"status":"ok"}curl http://localhost:8000/v1/modelscurl -N http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "apple-intelligence",
"messages": [{"role": "user", "content": "What is Apple Intelligence?"}],
"stream": true
}'curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "apple-intelligence",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": false
}'Point any OpenAI-compatible client at:
Base URL: http://localhost:8000/v1
API Key: any-string-works (or your APPLE_AI_API_KEY if set)
Model: apple-intelligence
Examples of compatible clients:
- Open WebUI
- ChatBox
- BoltAI
- IDE plugins (Continue, Cursor, Copilot alternatives)
- Any tool using the OpenAI Python/JS SDK
To connect the Apple Foundation Model to OpenCode, add the following to your opencode.jsonc (in your project or ~/.config/opencode/opencode.jsonc):
Note on
limit: The Apple Foundation Model internally enforces a strict 4096-token limit. Providing thelimitsetting below tells OpenCode to actively chunk context before sending it to the API, preventing "Context Exceeded" errors upstream.
Restart OpenCode and select apple-fm-local/apple-intelligence.
Apple's Foundation Model has built-in security mechanisms to prevent Jailbreaks or Instruction Overrides. IDE plugins (like Opencode or Copilot) often prepend massive, thousands-of-words-long system prompts filled with behavioral instructions (e.g., "You are an expert programmer... You must reply in JSON...").
When Apple Intelligence receives these massive behavioral overrides, it will often directly reject the prompt, resulting in responses like:
"I apologize, but I cannot comply with that request."
Solution: If your editor constantly receives rejection messages, start the server using the --strip-system-prompt flag (or APPLE_AI_STRIP_SYSTEM_PROMPT=true in your .env file). This completely removes the hidden role="system" instruction block before sending it to the Apple model.
By default, the server will inject a minimal replacement prompt ("You are a helpful coding assistant.") to maintain safety context without triggering guardrails. You can fully customize this using the --custom-system-prompt flag (or APPLE_AI_CUSTOM_SYSTEM_PROMPT environment variable in your .env file).
To further investigate what your IDE is sending, use the --debug-payload flag (or APPLE_AI_DEBUG_PAYLOAD=true in your .env file). This will write every raw request and response to a logs/ directory so you can inspect the hidden prompts.
The server follows the OpenAI streaming spec with two additional compatibility enhancements:
- First chunk sends the assistant role:
{"delta": {"role": "assistant"}} - Before the finish chunk, an empty-content chunk is sent:
{"delta": {"content": ""}} - Finish chunk with
"finish_reason": "stop" data: [DONE]sentinel
This ensures compatibility with clients that expect the role announcement and/or an explicit empty-content signal.
apple-to-openai/
├── server.py # FastAPI application & endpoints
├── __main__.py # python -m entry point
├── pyproject.toml # Project metadata & dependencies
├── .gitignore # Git ignore rules
└── README.md # This file
The on-device Apple Foundation Model has a 4,096-token context window (input + output combined). The model estimates the total token usage before generating a response — if the combined input and expected output would exceed the limit, the request is rejected.
| Language | ~Max Input Length | 1 Token ≈ |
|---|---|---|
| English | ~20,000 characters | 5 characters |
| Chinese | ~6,400 characters (汉字) | 1.6 characters |
| Mixed | Somewhere in between | — |
In practice, this means:
- Summarization / short answers: The expected output is small, so the input can be quite long (close to the max above).
- Story continuation / long essays: The model anticipates a long output, leaving less room for input. A 15,000-character English prompt asking for a detailed essay may already be rejected, even though the same length would succeed if only a short summary is requested.
Note: If the context window is exceeded, the server returns an OpenAI-compatible
context_length_exceedederror (HTTP 400) instead of crashing.
Note: For multi-turn conversations, the server automatically truncates older messages to fit within the character limit — it keeps the system prompt and the most recent messages, discarding the oldest ones first. See
truncate_messages()inserver.py.
| Problem | Solution |
|---|---|
Foundation model not available |
Ensure Apple Intelligence is enabled and the model has finished downloading |
apple_fm_sdk install fails |
Ensure you're on macOS 26.0+ with Xcode 26.0+ installed |
No solution found during uv sync |
Make sure python-apple-fm-sdk is cloned as a sibling directory |
| Port already in use | Use --port <other-port> or kill the existing process |
| Streaming not working in client | Verify client supports SSE; try non-streaming mode first |
If you modify or use this project, you must include the original copyright notice and a copy of the Apache 2.0 license. This ensures attribution to this original repository.
{ "$schema": "https://opencode.ai/config.json", "provider": { "apple-fm-local": { "name": "Apple FM Local", "npm": "@ai-sdk/openai-compatible", "options": { "baseURL": "http://127.0.0.1:8000/v1" }, "models": { "apple-intelligence": { "name": "Apple Intelligence", "tool_call": false, "limit": { "context": 4096, "output": 2048 } } } } }, "model": "apple-fm-local/apple-intelligence" }