Self-hosted, coding-focused LLM stack for an Apple Silicon Mac (M1/M2/M3, 32 GB RAM) that you access from other laptops over Tailscale.
The goal is a private, local "Claude Code / Cursor" replacement with no chat fluff — just IDE autocomplete, inline edits, and agentic coding.
- Ollama — model runtime + OpenAI-compatible REST API (
:11434) - Open WebUI (optional) — browser chat UI on
:3000 - Tailscale — secure remote access from any laptop
- Coder models (pulled by default):
qwen2.5-coder:14b-instruct-q4_K_M— primary chat/agent (recommended)qwen2.5-coder:7b-base-q4_K_M— Tab autocomplete (FIM)deepseek-coder-v2:16b-lite-instruct-q4_K_M— fast MoE alternativeqwen2.5-coder:32b-instruct-q4_K_M— high quality, slow on 32 GBcodestral:22b— strong FIM autocomplete
- Client tools: VS Code + Continue.dev (chat / inline / autocomplete), Cline (agent), Aider (terminal git-aware agent)
[Other laptop] -> VS Code + Continue/Cline \
[Other laptop] -> Browser + Open WebUI \--> [M1 Mac: Ollama :11434]
[Phone/iPad] -> Browser + Open WebUI / (Metal GPU)
(Tailscale VPN)
git clone https://github.com/gokutheengineer/local-llm-coding-server.git
cd local-llm-coding-server
bash setup-llm-server.shToggles:
INSTALL_WEBUI=1 bash setup-llm-server.sh # install/start Open WebUI (Docker)
INSTALL_TAILSCALE=0 bash setup-llm-server.sh # skip Tailscale install
PULL_MODELS=0 bash setup-llm-server.sh # skip model downloadsDefault behavior: Open WebUI/Docker steps are skipped unless you explicitly set
INSTALL_WEBUI=1.
After it finishes, sign into Tailscale (GUI), then note your Tailscale IP:
tailscale ip -4Install Tailscale first and sign in with the same account.
git clone https://github.com/gokutheengineer/local-llm-coding-server.git
cd local-llm-coding-server
SERVER_IP=100.x.y.z bash setup-llm-client.shThis one command will:
- Install VS Code, Tailscale, pipx, Python (mac)
- Install VS Code/Cursor extensions: Continue, Cline when the CLI is available
- Write
~/.continue/config.jsonpointed at your server - Install Aider and create an
aider-locallauncher - Run a real smoke test: client -> Ollama API -> coder model generation
If you only want to install the coding tools without running the generation test:
SERVER_IP=100.x.y.z bash install-coding-tools.shIf you already installed everything and only want to verify the server connection:
SERVER_IP=100.x.y.z bash check-llm-connection.shThat script checks:
http://<server-ip>:11434/api/tagsreturns models- the selected model can generate a tiny Swift function
- the same base URL is ready for Cline, Continue, and Aider
- Continue (VS Code):
Cmd+Lchat,Cmd+Iinline edit, Tab autocomplete - Cline (VS Code): agentic file edits + terminal commands (closest to Claude Code)
- Aider (terminal):
cd repo && aider-local— git-aware multi-file edits - Open WebUI:
http://<tailscale-ip>:3000for chat / RAG / file uploads
For coding automation, start here:
cd /path/to/your/git-repo
aider-localThen type normal requests:
Inspect this repo, find the likely cause of the failing tests, fix only the needed files, and run the relevant test command.
For Cline, use:
Provider: Ollama
Base URL: http://100.x.y.z:11434
Model: qwen2.5-coder:14b-instruct-q4_K_M
ollama list
ollama pull <model>
ollama rm <model>Then update the model field in ~/.continue/config.json on the client,
or pick from the model dropdown inside Cline / Continue.
| Model | Tokens/sec | Use |
|---|---|---|
| 7B | 40-60 | autocomplete |
| 14B | 15-25 | chat / agent |
| 32B | 5-10 | quality but slow |
curl http://<ip>:11434/api/tagsshould return JSON. If not, Tailscale is down orOLLAMA_HOSTis not bound to0.0.0.0.- Run
SERVER_IP=100.x.y.z bash check-llm-connection.shfrom the client for a full API + generation test. launchctl setenv OLLAMA_HOST "0.0.0.0:11434" && brew services restart ollama- Keep the Mac awake:
sudo pmset -a sleep 0 disksleep 0 - Free a stuck model:
ollama stop <model>
MIT