Your model. Your machine. No cloud. No subscriptions. No limits.
VibeStudio is a native macOS coding-agent desktop app built for Apple Silicon. Drop in your own model, point it at a codebase, and get a full coding agent — with tool use, web search, file read/write, and speculative decoding fast enough to feel instant. Everything runs on-device. Nothing leaves your machine.
Built on Tauri v2 + React, powered by the mlx-mtp inference engine.
| Chat — Codex-style project/thread sidebar | Settings — Local model, 256K context |
|---|---|
![]() |
![]() |
| Decoding — Native MTP · DFlash · Speculative | Serving — OpenAI-compatible local endpoint |
|---|---|
![]() |
![]() |
| VibeStudio | LM Studio | Codex Desktop | Hermes | |
|---|---|---|---|---|
| Fully local, no cloud | ✅ | ✅ | ✗ | ✗ |
| Native MTP speculative decode | ✅ | ✗ | ✗ | ✗ |
| DFlash block-diffusion decode | ✅ | ✗ | ✗ | ✗ |
| Vision (VLM) | ✅ | ✅ | ✗ | ✗ |
| Web search + scrape (local Docker) | ✅ | ✗ | ✅ | ✅ |
| Codex-style project/thread UX | ✅ | ✗ | ✅ | ✗ |
| Serve as OpenAI-compatible API | ✅ | ✅ | ✗ | ✗ |
| 256K context | ✅ | ✅ | ✗ | ✗ |
| Full CLI + MCP control (no GUI needed) | ✅ | partial | ✗ | ✅ |
| Proactive heartbeat + IFTTT triggers | ✅ | ✗ | ✅ | ✅ |
| Editable markdown memory, two-way synced | ✅ | ✗ | ✗ | partial |
VibeStudio's inference engine (mlx-mtp) squeezes maximum throughput out of your Apple Silicon hardware using speculative decoding on top of quantized models. Numbers on Apple M4 Max with osmQwopus-3.6-27B:
| Build | Size | Notes |
|---|---|---|
| MXFP8 (8-bit microscaling) | ~30 GB | MTP head + vision tower fp16; native MTP (accept ~0.77–0.92) — the default, highest fidelity |
| MXFP4 (4-bit microscaling) | ~16 GB | vision tower fp16; ~27.7 tok/s vanilla on M4 Max — fastest absolute tok/s |
osmQwopus-3.6-27B family · Apple M4 Max. MTP acceptance from real-weights validation; a full Coder MXFP8/MXFP4 throughput benchmark is forthcoming.
Apple Silicon is memory-bandwidth-bound, so the smaller MXFP4 build decodes faster in absolute tok/s, while MXFP8 trades some speed for higher fidelity. Both preserve the vision tower and run native MTP speculative decoding.
Quantizers and decoding engines: mlx-mtp.
- Native MTP — drives the model's own embedded multi-token-prediction head as a self-drafter; no external model needed; up to 1.62× faster
- DFlash — block-diffusion external drafter; block size tunable (8 / 16 / 32); drafter path auto-detected from the model config
- Standard + Speculative — vanilla AR and small-draft speculative also available
- Context up to 256K tokens with a live slider (probed from the loaded model)
- Full VLM — attach images to any prompt
- Tool-use loop: web search (SearXNG, self-hosted) + web scrape (Firecrawl, self-hosted) + file read/write + shell
- Per-bubble TTFT, thinking time, tokens/sec shown inline under each response
- Context memory meter — filled / remaining, live
- RAG — any-file knowledge index for the active project
- Codex-style sidebar: projects → chat threads; multiple chats per project
- Hover-copy on every bubble — your messages and the agent's full step transcript copy to the clipboard as plain text
- Each project maps to a local folder; git branch and path shown in the header
- Rename / delete projects from the sidebar (⋯ menu)
- Artifact drawer — file tree slides in from the left, lazy-loads on expand
- Expose any loaded model as a local OpenAI-compatible API endpoint
- Configure port, API key, max concurrent requests, and model identifier
- Decoding strategy (MTP / DFlash / Standard) and block size flow through to the server's env — every client call uses the strategy you selected in Settings
bin/vibedrives every knob: server/model lifecycle, agent runs with live step streaming, steer/queue, memory, RAG, settings, triggers — through the app's control socket when it's running (changes show up live in the GUI), headless when it isn't.vibe status→vibe server start→vibe model load --auto→vibe run "…" -p projvibe mcpexposes the same surface as MCP tools, so Claude Code (or any MCP client) can operate VibeStudio directly — no computer-use needed- runtime.json keeps GUI/CLI/agents agreeing on what's serving; the app adopts a CLI-started server (and vice versa) across restarts
- Settings → Automation holds every knob: heartbeat cadence/active-hours/project, the HEARTBEAT.md checklist editor, the trigger list + add form, recent firings
- Heartbeat: the agent wakes every N minutes (within active hours), follows the
user-editable
~/.vibestudio/HEARTBEAT.mdchecklist, and stays quiet (HEARTBEAT_OK) when nothing needs doing - Triggers: IFTTT-style rules — one-shot / interval / daily at HH:MM / file-glob changed / shell-probe edge — each firing a normal agent run on its project
- Schedule from any chat: "remind me in 20 minutes…", "every morning at 9…",
"when the build log changes…" — the agent's
scheduletool writes the same trigger list (one-shots prune themselves after firing)
- With the Computer use capability on, the agent can see the screen
(screenshots fed to the VLM as images — it's a vision model), control the
mouse & keyboard (JXA/CoreGraphics + System Events, no
cliclick), drive macOS apps & files via AppleScript, and take over Chrome (navigate, run JS in the tab, read/scrape) — all zero-dependency, built on tools that ship with macOS - The agent is OS-aware: macOS version, arch, shell and screen size are injected into its prompt so it uses Mac/BSD idioms
- Gated behind the
computerpermission (off by default — it controls the whole machine), Ask-able (Allow/Deny in chat), and fully audited. Verified live on the local 27B: created & renamed a Desktop file, drove Chrome to a URL and read it, and screenshotted + described the screen
- A separate Browser automation capability (also off by default) drives a real
Chromium via Playwright + browser-use,
pointed at your local model — no cloud LLM.
action: "task"hands it a natural-language goal and it navigates/clicks/reads on its own;action: "run"is a deterministic scripted interaction - Isolated in its own venv so it never perturbs the MLX/mlx-mtp inference stack. Verified live: VibeStudio's agent opened Hacker News and reported the top story, end-to-end on the 27B
- Settings → Privacy & Safety: per-tool permission matrix where Ask actually asks (Allow/Deny buttons in the chat; automations treat Ask as Deny), shell deny-patterns enforced before anything runs, web egress allow/block domain rules, an append-only audit trail of every command/fetch/edit with its decision, and a one-flip automations kill-switch
- All enforced in the agent process itself — the gap every OpenClaw security post-mortem and "sandbox your Hermes" thread keeps pointing at
- Settings → Usage & Budgets: real numbers from the local per-run ledger — runs/steps/tokens/avg tok/s today and this week, automation share, per-project breakdown, recent runs
- Hard budgets the agent enforces every run: memory-injection cap, max loop steps, shell + ask timeouts — context can never silently grow (the 45K-tokens-per-call Hermes failure mode, by design impossible here)
ask_userpauses the run and renders the question in the timeline as choice buttons (+ optional free text); your click flows back into the blocked run and the agent continues with your answer- Works from the terminal too (
vibe runprompts inline;vibe answerfrom anywhere) and over MCP (run_eventsshows the ask, theanswertool replies)
- Facts live in CLAUDE.md / AGENTS.md (global + per-project); the Zvec index is derived, two-way synced by mtime, and rebuildable from the files alone — edit your agent's memory in any editor
- Settings → Memory → Edit files… opens the real md in-app (CLAUDE.md/AGENTS.md tabs, ⌘S to save); saving re-indexes it for recall immediately
vibe memory recallmerges memory entries with the captured run-note archive
See docs/CONTROL.md for the full protocol, knob table, and examples.
┌─────────────────────────────────────────────────────┐
│ macOS native window · Tauri v2 (Rust) │
│ ────────────────────────────────────────────────── │
│ React 18 + Vanilla JS (no bundler) │
│ ────────────────────────────────────────────────── │
│ Python agent backend │
│ ├── mlx-mtp serve (OpenAI-compatible API) │
│ ├── mlx-mtp engine (MTP / DFlash speculative) │
│ ├── SearXNG :8888 (web search, Docker) │
│ └── Firecrawl :3002 (web scrape, Docker) │
└─────────────────────────────────────────────────────┘
| Layer | Tech |
|---|---|
| Shell | Tauri v2 (Rust) — native window, IPC, dialogs |
| Frontend | React 18, Vanilla JS, no bundler |
| Backend | Python 3.12 — agent loop, tools, RAG, memory |
| Inference | mlx-mtp — quantize + native-MTP OpenAI server |
| Web tools | SearXNG + Firecrawl, self-hosted via Docker |
| Storage | JSON on disk — projects, chats, settings, memory |
User prompt
│
▼
Tauri invoke ──► commands.rs ──► python.rs
│
vibestudio_sidecar.py
│
┌───────────────────┴───────────────────┐
│ agent loop │
│ ┌─────────────────────────────────┐ │
│ │ web_search ────────────────────┼──┼──► SearXNG :8888
│ │ web_scrape ────────────────────┼──┼──► Firecrawl :3002
│ │ read_file / write_file │ │
│ └─────────────────────────────────┘ │
│ │ │
│ mlx-mtp serve ──────────────────┼──► MLX model on device
│ (SSE streaming) │ MTP | DFlash | Standard
└───────────────────────────────────────┘
│
SSE agent://step events
│
Tauri event bus
│
vibestudio-bridge.js
│
React state update
(bubble grows live, metrics appended on done)
- macOS · Apple Silicon (M1 or later)
- Python 3.11+ · mlx-mtp (pip; no PyTorch)
- Rust + Cargo (to build from source)
- Docker (optional — SearXNG + Firecrawl web tools)
git clone https://github.com/junainfinity/VibeStudio
cd VibeStudio
# Frontend vendor libs (one-time; React/Babel/xterm via npm)
bash frontend/fetch-vendor.sh
# Python backend — its own venv, no PyTorch
python3 -m venv .venv && source .venv/bin/activate
pip install "git+https://github.com/junainfinity/mlx-mtp"
pip install -r backend/requirements.txt
# macOS app bundle
cargo tauri build
# → src-tauri/target/release/bundle/macos/VibeStudio.app
# Dev mode (hot-reload frontend)
cargo tauri devOpen Settings → Model after first launch and set the Python interpreter to your .venv/bin/python and the models folder to wherever your MLX models live.
cd integrations/searxng && docker compose up -d # web search → :8888
cd integrations/firecrawl && docker compose up -d # web scrape → :3002osmQwopus-3.6-27B-Coder — a 27B Qwen3.6-based uncensored coding VLM, quantized with mlx-mtp's vision- and MTP-preserving quantizer.
- MXFP8 (~30 GB): 8-bit microscaling, MTP head + vision tower in fp16 → native MTP (default)
- MXFP4 (~16 GB): 4-bit microscaling, vision tower in fp16 → half the size, faster absolute tok/s
Quantize your own: mlx-mtp.
- mlx-mtp — the inference engine: pure-MLX MXFP4/MXFP8 quantizers (vision + MTP preserved), native-MTP speculative decoding, and an OpenAI-compatible server
Apache-2.0



