VibeStudio

Your model. Your machine. No cloud. No subscriptions. No limits.

VibeStudio is a native macOS coding-agent desktop app built for Apple Silicon. Drop in your own model, point it at a codebase, and get a full coding agent — with tool use, web search, file read/write, and speculative decoding fast enough to feel instant. Everything runs on-device. Nothing leaves your machine.

Built on Tauri v2 + React, powered by the mlx-mtp inference engine.

Screenshots

Chat — Codex-style project/thread sidebar	Settings — Local model, 256K context

Decoding — Native MTP · DFlash · Speculative	Serving — OpenAI-compatible local endpoint

Why VibeStudio

	VibeStudio	LM Studio	Codex Desktop	Hermes
Fully local, no cloud	✅	✅	✗	✗
Native MTP speculative decode	✅	✗	✗	✗
DFlash block-diffusion decode	✅	✗	✗	✗
Vision (VLM)	✅	✅	✗	✗
Web search + scrape (local Docker)	✅	✗	✅	✅
Codex-style project/thread UX	✅	✗	✅	✗
Serve as OpenAI-compatible API	✅	✅	✗	✗
256K context	✅	✅	✗	✗
Full CLI + MCP control (no GUI needed)	✅	partial	✗	✅
Proactive heartbeat + IFTTT triggers	✅	✗	✅	✅
Editable markdown memory, two-way synced	✅	✗	✗	partial

Inference speed

VibeStudio's inference engine (mlx-mtp) squeezes maximum throughput out of your Apple Silicon hardware using speculative decoding on top of quantized models. Numbers on Apple M4 Max with osmQwopus-3.6-27B:

Build	Size	Notes
MXFP8 (8-bit microscaling)	~30 GB	MTP head + vision tower fp16; native MTP (accept ~0.77–0.92) — the default, highest fidelity
MXFP4 (4-bit microscaling)	~16 GB	vision tower fp16; ~27.7 tok/s vanilla on M4 Max — fastest absolute tok/s

osmQwopus-3.6-27B family · Apple M4 Max. MTP acceptance from real-weights validation; a full Coder MXFP8/MXFP4 throughput benchmark is forthcoming.

Apple Silicon is memory-bandwidth-bound, so the smaller MXFP4 build decodes faster in absolute tok/s, while MXFP8 trades some speed for higher fidelity. Both preserve the vision tower and run native MTP speculative decoding.

Quantizers and decoding engines: mlx-mtp.

Features

Inference

Native MTP — drives the model's own embedded multi-token-prediction head as a self-drafter; no external model needed; up to 1.62× faster
DFlash — block-diffusion external drafter; block size tunable (8 / 16 / 32); drafter path auto-detected from the model config
Standard + Speculative — vanilla AR and small-draft speculative also available
Context up to 256K tokens with a live slider (probed from the loaded model)
Full VLM — attach images to any prompt

Agent

Tool-use loop: web search (SearXNG, self-hosted) + web scrape (Firecrawl, self-hosted) + file read/write + shell
Per-bubble TTFT, thinking time, tokens/sec shown inline under each response
Context memory meter — filled / remaining, live
RAG — any-file knowledge index for the active project

Projects & chats

Codex-style sidebar: projects → chat threads; multiple chats per project
Hover-copy on every bubble — your messages and the agent's full step transcript copy to the clipboard as plain text
Each project maps to a local folder; git branch and path shown in the header
Rename / delete projects from the sidebar (⋯ menu)
Artifact drawer — file tree slides in from the left, lazy-loads on expand

Serving

Expose any loaded model as a local OpenAI-compatible API endpoint
Configure port, API key, max concurrent requests, and model identifier
Decoding strategy (MTP / DFlash / Standard) and block size flow through to the server's env — every client call uses the strategy you selected in Settings

Control without the GUI — `vibe` CLI + MCP

bin/vibe drives every knob: server/model lifecycle, agent runs with live step streaming, steer/queue, memory, RAG, settings, triggers — through the app's control socket when it's running (changes show up live in the GUI), headless when it isn't. vibe status → vibe server start → vibe model load --auto → vibe run "…" -p proj
vibe mcp exposes the same surface as MCP tools, so Claude Code (or any MCP client) can operate VibeStudio directly — no computer-use needed
runtime.json keeps GUI/CLI/agents agreeing on what's serving; the app adopts a CLI-started server (and vice versa) across restarts

Proactive layer — heartbeat & triggers

Settings → Automation holds every knob: heartbeat cadence/active-hours/project, the HEARTBEAT.md checklist editor, the trigger list + add form, recent firings
Heartbeat: the agent wakes every N minutes (within active hours), follows the user-editable ~/.vibestudio/HEARTBEAT.md checklist, and stays quiet (HEARTBEAT_OK) when nothing needs doing
Triggers: IFTTT-style rules — one-shot / interval / daily at HH:MM / file-glob changed / shell-probe edge — each firing a normal agent run on its project
Schedule from any chat: "remind me in 20 minutes…", "every morning at 9…", "when the build log changes…" — the agent's schedule tool writes the same trigger list (one-shots prune themselves after firing)

Computer use — drive the whole Mac (off by default)

With the Computer use capability on, the agent can see the screen (screenshots fed to the VLM as images — it's a vision model), control the mouse & keyboard (JXA/CoreGraphics + System Events, no cliclick), drive macOS apps & files via AppleScript, and take over Chrome (navigate, run JS in the tab, read/scrape) — all zero-dependency, built on tools that ship with macOS
The agent is OS-aware: macOS version, arch, shell and screen size are injected into its prompt so it uses Mac/BSD idioms
Gated behind the computer permission (off by default — it controls the whole machine), Ask-able (Allow/Deny in chat), and fully audited. Verified live on the local 27B: created & renamed a Desktop file, drove Chrome to a URL and read it, and screenshotted + described the screen

Browser automation — autonomous web agent, fully local

A separate Browser automation capability (also off by default) drives a real Chromium via Playwright + browser-use, pointed at your local model — no cloud LLM. action: "task" hands it a natural-language goal and it navigates/clicks/reads on its own; action: "run" is a deterministic scripted interaction
Isolated in its own venv so it never perturbs the MLX/mlx-mtp inference stack. Verified live: VibeStudio's agent opened Hacker News and reported the top story, end-to-end on the 27B

Privacy & Safety — guardrails with teeth

Settings → Privacy & Safety: per-tool permission matrix where Ask actually asks (Allow/Deny buttons in the chat; automations treat Ask as Deny), shell deny-patterns enforced before anything runs, web egress allow/block domain rules, an append-only audit trail of every command/fetch/edit with its decision, and a one-flip automations kill-switch
All enforced in the agent process itself — the gap every OpenClaw security post-mortem and "sandbox your Hermes" thread keeps pointing at

Usage & Budgets — see it, cap it

Settings → Usage & Budgets: real numbers from the local per-run ledger — runs/steps/tokens/avg tok/s today and this week, automation share, per-project breakdown, recent runs
Hard budgets the agent enforces every run: memory-injection cap, max loop steps, shell + ask timeouts — context can never silently grow (the 45K-tokens-per-call Hermes failure mode, by design impossible here)

Generative UI — the agent can ask you

ask_user pauses the run and renders the question in the timeline as choice buttons (+ optional free text); your click flows back into the blocked run and the agent continues with your answer
Works from the terminal too (vibe run prompts inline; vibe answer from anywhere) and over MCP (run_events shows the ask, the answer tool replies)

Memory — markdown is the database

Facts live in CLAUDE.md / AGENTS.md (global + per-project); the Zvec index is derived, two-way synced by mtime, and rebuildable from the files alone — edit your agent's memory in any editor
Settings → Memory → Edit files… opens the real md in-app (CLAUDE.md/AGENTS.md tabs, ⌘S to save); saving re-indexes it for recall immediately
vibe memory recall merges memory entries with the captured run-note archive

See docs/CONTROL.md for the full protocol, knob table, and examples.

Stack

┌─────────────────────────────────────────────────────┐
│  macOS native window  ·  Tauri v2 (Rust)            │
│  ────────────────────────────────────────────────── │
│  React 18 + Vanilla JS  (no bundler)                │
│  ────────────────────────────────────────────────── │
│  Python agent backend                               │
│    ├── mlx-mtp serve  (OpenAI-compatible API)       │
│    ├── mlx-mtp engine  (MTP / DFlash speculative)   │
│    ├── SearXNG  :8888  (web search, Docker)         │
│    └── Firecrawl  :3002  (web scrape, Docker)       │
└─────────────────────────────────────────────────────┘

Layer	Tech
Shell	Tauri v2 (Rust) — native window, IPC, dialogs
Frontend	React 18, Vanilla JS, no bundler
Backend	Python 3.12 — agent loop, tools, RAG, memory
Inference	mlx-mtp — quantize + native-MTP OpenAI server
Web tools	SearXNG + Firecrawl, self-hosted via Docker
Storage	JSON on disk — projects, chats, settings, memory

Architecture

User prompt
     │
     ▼
 Tauri invoke ──► commands.rs ──► python.rs
                                      │
                         vibestudio_sidecar.py
                                      │
                  ┌───────────────────┴───────────────────┐
                  │             agent loop                 │
                  │  ┌─────────────────────────────────┐  │
                  │  │  web_search  ────────────────────┼──┼──► SearXNG :8888
                  │  │  web_scrape  ────────────────────┼──┼──► Firecrawl :3002
                  │  │  read_file / write_file          │  │
                  │  └─────────────────────────────────┘  │
                  │                  │                     │
                  │       mlx-mtp serve  ──────────────────┼──► MLX model on device
                  │       (SSE streaming)                  │    MTP | DFlash | Standard
                  └───────────────────────────────────────┘
                                     │
                      SSE  agent://step  events
                                     │
                          Tauri event bus
                                     │
                       vibestudio-bridge.js
                                     │
                          React state update
                       (bubble grows live, metrics appended on done)

Requirements

macOS · Apple Silicon (M1 or later)
Python 3.11+ · mlx-mtp (pip; no PyTorch)
Rust + Cargo (to build from source)
Docker (optional — SearXNG + Firecrawl web tools)

Building from source

git clone https://github.com/junainfinity/VibeStudio
cd VibeStudio

# Frontend vendor libs (one-time; React/Babel/xterm via npm)
bash frontend/fetch-vendor.sh

# Python backend — its own venv, no PyTorch
python3 -m venv .venv && source .venv/bin/activate
pip install "git+https://github.com/junainfinity/mlx-mtp"
pip install -r backend/requirements.txt

# macOS app bundle
cargo tauri build
# → src-tauri/target/release/bundle/macos/VibeStudio.app

# Dev mode (hot-reload frontend)
cargo tauri dev

Open Settings → Model after first launch and set the Python interpreter to your .venv/bin/python and the models folder to wherever your MLX models live.

Web tools (Docker)

cd integrations/searxng  && docker compose up -d  # web search  → :8888
cd integrations/firecrawl && docker compose up -d  # web scrape  → :3002

Recommended model

osmQwopus-3.6-27B-Coder — a 27B Qwen3.6-based uncensored coding VLM, quantized with mlx-mtp's vision- and MTP-preserving quantizer.

MXFP8 (~30 GB): 8-bit microscaling, MTP head + vision tower in fp16 → native MTP (default)
MXFP4 (~16 GB): 4-bit microscaling, vision tower in fp16 → half the size, faster absolute tok/s

Quantize your own: mlx-mtp.

Related repos

mlx-mtp — the inference engine: pure-MLX MXFP4/MXFP8 quantizers (vision + MTP preserved), native-MTP speculative decoding, and an OpenAI-compatible server

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.claude		.claude
backend		backend
bin		bin
docs		docs
frontend		frontend
integrations/searxng		integrations/searxng
skills		skills
src-tauri		src-tauri
.gitignore		.gitignore
BUILD.md		BUILD.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VibeStudio

Screenshots

Why VibeStudio

Inference speed

Features

Inference

Agent

Projects & chats

Serving

Control without the GUI — `vibe` CLI + MCP

Proactive layer — heartbeat & triggers

Computer use — drive the whole Mac (off by default)

Browser automation — autonomous web agent, fully local

Privacy & Safety — guardrails with teeth

Usage & Budgets — see it, cap it

Generative UI — the agent can ask you

Memory — markdown is the database

Stack

Architecture

Requirements

Building from source

Web tools (Docker)

Recommended model

Related repos

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VibeStudio

Screenshots

Why VibeStudio

Inference speed

Features

Inference

Agent

Projects & chats

Serving

Control without the GUI — vibe CLI + MCP

Proactive layer — heartbeat & triggers

Computer use — drive the whole Mac (off by default)

Browser automation — autonomous web agent, fully local

Privacy & Safety — guardrails with teeth

Usage & Budgets — see it, cap it

Generative UI — the agent can ask you

Memory — markdown is the database

Stack

Architecture

Requirements

Building from source

Web tools (Docker)

Recommended model

Related repos

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Control without the GUI — `vibe` CLI + MCP

Packages