Skip to content

junainfinity/VibeStudio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VibeStudio

Your model. Your machine. No cloud. No subscriptions. No limits.

VibeStudio is a native macOS coding-agent desktop app built for Apple Silicon. Drop in your own model, point it at a codebase, and get a full coding agent — with tool use, web search, file read/write, and speculative decoding fast enough to feel instant. Everything runs on-device. Nothing leaves your machine.

Built on Tauri v2 + React, powered by the mlx-mtp inference engine.


Screenshots

Chat — Codex-style project/thread sidebar Settings — Local model, 256K context
Chat Model
Decoding — Native MTP · DFlash · Speculative Serving — OpenAI-compatible local endpoint
Decoding Serving

Why VibeStudio

VibeStudio LM Studio Codex Desktop Hermes
Fully local, no cloud
Native MTP speculative decode
DFlash block-diffusion decode
Vision (VLM)
Web search + scrape (local Docker)
Codex-style project/thread UX
Serve as OpenAI-compatible API
256K context
Full CLI + MCP control (no GUI needed) partial
Proactive heartbeat + IFTTT triggers
Editable markdown memory, two-way synced partial

Inference speed

VibeStudio's inference engine (mlx-mtp) squeezes maximum throughput out of your Apple Silicon hardware using speculative decoding on top of quantized models. Numbers on Apple M4 Max with osmQwopus-3.6-27B:

Build Size Notes
MXFP8 (8-bit microscaling) ~30 GB MTP head + vision tower fp16; native MTP (accept ~0.77–0.92) — the default, highest fidelity
MXFP4 (4-bit microscaling) ~16 GB vision tower fp16; ~27.7 tok/s vanilla on M4 Max — fastest absolute tok/s

osmQwopus-3.6-27B family · Apple M4 Max. MTP acceptance from real-weights validation; a full Coder MXFP8/MXFP4 throughput benchmark is forthcoming.

Apple Silicon is memory-bandwidth-bound, so the smaller MXFP4 build decodes faster in absolute tok/s, while MXFP8 trades some speed for higher fidelity. Both preserve the vision tower and run native MTP speculative decoding.

Quantizers and decoding engines: mlx-mtp.


Features

Inference

  • Native MTP — drives the model's own embedded multi-token-prediction head as a self-drafter; no external model needed; up to 1.62× faster
  • DFlash — block-diffusion external drafter; block size tunable (8 / 16 / 32); drafter path auto-detected from the model config
  • Standard + Speculative — vanilla AR and small-draft speculative also available
  • Context up to 256K tokens with a live slider (probed from the loaded model)
  • Full VLM — attach images to any prompt

Agent

  • Tool-use loop: web search (SearXNG, self-hosted) + web scrape (Firecrawl, self-hosted) + file read/write + shell
  • Per-bubble TTFT, thinking time, tokens/sec shown inline under each response
  • Context memory meter — filled / remaining, live
  • RAG — any-file knowledge index for the active project

Projects & chats

  • Codex-style sidebar: projects → chat threads; multiple chats per project
  • Hover-copy on every bubble — your messages and the agent's full step transcript copy to the clipboard as plain text
  • Each project maps to a local folder; git branch and path shown in the header
  • Rename / delete projects from the sidebar (⋯ menu)
  • Artifact drawer — file tree slides in from the left, lazy-loads on expand

Serving

  • Expose any loaded model as a local OpenAI-compatible API endpoint
  • Configure port, API key, max concurrent requests, and model identifier
  • Decoding strategy (MTP / DFlash / Standard) and block size flow through to the server's env — every client call uses the strategy you selected in Settings

Control without the GUI — vibe CLI + MCP

  • bin/vibe drives every knob: server/model lifecycle, agent runs with live step streaming, steer/queue, memory, RAG, settings, triggers — through the app's control socket when it's running (changes show up live in the GUI), headless when it isn't. vibe statusvibe server startvibe model load --autovibe run "…" -p proj
  • vibe mcp exposes the same surface as MCP tools, so Claude Code (or any MCP client) can operate VibeStudio directly — no computer-use needed
  • runtime.json keeps GUI/CLI/agents agreeing on what's serving; the app adopts a CLI-started server (and vice versa) across restarts

Proactive layer — heartbeat & triggers

  • Settings → Automation holds every knob: heartbeat cadence/active-hours/project, the HEARTBEAT.md checklist editor, the trigger list + add form, recent firings
  • Heartbeat: the agent wakes every N minutes (within active hours), follows the user-editable ~/.vibestudio/HEARTBEAT.md checklist, and stays quiet (HEARTBEAT_OK) when nothing needs doing
  • Triggers: IFTTT-style rules — one-shot / interval / daily at HH:MM / file-glob changed / shell-probe edge — each firing a normal agent run on its project
  • Schedule from any chat: "remind me in 20 minutes…", "every morning at 9…", "when the build log changes…" — the agent's schedule tool writes the same trigger list (one-shots prune themselves after firing)

Computer use — drive the whole Mac (off by default)

  • With the Computer use capability on, the agent can see the screen (screenshots fed to the VLM as images — it's a vision model), control the mouse & keyboard (JXA/CoreGraphics + System Events, no cliclick), drive macOS apps & files via AppleScript, and take over Chrome (navigate, run JS in the tab, read/scrape) — all zero-dependency, built on tools that ship with macOS
  • The agent is OS-aware: macOS version, arch, shell and screen size are injected into its prompt so it uses Mac/BSD idioms
  • Gated behind the computer permission (off by default — it controls the whole machine), Ask-able (Allow/Deny in chat), and fully audited. Verified live on the local 27B: created & renamed a Desktop file, drove Chrome to a URL and read it, and screenshotted + described the screen

Browser automation — autonomous web agent, fully local

  • A separate Browser automation capability (also off by default) drives a real Chromium via Playwright + browser-use, pointed at your local model — no cloud LLM. action: "task" hands it a natural-language goal and it navigates/clicks/reads on its own; action: "run" is a deterministic scripted interaction
  • Isolated in its own venv so it never perturbs the MLX/mlx-mtp inference stack. Verified live: VibeStudio's agent opened Hacker News and reported the top story, end-to-end on the 27B

Privacy & Safety — guardrails with teeth

  • Settings → Privacy & Safety: per-tool permission matrix where Ask actually asks (Allow/Deny buttons in the chat; automations treat Ask as Deny), shell deny-patterns enforced before anything runs, web egress allow/block domain rules, an append-only audit trail of every command/fetch/edit with its decision, and a one-flip automations kill-switch
  • All enforced in the agent process itself — the gap every OpenClaw security post-mortem and "sandbox your Hermes" thread keeps pointing at

Usage & Budgets — see it, cap it

  • Settings → Usage & Budgets: real numbers from the local per-run ledger — runs/steps/tokens/avg tok/s today and this week, automation share, per-project breakdown, recent runs
  • Hard budgets the agent enforces every run: memory-injection cap, max loop steps, shell + ask timeouts — context can never silently grow (the 45K-tokens-per-call Hermes failure mode, by design impossible here)

Generative UI — the agent can ask you

  • ask_user pauses the run and renders the question in the timeline as choice buttons (+ optional free text); your click flows back into the blocked run and the agent continues with your answer
  • Works from the terminal too (vibe run prompts inline; vibe answer from anywhere) and over MCP (run_events shows the ask, the answer tool replies)

Memory — markdown is the database

  • Facts live in CLAUDE.md / AGENTS.md (global + per-project); the Zvec index is derived, two-way synced by mtime, and rebuildable from the files alone — edit your agent's memory in any editor
  • Settings → Memory → Edit files… opens the real md in-app (CLAUDE.md/AGENTS.md tabs, ⌘S to save); saving re-indexes it for recall immediately
  • vibe memory recall merges memory entries with the captured run-note archive

See docs/CONTROL.md for the full protocol, knob table, and examples.


Stack

┌─────────────────────────────────────────────────────┐
│  macOS native window  ·  Tauri v2 (Rust)            │
│  ────────────────────────────────────────────────── │
│  React 18 + Vanilla JS  (no bundler)                │
│  ────────────────────────────────────────────────── │
│  Python agent backend                               │
│    ├── mlx-mtp serve  (OpenAI-compatible API)       │
│    ├── mlx-mtp engine  (MTP / DFlash speculative)   │
│    ├── SearXNG  :8888  (web search, Docker)         │
│    └── Firecrawl  :3002  (web scrape, Docker)       │
└─────────────────────────────────────────────────────┘
Layer Tech
Shell Tauri v2 (Rust) — native window, IPC, dialogs
Frontend React 18, Vanilla JS, no bundler
Backend Python 3.12 — agent loop, tools, RAG, memory
Inference mlx-mtp — quantize + native-MTP OpenAI server
Web tools SearXNG + Firecrawl, self-hosted via Docker
Storage JSON on disk — projects, chats, settings, memory

Architecture

User prompt
     │
     ▼
 Tauri invoke ──► commands.rs ──► python.rs
                                      │
                         vibestudio_sidecar.py
                                      │
                  ┌───────────────────┴───────────────────┐
                  │             agent loop                 │
                  │  ┌─────────────────────────────────┐  │
                  │  │  web_search  ────────────────────┼──┼──► SearXNG :8888
                  │  │  web_scrape  ────────────────────┼──┼──► Firecrawl :3002
                  │  │  read_file / write_file          │  │
                  │  └─────────────────────────────────┘  │
                  │                  │                     │
                  │       mlx-mtp serve  ──────────────────┼──► MLX model on device
                  │       (SSE streaming)                  │    MTP | DFlash | Standard
                  └───────────────────────────────────────┘
                                     │
                      SSE  agent://step  events
                                     │
                          Tauri event bus
                                     │
                       vibestudio-bridge.js
                                     │
                          React state update
                       (bubble grows live, metrics appended on done)

Requirements

  • macOS · Apple Silicon (M1 or later)
  • Python 3.11+ · mlx-mtp (pip; no PyTorch)
  • Rust + Cargo (to build from source)
  • Docker (optional — SearXNG + Firecrawl web tools)

Building from source

git clone https://github.com/junainfinity/VibeStudio
cd VibeStudio

# Frontend vendor libs (one-time; React/Babel/xterm via npm)
bash frontend/fetch-vendor.sh

# Python backend — its own venv, no PyTorch
python3 -m venv .venv && source .venv/bin/activate
pip install "git+https://github.com/junainfinity/mlx-mtp"
pip install -r backend/requirements.txt

# macOS app bundle
cargo tauri build
# → src-tauri/target/release/bundle/macos/VibeStudio.app

# Dev mode (hot-reload frontend)
cargo tauri dev

Open Settings → Model after first launch and set the Python interpreter to your .venv/bin/python and the models folder to wherever your MLX models live.

Web tools (Docker)

cd integrations/searxng  && docker compose up -d  # web search  → :8888
cd integrations/firecrawl && docker compose up -d  # web scrape  → :3002

Recommended model

osmQwopus-3.6-27B-Coder — a 27B Qwen3.6-based uncensored coding VLM, quantized with mlx-mtp's vision- and MTP-preserving quantizer.

  • MXFP8 (~30 GB): 8-bit microscaling, MTP head + vision tower in fp16 → native MTP (default)
  • MXFP4 (~16 GB): 4-bit microscaling, vision tower in fp16 → half the size, faster absolute tok/s

Quantize your own: mlx-mtp.


Related repos

  • mlx-mtp — the inference engine: pure-MLX MXFP4/MXFP8 quantizers (vision + MTP preserved), native-MTP speculative decoding, and an OpenAI-compatible server

License

Apache-2.0

About

Local-first AI coding agent for Apple Silicon — Tauri + React, mlx-mtp inference (MTP/DFlash speculative decode), fully on-device, no cloud

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors