Vox

Voice input reimagined — speak in any language, type in any language.

A macOS menu-bar voice input app built with Makepad and Rust.

Press Option, speak, release — your words appear wherever the cursor is.

Platform: macOS (Apple Silicon) only. See Roadmap for cross-platform plans.

中文文档

Features

Press-to-talk — Hold Option key to record, release to transcribe and inject text
High-quality ASR — Powered by Qwen3-ASR (CER 5.88 on Chinese, 30+ languages)
LLM Refinement — Optional post-processing to fix recognition errors, translate, or convert text style
Real-time translation — Speak Chinese, output English (or any supported language)
Classical Chinese mode — Speak modern Chinese, output 文言文
Transparent capsule UI — Floating status indicator with pulse animation, fully transparent background
Multi-language — Chinese, English, Japanese, Korean, Traditional Chinese, Classical Chinese
Menu bar app — Lives in your menu bar, no Dock icon (when bundled)
Privacy first — Audio processed locally via OminiX-MLX, LLM refinement configurable (local or cloud)

Architecture

┌─────────────────────────────────────┐
│            Vox (Makepad 2.0)        │
│                                     │
│  Menu Bar ←→ Capsule ←→ Settings   │
│       ↕           ↕         ↕       │
│   CGEvent    Audio I/O    Config    │
│     Tap      (16kHz)    (~/.config) │
└──────┬────────────┬─────────────────┘
       │            │
       ▼            ▼
  macos-sys     ominix-api
  (ObjC FFI)   (HTTP :18080)
                    │
              ┌─────┴─────┐
              │ Qwen3-ASR │  ← local MLX inference
              │ LLM (opt) │  ← local or cloud API
              └───────────┘

Crate	Purpose
`macos-sys`	macOS FFI — CGEvent tap, NSStatusBar, clipboard, input source, key simulation
`app` (vox)	Makepad 2.0 UI — capsule window, settings, audio capture, HTTP client

Quick Start

Prerequisites

macOS 14.0+ on Apple Silicon (M1/M2/M3/M4)
Rust 1.82+
OminiX-API running locally
Accessibility permission for the terminal (System Settings → Privacy → Accessibility)

1. Start the ASR service

cd /path/to/OminiX-API
PORT=18080 ASR_MODEL_DIR=~/.OminiX/models/qwen3-asr-0.6b cargo run --release

2. Run Vox

cd /path/to/vox
cargo run -p vox

3. Use it

Look for MIC in the menu bar
Hold left Option key and speak
Release — text appears at your cursor

With LLM Refinement (translation/correction)

MOONSHOT_API_KEY=sk-your-key cargo run -p vox

Or configure any OpenAI-compatible API in Settings (MIC → LLM Refinement → Settings).

Language Modes

Menu Selection	ASR Language	LLM Action	Example
简体中文	Chinese	Correct typos	配森 → Python
English	Chinese/English	Translate to English	你好 → Hello
繁體中文	Chinese	Convert to Traditional	语音输入 → 語音輸入
日本語	Japanese	Preserve Japanese	そのまま出力
한국어	Korean	Preserve Korean	그대로 출력
文言文	Chinese	Convert to Classical	今天天气好 → 今日天朗气清

Translation and style conversion require LLM Refinement enabled (needs API key).

Configuration

Config file: ~/.config/vox/config.json

Environment Variables

Variable	Description
`MOONSHOT_API_KEY`	Kimi API key (auto-configures URL and model)
`VOICE_INPUT_LLM_API_KEY`	Any OpenAI-compatible API key
`VOICE_INPUT_LLM_API_URL`	LLM API base URL
`VOICE_INPUT_LLM_MODEL`	LLM model name
`VOICE_INPUT_API_URL`	OminiX-API base URL (default: `http://localhost:18080`)

Build

# Development
cargo run -p vox

# Release
cargo build --release -p vox

# macOS .app bundle (with LSUIElement — no Dock icon)
make bundle
# Output: target/Vox.app

Tech Stack

UI Framework: Makepad 2.0 — GPU-accelerated native UI with Splash DSL
ASR Engine: Qwen3-ASR via OminiX-API — pure Rust MLX inference
LLM: Any OpenAI-compatible API (Kimi, DeepSeek, OpenAI, local models)
macOS Integration: Raw ObjC FFI via makepad_objc_sys — CGEvent tap, NSStatusBar, NSPasteboard, TIS input sources

Platform Support

Platform	Status	Notes
macOS (Apple Silicon)	Supported	Full functionality
macOS (Intel)	Untested	Makepad supports it, but OminiX-MLX requires Apple Silicon
Windows	Planned	Requires platform-sys crate for hotkey/tray/clipboard
Linux	Planned	Requires platform-sys crate + alternative ASR backend
Web/WASM	Not planned	Global hotkey and clipboard injection not possible

Roadmap

v0.2 — Polish & Stability

Hide capsule/settings windows on startup (proper window lifecycle)
Waveform animation driven by real-time audio RMS
Entrance/exit animations for capsule (spring + fade)
Menu bar icon using proper SF Symbol or SVG instead of text "MIC"
Clean up debug logging and clippy warnings
.app bundle with code signing

v0.3 — Cross-Platform Foundation

Extract macos-sys into platform-sys trait abstraction
- trait HotkeyMonitor — global hotkey detection
- trait SystemTray — tray icon and menu
- trait Clipboard — read/write clipboard
- trait KeyInjector — simulate keystrokes
- trait InputSourceManager — detect/switch input methods
Windows backend (windows-sys) — RegisterHotKey, Shell_NotifyIcon, SendInput
Linux backend (linux-sys) — XGrabKey, libappindicator, xdotool/wtype

v0.4 — Flexible ASR Backend

ASR backend abstraction (not tied to OminiX-MLX)
Remote API mode — use any OpenAI Whisper-compatible endpoint
Whisper.cpp local backend for non-Apple-Silicon platforms
Configurable ASR endpoint in Settings UI

v0.5 — Enhanced UX

Streaming transcription (show text as you speak)
Custom hotkey configuration UI
Auto-start on login
Update checker
Localized UI (English/Chinese/Japanese)

Future

Plugin system for custom text transformations
Voice commands (not just dictation)
Multi-model ASR (auto-select best model per language)

License

AGPL-3.0 — See LICENSE for details.

Credits

Makepad — The UI framework
OminiX-MLX — ML inference on Apple Silicon
OminiX-API — OpenAI-compatible API server

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
app		app
docs		docs
macos-sys		macos-sys
specs		specs
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
DESIGN.md		DESIGN.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README_CN.md		README_CN.md
UNSAFE-REVIEW.md		UNSAFE-REVIEW.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vox

Features

Architecture

Quick Start

Prerequisites

1. Start the ASR service

2. Run Vox

3. Use it

With LLM Refinement (translation/correction)

Language Modes

Configuration

Environment Variables

Build

Tech Stack

Platform Support

Roadmap

v0.2 — Polish & Stability

v0.3 — Cross-Platform Foundation

v0.4 — Flexible ASR Backend

v0.5 — Enhanced UX

Future

License

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vox

Features

Architecture

Quick Start

Prerequisites

1. Start the ASR service

2. Run Vox

3. Use it

With LLM Refinement (translation/correction)

Language Modes

Configuration

Environment Variables

Build

Tech Stack

Platform Support

Roadmap

v0.2 — Polish & Stability

v0.3 — Cross-Platform Foundation

v0.4 — Flexible ASR Backend

v0.5 — Enhanced UX

Future

License

Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages