Zora is a small macOS voice companion to Katra. The first milestone is a reliable Transcribe Mode: hold a global hotkey, speak, release, and have the transcript inserted into the app you were using.
This repo contains the first end-to-end vertical slice for Transcribe Mode:
- Global hold-to-talk hotkey with persisted function-key selection
- Minimal floating overlay near the bottom-center of the screen
- Live speech transcription with Apple Speech
- Release-to-finalize flow
- Automatic text insertion into the previously focused app when Accessibility access is granted
- Clipboard fallback when insertion is unavailable
- Small settings window for hotkey selection, microphone selection, and permission status
- Vocabulary boosting for technical words and project-specific terms
- Experimental Command Mode on
F19backed by a local Ollama model - Built-in macOS voice playback for spoken command replies
- Native launcher commands for opening apps, folders, URLs, and web searches
- A first command-generation skill that writes shell commands into the focused terminal or copies them to the clipboard
- A repo-aware git commit drafting skill that pastes a conventional
git commit -m ...command into the current prompt - User-defined custom skills that can be created, edited, and invoked by name
- AI-assisted custom skill generation from a plain-language request in Settings or command mode
- Lightweight local user context for command mode, including your macOS name and machine context
- Escape-to-cancel while the overlay is active
This is intentionally small. Command mode now exists as an early local-only slice, while broader provider support, tools, and richer assistant behavior are still out of scope.
For keyboards like the Moonlander, the intended setup is to map one dedicated physical key to F18 in firmware and use that as a true one-key push-to-talk trigger.
Zora uses:
SwiftSwiftUIfor the overlay UIAppKitfor panel and app lifecycle controlCarbonfor a true global hotkey with press and release eventsAVFoundationfor microphone captureSpeechfor transcriptionApplicationServicesfor Accessibility-aware text insertionURLSessionfor local Ollama requestsAVSpeechSynthesizerfor built-in macOS voice playback
Why this stack:
- It is fully macOS-native and dependency-free.
- Carbon hotkeys are the simplest reliable way to model hold-to-talk globally without bringing in third-party libraries.
- Apple speech and audio frameworks are enough for the first working slice and keep the architecture local-first where possible.
- SwiftUI gives us a polished overlay quickly, while AppKit covers the windowing details SwiftUI alone does not.
Sources/Zora/App/ App entry point and lifecycle
Sources/Zora/Core/Hotkey/ Global shortcut handling
Sources/Zora/Core/Overlay/ Floating overlay panel controller
Sources/Zora/Core/Permissions/ Microphone and Speech permission flow
Sources/Zora/Core/TextInsertion/ Paste and clipboard fallback
Sources/Zora/Features/Transcribe/Transcribe mode UI and orchestration
Resources/ App bundle metadata
docs/ Architecture notes
scripts/ Build/run helpers for a local app bundle
- macOS 14+
- Xcode installed for local development and app launching
The current Codex environment only has Command Line Tools active, so this repo is structured as a Swift package that also builds into a local .app bundle with a helper script. You can open Package.swift directly in Xcode, or build from the terminal.
- Open Package.swift in Xcode.
- Run the
Zoraexecutable target. - If Xcode launches the raw executable instead of the bundle, use the terminal option below for the cleanest permission prompts.
./scripts/install_app.sh
./scripts/run_app.shinstall_app.sh builds dist/Zora.app and installs a stable copy to ~/Applications/Zora.app.
run_app.sh simply opens the installed app without reinstalling it.
Why that matters:
- macOS privacy permissions behave more predictably when the app lives at a stable path.
- Reinstalling a dev build can confuse Accessibility trust, so normal relaunches should use
run_app.shor open~/Applications/Zora.appdirectly. - Ad-hoc signing can also destabilize Accessibility trust for a fast-moving dev build, so Zora currently installs unsigned during local development.
- Zora is a background utility app, so after launch it lives in the menu bar as
Zorarather than in the Dock.
Zora needs these macOS permissions:
- Microphone
- Speech Recognition
- Accessibility
- Automation for Terminal or iTerm when using repo-aware terminal skills that inspect those apps directly
Behavior by permission:
- Microphone + Speech Recognition are required for transcription.
- Accessibility is only required for automatic insertion into the previous app.
- If Accessibility is not granted, Zora still copies the transcript to the clipboard.
- If you grant Accessibility while Zora is already running, relaunch Zora once so macOS fully applies the trust state.
- Zora only asks for Terminal or iTerm automation access when a command skill needs direct terminal session context. If you're in another terminal or editor, Zora can also make a best-effort repo guess from the frontmost app and window title.
On first run:
- Run
./scripts/install_app.sh, then./scripts/run_app.sh. - Zora proactively asks for Speech Recognition and Microphone access on launch.
- Zora also prompts for Accessibility on launch so insertion works before first use.
- Look for the
Zoramenu bar item after launch. The app stays running there. - After permissions are granted, hold your dedicated
F18key to transcribe.
- Focus any text field in another app.
- Hold your dedicated
F18key. - Speak while holding the shortcut.
- Release the shortcut to finalize.
- Zora first tries direct Accessibility-based text insertion, then falls back to paste, or leaves the transcript on the clipboard if automation is unavailable.
- Press
Escapewhile the overlay is active to cancel.
Open Settings… from the Zora menu bar item to:
- Choose the current hotkey from one-key function key options
- See the fixed
F19command-mode hotkey - Choose the preferred microphone input device
- Add custom vocabulary hints for technical words, app names, or project terms
- Configure the local Ollama base URL and command model
- Create and edit custom command skills in a resizable split view
- Ask Zora to draft a new custom skill from a plain-language request
- Check live Speech Recognition, Microphone, and Accessibility status
- Jump directly to the relevant macOS privacy settings panes
- Make sure Ollama is running locally.
- In
Settings…, confirm the Ollama base URL and model name. - Hold
F19. - Speak your request.
- Release
F19to send it to the local model. - Zora first tries simple native launcher actions like opening apps, folders, URLs, or web searches.
- If you ask for a shell command, Zora uses a specialized command-generation skill and inserts the result into the previously focused app.
- If you ask for a git commit message, Zora inspects the current repo, drafts a conventional commit command, and pastes only the command into the prompt without running it.
- If you ask Zora to create a skill, it drafts and saves a new custom skill you can edit in Settings.
- If you invoke one of your custom skills by name, Zora uses that skill before falling back to generic chat.
- If no native or custom skill matches, Zora falls back to the local Ollama model.
- Zora shows a short response in the overlay and reads it aloud with the built-in macOS voice.
- Press
Escapeat any point to cancel the current command interaction.
Examples:
Open NotesLaunch System SettingsOpen DownloadsOpen github.comSearch web for Swift concurrencyWrite me a command to list files larger than 1 GBGive me a command that finds every .mov file under DownloadsGenerate a commit messageWrite me a git commit commandDraft a conventional commit for these changesCreate a skill that turns rough notes into crisp meeting summariesMeeting Summary summarize these notes
- The default hotkey starts at
F18, and you can remap it to another function key in Settings. - The cleanest one-key setup is to map a dedicated keyboard key to whichever function key Zora is listening for.
- Command mode is currently local-only and Ollama-only.
- Command mode currently supports a small built-in launcher skill set before model fallback.
- The shell command skill is prompt-driven and aims to return safe, non-destructive commands, but you should still review generated commands before running them.
- Repo-aware git commit drafting prefers direct Terminal or iTerm session context, but it can also make a best-effort repo guess from the frontmost app when you're working elsewhere.
- Zora pastes only the draft
git commit -m ...command. Any staging warning stays in the spoken or overlay response instead of being inserted into your shell. - Custom skills are invoked by saying the skill name first, then the task you want that skill to handle.
- Command mode now includes a lightweight local user profile in its prompt, but it still does not have long-term memory yet.
- Zora first tries direct Accessibility text replacement, then falls back to simulated paste for broader compatibility.
- Zora only overwrites the clipboard when it needs the paste fallback or clipboard-only fallback.
- The overlay becomes key during capture so Escape can cancel reliably; Zora re-activates the previous app before insertion.
- Speech recognition currently uses the system locale.
- Technical terms depend on Apple Speech recognition quality, but Zora now boosts built-in terms like
GitHuband lets you add your own vocabulary hints. - Zora also runs a small post-processing normalization pass for common vocabulary variants like
git hubtoGitHubbefore insertion. - Command responses are intentionally short and do not use MCP tools or custom skills yet.
- Improve insertion behavior for more editable controls
- Add app icon and packaging polish
- Capture live audio level for richer overlay feedback
See docs/architecture.md for the current vertical-slice design.