A privacy-first macOS meeting recorder and note-taker. Records system audio and microphone via ScreenCaptureKit, transcribes locally with mlx-whisper on Apple Silicon, identifies speakers with resemblyzer, captures visual context via on-device OCR, generates AI summaries and action items, and exports structured notes — no cloud required.
Built as a local, private alternative to Shadow and Granola.
- Automatic transcription — mlx-whisper runs on Apple Silicon GPU, fully offline
- Dual-stream speaker diarization — your mic is always "You"; remote speakers are clustered by resemblyzer. No HuggingFace token required, no cloud calls
- Attendee-informed speaker count — calendar attendee list tells resemblyzer exactly how many remote speakers to expect, reducing split/merge errors
- Visual context capture — periodic screenshots of the meeting window, OCR'd locally via Apple Vision framework, injected into the LLM summary prompt
- AI summaries + action items — structured notes with decisions, action items table, key sections, and outline
- Flexible export — writes markdown to any folder, or directly into an Obsidian vault with full frontmatter and wikilinks
- Calendar awareness — reads Apple Calendar via EventKit, shows upcoming meetings in the menu bar, floating alert panel 5 minutes before meetings
- Auto-record — starts recording automatically when a calendar meeting begins; auto-stops at event end
- Live transcript — real-time transcript visible in the HUD during recording
- Ask during meeting — query what's been said so far via a live query panel
- People registry — tracks known attendees across meetings, creates Obsidian People pages
- Speaker voice learning — stores resemblyzer embeddings after you assign names; auto-assigns known speakers in future meetings
- Global hotkey —
Cmd+Shift+Rtoggles recording from any app - Menu bar app — no Dock icon, runs quietly in the background
- macOS 14.4+ (required for ScreenCaptureKit audio capture APIs)
- Apple Silicon Mac (M1 or later — mlx-whisper and Vision OCR use the GPU/ANE)
- Python 3.11+
- Ollama for local LLM summarization (or a Claude / OpenAI API key)
- ffmpeg (
brew install ffmpeg)
If you downloaded a release DMG:
- Open the DMG and drag Attend to Applications
- Clone this repo (needed for the Python backend):
git clone https://github.com/nickybmon/Attend.git cd Attend - Run the setup script:
./setup.sh
- Launch the app — right-click → Open on first launch (bypasses Gatekeeper since the app isn't code-signed)
- Grant Screen Recording and Microphone permissions when prompted
The app starts the Python backend automatically. Make sure Ollama is running (ollama serve) for AI summaries.
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txtbrew install ollama
ollama serve # start in a separate terminal, or set it to run at login
ollama pull llama3.2 # or any model you preferbrew install ffmpeg./build.shThis also compiles the OCR helper CLI (tools/ocr_helper_bin) if not already built.
./run_with_backend.shBuilds the app, starts the Python backend, and launches the menu bar app. The icon appears in your menu bar.
On first run, config.yaml is created automatically. Edit it directly or use the Settings window (menu bar icon → Settings, or ⌘,).
| Setting | Description | Default |
|---|---|---|
obsidian_mode |
obsidian_folder, folder_export, or local_rest_api |
obsidian_folder |
obsidian_vault_path |
Path to your Obsidian vault or output folder | — |
llm_backend |
ollama, claude, or openai |
ollama |
llm_model |
Ollama model name | llama3.2 |
mlx_whisper_model |
Whisper model (HuggingFace repo or local path) | mlx-community/whisper-large-v3-turbo |
audio_capture_method |
auto, systemdump, audiocap, or blackhole |
auto |
diarization_num_speakers |
Expected speaker count for resemblyzer (0 = use attendee count or auto-detect) | 0 |
screenshot_capture_enabled |
Capture meeting window screenshots for visual context | true |
screenshot_interval_seconds |
How often to capture a screenshot | 30 |
keep_screenshots |
Keep PNG files after OCR (default: delete after processing) | false |
create_people_stubs |
Auto-create Obsidian People pages for new attendees | false |
people_stub_folder |
Vault-relative folder for People pages | People |
If you want to use Claude or OpenAI for summaries instead of a local Ollama model, enter your API key in Settings → LLM. Keys are stored in the macOS Keychain — they are never written to config.yaml.
You can also set the key as an environment variable before launching:
export ANTHROPIC_API_KEY="sk-ant-..." # for Claude
export OPENAI_API_KEY="sk-..." # for OpenAI
export GEMINI_API_KEY="..." # for Google GeminiSettings → Export lets you choose how notes are saved:
Obsidian vault (folder) — writes markdown files directly into your Obsidian vault. "Open in Obsidian" links deep-link into the correct file. Recommended if you use Obsidian.
Folder (any location) — saves to any folder. Works without Obsidian.
Obsidian vault (REST API) — writes via the Local REST API community plugin.
{vault}/
Meetings/
Notes/YYYY-MM-DD/
{Meeting Title}.md ← hub note: summary, decisions, action items
Transcripts/YYYY-MM-DD/
{Meeting Title} — transcript.md
Outlines/YYYY-MM-DD/
{Meeting Title} — outline.md
All paths and filename patterns are configurable in Settings.
The app is two independent processes communicating over HTTP and WebSocket:
Swift app (.app) Python backend (FastAPI, localhost:8000)
───────────────────────────────── ────────────────────────────────────────
Menu bar + floating HUD mlx-whisper transcription (streaming + batch)
ScreenCaptureKit → dual named pipes resemblyzer speaker diarization
ScreenshotCapture → session dir ──► Vision OCR (tools/ocr_helper_bin)
ProcessManager (spawns Python) Ollama / Claude / OpenAI summarization
BackendService (HTTP client) ◄──► ObsidianWriter → vault export
WebSocketService (live updates) People registry (SQLite)
EventKit calendar read Speaker profiles (SQLite)
Settings UI
The Swift app handles the three things only a signed .app can do on macOS: Screen Recording permission, menu bar presence, and spawning the Python backend. All AI processing lives in Python.
ScreenCaptureKit (system audio + mic, two separate named pipes)
→ mic pipe → all segments tagged "You" (no diarization needed)
→ system pipe → streaming transcription (10s chunks, live WebSocket updates)
→ on stop: batch transcription (full audio, higher accuracy)
→ resemblyzer diarization (remote speakers only; count from calendar attendees)
→ screenshot OCR (Vision framework, local, non-fatal)
→ LLM: summary (+ visual context), action items, outline, title
→ People registry upsert + vault export
All audio processing, transcription, speaker diarization, and OCR run entirely on your Mac — nothing leaves your machine unless you explicitly configure a cloud LLM backend (Claude, OpenAI, or Gemini). Even then, only the meeting transcript is sent to the LLM API; audio never leaves the device.
API keys are stored in the macOS Keychain, not in config.yaml.
The app requests the following macOS permissions on first use:
- Screen Recording — required to capture system audio and meeting window screenshots via ScreenCaptureKit
- Microphone — required to capture your microphone
- Calendar — optional, enables meeting detection and auto-record
No audio captured Check Screen Recording permission in System Settings → Privacy & Security → Screen Recording.
Backend won't start
Port 8000 is tried first; the app auto-tries 8001–8009. Check that your venv is activated and pip install -r requirements.txt completed successfully.
Vault not writing
Check the backend log for [RecordingService] Obsidian: lines. Verify obsidian_vault_path points to an existing directory.
Speaker labels wrong or merged
Set diarization_num_speakers to the exact number of people in the meeting, or ensure the meeting has a calendar event with attendees so the count is derived automatically.
No visual context in summary
Check that screenshot_capture_enabled: true in config. During a recording, verify sessions/<id>/screenshots/ is being populated. After processing, the backend log shows Screen OCR: N context entries.
Mic pipe not connecting (all speakers show as "Others")
At recording start, the backend log should show Reading mic-only audio from pipe: /tmp/.... If missing, the dual-pipe path didn't connect — audio capture fell back to single-stream mode.
App freezes
./scripts/capture_freeze.sh # captures main-thread stack trace
./scripts/stop_all.sh # force-quit everything# Python syntax check
source venv/bin/activate
python -m py_compile src/*.py api/*.py
# Swift build check
cd "Swift App/LocalTranscriptionApp"
xcodebuild -scheme LocalTranscriptionApp -configuration Debug build 2>&1 | grep -E "error:|BUILD"
# Start backend only (no Swift app)
source venv/bin/activate
python backend_server.py --port 8000
# Stop everything
./scripts/stop_all.shSee CLAUDE.md for full codebase guidance and architecture details, and ROADMAP.md for the feature roadmap.
| Dependency | Install | Purpose |
|---|---|---|
| Ollama | brew install ollama |
Local LLM for summarization |
| ffmpeg | brew install ffmpeg |
Audio conversion (PCM → WAV) |
| mlx-whisper | via requirements.txt |
Apple Silicon transcription |
| resemblyzer | via requirements.txt |
Local speaker diarization (no token) |
| scikit-learn | via requirements.txt |
Clustering for resemblyzer |
Python packages are in requirements.txt. Install with pip install -r requirements.txt.