Attend

A privacy-first macOS meeting recorder and note-taker. Records system audio and microphone via ScreenCaptureKit, transcribes locally with mlx-whisper on Apple Silicon, identifies speakers with resemblyzer, captures visual context via on-device OCR, generates AI summaries and action items, and exports structured notes — no cloud required.

Built as a local, private alternative to Shadow and Granola.

Features

Automatic transcription — mlx-whisper runs on Apple Silicon GPU, fully offline
Dual-stream speaker diarization — your mic is always "You"; remote speakers are clustered by resemblyzer. No HuggingFace token required, no cloud calls
Attendee-informed speaker count — calendar attendee list tells resemblyzer exactly how many remote speakers to expect, reducing split/merge errors
Visual context capture — periodic screenshots of the meeting window, OCR'd locally via Apple Vision framework, injected into the LLM summary prompt
AI summaries + action items — structured notes with decisions, action items table, key sections, and outline
Flexible export — writes markdown to any folder, or directly into an Obsidian vault with full frontmatter and wikilinks
Calendar awareness — reads Apple Calendar via EventKit, shows upcoming meetings in the menu bar, floating alert panel 5 minutes before meetings
Auto-record — starts recording automatically when a calendar meeting begins; auto-stops at event end
Live transcript — real-time transcript visible in the HUD during recording
Ask during meeting — query what's been said so far via a live query panel
People registry — tracks known attendees across meetings, creates Obsidian People pages
Speaker voice learning — stores resemblyzer embeddings after you assign names; auto-assigns known speakers in future meetings
Global hotkey — Cmd+Shift+R toggles recording from any app
Menu bar app — no Dock icon, runs quietly in the background

Requirements

macOS 14.4+ (required for ScreenCaptureKit audio capture APIs)
Apple Silicon Mac (M1 or later — mlx-whisper and Vision OCR use the GPU/ANE)
Python 3.11+
Ollama for local LLM summarization (or a Claude / OpenAI API key)
ffmpeg (brew install ffmpeg)

Quick Install (from Release)

If you downloaded a release DMG:

Open the DMG and drag Attend to Applications

Clone this repo (needed for the Python backend):

git clone https://github.com/nickybmon/Attend.git
cd Attend

Run the setup script:
```
./setup.sh
```
Launch the app — right-click → Open on first launch (bypasses Gatekeeper since the app isn't code-signed)
Grant Screen Recording and Microphone permissions when prompted

The app starts the Python backend automatically. Make sure Ollama is running (ollama serve) for AI summaries.

Setup (from source)

1. Python environment

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

2. Ollama (local LLM)

brew install ollama
ollama serve          # start in a separate terminal, or set it to run at login
ollama pull llama3.2  # or any model you prefer

3. ffmpeg

brew install ffmpeg

4. Build the Swift app

./build.sh

This also compiles the OCR helper CLI (tools/ocr_helper_bin) if not already built.

5. Launch everything

./run_with_backend.sh

Builds the app, starts the Python backend, and launches the menu bar app. The icon appears in your menu bar.

Configuration

On first run, config.yaml is created automatically. Edit it directly or use the Settings window (menu bar icon → Settings, or ⌘,).

Key settings

Setting	Description	Default
`obsidian_mode`	`obsidian_folder`, `folder_export`, or `local_rest_api`	`obsidian_folder`
`obsidian_vault_path`	Path to your Obsidian vault or output folder	—
`llm_backend`	`ollama`, `claude`, or `openai`	`ollama`
`llm_model`	Ollama model name	`llama3.2`
`mlx_whisper_model`	Whisper model (HuggingFace repo or local path)	`mlx-community/whisper-large-v3-turbo`
`audio_capture_method`	`auto`, `systemdump`, `audiocap`, or `blackhole`	`auto`
`diarization_num_speakers`	Expected speaker count for resemblyzer (0 = use attendee count or auto-detect)	`0`
`screenshot_capture_enabled`	Capture meeting window screenshots for visual context	`true`
`screenshot_interval_seconds`	How often to capture a screenshot	`30`
`keep_screenshots`	Keep PNG files after OCR (default: delete after processing)	`false`
`create_people_stubs`	Auto-create Obsidian People pages for new attendees	`false`
`people_stub_folder`	Vault-relative folder for People pages	`People`

Cloud LLM backends (optional)

If you want to use Claude or OpenAI for summaries instead of a local Ollama model, enter your API key in Settings → LLM. Keys are stored in the macOS Keychain — they are never written to config.yaml.

You can also set the key as an environment variable before launching:

export ANTHROPIC_API_KEY="sk-ant-..."   # for Claude
export OPENAI_API_KEY="sk-..."           # for OpenAI
export GEMINI_API_KEY="..."              # for Google Gemini

Export Options

Settings → Export lets you choose how notes are saved:

Obsidian vault (folder) — writes markdown files directly into your Obsidian vault. "Open in Obsidian" links deep-link into the correct file. Recommended if you use Obsidian.

Folder (any location) — saves to any folder. Works without Obsidian.

Obsidian vault (REST API) — writes via the Local REST API community plugin.

Output structure

{vault}/
  Meetings/
    Notes/YYYY-MM-DD/
      {Meeting Title}.md              ← hub note: summary, decisions, action items
    Transcripts/YYYY-MM-DD/
      {Meeting Title} — transcript.md
    Outlines/YYYY-MM-DD/
      {Meeting Title} — outline.md

All paths and filename patterns are configurable in Settings.

How It Works

The app is two independent processes communicating over HTTP and WebSocket:

Swift app (.app)                     Python backend (FastAPI, localhost:8000)
─────────────────────────────────    ────────────────────────────────────────
Menu bar + floating HUD              mlx-whisper transcription (streaming + batch)
ScreenCaptureKit → dual named pipes  resemblyzer speaker diarization
ScreenshotCapture → session dir ──►  Vision OCR (tools/ocr_helper_bin)
ProcessManager (spawns Python)       Ollama / Claude / OpenAI summarization
BackendService (HTTP client)  ◄──►   ObsidianWriter → vault export
WebSocketService (live updates)      People registry (SQLite)
EventKit calendar read               Speaker profiles (SQLite)
Settings UI

The Swift app handles the three things only a signed .app can do on macOS: Screen Recording permission, menu bar presence, and spawning the Python backend. All AI processing lives in Python.

Audio + processing pipeline

ScreenCaptureKit (system audio + mic, two separate named pipes)
  → mic pipe → all segments tagged "You" (no diarization needed)
  → system pipe → streaming transcription (10s chunks, live WebSocket updates)
  → on stop: batch transcription (full audio, higher accuracy)
  → resemblyzer diarization (remote speakers only; count from calendar attendees)
  → screenshot OCR (Vision framework, local, non-fatal)
  → LLM: summary (+ visual context), action items, outline, title
  → People registry upsert + vault export

Privacy

All audio processing, transcription, speaker diarization, and OCR run entirely on your Mac — nothing leaves your machine unless you explicitly configure a cloud LLM backend (Claude, OpenAI, or Gemini). Even then, only the meeting transcript is sent to the LLM API; audio never leaves the device.

API keys are stored in the macOS Keychain, not in config.yaml.

Permissions

The app requests the following macOS permissions on first use:

Screen Recording — required to capture system audio and meeting window screenshots via ScreenCaptureKit
Microphone — required to capture your microphone
Calendar — optional, enables meeting detection and auto-record

Troubleshooting

No audio captured Check Screen Recording permission in System Settings → Privacy & Security → Screen Recording.

Backend won't start Port 8000 is tried first; the app auto-tries 8001–8009. Check that your venv is activated and pip install -r requirements.txt completed successfully.

Vault not writing Check the backend log for [RecordingService] Obsidian: lines. Verify obsidian_vault_path points to an existing directory.

Speaker labels wrong or merged Set diarization_num_speakers to the exact number of people in the meeting, or ensure the meeting has a calendar event with attendees so the count is derived automatically.

No visual context in summary Check that screenshot_capture_enabled: true in config. During a recording, verify sessions/<id>/screenshots/ is being populated. After processing, the backend log shows Screen OCR: N context entries.

Mic pipe not connecting (all speakers show as "Others") At recording start, the backend log should show Reading mic-only audio from pipe: /tmp/.... If missing, the dual-pipe path didn't connect — audio capture fell back to single-stream mode.

App freezes

./scripts/capture_freeze.sh   # captures main-thread stack trace
./scripts/stop_all.sh         # force-quit everything

Development

# Python syntax check
source venv/bin/activate
python -m py_compile src/*.py api/*.py

# Swift build check
cd "Swift App/LocalTranscriptionApp"
xcodebuild -scheme LocalTranscriptionApp -configuration Debug build 2>&1 | grep -E "error:|BUILD"

# Start backend only (no Swift app)
source venv/bin/activate
python backend_server.py --port 8000

# Stop everything
./scripts/stop_all.sh

See CLAUDE.md for full codebase guidance and architecture details, and ROADMAP.md for the feature roadmap.

External Dependencies

Dependency	Install	Purpose
Ollama	`brew install ollama`	Local LLM for summarization
ffmpeg	`brew install ffmpeg`	Audio conversion (PCM → WAV)
mlx-whisper	via `requirements.txt`	Apple Silicon transcription
resemblyzer	via `requirements.txt`	Local speaker diarization (no token)
scikit-learn	via `requirements.txt`	Clustering for resemblyzer

Python packages are in requirements.txt. Install with pip install -r requirements.txt.

Name		Name	Last commit message	Last commit date
Latest commit History 145 Commits
Swift App/LocalTranscriptionApp		Swift App/LocalTranscriptionApp
api		api
prompts		prompts
scripts		scripts
src		src
systemAudioDump		systemAudioDump
tests		tests
tools		tools
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
IMPLEMENTATION_PLAN.md		IMPLEMENTATION_PLAN.md
README.md		README.md
ROADMAP.md		ROADMAP.md
backend_server.py		backend_server.py
build.sh		build.sh
config.yaml.example		config.yaml.example
release.sh		release.sh
requirements.txt		requirements.txt
run_python.py		run_python.py
run_with_backend.sh		run_with_backend.sh
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Attend

Features

Requirements

Quick Install (from Release)

Setup (from source)

1. Python environment

2. Ollama (local LLM)

3. ffmpeg

4. Build the Swift app

5. Launch everything

Configuration

Key settings

Cloud LLM backends (optional)

Export Options

Output structure

How It Works

Audio + processing pipeline

Privacy

Permissions

Troubleshooting

Development

External Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Attend

Features

Requirements

Quick Install (from Release)

Setup (from source)

1. Python environment

2. Ollama (local LLM)

3. ffmpeg

4. Build the Swift app

5. Launch everything

Configuration

Key settings

Cloud LLM backends (optional)

Export Options

Output structure

How It Works

Audio + processing pipeline

Privacy

Permissions

Troubleshooting

Development

External Dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages