Skip to content

nickybmon/Attend

Repository files navigation

Attend

A privacy-first macOS meeting recorder and note-taker. Records system audio and microphone via ScreenCaptureKit, transcribes locally with mlx-whisper on Apple Silicon, identifies speakers with resemblyzer, captures visual context via on-device OCR, generates AI summaries and action items, and exports structured notes — no cloud required.

Built as a local, private alternative to Shadow and Granola.


Features

  • Automatic transcription — mlx-whisper runs on Apple Silicon GPU, fully offline
  • Dual-stream speaker diarization — your mic is always "You"; remote speakers are clustered by resemblyzer. No HuggingFace token required, no cloud calls
  • Attendee-informed speaker count — calendar attendee list tells resemblyzer exactly how many remote speakers to expect, reducing split/merge errors
  • Visual context capture — periodic screenshots of the meeting window, OCR'd locally via Apple Vision framework, injected into the LLM summary prompt
  • AI summaries + action items — structured notes with decisions, action items table, key sections, and outline
  • Flexible export — writes markdown to any folder, or directly into an Obsidian vault with full frontmatter and wikilinks
  • Calendar awareness — reads Apple Calendar via EventKit, shows upcoming meetings in the menu bar, floating alert panel 5 minutes before meetings
  • Auto-record — starts recording automatically when a calendar meeting begins; auto-stops at event end
  • Live transcript — real-time transcript visible in the HUD during recording
  • Ask during meeting — query what's been said so far via a live query panel
  • People registry — tracks known attendees across meetings, creates Obsidian People pages
  • Speaker voice learning — stores resemblyzer embeddings after you assign names; auto-assigns known speakers in future meetings
  • Global hotkeyCmd+Shift+R toggles recording from any app
  • Menu bar app — no Dock icon, runs quietly in the background

Requirements

  • macOS 14.4+ (required for ScreenCaptureKit audio capture APIs)
  • Apple Silicon Mac (M1 or later — mlx-whisper and Vision OCR use the GPU/ANE)
  • Python 3.11+
  • Ollama for local LLM summarization (or a Claude / OpenAI API key)
  • ffmpeg (brew install ffmpeg)

Quick Install (from Release)

If you downloaded a release DMG:

  1. Open the DMG and drag Attend to Applications
  2. Clone this repo (needed for the Python backend):
    git clone https://github.com/nickybmon/Attend.git
    cd Attend
  3. Run the setup script:
    ./setup.sh
  4. Launch the app — right-click → Open on first launch (bypasses Gatekeeper since the app isn't code-signed)
  5. Grant Screen Recording and Microphone permissions when prompted

The app starts the Python backend automatically. Make sure Ollama is running (ollama serve) for AI summaries.


Setup (from source)

1. Python environment

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

2. Ollama (local LLM)

brew install ollama
ollama serve          # start in a separate terminal, or set it to run at login
ollama pull llama3.2  # or any model you prefer

3. ffmpeg

brew install ffmpeg

4. Build the Swift app

./build.sh

This also compiles the OCR helper CLI (tools/ocr_helper_bin) if not already built.

5. Launch everything

./run_with_backend.sh

Builds the app, starts the Python backend, and launches the menu bar app. The icon appears in your menu bar.


Configuration

On first run, config.yaml is created automatically. Edit it directly or use the Settings window (menu bar icon → Settings, or ⌘,).

Key settings

Setting Description Default
obsidian_mode obsidian_folder, folder_export, or local_rest_api obsidian_folder
obsidian_vault_path Path to your Obsidian vault or output folder
llm_backend ollama, claude, or openai ollama
llm_model Ollama model name llama3.2
mlx_whisper_model Whisper model (HuggingFace repo or local path) mlx-community/whisper-large-v3-turbo
audio_capture_method auto, systemdump, audiocap, or blackhole auto
diarization_num_speakers Expected speaker count for resemblyzer (0 = use attendee count or auto-detect) 0
screenshot_capture_enabled Capture meeting window screenshots for visual context true
screenshot_interval_seconds How often to capture a screenshot 30
keep_screenshots Keep PNG files after OCR (default: delete after processing) false
create_people_stubs Auto-create Obsidian People pages for new attendees false
people_stub_folder Vault-relative folder for People pages People

Cloud LLM backends (optional)

If you want to use Claude or OpenAI for summaries instead of a local Ollama model, enter your API key in Settings → LLM. Keys are stored in the macOS Keychain — they are never written to config.yaml.

You can also set the key as an environment variable before launching:

export ANTHROPIC_API_KEY="sk-ant-..."   # for Claude
export OPENAI_API_KEY="sk-..."           # for OpenAI
export GEMINI_API_KEY="..."              # for Google Gemini

Export Options

Settings → Export lets you choose how notes are saved:

Obsidian vault (folder) — writes markdown files directly into your Obsidian vault. "Open in Obsidian" links deep-link into the correct file. Recommended if you use Obsidian.

Folder (any location) — saves to any folder. Works without Obsidian.

Obsidian vault (REST API) — writes via the Local REST API community plugin.

Output structure

{vault}/
  Meetings/
    Notes/YYYY-MM-DD/
      {Meeting Title}.md              ← hub note: summary, decisions, action items
    Transcripts/YYYY-MM-DD/
      {Meeting Title} — transcript.md
    Outlines/YYYY-MM-DD/
      {Meeting Title} — outline.md

All paths and filename patterns are configurable in Settings.


How It Works

The app is two independent processes communicating over HTTP and WebSocket:

Swift app (.app)                     Python backend (FastAPI, localhost:8000)
─────────────────────────────────    ────────────────────────────────────────
Menu bar + floating HUD              mlx-whisper transcription (streaming + batch)
ScreenCaptureKit → dual named pipes  resemblyzer speaker diarization
ScreenshotCapture → session dir ──►  Vision OCR (tools/ocr_helper_bin)
ProcessManager (spawns Python)       Ollama / Claude / OpenAI summarization
BackendService (HTTP client)  ◄──►   ObsidianWriter → vault export
WebSocketService (live updates)      People registry (SQLite)
EventKit calendar read               Speaker profiles (SQLite)
Settings UI

The Swift app handles the three things only a signed .app can do on macOS: Screen Recording permission, menu bar presence, and spawning the Python backend. All AI processing lives in Python.

Audio + processing pipeline

ScreenCaptureKit (system audio + mic, two separate named pipes)
  → mic pipe → all segments tagged "You" (no diarization needed)
  → system pipe → streaming transcription (10s chunks, live WebSocket updates)
  → on stop: batch transcription (full audio, higher accuracy)
  → resemblyzer diarization (remote speakers only; count from calendar attendees)
  → screenshot OCR (Vision framework, local, non-fatal)
  → LLM: summary (+ visual context), action items, outline, title
  → People registry upsert + vault export

Privacy

All audio processing, transcription, speaker diarization, and OCR run entirely on your Mac — nothing leaves your machine unless you explicitly configure a cloud LLM backend (Claude, OpenAI, or Gemini). Even then, only the meeting transcript is sent to the LLM API; audio never leaves the device.

API keys are stored in the macOS Keychain, not in config.yaml.


Permissions

The app requests the following macOS permissions on first use:

  • Screen Recording — required to capture system audio and meeting window screenshots via ScreenCaptureKit
  • Microphone — required to capture your microphone
  • Calendar — optional, enables meeting detection and auto-record

Troubleshooting

No audio captured Check Screen Recording permission in System Settings → Privacy & Security → Screen Recording.

Backend won't start Port 8000 is tried first; the app auto-tries 8001–8009. Check that your venv is activated and pip install -r requirements.txt completed successfully.

Vault not writing Check the backend log for [RecordingService] Obsidian: lines. Verify obsidian_vault_path points to an existing directory.

Speaker labels wrong or merged Set diarization_num_speakers to the exact number of people in the meeting, or ensure the meeting has a calendar event with attendees so the count is derived automatically.

No visual context in summary Check that screenshot_capture_enabled: true in config. During a recording, verify sessions/<id>/screenshots/ is being populated. After processing, the backend log shows Screen OCR: N context entries.

Mic pipe not connecting (all speakers show as "Others") At recording start, the backend log should show Reading mic-only audio from pipe: /tmp/.... If missing, the dual-pipe path didn't connect — audio capture fell back to single-stream mode.

App freezes

./scripts/capture_freeze.sh   # captures main-thread stack trace
./scripts/stop_all.sh         # force-quit everything

Development

# Python syntax check
source venv/bin/activate
python -m py_compile src/*.py api/*.py

# Swift build check
cd "Swift App/LocalTranscriptionApp"
xcodebuild -scheme LocalTranscriptionApp -configuration Debug build 2>&1 | grep -E "error:|BUILD"

# Start backend only (no Swift app)
source venv/bin/activate
python backend_server.py --port 8000

# Stop everything
./scripts/stop_all.sh

See CLAUDE.md for full codebase guidance and architecture details, and ROADMAP.md for the feature roadmap.


External Dependencies

Dependency Install Purpose
Ollama brew install ollama Local LLM for summarization
ffmpeg brew install ffmpeg Audio conversion (PCM → WAV)
mlx-whisper via requirements.txt Apple Silicon transcription
resemblyzer via requirements.txt Local speaker diarization (no token)
scikit-learn via requirements.txt Clustering for resemblyzer

Python packages are in requirements.txt. Install with pip install -r requirements.txt.

About

Local macOS meeting transcriber for private notetaking

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors