Skip to content

Add TOML config, text sanitization, custom endpoints#1

Open
Martin-Atrin wants to merge 3 commits intoevoleinik:mainfrom
Martin-Atrin:config-custom-endpoints-hotkey
Open

Add TOML config, text sanitization, custom endpoints#1
Martin-Atrin wants to merge 3 commits intoevoleinik:mainfrom
Martin-Atrin:config-custom-endpoints-hotkey

Conversation

@Martin-Atrin
Copy link
Copy Markdown

@Martin-Atrin Martin-Atrin commented Feb 10, 2026

Summary

  • TOML configuration (~/.config/fnkey/config.toml) with auto-generated template on first launch, Settings menu item to open it
  • Custom API endpoints — any OpenAI-compatible transcription and chat completions API (Groq, OpenAI, vLLM, local llama.cpp, etc.)
  • Always-on text sanitization — LLM-powered cleanup of filler words, repeated words, grammar, and misheard terms after Whisper transcription. Uses a lightweight model (as small as 0.6B params) for sub-200ms latency
  • Separate API keysapi_key for Whisper, polish_api_key for sanitizer, supporting mixed providers (e.g. Groq STT + local sanitizer)
  • Custom system prompt (polish_prompt) for domain-specific term correction via replacement dictionaries
  • Configurable hotkey — fn, option, control, shift, or command
  • Language hint — ISO-639-1 code passed to Whisper for better accuracy
  • Max tokens cap on sanitizer output to prevent hallucination runaway on small models
  • Debug logging (stderr) for audio duration, WAV size, Whisper response, and sanitizer output
  • CLAUDE.md — agent-facing documentation with architecture, setup guide, and code map
  • App icon and code signing support

Test plan

  • Fresh install: verify config.toml template is auto-created on first launch
  • Groq cloud: set api_key and default endpoints, verify transcription works
  • Local endpoints: run llama.cpp whisper + Qwen3-0.6B sanitizer, verify sub-200ms polish
  • Mixed providers: different api_key and polish_api_key, verify both endpoints authenticate correctly
  • Custom polish_prompt: verify domain-specific term corrections
  • always_polish = true: verify hotkey gives polished text, hotkey+modifier gives raw
  • always_polish = false: verify inverse behavior
  • All hotkey options: fn, option, control, shift, command
  • Language hint: set language = "de" etc., verify Whisper respects it
  • Long recordings (>10s): verify full audio is captured and transcribed via debug logs

🤖 Generated with Claude Code

puma and others added 3 commits February 10, 2026 10:34
- Add Config struct with TOML deserialization and serde defaults
- Replace hardcoded Groq URLs/models with configurable endpoints
- Support any OpenAI-compatible transcription/chat API (vLLM, etc.)
- Handle both plain-text and JSON transcription responses
- Add configurable hotkey (fn/option/control/shift/command)
- Add Settings... menu item to open config.toml from menu bar
- Make api_key optional so app launches without config
- Create default config.toml template on first launch
- Fix event tap lifetime (tap+source must outlive NSApp.run())
- Make permission check non-blocking
- Add app icon (gen-icon.py + AppIcon.icns)
- Comprehensive README with permissions troubleshooting guide

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Text sanitization now runs by default (always_polish = true), polish
  modifier skips it for raw output
- Separate polish_api_key field for using different providers for
  Whisper vs sanitizer (falls back to api_key when empty)
- Reworked system prompt for small models: short, direct, with /no_think
  for Qwen3 to keep latency under 200ms
- Custom polish_prompt config field for domain-specific replacement
  dictionaries (misheard term → correct term)
- Language hint passed to Whisper when configured
- CLAUDE.md with agent-facing setup guide, architecture, and code map
- README expanded with full text sanitization docs, local inference
  setup (llama.cpp / MLX), and mixed-provider config examples

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Cap sanitizer output tokens proportionally to input length (floor 64,
  ceiling 1024) to prevent small model hallucination runaway
- Add stderr debug logging for audio duration, WAV size, Whisper
  response, and sanitizer output to aid troubleshooting
- Fix UTF-8 safe string truncation in log output

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Martin-Atrin Martin-Atrin changed the title Add TOML config, custom API endpoints, configurable hotkey Add TOML config, text sanitization, custom endpoints Feb 17, 2026
Copy link
Copy Markdown
Owner

@evoleinik evoleinik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Must Fix

1. API keys stored world-readablestd::fs::write() creates config.toml with default 0644 permissions. Should chmod 0600 after creation since the file holds API keys:

use std::os::unix::fs::PermissionsExt;
std::fs::set_permissions(&toml_path, std::fs::Permissions::from_mode(0o600))?;

2. Default TOML template duplicated — The exact same template string appears in both load_config() and open_settings(). Extract to a const or function to avoid them drifting apart.

3. /no_think hardcoded in default prompts — This is Qwen3-specific. Users switching to GPT-4, Claude, or Llama will have /no_think literally in their system prompt, which models may echo back or misinterpret. Remove from hardcoded defaults; users who need it can add it to their custom polish_prompt.

Should Fix

4. Fn auto-detection fallback removed — The current code has a 5-second fallback from Fn to Option for keyboards that don't emit Fn events (common with external keyboards). This PR removes it silently. That's a breaking change for existing users. Consider keeping the fallback when hotkey = "fn" (the default).

5. Debug logging may leak PIIeprintln!("[fnkey] whisper text: {}", text) logs full transcriptions to stderr → Console.app/system logs. Could contain sensitive dictated content. Consider truncating or making debug logging opt-in via a debug = true config flag.

6. Silent truncation — If the sanitizer hits the max_tokens cap, the user gets half a sentence pasted silently. Should check finish_reason == "length" in the API response and fall back to raw text when truncated.

Minor

7. Config struct verbosity — The 10 separate default_*() functions could be replaced with #[serde(default)] on the struct + a single Default impl.

8. PR scope — 10+ features, +925/-160 lines in one PR. This would be easier to review and safer to merge as 3-4 smaller PRs (config migration, custom endpoints, sanitizer enhancements, cosmetic).

What's Good

  • TOML config with backward compat (legacy api_key file, env var fallback) is well designed
  • Custom endpoints are a real need for local/self-hosted setups
  • JSON response fallback for non-compliant Whisper servers is practical
  • Settings menu item with ObjC delegate is properly implemented
  • The always_polish inversion logic is correct
  • README expansion is thorough

🤖 Generated with Claude Code

@evoleinik
Copy link
Copy Markdown
Owner

Product Feedback

Thanks for the PR — there's some genuinely useful work here. Wanted to share some thoughts on the product direction before we go further.

Love these — clear wins

  • TOML config — needed it, the plain api_key file doesn't scale
  • Custom endpoints — this is the biggest value-add. Opens fnkey to OpenAI, local llama.cpp, vLLM, etc. Huge for adoption
  • Language hint — zero-cost accuracy win for non-English users
  • Settings menu item — good discoverability, users don't need to know the config path

Concerns about scope

fnkey's identity is "hold Fn, speak, paste." One thing, done well. This PR pulls it toward a configurable platform, and I want to be careful about that.

always_polish = true as the default — Today polish is opt-in (Fn+Ctrl). Flipping to always-on changes every user's first experience: added latency, LLM rewrites that may alter meaning or drop content. I'd prefer keeping always_polish = false as default and letting users opt in.

5 hotkey options — The Fn key is the product (it's called "fnkey"). The existing Fn-with-Option-fallback covers the real use case (external keyboards that don't emit Fn). Do we need control/shift/command? Each adds testing surface and edge cases (like the polish modifier collision when hotkey=control).

Separate polish_api_key — Adds config complexity for a niche case. Can add it later when someone asks.

Debug logging always on — Logs full transcriptions to stderr/Console.app. Should be behind a debug = true flag or removed.

Suggested path forward

Would you be open to splitting this up? Something like:

  1. Config migration — TOML config + backward compat + Settings menu item
  2. Custom endpoints — configurable URLs + models + language hint + JSON response handling
  3. Sanitizer enhancements — custom prompt, max_tokens, always_polish (as opt-in)

That way we can land the structural wins quickly and iterate on the behavioral changes.


🤖 Generated with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants