Skip to content

Add on-device transcription, language selection, and account management#4

Open
vood wants to merge 2 commits intomainfrom
claude/ios-dictation-app-Dtwe3
Open

Add on-device transcription, language selection, and account management#4
vood wants to merge 2 commits intomainfrom
claude/ios-dictation-app-Dtwe3

Conversation

@vood
Copy link
Copy Markdown
Contributor

@vood vood commented Mar 7, 2026

Summary

This PR adds comprehensive support for on-device transcription using NVIDIA's Parakeet model, multi-language selection, user account management with subscription tracking, and improves the overall transcription pipeline with better post-processing.

Key Changes

On-Device Transcription

  • New ParakeetTranscriptionService: Implements on-device transcription using FluidAudio framework with NVIDIA Parakeet v3 multilingual model
    • Handles model download, initialization, and transcription with proper state management
    • Includes audio resampling to 16kHz mono format
    • Provides detailed error handling and progress tracking
  • Toggle in Settings: Users can switch between cloud (OpenAI) and on-device (Parakeet) transcription engines
  • Transcription fallback: Both RecordingSheetView and keyboard extension support both transcription methods

Language Selection

  • New LanguageSelectionView: Dedicated UI for selecting transcription languages
  • LanguageManager enhancements:
    • Tracks selected languages and provides API language codes for prompts
    • Supports auto-detection mode
    • Integrated into onboarding flow as new step
  • Language hints in prompts: Language information is passed to transcription APIs to improve accuracy

Account & Subscription Management

  • New Account section in Settings:
    • Displays user email and subscription tier
    • Shows word usage progress for free tier users
    • Upgrade button for free users
    • Sign out functionality
  • AuthManager and SubscriptionManager integration:
    • Word count tracking after transcription
    • Subscription limit checking before transcription
    • Cross-platform support (iOS/macOS) for opening URLs
  • Environment objects: Added to app root and preview for proper dependency injection

Enhanced Transcription Pipeline

  • Post-processing improvements:
    • Dictionary replacements and shortcut expansion applied to both cloud and on-device results
    • Word count tracking for subscription management
    • Formatting rules from tone/style settings passed to cloud API
  • Keyboard extension updates: Now includes language, dictionary, and shortcut hints in transcription prompts

UI/UX Improvements

  • Settings reorganization: Grouped into Account, Language, Transcription Engine, Permissions, Transcription, About, and Data sections
  • Dynamic version display: App version pulled from bundle instead of hardcoded
  • Onboarding enhancement: Added language selection step between microphone and keyboard setup
  • Better error messaging: More descriptive error messages for transcription failures

Technical Details

  • Added FluidAudio framework dependency to keyboard extension target
  • Proper state management for Parakeet model initialization (notInitialized → downloading → initializing → ready)
  • Audio conversion utility for resampling to required format
  • Conditional compilation for platform-specific URL opening (AppKit vs UIKit)

https://claude.ai/code/session_014WnimLYNLwfqYsqZn71BSU

claude added 2 commits March 7, 2026 01:04
…support

- Add language selection view and integrate into onboarding flow and settings
- Add account management section (sign in/out, usage display, upgrade to Pro)
- Add subscription tracking with word count limits and upgrade prompts
- Enhance recording flow with LLM post-processing via transcribeAndFormat
- Update keyboard extension with dictionary, shortcut, and language support
- Add iOS URL opening support to shared AuthManager and SubscriptionManager
- Make LanguageManager init and selectedLanguages public for cross-module use
- Read app version from bundle instead of hardcoding

https://claude.ai/code/session_014WnimLYNLwfqYsqZn71BSU
- Link FluidAudio SPM package to WhisperMateIOS target
- Create ParakeetTranscriptionService for iOS (NVIDIA Parakeet v3 model)
- Add on-device/cloud transcription toggle in iOS Settings
- Integrate Parakeet into RecordingSheetView transcription flow
- On-device mode bypasses cloud API and subscription limits

https://claude.ai/code/session_014WnimLYNLwfqYsqZn71BSU
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Mar 7, 2026

Greptile Summary

This PR adds substantial new functionality: on-device transcription via the FluidAudio/Parakeet model, multi-language selection with a dedicated UI, account/subscription management in settings, and an enriched transcription pipeline (prompt hints, post-processing, word-count tracking). The changes span the iOS app, keyboard extension, and shared services layers and are generally well-structured.

Two functional bugs were found that need attention before merging:

  • Subscription gate bypass (RecordingSheetView.swift): transcribeWithParakeet skips the SubscriptionManager.checkCanTranscribe() call that transcribeWithCloud makes before processing audio. Users can bypass the word-limit entirely by toggling the on-device engine.
  • Cannot retry after initialization failure (ParakeetTranscriptionService.swift): initialize() guards on .notInitialized only. Once the state transitions to .error(...) after a failed download, every subsequent tap of "Download" returns early and does nothing, leaving the user stuck.
  • Thread-safety (ParakeetTranscriptionService.swift): cleanup() mutates @Published properties directly without dispatching to the main actor, which is unsafe and will emit runtime warnings.
  • Duplicate style instructions: When formattingRules is non-empty, tone/style instructions end up in both the prompt string and the formattingRules array sent to transcribeAndFormat, resulting in the same instructions being passed twice to the API.

Confidence Score: 2/5

  • Not safe to merge — the on-device path bypasses subscription enforcement and the Parakeet service cannot recover from initialization errors without an app restart.
  • Two of the four issues are functional bugs with business-logic impact: the subscription bypass undermines monetization, and the unrecoverable error state in the Parakeet service degrades UX. Both require code changes before shipping.
  • Whishpermate/WhisperMateIOS/ParakeetTranscriptionService.swift and Whishpermate/WhisperMateIOS/RecordingSheetView.swift need the most attention.

Important Files Changed

Filename Overview
Whishpermate/WhisperMateIOS/ParakeetTranscriptionService.swift New on-device transcription service with two bugs: cannot retry after error state due to overly strict guard, and cleanup() mutates @Published properties off the main thread.
Whishpermate/WhisperMateIOS/RecordingSheetView.swift Parakeet transcription path omits subscription limit check present in the cloud path, allowing users to bypass word-limit enforcement; also duplicates tone/style instructions in the same API call.
Whishpermate/WhisperMateShared/Services/SubscriptionManager.swift Added UIKit URL-opening support for iOS alongside the existing AppKit path; logic is correct and uses proper !targetEnvironment(macCatalyst) guard.
Whishpermate/WhisperMateShared/Services/AuthManager.swift Added UIKit URL-opening for iOS sign-up flow with correct platform guards; change is safe.
Whishpermate/WhisperMateShared/Models/Language.swift Visibility fixes (public init, public var selectedLanguages) to allow use from iOS target; logic is correct and default to auto-detect is well-handled.
Whishpermate/WhisperMateIOS/ContentView.swift New Account, Language, and Transcription Engine settings sections added; dynamic version string and environment object wiring look correct.
Whishpermate/WhisperMateIOS/OnboardingView.swift New language selection step inserted between microphone and keyboard setup; creates its own LanguageManager instance which saves to shared AppDefaults, so settings persist correctly into ContentView.
Whishpermate/WhisperMateKeyboard/KeyboardViewController.swift Keyboard extension now enriches the transcription prompt with language, dictionary, and shortcut hints and applies post-processing; does not enforce subscription limits pre-transcription but records word usage after.

Sequence Diagram

sequenceDiagram
    participant User
    participant RecordingSheetView
    participant SubscriptionManager
    participant ParakeetService
    participant OpenAIClient

    User->>RecordingSheetView: Stop recording
    RecordingSheetView->>RecordingSheetView: transcribeAudio(audioURL)

    alt useOnDeviceTranscription = true
        RecordingSheetView->>ParakeetService: transcribe(audioURL)
        Note over RecordingSheetView,ParakeetService: ⚠️ No subscription check!
        ParakeetService-->>RecordingSheetView: result text
        RecordingSheetView->>RecordingSheetView: applyReplacements + expandShortcuts
        RecordingSheetView->>SubscriptionManager: recordWords(count)
    else useOnDeviceTranscription = false
        RecordingSheetView->>SubscriptionManager: checkCanTranscribe()
        SubscriptionManager-->>RecordingSheetView: canTranscribe / reason
        alt canTranscribe = false
            RecordingSheetView-->>User: Show limit error
        else canTranscribe = true
            RecordingSheetView->>OpenAIClient: transcribe / transcribeAndFormat
            OpenAIClient-->>RecordingSheetView: result text
            RecordingSheetView->>RecordingSheetView: applyReplacements + expandShortcuts
            RecordingSheetView->>SubscriptionManager: recordWords(count)
        end
    end
    RecordingSheetView-->>User: Show transcription
Loading

Comments Outside Diff (1)

  1. Whishpermate/WhisperMateIOS/RecordingSheetView.swift, line 378-405 (link)

    Tone/style instructions are duplicated in the same API call

    When formattingRules is non-empty, transcribeAndFormat is called. However, at this point promptText already contains toneStyleManager.allInstructions (added at line ~382 of the full file), and formattingRules contains the same per-style .instructions. This means the style instructions are sent twice in the same request: once inside the prompt string and once as formattingRules.

    Consider removing the style instructions from promptComponents when formattingRules is non-empty, or unifying how style instructions are passed so they are only sent once.

Last reviewed commit: 94b7d63

Comment on lines +55 to +59
func initialize() async throws {
guard case .notInitialized = state else {
DebugLog.info("Already initialized or in progress", context: "ParakeetTranscriptionService")
return
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot retry after initialization failure

The guard only allows re-entry when the state is .notInitialized. After a failed download or initialization, the state transitions to .error(...), which means calling initialize() again (e.g., when the user taps "Download" a second time in Settings) silently returns with the "Already initialized or in progress" log message and does nothing.

The guard should also permit entry from the .error state to allow retries:

Suggested change
func initialize() async throws {
guard case .notInitialized = state else {
DebugLog.info("Already initialized or in progress", context: "ParakeetTranscriptionService")
return
}
guard case .notInitialized = state, case .error(_) = state else {

Or more clearly:

guard state == .notInitialized || {
    if case .error = state { return true }
    return false
}() else {
    DebugLog.info("Already initialized or in progress", context: "ParakeetTranscriptionService")
    return
}
// Reset to notInitialized before proceeding
await MainActor.run { self.state = .notInitialized }

As written, a user who encounters a download failure has no way to retry without restarting the app.

Comment on lines +277 to +321
private func transcribeWithParakeet(audioURL: URL) {
sheetState = .processing

Task {
do {
let result = try await ParakeetTranscriptionService.shared.transcribe(audioURL: audioURL)

// Apply post-processing
var processedResult = result
processedResult = dictionaryManager.applyReplacements(to: processedResult)
processedResult = shortcutManager.expandShortcuts(in: processedResult)

// Track word count
let wordCount = processedResult.split(separator: " ").count
await SubscriptionManager.shared.recordWords(wordCount)

await MainActor.run {
transcription = processedResult
sheetState = .viewing
errorMessage = ""

let duration = recordingStartTime.map { Date().timeIntervalSince($0) }
let recordingID = UUID()
let permanentAudioURL = historyManager.saveAudioFile(from: audioURL, for: recordingID)

let recording = Recording(
id: recordingID,
transcription: processedResult,
duration: duration,
audioFileURL: permanentAudioURL
)
historyManager.addRecording(recording)
currentRecording = recording

try? FileManager.default.removeItem(at: audioURL)
}
} catch {
await MainActor.run {
transcription = ""
sheetState = .viewing
errorMessage = "On-device transcription failed: \(error.localizedDescription)"
}
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On-device path bypasses subscription limits

transcribeWithParakeet skips the subscription check that transcribeWithCloud performs before transcribing. This means users who enable on-device (Parakeet) transcription can transcribe unlimited audio regardless of their subscription tier or word limit — effectively making the subscription gate defeatable by simply toggling a switch.

The subscription check should be added at the top of transcribeWithParakeet, mirroring the guard in transcribeWithCloud:

private func transcribeWithParakeet(audioURL: URL) {
    // Check subscription limits
    let subscriptionCheck = SubscriptionManager.shared.checkCanTranscribe()
    if !subscriptionCheck.canTranscribe {
        errorMessage = subscriptionCheck.reason ?? "Transcription limit reached"
        sheetState = .viewing
        try? FileManager.default.removeItem(at: audioURL)
        return
    }

    sheetState = .processing
    // ...
}

Comment on lines +138 to +144
func cleanup() {
asrManager?.cleanup()
asrManager = nil
models = nil
state = .notInitialized
isModelDownloaded = false
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanup() mutates @Published properties off the main thread

cleanup() directly assigns to state and isModelDownloaded, both of which are @Published properties requiring main-thread access. This is called from arbitrary contexts and will cause a runtime warning (or crash in strict concurrency builds) because @Published property updates must happen on the main actor.

Suggested change
func cleanup() {
asrManager?.cleanup()
asrManager = nil
models = nil
state = .notInitialized
isModelDownloaded = false
}
func cleanup() {
asrManager?.cleanup()
asrManager = nil
models = nil
Task { @MainActor in
self.state = .notInitialized
self.isModelDownloaded = false
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants