Skip to content

NotGeorgeMessier/nitro-speech

Repository files navigation

nitro-speech

npm version license npm downloads

If you hit an issue, please open a GitHub issue or reach out to me on Discord / Twitter (X) — response is guaranteed.

React Native Real-Time Speech Recognition Library, powered by Nitro Modules.

Compatibility:

‼️ Newest versions of @gmessier/nitro-speech requires react-native-nitro-modules 0.35.0 or higher.

Compatibility Supported versions
react-native-nitro-modules <= 0.34.* @gmessier/nitro-speech <= 0.2.*
react-native-nitro-modules >= 0.35.* @gmessier/nitro-speech >= 0.3.*

Key Features:

  • Built on Nitro Modules for low-overhead native bridging
  • Uses newest advanced SpeechAnalyzer and SpeechTranscriber API for iOS 26+ (with fallback to legacy SFSpeechRecognition for older versions)
  • Configurable Timer for silence (default: 8 sec)
    • Callback onAutoFinishProgress for progress bars, etc...
    • Method addAutoFinishTime for single timer update
    • Method updateAutoFinishTime for constant timer update
  • Configurable Haptic Feedback on start and finish
  • Flexible onVolumeChange to display input volume in UI with built-in useVoiceInputVolume hook
  • Speech-quality configurations:
    • Result is grouped by speech segments into Batches.
    • Param disableRepeatingFilter for consecutive duplicate-word filtering.
    • Param androidDisableBatchHandling for removing empty recognition result.
  • Embedded Permission handling
    • Callback onPermissionDenied - if user denied the request
  • Everything else that could be found in Expo or other libraries

Table of Contents

Installation

npm install @gmessier/nitro-speech react-native-nitro-modules
# or
yarn add @gmessier/nitro-speech react-native-nitro-modules
# or 
bun add @gmessier/nitro-speech react-native-nitro-modules

Expo

This library works with Expo. You need to run prebuild to generate native code:

npx expo prebuild

Note: Make sure New Arch is enabled in your Expo configuration before running prebuild.

iOS

cd ios && pod install

Android

No additional setup required.

Permissions

Android

The library declares the required permission in its AndroidManifest.xml (merged automatically):

<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.VIBRATE" />

iOS

Add the following keys to your app's Info.plist:

<key>NSMicrophoneUsageDescription</key>
<string>This app needs microphone access for speech recognition</string>
<key>NSSpeechRecognitionUsageDescription</key>
<string>This app needs speech recognition to convert speech to text</string>

Both permissions are required for speech recognition to work on iOS.

Features

Feature Description iOS Android
Real-time transcription Get partial results as the user speaks, enabling live UI updates
Auto-stop on silence Automatically stops recognition after configurable inactivity period (default: 8s)
Auto-finish progress Progress callbacks showing countdown until auto-stop (TODO)
Haptic feedback Optional haptics on recording start/stop
Background handling Auto-stop when app loses focus/goes to background Not Safe (TODO)
Permission handling Dedicated onPermissionDenied callback
Voice input volume Normalized voice input level for UI meters (useVoiceInputVolume)
Repeating word filter Removes consecutive duplicate words from artifacts
Locale support Configure speech recognizer for different languages
Contextual strings Domain-specific vocabulary for improved accuracy
Automatic punctuation Adds punctuation to transcription (iOS 16+) Auto
Language model selection Choose between web search vs free-form models Auto
Offensive word masking Control whether offensive words are masked Auto
Formatting quality Prefer quality vs speed in formatting Auto

Usage

Recommended: useRecognizer Hook

useRecognizer is lifecycle-aware. It calls stopListening() during cleanup (unmount or destroyDeps change).
Because of that, treat it as a single session owner setup hook: use it once per recognition session/screen, where you define callbacks.

import { useRecognizer } from '@gmessier/nitro-speech';

function MyComponent() {
  const { 
    startListening, 
    stopListening, 
    addAutoFinishTime, 
    updateAutoFinishTime 
  } = useRecognizer({
    onReadyForSpeech: () => {
      console.log('Listening...');
    },
    onResult: (textBatches) => {
      console.log('Result:', textBatches.join('\n'));
    },
    onRecordingStopped: () => {
      console.log('Stopped');
    },
    onAutoFinishProgress: (timeLeftMs) => {
      console.log('Auto-stop in:', timeLeftMs, 'ms');
    },
    onError: (error) => {
      console.log('Error:', error);
    },
    onPermissionDenied: () => {
      console.log('Permission denied');
    },
  });

  return (
    <View>
      <TouchableOpacity onPress={() => startListening({ 
        locale: 'en-US',
        disableRepeatingFilter: false,
        autoFinishRecognitionMs: 8000,
        
        contextualStrings: ['custom', 'words'],
        // Haptics (both platforms)
        startHapticFeedbackStyle: 'medium',
        stopHapticFeedbackStyle: 'light',
        // iOS specific
        iosAddPunctuation: true,
        // Android specific
        maskOffensiveWords: false,
        androidFormattingPreferQuality: false,
        androidUseWebSearchModel: false,
        androidDisableBatchHandling: false,
      })}>
        <Text>Start Listening</Text>
      </TouchableOpacity>
      <TouchableOpacity onPress={stopListening}>
        <Text>Stop Listening</Text>
      </TouchableOpacity>
      <TouchableOpacity onPress={() => addAutoFinishTime(5000)}>
        <Text>Add 5s to Timer</Text>
      </TouchableOpacity>
      <TouchableOpacity onPress={() => updateAutoFinishTime(10000)}>
        <Text>Update Timer to 10s</Text>
      </TouchableOpacity>
    </View>
  );
}

Use the handlers returned by this single hook instance inside that owner component.
For other components, avoid creating another useRecognizer instance for the same session.

With React Navigation (important)

React Navigation doesn’t unmount screens when you navigate — the screen can stay mounted in the background and come back without remounting. See: Navigation lifecycle (React Navigation).

Because of that, prefer tying recognition cleanup to focus state, not just component unmount. A simple approach is useIsFocused() and passing it into useRecognizer’s destroyDeps so recognition stops when the screen blurs. See: [useIsFocused (React Navigation)](https://reactnavigation.org/docs/8.x/use-is-focused).

const isFocused = useIsFocused();
const { 
  // ...
} = useRecognizer(
  {
    // ...
  },
  [isFocused]
);

Cross-component control: RecognizerRef

If you need to call recognizer methods from other components without prop drilling, use RecognizerRef.

import { RecognizerRef } from '@gmessier/nitro-speech';

RecognizerRef.startListening({ locale: 'en-US' });
RecognizerRef.addAutoFinishTime(5000);
RecognizerRef.updateAutoFinishTime(10000, true);
RecognizerRef.getIsActive();
RecognizerRef.stopListening();

RecognizerRef exposes only method handlers and is safe for cross-component method access.

Voice input volume

useVoiceInputVolume

By default you have access to useVoiceInputVolume to read normalized voice input level (0..1) for UI meters. ⚠️ Technical limitation: this approach re-renders component a lot.

import { useVoiceInputVolume } from '@gmessier/nitro-speech';

function VoiceMeter() {
  const volume = useVoiceInputVolume();
  return <Text>{volume.toFixed(2)}</Text>;
}

Reanimated: useSharedValue, worklets, UI thread

As a better alternative you can control volume via SharedValue and apply it only on UI thread with Reanimated. This way you will avoid re-renders since the volume will be stored on UI thread

function VoiceMeter() {
  const sharedVolume = useSharedValue(0)
  const {
    // ...
  } = useRecognizer(
    {
      // ...
      onVolumeChange: (normVolume) => {
        "worklet";
        sharedVolume.value = normValue
      },
      // ...
    }
  );
}

Unsafe: RecognizerSession

RecognizerSession is the hybrid object. It gives direct access to callbacks and control methods, but it is unsafe to orchestrate the full session directly from it.

import { RecognizerSession, unsafe_onVolumeChange } from '@gmessier/nitro-speech';

// Set up callbacks
RecognizerSession.onReadyForSpeech = () => {
  console.log('Listening...');
};

RecognizerSession.onResult = (textBatches) => {
  console.log('Result:', textBatches.join('\n'));
};

RecognizerSession.onRecordingStopped = () => {
  console.log('Stopped');
};

RecognizerSession.onAutoFinishProgress = (timeLeftMs) => {
  console.log('Auto-stop in:', timeLeftMs, 'ms');
};

RecognizerSession.onError = (error) => {
  console.log('Error:', error);
};

RecognizerSession.onPermissionDenied = () => {
  console.log('Permission denied');
};

RecognizerSession.onVolumeChange = (volume) => {
  console.log('new volume: ', volume);
};
// OR use unsafe_onVolumeChange to enable useVoiceInputVolume hook manually
RecognizerSession.onVolumeChange = unsafe_onVolumeChange


// Start listening
RecognizerSession.startListening({
  locale: 'en-US',
});

// Stop listening
RecognizerSession.stopListening();

// Manually add time to auto finish timer
RecognizerSession.addAutoFinishTime(5000); // Add 5 seconds
RecognizerSession.addAutoFinishTime(); // Reset to original time

// Update auto finish time
RecognizerSession.updateAutoFinishTime(10000); // Set to 10 seconds
RecognizerSession.updateAutoFinishTime(10000, true); // Set to 10 seconds and refresh progress

⚠️ About dispose()

The RecognizerSession.dispose() method is NOT SAFE and should rarely be used. Hybrid Objects in Nitro are typically managed by the JS garbage collector automatically. Only call dispose() in performance-critical scenarios where you need to eagerly destroy objects.

See: Nitro dispose() documentation

API Reference

useRecognizer(callbacks, destroyDeps?)

Usage notes

  • Use useRecognizer once per session/screen as the session setup owner.
  • Cleanup stops recognition, so mounting multiple instances can unexpectedly end an active session.
  • For method access in non-owner components, use RecognizerRef.

Parameters

  • callbacks (object):
    • onReadyForSpeech?: () => void - Called when speech recognition starts
    • onResult?: (textBatches: string[]) => void - Called every time when partial result is ready (array of text batches)
    • onRecordingStopped?: () => void - Called when recording stops
    • onAutoFinishProgress?: (timeLeftMs: number) => void - Called each second during auto-finish countdown
    • onError?: (message: string) => void - Called when an error occurs
    • onPermissionDenied?: () => void - Called if microphone permission is denied
  • destroyDeps (array, optional) - Additional dependencies for the cleanup effect. When any of these change (or the component unmounts), recognition is stopped.

Returns

  • startListening(params: SpeechToTextParams) - Start speech recognition with the given parameters
  • stopListening() - Stop speech recognition
  • addAutoFinishTime(additionalTimeMs?: number) - Add time to the auto-finish timer (or reset to original if no parameter)
  • updateAutoFinishTime(newTimeMs: number, withRefresh?: boolean) - Update the auto-finish timer
  • getIsActive() - Returns true if the speech recognition is active

RecognizerRef

  • startListening(params: SpeechToTextParams)
  • stopListening()
  • addAutoFinishTime(additionalTimeMs?: number)
  • updateAutoFinishTime(newTimeMs: number, withRefresh?: boolean)
  • getIsActive()

useVoiceInputVolume

  • useVoiceInputVolume(): number

RecognizerSession

  • Exposes callbacks (onReadyForSpeech, onResult, etc.) and control methods.
  • Prefer useRecognizer (single owner) + RecognizerRef for app-level usage.

SpeechToTextParams

Configuration object for speech recognition.

Common Parameters

  • locale?: string - Language locale (default: "en-US")
  • autoFinishRecognitionMs?: number - Auto-stop timeout in milliseconds (default: 8000)
  • contextualStrings?: string[] - Array of domain-specific words for better recognition
  • disableRepeatingFilter?: boolean - Disable filter that removes consecutive duplicate words (default: false)
  • startHapticFeedbackStyle?: 'light' | 'medium' | 'heavy' | 'none' - Haptic feedback style when microphone starts recording (default: "medium")
  • stopHapticFeedbackStyle?: 'light' | 'medium' | 'heavy' | 'none' - Haptic feedback style when microphone stops recording (default: "medium")
  • maskOffensiveWords?: boolean - Mask offensive words with asterisks. (Android 13+, iOS 26+, default: false. iOS <26: always false)

iOS-Specific Parameters

  • iosAddPunctuation?: boolean - Add punctuation to results (iOS 16+, default: true)

Android-Specific Parameters

  • androidFormattingPreferQuality?: boolean - Prefer quality over latency (Android 13+, default: false)
  • androidUseWebSearchModel?: boolean - Use web search language model instead of free-form (default: false)
  • androidDisableBatchHandling?: boolean - Disable default batch handling (may add many empty batches, default: false)

Requirements

  • React Native >= 0.76
  • New Arch Only
  • react-native-nitro-modules

Troubleshooting

Android Gradle sync issues

If you're having issues with Android Gradle sync, try running the prebuild for the core Nitro library:

cd android && ./gradlew :react-native-nitro-modules:preBuild

License

MIT

About

React Native Speech Recognition Library powered by Nitro Modules

Resources

Stars

Watchers

Forks

Contributors