-
Notifications
You must be signed in to change notification settings - Fork 13
Voice sdk restructure #210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
11c28b7
Voice agents: hide Flow, add integrations-first overview, add Voice S…
905a543
Voice agents: refactor and expand Voice SDK docs
bb79643
Update docs/voice-agents/voice-sdk.mdx
ArchieMcM234 1483f42
Update docs/voice-agents/assets/quickstart.py
ArchieMcM234 05057e4
Resolved errors and enhanced some sections for clarity
ArchieMcM234 5045ec9
Fix duplicated 'intelligent segmentation'
ArchieMcM234 b18b4a4
Merge branch 'main' into voice-sdk-restructure
ArchieMcM234 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| from speechmatics.voice import AdditionalVocabEntry, VoiceAgentConfig | ||
|
|
||
| config = VoiceAgentConfig( | ||
| language="en", | ||
| additional_vocab=[ | ||
| AdditionalVocabEntry( | ||
| content="Speechmatics", | ||
| sounds_like=["speech matters", "speech matics"] | ||
| ), | ||
| AdditionalVocabEntry(content="API"), | ||
| ] | ||
| ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| from speechmatics.voice import ( | ||
| EndOfUtteranceMode, | ||
| SpeakerFocusConfig, | ||
| SpeakerFocusMode, | ||
| SpeakerIdentifier, | ||
| VoiceAgentConfig, | ||
| VoiceAgentConfigPreset, | ||
| ) | ||
|
|
||
| overrides = VoiceAgentConfig( | ||
| end_of_utterance_mode=EndOfUtteranceMode.ADAPTIVE, | ||
| enable_diarization=True, | ||
| speaker_config=SpeakerFocusConfig( | ||
| focus_speakers=["S1"], | ||
| focus_mode=SpeakerFocusMode.RETAIN, | ||
| ), | ||
| known_speakers=[ | ||
| SpeakerIdentifier(label="Alice", speaker_identifiers=["XX...XX"]), | ||
| ], | ||
| ) | ||
|
|
||
| config = VoiceAgentConfigPreset.ADAPTIVE(overrides) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,36 @@ | ||
| from speechmatics.voice import ( | ||
| AdditionalVocabEntry, | ||
| AudioEncoding, | ||
| OperatingPoint, | ||
| VoiceAgentConfig, | ||
| VoiceAgentConfigPreset, | ||
| ) | ||
|
|
||
| overrides = VoiceAgentConfig( | ||
| # Language and locale | ||
| language="en", # e.g. "en", "es", "fr" | ||
| output_locale=None, # e.g. "en-GB", "en-US" | ||
|
|
||
| # Model selection | ||
| operating_point=OperatingPoint.ENHANCED, # STANDARD or ENHANCED | ||
| domain=None, # e.g. "finance", "medical" | ||
|
|
||
| # Vocabulary | ||
| additional_vocab=[ | ||
| AdditionalVocabEntry( | ||
| content="Speechmatics", | ||
| sounds_like=["speech matters", "speech matics"], | ||
| ), | ||
| AdditionalVocabEntry(content="API"), | ||
| ], | ||
| punctuation_overrides=None, | ||
|
|
||
| # Audio | ||
| sample_rate=16000, | ||
| audio_encoding=AudioEncoding.PCM_S16LE, | ||
|
|
||
| # Diarization | ||
| enable_diarization=True, | ||
| ) | ||
|
|
||
| config = VoiceAgentConfigPreset.ADAPTIVE(overrides) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| @client.on(AgentServerMessageType.ADD_SEGMENT) | ||
| def on_final_segment(message): | ||
| for segment in message["segments"]: | ||
| print(f"[FINAL] {segment['speaker_id']}: {segment['text']}") | ||
|
|
||
| @client.on(AgentServerMessageType.ADD_PARTIAL_SEGMENT) | ||
| def on_partial_segment(message): | ||
| for segment in message["segments"]: | ||
| print(f"[PARTIAL] {segment['speaker_id']}: {segment['text']}") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| from speechmatics.voice import SpeakerIdentifier, VoiceAgentConfig | ||
|
|
||
| config = VoiceAgentConfig( | ||
| enable_diarization=True, | ||
| known_speakers=[ | ||
| SpeakerIdentifier(label="Alice", speaker_identifiers=["XX...XX"]), | ||
| SpeakerIdentifier(label="Bob", speaker_identifiers=["YY...YY"]) | ||
| ] | ||
| ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,50 @@ | ||
| import asyncio | ||
| import os | ||
| from speechmatics.rt import Microphone | ||
| from speechmatics.voice import VoiceAgentClient, AgentServerMessageType | ||
|
|
||
| async def main(): | ||
| """Stream microphone audio to Speechmatics Voice Agent using 'scribe' preset""" | ||
|
|
||
| # Audio configuration | ||
| SAMPLE_RATE = 16000 # Hz | ||
| CHUNK_SIZE = 160 # Samples per read | ||
| PRESET = "scribe" # Configuration preset | ||
|
|
||
| # Create client with preset | ||
| client = VoiceAgentClient( | ||
| api_key=os.getenv("SPEECHMATICS_API_KEY"), | ||
| preset=PRESET | ||
| ) | ||
|
|
||
| # Print finalised segments of speech with speaker ID | ||
| @client.on(AgentServerMessageType.ADD_SEGMENT) | ||
| def on_segment(message): | ||
| for segment in message["segments"]: | ||
| speaker = segment["speaker_id"] | ||
| text = segment["text"] | ||
| print(f"{speaker}: {text}") | ||
|
|
||
| # Setup microphone | ||
| mic = Microphone(SAMPLE_RATE, CHUNK_SIZE) | ||
| if not mic.start(): | ||
| print("Error: Microphone not available") | ||
| return | ||
|
|
||
| # Connect to the Voice Agent | ||
| await client.connect() | ||
|
|
||
| # Stream microphone audio (interruptible using keyboard) | ||
| try: | ||
| while True: | ||
| audio_chunk = await mic.read(CHUNK_SIZE) | ||
| if not audio_chunk: | ||
| break # Microphone stopped producing data | ||
| await client.send_audio(audio_chunk) | ||
| except KeyboardInterrupt: | ||
| pass | ||
| finally: | ||
| await client.disconnect() | ||
|
|
||
| if __name__ == "__main__": | ||
| asyncio.run(main()) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| from speechmatics.voice import ( | ||
| EndOfUtteranceMode, | ||
| SmartTurnConfig, | ||
| VoiceAgentConfig, | ||
| VoiceAgentConfigPreset, | ||
| ) | ||
|
|
||
| # ADAPTIVE mode + ML-enhanced turn detection | ||
| config = VoiceAgentConfig( | ||
| end_of_utterance_mode=EndOfUtteranceMode.ADAPTIVE, | ||
| smart_turn_config=SmartTurnConfig(enabled=True), | ||
| ) | ||
|
|
||
| # Or use the SMART_TURN preset which bundles this configuration | ||
| config = VoiceAgentConfigPreset.SMART_TURN() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| @client.on(AgentServerMessageType.ADD_SEGMENT) | ||
| def on_segment(message): | ||
| for segment in message["segments"]: | ||
| if segment["is_active"]: | ||
| process_focused_speaker(segment["text"]) | ||
| else: | ||
| process_passive_speaker(segment["speaker_id"], segment["text"]) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| from speechmatics.voice import SpeakerFocusConfig, SpeakerFocusMode, VoiceAgentConfig | ||
|
|
||
| # Focus on specific speakers, keep others as passive | ||
| config = VoiceAgentConfig( | ||
| enable_diarization=True, | ||
| speaker_config=SpeakerFocusConfig( | ||
| focus_speakers=["S1", "S2"], | ||
| focus_mode=SpeakerFocusMode.RETAIN | ||
| ) | ||
| ) | ||
|
|
||
| # Focus on specific speakers, exclude everyone else | ||
| config = VoiceAgentConfig( | ||
| enable_diarization=True, | ||
| speaker_config=SpeakerFocusConfig( | ||
| focus_speakers=["S1", "S2"], | ||
| focus_mode=SpeakerFocusMode.IGNORE | ||
| ) | ||
| ) | ||
|
|
||
| # Blacklist specific speakers (exclude them from all processing) | ||
| config = VoiceAgentConfig( | ||
| enable_diarization=True, | ||
| speaker_config=SpeakerFocusConfig( | ||
| ignore_speakers=["S3"], | ||
| ) | ||
| ) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.