feat: Add comprehensive Azure MSTTS support with automatic namespace injection#105
Conversation
- Implement automatic detection of MSTTS tags in generated SSML - Conditionally inject xmlns:mstts namespace only when needed - Override addSpeakTag() in MicrosoftAzureSsmlFormatter - Add containsMsttsTag() helper method with regex detection - Update test expectations for newscaster feature - All 657 tests passing
- Implement excited, disappointed, friendly, cheerful, sad, angry, fearful, empathetic, calm, lyrical, hopeful, shouting, whispering, terrified, unfriendly, gentle, serious, depressed, embarrassed, affectionate, envious, chat, cheerful, customerservice styles - Add styledegree attribute support (0.01-2.0 range) with validation - Update test expectations for Azure's behavior with invalid values - All 669 tests passing
…erage - Document all 27 express-as styles (emotional and scenario-specific) - Add styledegree attribute documentation with examples - Document automatic namespace injection feature - Add Azure example to main README showcasing express-as with styledegree - Note unsupported features (role, mstts:silence, etc.) with workarounds - Update platform documentation to reflect current implementation
…atforms - Compare Azure's 27 express-as styles vs Alexa's 2 emotions and Google's 0 - Highlight Azure's numeric intensity control (0.01-2.0) vs Alexa's 3 levels - Document automatic namespace injection advantage - Show Azure has most comprehensive emotional/stylistic control - List advantages and parity for each platform comparison
- Add all Azure styles to textModifierKey and sectionModifierKey in grammar - Update MicrosoftAzureSsmlFormatter to handle all 27 styles in both text and section modifiers - Add special handling for newscaster -> newscast style mapping - Include poetry-reading, narration-professional, newscast-casual styles - All 669 tests passing including live Azure MSTTS validation
|
Note. WIP. Let me check after Claude done a ton of this |
- Add advertisement_upbeat, documentary-narration, narration-relaxed, newscast-formal, sports_commentary, sports_commentary_excited styles - Implement lang modifier support for Azure platform (xml:lang attribute) - Update test expectations for Azure lang support - Update documentation with all 33 Azure styles - Document multi-speaker dialog (mstts:dialog/mstts:turn) and role attributes as requiring raw SSML - Add .env to .gitignore for security - Total Azure styles now 33 (up from 27) - All 669 tests passing
- Add detailed support matrix table showing all Azure SSML elements - Document which elements are fully supported, partially supported, or not supported - Reorganize unsupported features section with clear explanations - Add workarounds for each unsupported feature - Clarify why certain features are disabled (emphasis, expletive, interjection, unit) - Document all advanced MSTTS features and their support status - Improve documentation structure and clarity
- Enable emphasis element with all 4 levels (moderate, strong, reduced, none) - Add bookmark support (generates <bookmark mark='...'> for Azure SDK) - Update all tests to expect proper SSML tags - Update documentation to reflect correct support status - All 669 tests passing
…ttributes - Add style and role keywords to grammar - Implement semicolon-delimited multiple attribute syntax - Refactor Azure formatter to collect and combine express-as attributes - Add comprehensive tests for style+role combinations - Update documentation with role attribute examples - All 672 tests passing
|
Ready for review now @arjan - this gives way more feature support to azure tts and its various intricacies.. I'd say its more feature packed now than any others.. |
… metadata - Update voice data script to include voice metadata for downstream uses - Add id, displayName/name, and languages/language/locale fields - Maintain backward compatibility with voice.name for SSML generation - Filter metadata fields in voiceTagNamed to prevent invalid SSML tags - Regenerate all voice data files (Azure, Google, Polly, Watson) - All 672 tests passing
- Updated Azure formatter to use voiceTag() consistently for voice lookups - Added getVoiceTagFallback() method to Azure formatter for unknown voices - Voice data now supports lookup by both display name (e.g., 'Jenny') and voice ID (e.g., 'en-US-JennyNeural') - SSML output always uses the correct voice ID from the catalog - Added comprehensive tests for display name lookup functionality - Updated existing tests to reflect new voice ID resolution behavior - All 677 tests passing
- Created azure-comprehensive.spec.ts with 10 test cases covering Azure TTS features - Tests include: bookmarks, style/degree, role adjustments, language changes, pitch, emphasis, and audio - 7 tests passing, 3 skipped (voice names with colons and effect attribute not yet supported) - All 684 existing tests still passing - Verified Speech Markdown correctly generates Azure SSML for common use cases
- Changed voice names from HD format (en-US-Ava:DragonHDLatestNeural) to standard neural format (en-US-AvaNeural) - HD voices with colon syntax are a separate Azure feature not currently supported by Speech Markdown parser - Updated 'Simple azure Voice name' test to use en-US-AvaNeural - Updated 'Multi Voices' test to use en-US-AvaNeural and en-US-AndrewNeural - Fixed XML entity escaping expectation (& not & in actual output) - Fixed whitespace formatting in multi-voice test - Now 9 tests passing, 1 skipped (audio effects) - All 686 existing tests still passing
- Added 30 Azure HD voices to voice data with dash syntax (e.g., en-US-Ava-DragonHDLatestNeural) - HD voices use dash syntax in Speech Markdown, converted to colon syntax in SSML (e.g., en-US-Ava:DragonHDLatestNeural) - Added isHD metadata field to voice entries and filtered it from SSML output - Updated comprehensive tests to use HD voices - All 686 tests passing (1 skipped) HD voices are premium high-definition voices with enhanced features: - Human-like speech generation with automatic emotion detection - Conversational patterns with natural pauses - Prosody variations for realism - Higher fidelity audio
- Created comprehensive test suite for Google Cloud TTS with 17 tests - Added support for google:style tag - Maps to google:style SSML tag - No namespace declaration needed per Google documentation - 16 tests passing, 1 skipped (voice sections not yet supported) - All 702 existing tests still passing
|
So note.. We've fixed and tested quite a bit in this - we've added tests directy using SSML snippets from azure and google cloud docs.. We've also done quite a bit to add langs to the voice lists and voice-id so a user can use either id or name and we replace it correctly in the ssml with the id |
|
This looks pretty "comprehensive" indeed Will! Nice work. |
|
TY @arjan - I'll just keep an eye on a release.. (NB: ive been working on a PR for the editor - I'll hold off doing much more on that till released.. I have grand plans for that but will prolly keep the "grand" plans for a completely seperate PR you may not want to release! second NB: https://github.com/willwade/js-tts-wrapper - is a wrapper supporting speechmarkdown across as many TTS systems as possible.. we can give live preview of the output..) |
|
Release done ✅ |
Overview
This PR adds comprehensive Azure MSTTS (Microsoft Text-to-Speech) support to Speech Markdown, including automatic namespace injection, 33 express-as styles with intensity control, and language switching support.
Key Features
1. Automatic Azure SSML Namespace Injection
<mstts:express-as>) are present in generated SSMLxmlns:mstts="https://www.w3.org/2001/mstts"namespace when needed2. Complete Express-As Style Support (33 styles)
3. Style Degree (Intensity Control)
(text)[excited:"1.5"]generates<mstts:express-as style="excited" styledegree="1.5">4. Language Switching Support
langmodifier:(Paris)[lang:"fr-FR"]<lang xml:lang="fr-FR">Paris</lang>5. Docs
docs/platforms/azure.mdwith all 33 stylesmstts:dialogandmstts:turn) with raw SSML examplesExamples
Basic Express-As Style
Generates:
Style with Intensity
Generates:
Language Switching
Generates:
Section-Level Style
Generates:
Ready for review! This PR brings Azure MSTTS support from 27 to 33 styles, adds language switching, and provides comprehensive documentation for all Azure-specific features.