Skip to content

"feat(ui): add voice selector for Kokoro TTS#602

Open
mh03r932 wants to merge 4 commits intoBlaizzy:mainfrom
mh03r932:feat/voice-selector-ui
Open

"feat(ui): add voice selector for Kokoro TTS#602
mh03r932 wants to merge 4 commits intoBlaizzy:mainfrom
mh03r932:feat/voice-selector-ui

Conversation

@mh03r932
Copy link
Copy Markdown

Summary

  • Add voice selection UI for Kokoro TTS model
  • Consolidate voice display name handling with single source of truth
  • Add python-multipart dependency for multipart form support (was needed to get the program to run on my machine, it was needed by fastapi but not declared as a dependency)

Context

The Text-to-Speech UI had a VoiceSelection component that was imported but never rendered, maybe that was some kind of placeholder? The original voice list contained placeholder data that didn't correspond to any real TTS model voices. This PR enables voice selection specifically for the Kokoro model, which has 10 preset voices.

Description

  • Render the VoiceSelection component in the TTS settings panel (only for the Kokoro model)
  • Replace placeholder voices with actual Kokoro voice IDs (af_heart, af_bella, am_adam, etc.)
  • Conditionally show the voice selector only for Kokoro model (Marvis and SparkTTS use different voice systems)
  • Export getVoiceDisplayName() helper from voice-library.tsx to avoid duplicate voice name mappings
  • Add python-multipart to dependencies (required by FastAPI for form data handling)

Changes in the codebase

  • mlx_audio/ui/app/text-to-speech/page.tsx - Add VoiceSelection component, import shared helper
  • mlx_audio/ui/components/voice-library.tsx - Replace placeholder voices with Kokoro IDs, export getVoiceDisplayName()
  • mlx_audio/ui/components/voice-selection.tsx - Use shared helper, remove duplicate voice names
  • pyproject.toml - Add python-multipart>=0.0.22 dependency
  • uv.lock - Updated lockfile

Changes outside the codebase

None

Additional information
The voice selector is Kokoro-specific. Other models have different voice systems. But I am not familiar with them.
These models could be supported in a future PR with model-specific voice selection UI.

Checklist

  • Tests added/updated
  • Documentation updated
  • Issue referenced (e.g., "Closes #...")

- Add voice selection UI and improve handling of voice display names
- Add `python-multipart` dependency for multipart support
@mh03r932 mh03r932 force-pushed the feat/voice-selector-ui branch from 8513f61 to b505243 Compare March 25, 2026 15:50
Copy link
Copy Markdown
Owner

@Blaizzy Blaizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mh03r932!

What if we got voices dynamically from the models. Check Qwen3-TTS and you will see how the voices are attributes we can retrieve.

That way we don't harcode anything

@mh03r932
Copy link
Copy Markdown
Author

Thank you for the feedback.
I debated implementing a method that reads the voices from the backend through the API, but as I said, I am not familiar with the other models.
Currently there are 5 models to choose from in the GUI (Kokoro, 3 Marvis flavours, and SparkTTS). As far as I could find out, Spark is voice clone and Marvis seems to have 2 implemented voices. QWEN3-TTS is not (yet?) in the GUI.
I presume you are talking about get_supported_speakers().
If I have time I will try to propose an endpoint that can pass voice metadata to the frontend. The implementation for most models I will have to stub out though, so worst case the hardcoded values will still be there but in the backend. Maybe somebody with more knowledge on how to get the supported voices from the model dynamically can step in then.
I was thinking something like this — ideally more sophisticated than just names, so we can convey some more info like gender, accent, etc. like Kokoro currently does:

# New endpoint to add to server.py
@app.get("/v1/audio/voices")
async def get_voices(model: str):
    model_obj = model_provider.load_model(model)
   
    # Call the new method we'll add to each model
    if hasattr(model_obj, 'get_supported_speakers'):
        voices = model_obj.get_supported_speakers()
    else:
        voices = []
    
    return {
        "model": model,
        "voices": voices,
        "count": len(voices)
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants