Generate high-quality custom AI voice acting for Skyrim SE modlists locally. SVGS extracts dialogue from Skyrim plugins, resolves VoiceTypes, and generates speech audio using advanced local AI text-to-speech, and packages the results as installable voicepack mods.
- Native plugin reading — Reads .esp/.esm files directly via the Mutagen library to extract dialogue and resolve applicable VoiceTypes. No external tools needed for dialogue extraction.
- Three TTS providers — Local Qwen3-TTS in two flavors: Standard (highest clone quality, CUDA/ROCm/CPU) and Fast (5-8x faster via faster-qwen3-tts, NVIDIA CUDA only), plus cloud ElevenLabs (paid, no GPU needed). Pluggable provider system for future expansions.
- Voice cloning & design — Clone voices from reference audio (existing NPC dialogue, DBVO packs, or your own recordings) or design new voices from text descriptions. Clone all existing Skyrim VoiceTypes with one click!
- Full audio pipeline — TTS → LIP mouth animation → XWMA compression → FUZ packaging. Uses Creation Kit tools when available, with automatic fallbacks.
- Two output formats — NPC voicepacks (
Sound/Voice/) for NPC dialogue, and DBVO voicepacks (Sound/DBVO/) for player dialogue with Dragonborn Voice Over. - Batch generation — Generate thousands of lines with progress tracking, cancel/resume support, and permanent asset storage.
- Smart import — 5-tier voice type resolution from plugin conditions for high-accuracy VoiceType assignment detection.
- Line & voice management — Manage lines and VoiceType assignments. Browse voice types, assign voice actors, preview lines, and test-generate audio. Create reusable voice actors and assign them to Skyrim voice types across batches.
- DBVO coverage detection — Scans installed DBVO packs to identify which player lines already have audio and which need generation.
- Whisper transcription — Built-in speech-to-text for transcribing reference audio during voice cloning. Runs on a lightweight standalone server (no PyTorch needed).
- Multi-profile support — Separate databases, configs, and assets per profile for managing multiple modlists.
- Emotion tagging — ElevenLabs provider automatically adds expressive audio tags (
[angry],[sad], etc.) based on each line's emotion metadata. - Database viewer — Read-only SQL console with autocomplete and saved queries for advanced data analysis.
| Requirement | Details |
|---|---|
| OS | Windows 10 or 11 (64-bit) |
| Runtime | .NET 10 |
| Disk Space | ~4–8 GB for TTS models + space for generated audio |
| Requirement | Details |
|---|---|
| GPU | 6+ GB VRAM (8+ recommended). See backend notes below. |
| Python | 3.10 or newer (virtual environments managed automatically) |
SVGS offers two local Qwen3-TTS backends:
- Qwen3-TTS (Standard) — Highest voice clone quality. Supports NVIDIA CUDA, AMD ROCm, and CPU.
- Qwen3-TTS (Fast) — 5-8x faster via faster-qwen3-tts CUDA graph optimization. NVIDIA GPU with CUDA required — AMD and CPU-only setups are not supported.
| Requirement | Details |
|---|---|
| Account | ElevenLabs API key (paid, per-character billing) |
| GPU/Python | Not needed |
| Tool | Benefit |
|---|---|
| Creation Kit (free on Steam) | XWMA audio compression (~25 KB vs ~440 KB per line) and LIP mouth animation. Without it: audio still works, files are larger, and NPCs won't move their mouths. |
Python, FFmpeg, and TTS models are all managed automatically by the app.
- Download and run SVGS. On first launch, a default profile is created automatically.
- Configure paths on the App Settings page — set your mod manager type and instance/game path.
- Set up TTS on the TTS Settings page:
- Qwen3-TTS or Qwen3-TTS (Fast): Select provider → Choose model variant → Set Up Environment → Start Server
- ElevenLabs: Select provider → Choose model variant → Enter API key
- Import dialogue — Open Manage Lines → Click Import. SVGS reads plugins from your load order automatically.
- Create a voice actor — Clone from reference audio or design from a text description.
- Generate — Create a batch on the NPC Voice Gen or Player Voice Gen page, assign voice actors to voice types, and click Generate.
- Export — Package generated audio into an installable mod on the NPC or Player Voicepack Export page.
For a detailed walkthrough, see the Getting Started guide. For a full feature reference, see the User Guide.
Requires .NET 10 SDK.
# Clone the repository
git clone https://github.com/shtaylor/SkyrimVoiceGenStudio.git
cd SkyrimVoiceGenStudio
# Build the solution
dotnet build SkyrimVoiceGenStudio.slnx
# Run the application
dotnet run --project SkyrimVoiceGenStudio/SkyrimVoiceGenStudio.csproj
# Run tests
dotnet test SVGSTests/SVGSTests.csprojSkyrimVoiceGenStudio/ WPF desktop application (UI, orchestration)
├── Docs/
│ ├── Getting-Started.md First-time setup guide
│ └── User-Guide.md Comprehensive feature reference
├── Views/ WPF pages
├── ViewModels/ MVVM view models
└── Services/ App-level services (server lifecycle, dialogs)
SVGSLib/ Class library (no UI dependency)
├── Models/ EF Core entities and DTOs
├── Providers/ TTS provider abstraction
│ ├── Qwen3/ Local Qwen3-TTS provider
│ └── ElevenLabs/ Cloud ElevenLabs provider
└── Services/ Business logic (import, audio pipeline, export)
SVGSTests/ xUnit test project
PythonServer/ FastAPI TTS server (port 5100)
├── server.py Endpoints: /health, /generate/*, /model/*, /shutdown
└── tts_engine.py Dual-engine TTS wrapper (standard + fast backends)
WhisperServer/ FastAPI Whisper server (port 5101)
├── server.py Endpoints: /health, /transcribe, /shutdown
└── whisper_engine.py faster-whisper transcription wrapper
Dialogue Text → TTS Provider → WAV → LIP Generation → XWMA Encoding → FUZ Packaging
| Stage | With CK Tools | Fallback |
|---|---|---|
| Audio encoding | XWMA via xwmaencode.exe (~25 KB/line) |
Raw WAV in FUZ (~440 KB/line) |
| Mouth animation | LIP via LipGenerator.exe |
No mouth movement (lipSize=0) |
| FUZ packaging | Built-in (no external tools) | — |
Audio plays correctly in Skyrim in both cases. The Creation Kit is free on Steam.
- App: .NET 10, C#, WPF, EF Core + SQLite, CommunityToolkit.Mvvm
- Plugin reading: Mutagen (Bethesda plugin library)
- Audio: NAudio, FFmpeg (auto-downloaded), Creation Kit CLI tools (optional)
- Local TTS: Python 3, FastAPI, Qwen3-TTS (
qwen-tts) + faster-qwen3-tts (CUDA-graph-optimized), PyTorch (CUDA, ROCm, or CPU) - Cloud TTS: ElevenLabs API
- Transcription: faster-whisper (CTranslate2-based)
- Getting Started — First-time setup and your first voice generation
- User Guide — Comprehensive reference for all features
Both guides are also accessible from within the app on the Documentation page.
This project is licensed under the GNU General Public License v3.0 (GPL-3.0-or-later).
SVGS uses Mutagen (GPL-3.0) for reading Bethesda plugin files.
- Mutagen by Noggog — Bethesda plugin reading
- Qwen3-TTS by Alibaba — Local text-to-speech
- faster-qwen3-tts by Andi Marafioti — CUDA-graph-optimized Qwen3-TTS inference
- ElevenLabs — Cloud text-to-speech
- faster-whisper by SYSTRAN — Speech-to-text transcription
- Dragonborn Voice Over — Player voice framework
