SkyrimVoiceGenStudio (SVGS)

Generate high-quality custom AI voice acting for Skyrim SE modlists locally. SVGS extracts dialogue from Skyrim plugins, resolves VoiceTypes, and generates speech audio using advanced local AI text-to-speech, and packages the results as installable voicepack mods.

Features

Native plugin reading — Reads .esp/.esm files directly via the Mutagen library to extract dialogue and resolve applicable VoiceTypes. No external tools needed for dialogue extraction.
Three TTS providers — Local Qwen3-TTS in two flavors: Standard (highest clone quality, CUDA/ROCm/CPU) and Fast (5-8x faster via faster-qwen3-tts, NVIDIA CUDA only), plus cloud ElevenLabs (paid, no GPU needed). Pluggable provider system for future expansions.
Voice cloning & design — Clone voices from reference audio (existing NPC dialogue, DBVO packs, or your own recordings) or design new voices from text descriptions. Clone all existing Skyrim VoiceTypes with one click!
Full audio pipeline — TTS → LIP mouth animation → XWMA compression → FUZ packaging. Uses Creation Kit tools when available, with automatic fallbacks.
Two output formats — NPC voicepacks (Sound/Voice/) for NPC dialogue, and DBVO voicepacks (Sound/DBVO/) for player dialogue with Dragonborn Voice Over.
Batch generation — Generate thousands of lines with progress tracking, cancel/resume support, and permanent asset storage.
Smart import — 5-tier voice type resolution from plugin conditions for high-accuracy VoiceType assignment detection.
Line & voice management — Manage lines and VoiceType assignments. Browse voice types, assign voice actors, preview lines, and test-generate audio. Create reusable voice actors and assign them to Skyrim voice types across batches.
DBVO coverage detection — Scans installed DBVO packs to identify which player lines already have audio and which need generation.
Whisper transcription — Built-in speech-to-text for transcribing reference audio during voice cloning. Runs on a lightweight standalone server (no PyTorch needed).
Multi-profile support — Separate databases, configs, and assets per profile for managing multiple modlists.
Emotion tagging — ElevenLabs provider automatically adds expressive audio tags ([angry], [sad], etc.) based on each line's emotion metadata.
Database viewer — Read-only SQL console with autocomplete and saved queries for advanced data analysis.

System Requirements

Requirement	Details
OS	Windows 10 or 11 (64-bit)
Runtime	.NET 10
Disk Space	~4–8 GB for TTS models + space for generated audio

For local TTS (Qwen3-TTS)

Requirement	Details
GPU	6+ GB VRAM (8+ recommended). See backend notes below.
Python	3.10 or newer (virtual environments managed automatically)

SVGS offers two local Qwen3-TTS backends:

Qwen3-TTS (Standard) — Highest voice clone quality. Supports NVIDIA CUDA, AMD ROCm, and CPU.
Qwen3-TTS (Fast) — 5-8x faster via faster-qwen3-tts CUDA graph optimization. NVIDIA GPU with CUDA required — AMD and CPU-only setups are not supported.

For cloud TTS (ElevenLabs)

Requirement	Details
Account	ElevenLabs API key (paid, per-character billing)
GPU/Python	Not needed

Optional

Tool	Benefit
Creation Kit (free on Steam)	XWMA audio compression (~25 KB vs ~440 KB per line) and LIP mouth animation. Without it: audio still works, files are larger, and NPCs won't move their mouths.

Python, FFmpeg, and TTS models are all managed automatically by the app.

Quick Start

Download and run SVGS. On first launch, a default profile is created automatically.
Configure paths on the App Settings page — set your mod manager type and instance/game path.
Set up TTS on the TTS Settings page:
- Qwen3-TTS or Qwen3-TTS (Fast): Select provider → Choose model variant → Set Up Environment → Start Server
- ElevenLabs: Select provider → Choose model variant → Enter API key
Import dialogue — Open Manage Lines → Click Import. SVGS reads plugins from your load order automatically.
Create a voice actor — Clone from reference audio or design from a text description.
Generate — Create a batch on the NPC Voice Gen or Player Voice Gen page, assign voice actors to voice types, and click Generate.
Export — Package generated audio into an installable mod on the NPC or Player Voicepack Export page.

For a detailed walkthrough, see the Getting Started guide. For a full feature reference, see the User Guide.

Building from Source

Requires .NET 10 SDK.

# Clone the repository
git clone https://github.com/shtaylor/SkyrimVoiceGenStudio.git
cd SkyrimVoiceGenStudio

# Build the solution
dotnet build SkyrimVoiceGenStudio.slnx

# Run the application
dotnet run --project SkyrimVoiceGenStudio/SkyrimVoiceGenStudio.csproj

# Run tests
dotnet test SVGSTests/SVGSTests.csproj

Project Structure

SkyrimVoiceGenStudio/          WPF desktop application (UI, orchestration)
├── Docs/
│   ├── Getting-Started.md     First-time setup guide
│   └── User-Guide.md          Comprehensive feature reference
├── Views/                     WPF pages
├── ViewModels/                MVVM view models
└── Services/                  App-level services (server lifecycle, dialogs)

SVGSLib/                       Class library (no UI dependency)
├── Models/                    EF Core entities and DTOs
├── Providers/                 TTS provider abstraction
│   ├── Qwen3/                 Local Qwen3-TTS provider
│   └── ElevenLabs/            Cloud ElevenLabs provider
└── Services/                  Business logic (import, audio pipeline, export)

SVGSTests/                     xUnit test project

PythonServer/                  FastAPI TTS server (port 5100)
├── server.py                  Endpoints: /health, /generate/*, /model/*, /shutdown
└── tts_engine.py              Dual-engine TTS wrapper (standard + fast backends)

WhisperServer/                 FastAPI Whisper server (port 5101)
├── server.py                  Endpoints: /health, /transcribe, /shutdown
└── whisper_engine.py          faster-whisper transcription wrapper

Audio Pipeline

Dialogue Text → TTS Provider → WAV → LIP Generation → XWMA Encoding → FUZ Packaging

Stage	With CK Tools	Fallback
Audio encoding	XWMA via `xwmaencode.exe` (~25 KB/line)	Raw WAV in FUZ (~440 KB/line)
Mouth animation	LIP via `LipGenerator.exe`	No mouth movement (lipSize=0)
FUZ packaging	Built-in (no external tools)	—

Audio plays correctly in Skyrim in both cases. The Creation Kit is free on Steam.

Tech Stack

App: .NET 10, C#, WPF, EF Core + SQLite, CommunityToolkit.Mvvm
Plugin reading: Mutagen (Bethesda plugin library)
Audio: NAudio, FFmpeg (auto-downloaded), Creation Kit CLI tools (optional)
Local TTS: Python 3, FastAPI, Qwen3-TTS (qwen-tts) + faster-qwen3-tts (CUDA-graph-optimized), PyTorch (CUDA, ROCm, or CPU)
Cloud TTS: ElevenLabs API
Transcription: faster-whisper (CTranslate2-based)

Documentation

Getting Started — First-time setup and your first voice generation
User Guide — Comprehensive reference for all features

Both guides are also accessible from within the app on the Documentation page.

License

This project is licensed under the GNU General Public License v3.0 (GPL-3.0-or-later).

SVGS uses Mutagen (GPL-3.0) for reading Bethesda plugin files.

Acknowledgments

Mutagen by Noggog — Bethesda plugin reading
Qwen3-TTS by Alibaba — Local text-to-speech
faster-qwen3-tts by Andi Marafioti — CUDA-graph-optimized Qwen3-TTS inference
ElevenLabs — Cloud text-to-speech
faster-whisper by SYSTRAN — Speech-to-text transcription
Dragonborn Voice Over — Player voice framework

Name		Name	Last commit message	Last commit date
Latest commit History 218 Commits
.claude		.claude
PythonServer		PythonServer
SVGSLib		SVGSLib
SVGSTests		SVGSTests
SkyrimVoiceGenStudio		SkyrimVoiceGenStudio
WhisperServer		WhisperServer
assets		assets
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
SkyrimVoiceGenStudio.slnx		SkyrimVoiceGenStudio.slnx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SkyrimVoiceGenStudio (SVGS)

Features

System Requirements

For local TTS (Qwen3-TTS)

For cloud TTS (ElevenLabs)

Optional

Quick Start

Building from Source

Project Structure

Audio Pipeline

Tech Stack

Documentation

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SkyrimVoiceGenStudio (SVGS)

Features

System Requirements

For local TTS (Qwen3-TTS)

For cloud TTS (ElevenLabs)

Optional

Quick Start

Building from Source

Project Structure

Audio Pipeline

Tech Stack

Documentation

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages