Skip to content

shtaylor/SkyrimVoiceGenStudio

Repository files navigation

SkyrimVoiceGenStudio logo

SkyrimVoiceGenStudio (SVGS)

Generate high-quality custom AI voice acting for Skyrim SE modlists locally. SVGS extracts dialogue from Skyrim plugins, resolves VoiceTypes, and generates speech audio using advanced local AI text-to-speech, and packages the results as installable voicepack mods.

Features

  • Native plugin reading — Reads .esp/.esm files directly via the Mutagen library to extract dialogue and resolve applicable VoiceTypes. No external tools needed for dialogue extraction.
  • Three TTS providers — Local Qwen3-TTS in two flavors: Standard (highest clone quality, CUDA/ROCm/CPU) and Fast (5-8x faster via faster-qwen3-tts, NVIDIA CUDA only), plus cloud ElevenLabs (paid, no GPU needed). Pluggable provider system for future expansions.
  • Voice cloning & design — Clone voices from reference audio (existing NPC dialogue, DBVO packs, or your own recordings) or design new voices from text descriptions. Clone all existing Skyrim VoiceTypes with one click!
  • Full audio pipeline — TTS → LIP mouth animation → XWMA compression → FUZ packaging. Uses Creation Kit tools when available, with automatic fallbacks.
  • Two output formats — NPC voicepacks (Sound/Voice/) for NPC dialogue, and DBVO voicepacks (Sound/DBVO/) for player dialogue with Dragonborn Voice Over.
  • Batch generation — Generate thousands of lines with progress tracking, cancel/resume support, and permanent asset storage.
  • Smart import — 5-tier voice type resolution from plugin conditions for high-accuracy VoiceType assignment detection.
  • Line & voice management — Manage lines and VoiceType assignments. Browse voice types, assign voice actors, preview lines, and test-generate audio. Create reusable voice actors and assign them to Skyrim voice types across batches.
  • DBVO coverage detection — Scans installed DBVO packs to identify which player lines already have audio and which need generation.
  • Whisper transcription — Built-in speech-to-text for transcribing reference audio during voice cloning. Runs on a lightweight standalone server (no PyTorch needed).
  • Multi-profile support — Separate databases, configs, and assets per profile for managing multiple modlists.
  • Emotion tagging — ElevenLabs provider automatically adds expressive audio tags ([angry], [sad], etc.) based on each line's emotion metadata.
  • Database viewer — Read-only SQL console with autocomplete and saved queries for advanced data analysis.

System Requirements

Requirement Details
OS Windows 10 or 11 (64-bit)
Runtime .NET 10
Disk Space ~4–8 GB for TTS models + space for generated audio

For local TTS (Qwen3-TTS)

Requirement Details
GPU 6+ GB VRAM (8+ recommended). See backend notes below.
Python 3.10 or newer (virtual environments managed automatically)

SVGS offers two local Qwen3-TTS backends:

  • Qwen3-TTS (Standard) — Highest voice clone quality. Supports NVIDIA CUDA, AMD ROCm, and CPU.
  • Qwen3-TTS (Fast) — 5-8x faster via faster-qwen3-tts CUDA graph optimization. NVIDIA GPU with CUDA required — AMD and CPU-only setups are not supported.

For cloud TTS (ElevenLabs)

Requirement Details
Account ElevenLabs API key (paid, per-character billing)
GPU/Python Not needed

Optional

Tool Benefit
Creation Kit (free on Steam) XWMA audio compression (~25 KB vs ~440 KB per line) and LIP mouth animation. Without it: audio still works, files are larger, and NPCs won't move their mouths.

Python, FFmpeg, and TTS models are all managed automatically by the app.

Quick Start

  1. Download and run SVGS. On first launch, a default profile is created automatically.
  2. Configure paths on the App Settings page — set your mod manager type and instance/game path.
  3. Set up TTS on the TTS Settings page:
    • Qwen3-TTS or Qwen3-TTS (Fast): Select provider → Choose model variant → Set Up Environment → Start Server
    • ElevenLabs: Select provider → Choose model variant → Enter API key
  4. Import dialogue — Open Manage Lines → Click Import. SVGS reads plugins from your load order automatically.
  5. Create a voice actor — Clone from reference audio or design from a text description.
  6. Generate — Create a batch on the NPC Voice Gen or Player Voice Gen page, assign voice actors to voice types, and click Generate.
  7. Export — Package generated audio into an installable mod on the NPC or Player Voicepack Export page.

For a detailed walkthrough, see the Getting Started guide. For a full feature reference, see the User Guide.

Building from Source

Requires .NET 10 SDK.

# Clone the repository
git clone https://github.com/shtaylor/SkyrimVoiceGenStudio.git
cd SkyrimVoiceGenStudio

# Build the solution
dotnet build SkyrimVoiceGenStudio.slnx

# Run the application
dotnet run --project SkyrimVoiceGenStudio/SkyrimVoiceGenStudio.csproj

# Run tests
dotnet test SVGSTests/SVGSTests.csproj

Project Structure

SkyrimVoiceGenStudio/          WPF desktop application (UI, orchestration)
├── Docs/
│   ├── Getting-Started.md     First-time setup guide
│   └── User-Guide.md          Comprehensive feature reference
├── Views/                     WPF pages
├── ViewModels/                MVVM view models
└── Services/                  App-level services (server lifecycle, dialogs)

SVGSLib/                       Class library (no UI dependency)
├── Models/                    EF Core entities and DTOs
├── Providers/                 TTS provider abstraction
│   ├── Qwen3/                 Local Qwen3-TTS provider
│   └── ElevenLabs/            Cloud ElevenLabs provider
└── Services/                  Business logic (import, audio pipeline, export)

SVGSTests/                     xUnit test project

PythonServer/                  FastAPI TTS server (port 5100)
├── server.py                  Endpoints: /health, /generate/*, /model/*, /shutdown
└── tts_engine.py              Dual-engine TTS wrapper (standard + fast backends)

WhisperServer/                 FastAPI Whisper server (port 5101)
├── server.py                  Endpoints: /health, /transcribe, /shutdown
└── whisper_engine.py          faster-whisper transcription wrapper

Audio Pipeline

Dialogue Text → TTS Provider → WAV → LIP Generation → XWMA Encoding → FUZ Packaging
Stage With CK Tools Fallback
Audio encoding XWMA via xwmaencode.exe (~25 KB/line) Raw WAV in FUZ (~440 KB/line)
Mouth animation LIP via LipGenerator.exe No mouth movement (lipSize=0)
FUZ packaging Built-in (no external tools)

Audio plays correctly in Skyrim in both cases. The Creation Kit is free on Steam.

Tech Stack

  • App: .NET 10, C#, WPF, EF Core + SQLite, CommunityToolkit.Mvvm
  • Plugin reading: Mutagen (Bethesda plugin library)
  • Audio: NAudio, FFmpeg (auto-downloaded), Creation Kit CLI tools (optional)
  • Local TTS: Python 3, FastAPI, Qwen3-TTS (qwen-tts) + faster-qwen3-tts (CUDA-graph-optimized), PyTorch (CUDA, ROCm, or CPU)
  • Cloud TTS: ElevenLabs API
  • Transcription: faster-whisper (CTranslate2-based)

Documentation

  • Getting Started — First-time setup and your first voice generation
  • User Guide — Comprehensive reference for all features

Both guides are also accessible from within the app on the Documentation page.

License

This project is licensed under the GNU General Public License v3.0 (GPL-3.0-or-later).

SVGS uses Mutagen (GPL-3.0) for reading Bethesda plugin files.

Acknowledgments

About

A utility for organizing and generating voice packs for Skyrim using TTS

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors