TensorVox

TensorVox is an application designed to enable user-friendly and lightweight neural speech synthesis in the desktop, aimed at increasing accessibility to such technology.

Being able to load models by TensorFlowTTS, Coqui-TTS, VITS, and VITS EVO, it is written in pure C++/Qt, using the ONNX Runtime, and supporting TensorFlow and LibTorch as legacy backends.

Try it out

System requirements: Windows 10 64-bit and a CPU that supports the AVX instruction set (pretty much anything made after 2010). As for GPU, to use it you need one that supports DirectX 12 (only with ONNX models)

Detailed guide in Google Docs

Grab a copy from the releases, extract the .zip and check the Google Drive folder for models and installation instructions

If you're interested in using your own model, first you need to train then export it.

Supported architectures

TensorVox supports models from four repos.

VITS Evolution: This is my fully upgraded version of VITS, with ONNX support
jaywalnut310/VITS: VITS, which is a fully E2E model. (Stressed IPA as phonemes) Export notebook:
TensorFlowTTS: FastSpeech2, Tacotron2, both char and phoneme based and Multi-Band MelGAN. Here's a Colab notebook demonstrating how to export the LJSpeech pretrained, char-based Tacotron2 model:
Coqui-TTS: Tacotron2 (phoneme-based IPA) and Multi-Band MelGAN, after converting from PyTorch to Tensorflow. Here's a notebook showing how to export the LJSpeech DDC model:

More support of modern TTS models is being actively worked on!

These examples should provide you with enough guidance to understand what is needed. If you're looking to train a model specifically for this purpose, then stay tuned... Or if you’d rather skip the training and export work, you can also get a TensorVox-ready model directly from me. (see contact details at the bottom of this)

As for languages, out-of-the-box support is provided for English, German and Spanish (only TensorFlowTTS); that is, you won't have to do anything. You can add languages without modifying code, as long as the phoneme set are IPA (stressed or nonstressed), ARPA, or GlobalPhone, (open an issue and I'll explain it to you)

Backends

TensorVox currently supports multiple inference backends.

LibTorch (TorchScript) and TensorFlow backends are maintained for compatibility with older models and projects created before ONNX export was refined enough.

New development and active support are focused on ONNX Runtime, with DirectML used for GPU acceleration on Windows. This backend provides the best portability, long-term stability, and hardware coverage.

Build instructions

Currently, only Windows 10 x64 (although I've heard reports of it running on 8.1) is supported.

Requirements:

Qt Creator
MSVC 2017 (v141) compiler

Primed build (with all provided libraries):

Download precompiled binary dependencies and includes
Unzip it so that the deps folder is in the same place as the .pro and main source files.
Open the project with Qt Creator, add your compiler and compile

Note that to try your shiny new executable you'll need to download a release of program as described above and replace the executable in that release with your new one, so you have all the DLLs in place.

TODO: Add instructions for compile from scratch.

Externals (and thanks)

ONNX Runtime :https://onnxruntime.ai/
Tensorflow C API: https://www.tensorflow.org/install/lang_c
CppFlow (TF C API -> C++ wrapper): https://github.com/serizba/cppflow
AudioFile (for WAV export): https://github.com/adamstark/AudioFile
Frameless Dark Style Window: https://github.com/Jorgen-VikingGod/Qt-Frameless-Window-DarkStyle
JSON for modern C++: https://github.com/nlohmann/json
r8brain-free-src (Resampling): https://github.com/avaneev/r8brain-free-src
rnnoise (CMake version, denoising output): https://github.com/almogh52/rnnoise-cmake
Logitech LED Illumination SDK (Mouse RGB integration): https://www.logitechg.com/en-us/innovation/developer-lab.html
QCustomPlot : https://www.qcustomplot.com/index.php/introduction
libnumbertext : https://github.com/Numbertext/libnumbertext

Contact

You can open an issue here or join the Discord server and discuss/ask anything there

Custom model training, fine-tuning, and compatible exports are available on request (not free). Use email or DM me on Xitter

Follow me on X (formerly Twitter): ZD1908 (@ZDi____) / X For any formal inquiries, send to this email: nika109021@gmail.com

Note about licensing

This program itself is MIT licensed, but for the models you use, their license terms apply. For example, if you're in Vietnam and using TensorFlowTTS models, you'll have to check here for some details

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
ext		ext
g2p_train		g2p_train
res		res
.gitignore		.gitignore
EnglishPhoneticProcessor.cpp		EnglishPhoneticProcessor.cpp
EnglishPhoneticProcessor.h		EnglishPhoneticProcessor.h
FastSpeech2.cpp		FastSpeech2.cpp
FastSpeech2.h		FastSpeech2.h
LICENSE.md		LICENSE.md
MultiBandMelGAN.cpp		MultiBandMelGAN.cpp
MultiBandMelGAN.h		MultiBandMelGAN.h
ONNXModel.cpp		ONNXModel.cpp
ONNXModel.h		ONNXModel.h
ONNXUtil.hpp		ONNXUtil.hpp
README.md		README.md
TensorVox.pro		TensorVox.pro
TextTokenizer.cpp		TextTokenizer.cpp
TextTokenizer.h		TextTokenizer.h
VITSEvo.cpp		VITSEvo.cpp
VITSEvo.h		VITSEvo.h
Voice.cpp		Voice.cpp
Voice.h		Voice.h
VoxCommon.cpp		VoxCommon.cpp
VoxCommon.hpp		VoxCommon.hpp
attention.cpp		attention.cpp
attention.h		attention.h
batchdenoisedlg.cpp		batchdenoisedlg.cpp
batchdenoisedlg.h		batchdenoisedlg.h
batchdenoisedlg.ui		batchdenoisedlg.ui
bert.cpp		bert.cpp
bert.h		bert.h
berttokenizer.cpp		berttokenizer.cpp
berttokenizer.h		berttokenizer.h
devits.cpp		devits.cpp
devits.h		devits.h
espeakphonemizer.cpp		espeakphonemizer.cpp
espeakphonemizer.h		espeakphonemizer.h
istftnettorch.cpp		istftnettorch.cpp
istftnettorch.h		istftnettorch.h
main.cpp		main.cpp
mainwindow.cpp		mainwindow.cpp
mainwindow.h		mainwindow.h
mainwindow.ui		mainwindow.ui
melgen.cpp		melgen.cpp
melgen.h		melgen.h
modelinfodlg.cpp		modelinfodlg.cpp
modelinfodlg.h		modelinfodlg.h
modelinfodlg.ui		modelinfodlg.ui
phddialog.cpp		phddialog.cpp
phddialog.h		phddialog.h
phddialog.ui		phddialog.ui
phonemizer.cpp		phonemizer.cpp
phonemizer.h		phonemizer.h
phoneticdict.cpp		phoneticdict.cpp
phoneticdict.h		phoneticdict.h
phonetichighlighter.cpp		phonetichighlighter.cpp
phonetichighlighter.h		phonetichighlighter.h
spectrogram.cpp		spectrogram.cpp
spectrogram.h		spectrogram.h
stdres.qrc		stdres.qrc
tacotron2.cpp		tacotron2.cpp
tacotron2.h		tacotron2.h
tacotron2torch.cpp		tacotron2torch.cpp
tacotron2torch.h		tacotron2torch.h
tfg2p.cpp		tfg2p.cpp
tfg2p.h		tfg2p.h
torchmoji.cpp		torchmoji.cpp
torchmoji.h		torchmoji.h
track.cpp		track.cpp
track.h		track.h
vits.cpp		vits.cpp
vits.h		vits.h
voicemanager.cpp		voicemanager.cpp
voicemanager.h		voicemanager.h
voxer.cpp		voxer.cpp
voxer.h		voxer.h
winicon.ico		winicon.ico

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TensorVox

Try it out

Supported architectures

Backends

Build instructions

Externals (and thanks)

Contact

Note about licensing

About

Uh oh!

Releases 8

Packages

Contributors 2

Uh oh!

Languages

License

ZDisket/TensorVox

Folders and files

Latest commit

History

Repository files navigation

TensorVox

Try it out

Supported architectures

Backends

Build instructions

Externals (and thanks)

Contact

Note about licensing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Contributors 2

Uh oh!

Languages

Packages