Multilingual Voice Understanding Model
-
Updated
Dec 30, 2025 - Python
Multilingual Voice Understanding Model
A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singing ASR. FireRedVAD supports speech/singing/music in 100+ langs. FireRedLID supports 100+ langs and 20+ zh dialects. FireRedPunc supports zh and en.
A SOTA Industrial-Grade Voice Activity Detection & Audio Event Detection, supporting 100+ languages, outperforming Silero-VAD, TEN-VAD, FunASR-VAD and WebRTC-VAD
Tensorflow2 implementation of Data-driven Harmonic Filters for Audio Representation Learning
A Few Shot learning technique, called Relation Networks, for classification of audio events
Code for the paper "Deep Learning Solutions for Audio Event Detection in a Swine Barn Using Environmental Audio and Weak Labels".
A machine learning task to classify audio events
Coach structured answers in real time during mock interviews with question detection, feedback, filler tracking, and live transcription.
Provide accurate voice activity and audio event detection in 100+ languages with high-performance streaming and non-streaming capabilities.
Add a description, image, and links to the audio-event-classification topic page so that developers can more easily learn about it.
To associate your repository with the audio-event-classification topic, visit your repo's landing page and select "manage topics."