Skip to content

seuly1203/LanguageAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Children's Speech Analysis Pipeline

Python PyTorch Whisper LoRA

An end-to-end pipeline for analyzing children's speech from mixed adult-child recordings, using a fine-tuned ASR model and NLP-based linguistic analysis.


📌 Overview

Standard Automatic Speech Recognition (ASR) models struggle with children's atypical and unclear pronunciation patterns. This project addresses that gap by building an audio processing pipeline with speaker assignment and linguistic analysis — and fine-tuning a Swedish Whisper model to better handle children's speech.


🔎 Features

  • ASRkb-whisper-large for speech-to-text transcription
  • Speaker Assignment — Logistic regression model to separate child and adult speech segments
  • Linguistic AnalysisStanza and spaCy for NLP-based lexical analysis
  • LoRA Fine-tuning — Low-Rank Adaptation to fine-tune kb-whisper-large on limited children's speech data within Colab's memory constrains, specializing the model for children's speech

📁 Project Structure

├── data/
│   ├── train/          # Training data
│   └── test/           # Test data
├── models/             # Saved model weights (not tracked in git)
├── data_loader.py      # Data loading utilities
├── functions.py        # Helper functions
├── lr_train.py         # Logistic regression training
└── main.py             # Full pipeline: data loading, model inference, optional LR training

➿ Procedure

1. Prepare Data

Place data in data/train/ and data/test/.

Note: This data is used for the logistic regression speaker assignment model only — separate from the dataset used to fine-tune the Whisper ASR model.

Each dataset split contains:

  • Multiple .wav audio files (mixed adult-child recordings)
  • A .csv file with transcriptions in the format: [filename], [transcribed text]

2. Run

python main.py

main.py handles the full pipeline — loading data, loading models, and running inference. Logistic regression training can be enabled and configured via parameters inside main.py.

The LoRA fine-tuned Whisper model was trained separately on Google Colab and is loaded from a private Hugging Face repository.


📃 Results

Reduced WER from 0.23 → 0.157 through data cleaning, text postprocessing (jiwer), and hyperparameter tuning.

  • Base model — performs better on longer audio files
  • LoRA model — higher accuracy on shorter audio files, but hallucinates more severely

🔒 Note: Data used in this project is proprietary and not publicly available.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages