SpeechBridge

An AI-powered assistive communication system that improves the clarity of impaired speech in real time — built for people with stuttering, dysarthria, and speech motor impairments.

Traditional speech-to-text systems perform poorly on distorted or irregular speech patterns. SpeechBridge is a multi-layer pipeline that goes beyond a simple ASR wrapper to handle the unique challenges of impaired speech.

Problem Statement

People with speech disorders often struggle to communicate clearly. Conditions like stuttering, dysarthria, and speech motor impairments produce speech patterns that break standard ASR systems — leading to misrecognition, frustration, and reduced independence.

SpeechBridge addresses this by combining local ASR, phoneme-level correction, and adaptive personal memory into a single assistive pipeline.

System Architecture

Audio Input
    ↓
ASR — Whisper (local inference, beam search, confidence scores)
    ↓
Stutter Detection & Removal
    ↓
Dysarthria Correction (Phoneme + Confidence Based)
    ↓
Personal Adaptive Memory
    ↓
Final Cleaned Output

Core Components

1. ASR Layer — Speech to Text

Uses OpenAI Whisper running fully locally (no API calls)
Tuned with beam search, best_of sampling, and temperature control
Audio is preprocessed via ffmpeg: mono conversion, 16kHz resampling, error-tolerant decoding
Per-word confidence scores are extracted from Whisper's output

2. Stutter Removal

Detects and removes:

Rapid word repetitions (timestamp-based gap detection)
Elongated characters (e.g. "sooo" → "so")
Short-gap duplications

3. Dysarthria Correction Engine

For each low-confidence word (confidence < 0.6):

Look up phoneme sequence via the CMU Pronouncing Dictionary
Compute Levenshtein distance between phoneme strings
Find the closest match in the top 50,000 most frequent English words (wordfreq)
Replace only if edit distance ≤ 2 (conservative — avoids over-correction)

4. Personal Adaptive Memory

Corrections are stored per user session
Once a word is corrected (e.g. wader → water), future occurrences are fixed instantly
Makes the system more accurate over time for each individual speaker

Observed Challenge

When testing with dysarthric speech:

Stage	Output
Input speech	"I want some water"
Whisper raw output	"I one big bit of water"
After correction	"I want some water"

Key insight: Base ASR misrecognition under dysarthric speech cannot be fixed by an LLM alone — acoustic decoding must be improved first.

Improvements Implemented

Improvement	Detail
Beam search decoding	Improves transcription accuracy on distorted speech
Audio preprocessing	Mono, 16kHz, loudness normalization, error-tolerant ffmpeg flags
Confidence-aware correction	Only corrects words Whisper is uncertain about
Phoneme distance matching	Uses CMU dict + Levenshtein for phonetically-informed replacement
Adaptive user memory	Personalises corrections over time

Optional LLM Layer

An LLM is not used to blindly fix ASR errors. It is used selectively for:

Grammar smoothing
Semantic clarification
Emergency intent detection

Example:

Stage	Text
ASR output	"I need doctor breathing problem"
LLM enhancement	"I need a doctor. I am having trouble breathing."

Tech Stack

Layer	Technology
Frontend	React, Vite, Tailwind CSS
Backend	FastAPI, Python
ASR	OpenAI Whisper (base model, local inference)
Phoneme matching	pronouncing (CMU dict), python-Levenshtein
Vocabulary	wordfreq (top 50k English words)
Audio processing	ffmpeg (via MediaRecorder API → WebM/Opus → WAV)

Project Structure

├── backend/
│   ├── main.py          # FastAPI server, Whisper transcription, stutter cleaning
│   ├── dysarthria.py    # Dysarthria correction — phoneme matching + adaptive memory
│   └── requirements.txt
├── frontend/
│   ├── src/
│   │   ├── App.jsx              # Recording logic, MIME detection, API call
│   │   ├── components/
│   │   │   ├── RecordButton.jsx
│   │   │   ├── WaveformVisualizer.jsx
│   │   │   ├── OutputPanels.jsx
│   │   │   ├── StatsRow.jsx
│   │   │   └── ActionButtons.jsx
│   │   └── pages/
│   │       └── HistoryPage.jsx
│   └── package.json
└── README.md

Setup & Running

Prerequisites

Python 3.10+
Node.js 18+
ffmpeg on PATH

Backend

cd backend
pip install -r requirements.txt
uvicorn main:app --reload --port 8000

The Whisper base model (~139MB) downloads automatically on first run.

Frontend

cd frontend
npm install
npm run dev

Open http://localhost:5173.

API

`POST /transcribe`

Accepts a multipart/form-data audio file. Returns:

{
  "raw_text": "I I w-want to to go",
  "after_stutter": "I want to go",
  "final_text": "I want to go",
  "corrections_applied": [{ "from": "wader", "to": "water" }],
  "tokens": [{ "word": "I", "start": 0.0, "end": 0.3, "confidence": 0.98 }]
}

Human Impact

SpeechBridge enables:

Clear communication for speech-impaired individuals
Reduced frustration in social and professional interactions
Assistive technology for caregivers and clinicians
A foundation for rehabilitation progress tracking

Planned additions: speech clarity scoring over time, error-type classification, WebSocket-based live streaming transcription, emergency intent detection , speech tehrapy care.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpeechBridge

Problem Statement

System Architecture

Core Components

1. ASR Layer — Speech to Text

2. Stutter Removal

3. Dysarthria Correction Engine

4. Personal Adaptive Memory

Observed Challenge

Improvements Implemented

Optional LLM Layer

Tech Stack

Project Structure

Setup & Running

Prerequisites

Backend

Frontend

API

`POST /transcribe`

Human Impact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SpeechBridge

Problem Statement

System Architecture

Core Components

1. ASR Layer — Speech to Text

2. Stutter Removal

3. Dysarthria Correction Engine

4. Personal Adaptive Memory

Observed Challenge

Improvements Implemented

Optional LLM Layer

Tech Stack

Project Structure

Setup & Running

Prerequisites

Backend

Frontend

API

POST /transcribe

Human Impact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /transcribe`

Packages