An AI-powered assistive communication system that improves the clarity of impaired speech in real time — built for people with stuttering, dysarthria, and speech motor impairments.
Traditional speech-to-text systems perform poorly on distorted or irregular speech patterns. SpeechBridge is a multi-layer pipeline that goes beyond a simple ASR wrapper to handle the unique challenges of impaired speech.
People with speech disorders often struggle to communicate clearly. Conditions like stuttering, dysarthria, and speech motor impairments produce speech patterns that break standard ASR systems — leading to misrecognition, frustration, and reduced independence.
SpeechBridge addresses this by combining local ASR, phoneme-level correction, and adaptive personal memory into a single assistive pipeline.
Audio Input
↓
ASR — Whisper (local inference, beam search, confidence scores)
↓
Stutter Detection & Removal
↓
Dysarthria Correction (Phoneme + Confidence Based)
↓
Personal Adaptive Memory
↓
Final Cleaned Output
- Uses OpenAI Whisper running fully locally (no API calls)
- Tuned with beam search, best_of sampling, and temperature control
- Audio is preprocessed via ffmpeg: mono conversion, 16kHz resampling, error-tolerant decoding
- Per-word confidence scores are extracted from Whisper's output
Detects and removes:
- Rapid word repetitions (timestamp-based gap detection)
- Elongated characters (e.g. "sooo" → "so")
- Short-gap duplications
For each low-confidence word (confidence < 0.6):
- Look up phoneme sequence via the CMU Pronouncing Dictionary
- Compute Levenshtein distance between phoneme strings
- Find the closest match in the top 50,000 most frequent English words (wordfreq)
- Replace only if edit distance ≤ 2 (conservative — avoids over-correction)
- Corrections are stored per user session
- Once a word is corrected (e.g.
wader → water), future occurrences are fixed instantly - Makes the system more accurate over time for each individual speaker
When testing with dysarthric speech:
| Stage | Output |
|---|---|
| Input speech | "I want some water" |
| Whisper raw output | "I one big bit of water" |
| After correction | "I want some water" |
Key insight: Base ASR misrecognition under dysarthric speech cannot be fixed by an LLM alone — acoustic decoding must be improved first.
| Improvement | Detail |
|---|---|
| Beam search decoding | Improves transcription accuracy on distorted speech |
| Audio preprocessing | Mono, 16kHz, loudness normalization, error-tolerant ffmpeg flags |
| Confidence-aware correction | Only corrects words Whisper is uncertain about |
| Phoneme distance matching | Uses CMU dict + Levenshtein for phonetically-informed replacement |
| Adaptive user memory | Personalises corrections over time |
An LLM is not used to blindly fix ASR errors. It is used selectively for:
- Grammar smoothing
- Semantic clarification
- Emergency intent detection
Example:
| Stage | Text |
|---|---|
| ASR output | "I need doctor breathing problem" |
| LLM enhancement | "I need a doctor. I am having trouble breathing." |
| Layer | Technology |
|---|---|
| Frontend | React, Vite, Tailwind CSS |
| Backend | FastAPI, Python |
| ASR | OpenAI Whisper (base model, local inference) |
| Phoneme matching | pronouncing (CMU dict), python-Levenshtein |
| Vocabulary | wordfreq (top 50k English words) |
| Audio processing | ffmpeg (via MediaRecorder API → WebM/Opus → WAV) |
├── backend/
│ ├── main.py # FastAPI server, Whisper transcription, stutter cleaning
│ ├── dysarthria.py # Dysarthria correction — phoneme matching + adaptive memory
│ └── requirements.txt
├── frontend/
│ ├── src/
│ │ ├── App.jsx # Recording logic, MIME detection, API call
│ │ ├── components/
│ │ │ ├── RecordButton.jsx
│ │ │ ├── WaveformVisualizer.jsx
│ │ │ ├── OutputPanels.jsx
│ │ │ ├── StatsRow.jsx
│ │ │ └── ActionButtons.jsx
│ │ └── pages/
│ │ └── HistoryPage.jsx
│ └── package.json
└── README.md
- Python 3.10+
- Node.js 18+
- ffmpeg on PATH
cd backend
pip install -r requirements.txt
uvicorn main:app --reload --port 8000The Whisper base model (~139MB) downloads automatically on first run.
cd frontend
npm install
npm run devOpen http://localhost:5173.
Accepts a multipart/form-data audio file. Returns:
{
"raw_text": "I I w-want to to go",
"after_stutter": "I want to go",
"final_text": "I want to go",
"corrections_applied": [{ "from": "wader", "to": "water" }],
"tokens": [{ "word": "I", "start": 0.0, "end": 0.3, "confidence": 0.98 }]
}SpeechBridge enables:
- Clear communication for speech-impaired individuals
- Reduced frustration in social and professional interactions
- Assistive technology for caregivers and clinicians
- A foundation for rehabilitation progress tracking
Planned additions: speech clarity scoring over time, error-type classification, WebSocket-based live streaming transcription, emergency intent detection , speech tehrapy care.