βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β USER INTERFACE (React) β
β Hosted on Vercel β
ββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
β
β User visits app
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AUTHENTICATION LAYER β
β Supabase Auth β
β β’ Login/Register β
β β’ JWT token management β
β β’ Session handling β
ββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
β
β Authenticated
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PRACTICE SESSION START β
β β
β Frontend displays: β
β β’ Target sentence: "Think about the weather" β
β β’ [Optional] Play TTS example (ElevenLabs) β
β β’ Record button β
ββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
β
β User clicks "Record"
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AUDIO CAPTURE (Frontend) β
β β
β Web Audio API + MediaRecorder API β
β β’ Start recording from microphone β
β β’ Real-time waveform visualization (WaveSurfer.js) β
β β’ User speaks: "Fink about the wedder" β
β β’ Stop recording β
β β’ Convert to WAV/MP3 format β
ββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
β
β POST /api/v1/practice/submit-audio
β Payload: { audio: File, reference_text: "..." }
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β BACKEND API (FastAPI) β
β Hosted on Railway β
β β
β 1. Validate request (auth token, file size, format) β
β 2. Upload audio to storage β
β 3. Initialize session tracking β
ββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
β
β Audio file ready
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AUDIO PREPROCESSING LAYER β
β β
β Python libraries: pydub, soundfile β
β β’ Normalize audio volume β
β β’ Remove leading/trailing silence β
β β’ Resample to 16kHz (standard for speech) β
β β’ Convert to required format β
ββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
β
β Preprocessed audio
βΌ
ββββββββββββββββββββββββ΄βββββββββββββββββββββββ
β β
βΌ βΌ
ββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββ
β WHISPER API (OpenAI) β β AZURE SPEECH SERVICES β
β β β Pronunciation Assessment β
β β’ Transcription β β β
β β’ Word-level timestamps β β Input: β
β β’ Confidence scores β β β’ Audio file β
β β β β’ Reference text: β
β Returns: β β "Think about the weather" β
β "Fink about the wedder" β β β
ββββββββββββββ¬ββββββββββββββ β Returns: β
β β β’ Overall accuracy: 68/100 β
β β β’ Phoneme-level scores: β
β β - ΞΈ β f (score: 31) β
β β - Γ° β d (score: 38) β
β β β’ Word-level breakdown β
β β β’ Error types β
β ββββββββββββββββ¬ββββββββββββββββ
β β
β β
βββββββββββββββββββ¬ββββββββββββββββββββββββββ
β
β Both results combined
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ERROR ANALYSIS ENGINE (Backend) β
β Python + Business Logic β
β β
β 1. Compare Whisper transcription vs reference text β
β 2. Parse Azure phoneme scores β
β 3. Identify error patterns: β
β β’ Phoneme substitutions (ΞΈβf, Γ°βd) β
β β’ Omissions β
β β’ Timing/duration issues β
β β’ Prosody problems β
β β
β 4. Query user history from database: β
β β’ Past phoneme performance β
β β’ Improvement trends β
β β’ Recurring error patterns β
β β
β 5. Classify impediment type: β
β β’ Frontal lisp (ΞΈβf, Γ°βd pattern) β
β β’ Rhotacism (rβw) β
β β’ Etc. β
ββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
β
β Structured error report
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DATABASE UPDATE (Supabase) β
β PostgreSQL β
β β
β INSERT INTO phoneme_performance: β
β β’ user_id, session_id, timestamp β
β β’ phoneme, accuracy_score, error_type β
β β’ word, position β
β β
β UPDATE user_progress: β
β β’ sessions_completed++ β
β β’ overall_score_trend β
β β’ current_difficulty_level β
ββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
β
β Data saved
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AI FEEDBACK GENERATION (Claude API) β
β Anthropic Claude Sonnet 4 β
β β
β Input payload: β
β { β
β "target_sentence": "Think about the weather", β
β "user_transcription": "Fink about the wedder", β
β "phoneme_errors": [ β
β {"phoneme": "ΞΈ", "actual": "f", "score": 31, "word": "think"}, β
β {"phoneme": "Γ°", "actual": "d", "score": 38, "word": "the"} β
β ], β
β "historical_patterns": { β
β "ΞΈ_substitution_rate": 0.87, β
β "sessions_completed": 12, β
β "improvement_trend": "slight" β
β }, β
β "user_difficulty_level": 2 β
β } β
β β
β Claude analyzes and generates: β
β β’ Personalized, encouraging feedback β
β β’ 5 adaptive practice sentences (progressive difficulty) β
β β’ Specific articulation tips β
β β’ Difficulty adjustment recommendation β
ββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
β
β Claude response received
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TEXT-TO-SPEECH GENERATION (ElevenLabs) β
β Optional Step β
β β
β If user needs to hear correct pronunciation: β
β β’ Generate TTS for problem words: "think", "the", "weather" β
β β’ Generate TTS for next practice sentences β
β β’ Use appropriate accent (en-US, en-GB, etc.) β
β β’ Store audio URLs β
ββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
β
β TTS audio ready
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RESPONSE ASSEMBLY (Backend) β
β β
β Compile complete response: β
β { β
β "session_id": "uuid", β
β "overall_score": 68, β
β "accuracy_breakdown": { β
β "pronunciation": 65, β
β "fluency": 82, β
β "completeness": 100 β
β }, β
β "errors": [ β
β { β
β "phoneme": "ΞΈ", β
β "word": "think", β
β "score": 31, β
β "feedback": "Try placing tongue between teeth" β
β } β
β ], β
β "ai_feedback": { β
β "text": "Good effort! I noticed you're substituting...", β
β "encouragement": "You're making progress on fluency!" β
β }, β
β "practice_sentences": [ β
β "The cat is here.", β
β "I think that's right.", β
β "This thing is smooth.", β
β "Three brothers thought about it.", β
β "The weather is thoroughly unpredictable." β
β ], β
β "tts_urls": { β
β "correct_example": "https://storage.../correct.mp3", β
β "practice_1": "https://storage.../p1.mp3", β
β }, β
β "visual_data": { β
β "phoneme_chart": [...], β
β "waveform_data": [...], β
β "progress_history": [...] β
β } β
β } β
ββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
β
β Return response to frontend
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FRONTEND RESULTS DISPLAY β
β β
β User sees: β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Your Score: 68/100 β β
β β βββββ β β
β β β β
β β Great effort! I noticed you're substituting 'th' β β
β β sounds with 'f' and 'd'. This is very common and β β
β β completely fixable with practice. β β
β β β β
β β Problem Areas: β β
β β π΄ "think" - ΞΈ sound (31/100) β β
β β [Play Correct] [Play Your Recording] β β
β β β β
β β π‘ Tip: Place your tongue between your teeth and β β
β β blow air gently... β β
β β β β
β β π Progress Chart [Recharts visualization] β β
β β β β
β β Next Practice Sentences: β β
β β 1. "The cat is here." [βΆοΈ Listen] [π€ Record] β β
β β 2. "I think that's right." [βΆοΈ Listen] [π€ Record] β β
β β ... β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β User continues practice
βΌ
[Loop back to top]
ββββββββββββββββ
β USER β
ββββββββ¬ββββββββ
β
β 1. Speaks into microphone
βΌ
ββββββββββββββββββββββββ
β Web Audio API β
β (Browser) β
β β’ Capture audio β
β β’ Real-time visual β
ββββββββ¬ββββββββββββββββ
β
β 2. Audio blob (WAV/MP3)
βΌ
ββββββββββββββββββββββββ
β React Frontend β
β β’ FormData prep β
β β’ Loading states β
ββββββββ¬ββββββββββββββββ
β
β 3. HTTP POST with audio file + metadata
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FastAPI Backend (Railway) β
β β
β Rate Limiting β Auth Check β File Validation β
ββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β 4. Save audio to storage
βΌ
ββββββββββββββββββββββββ
β Supabase Storage β
β β’ Store audio file β
β β’ Generate URL β
ββββββββ¬ββββββββββββββββ
β
β 5. Audio URL
βΌ
ββββββββββββββββββββββββββββββββββββββββββββ
β Parallel API Calls β
β β
β βββββββββββββββ βββββββββββββββ β
β β Whisper β β Azure β β
β β (STT) β β (Assess) β β
β ββββββββ¬βββββββ ββββββββ¬βββββββ β
β β β β
β β β β
βββββββββββΌβββββββββββββββββββββΌββββββββββββ
β β
β 6. Transcription β 7. Phoneme scores
β β
ββββββββββ¬ββββββββββββ
β
βΌ
ββββββββββββββββββ
β Error Analysis β
β (Python Logic) β
ββββββββββ¬ββββββββ
β
β 8. Error patterns identified
βΌ
ββββββββββββββββββ
β PostgreSQL β
β (Supabase) β
β β’ Save results β
β β’ Get history β
ββββββββββ¬ββββββββ
β
β 9. Historical data + current errors
βΌ
ββββββββββββββββββ
β Claude API β
β β’ Generate β
β feedback β
β β’ Create β
β sentences β
ββββββββββ¬ββββββββ
β
β 10. Personalized response
βΌ
ββββββββββββββββββ
β ElevenLabs β
β (Optional) β
β β’ TTS for β
β examples β
ββββββββββ¬ββββββββ
β
β 11. Audio URLs
βΌ
ββββββββββββββββββ
β JSON Response β
β Assembly β
ββββββββββ¬ββββββββ
β
β 12. Complete response object
βΌ
ββββββββββββββββββ
β React Frontend β
β β’ Display UI β
β β’ Update state β
β β’ Show results β
ββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SUPABASE (All-in-One) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββββββββββ βββββββββββββββββββββββ β
β β PostgreSQL DB β β Storage Buckets β β
β β β β β β
β β Tables: β β Buckets: β β
β β β’ users β β β’ audio-uploads/ β β
β β β’ sessions β β β’ tts-generated/ β β
β β β’ phoneme_performanceβ β β β
β β β’ practice_sentencesβ β Auto-delete after β β
β β β’ user_progress β β 30 days (GDPR) β β
β β β’ impediment_profilesβ β β β
β ββββββββββββ¬ββββββββββββ βββββββββββ¬ββββββββββββ β
β β β β
β βΌ βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Supabase Realtime (Optional) β β
β β β’ Live progress updates β β
β β β’ Multi-device sync β β
β ββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Supabase Auth β β
β β β’ JWT tokens β β
β β β’ Row-level security β β
β β β’ OAuth providers β β
β ββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PRESENTATION LAYER β
β React + TypeScript + Tailwind + shadcn/ui β
β Hosted on: Vercel β
βββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
β HTTPS/WebSocket
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β APPLICATION LAYER β
β FastAPI (Python 3.11+) β
β Hosted on: Railway/Render β
βββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββΌββββββββββββββββ
β β β
βΌ βΌ βΌ
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β AI Layer β β Data Layer β β Storage Layerβ
β β β β β β
β β’ Whisper β β β’ PostgreSQL β β β’ Supabase β
β β’ Azure β β β’ Redis β β Storage β
β β’ Claude β β (cache) β β β’ Cloudflare β
β β’ ElevenLabs β β β’ Supabase β β R2 β
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ