An AI-powered application designed to convert raw, stuttered audio into fluent and clear speech.
demo.mp4
Launch the full backend
docker-compose -f backend/deployment/dockercompose.yml up --buildMake sure all the containers are fully up then run the frontend
docker build -t frontend:1.0 frontenddocker run -d -p 5173:5173 --net deployment_ainet --name frontend frontend:1.0This project is developed and maintained by two primary contributors:
-
Frontend Developer: @jsbloo
The frontend developer was responsible for designing and implementing a user interface that seamlessly integrates with the backend system. Built with modern web technologies—including React, Node.js, TypeScript, and Vite—the frontend application emphasizes simplicity and intuitive user experience. To streamline deployment, the application is containerized using Docker. -
Backend Developer: @kaoutaar
The backend developer focuses on implementing the core processing logic, including integrating AI models for Speech-to-Text (STT) and Text-to-Speech (TTS), managing database operations, and handling asynchronous tasks. The backend is built with FastAPI, Celery, and PostgreSQL, and it is containerized using Docker for easy deployment.
For detailed information on the backend and frontend components, please refer to their respective README files:
-
Backend README:
This file contains the technical details of the backend setup, including instructions for running the servers, setting up Docker. It also outlines the AI models used, task processing flow, and system requirements. -
Frontend README:
This files serves as a concise guide for understanding, installing, using, and contributing to the Stutter Enhancer frontend application.
Some potential pitfalls to be aware of:
-
The app uses the Outetts model, which, while the only available option, is not very efficient.
-
It can be buggy at times, causing occasional performance issues.
-
The model also struggles with longer audio or text inputs.
- Integration of more advanced TTS and STT models for better performance.
- Implementation of user authentication with personalized data storage.
- Extension of API capabilities to support detailed analytics and reporting.
- Support for video processing, enabling the extraction of audio from video files.
- Frontend -> Backend authentication.
- Make frontend / api public.
- Kubernetes deployment.
- WhatsApp integration.
We welcome contributions to both the frontend and backend. Please fork the repository, create an issue, or submit a pull request for any improvements or bug fixes.
This project is licensed under the MIT License - see the LICENSE file for details.