Speech Recognition API

This is a simple API for real-time keyword spotting, built using Python, Flask, and TensorFlow. The project takes a voice input and predicts one of the ten predefined keywords. It is fully containerized using Docker and Docker Compose, making it easy to deploy on any server.

🚀 Key Features

Real-time Prediction: Uses a pre-trained model to classify audio inputs.
API-based: Provides a RESTful API endpoint to predict keywords.
Containerized: Utilizes Docker and Docker Compose for a consistent and portable environment.
Efficient: Uses uWSGI and Nginx for high-performance serving.

📦 Project Structure

classifier/: Contains the code for preparing the dataset and training the classification model.
dataset/: Stores the raw audio files for the ten keywords.
server/: The main directory for the API, including Flask application, model, and Docker files.
local/test/: Includes sample audio files and a simple client for testing the API.

⚙️ Getting Started

Running the API

The project is designed to be easily deployed on a Linux server. Simply clone the repository and run the init.sh script.

Clone the repository:

git clone https://github.com/your-username/Speech-Recognition-API.git
cd Speech-Recognition-API/server

Run the initialization script:
```
bash init.sh
```
This script will:
- Update the system packages.
- Install Docker and Docker Compose (if not already present).
- Build the Docker images for Flask and Nginx.
- Start the containers.

The API will be accessible on port 80.

🧪 How to Use the API

The API exposes a single endpoint for prediction.

Endpoint

URL: http://localhost/predict
Method: POST
Content-Type: audio/wav (or other supported audio formats)
Body: The audio file you want to predict.

Example

You can use the provided local/client.py script to test the API.

python local/client.py

The script will send the down.wav/left.wav file to the API and print the prediction result.

🧠 Model and Libraries

The project uses the following key technologies and libraries:

Backend: Flask, uWSGI, Nginx

Machine Learning: TensorFlow, Keras, scikit-learn

Audio Processing: Librosa

Containerization: Docker, Docker Compose

The model (model.h5) is a deep learning model trained for keyword spotting on the ten provided classes: down, go, left, no, off, on, right, stop, up, and yes.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
local		local
server		server
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Recognition API

🚀 Key Features

📦 Project Structure

⚙️ Getting Started

Running the API

🧪 How to Use the API

Endpoint

Example

🧠 Model and Libraries

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Speech Recognition API

🚀 Key Features

📦 Project Structure

⚙️ Getting Started

Running the API

🧪 How to Use the API

Endpoint

Example

🧠 Model and Libraries

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages