Skip to content

MEghlima8/Simple-Speech-Recognition-API

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Speech Recognition API

This is a simple API for real-time keyword spotting, built using Python, Flask, and TensorFlow. The project takes a voice input and predicts one of the ten predefined keywords. It is fully containerized using Docker and Docker Compose, making it easy to deploy on any server.

🚀 Key Features

  • Real-time Prediction: Uses a pre-trained model to classify audio inputs.
  • API-based: Provides a RESTful API endpoint to predict keywords.
  • Containerized: Utilizes Docker and Docker Compose for a consistent and portable environment.
  • Efficient: Uses uWSGI and Nginx for high-performance serving.

📦 Project Structure

  • classifier/: Contains the code for preparing the dataset and training the classification model.
  • dataset/: Stores the raw audio files for the ten keywords.
  • server/: The main directory for the API, including Flask application, model, and Docker files.
  • local/test/: Includes sample audio files and a simple client for testing the API.

⚙️ Getting Started

Running the API

The project is designed to be easily deployed on a Linux server. Simply clone the repository and run the init.sh script.

  1. Clone the repository:

    git clone https://github.com/your-username/Speech-Recognition-API.git
    cd Speech-Recognition-API/server
  2. Run the initialization script:

    bash init.sh

    This script will:

    • Update the system packages.
    • Install Docker and Docker Compose (if not already present).
    • Build the Docker images for Flask and Nginx.
    • Start the containers.

The API will be accessible on port 80.

🧪 How to Use the API

The API exposes a single endpoint for prediction.

Endpoint

  • URL: http://localhost/predict
  • Method: POST
  • Content-Type: audio/wav (or other supported audio formats)
  • Body: The audio file you want to predict.

Example

You can use the provided local/client.py script to test the API.

python local/client.py

The script will send the down.wav/left.wav file to the API and print the prediction result.

🧠 Model and Libraries

The project uses the following key technologies and libraries:

Backend: Flask, uWSGI, Nginx

Machine Learning: TensorFlow, Keras, scikit-learn

Audio Processing: Librosa

Containerization: Docker, Docker Compose

The model (model.h5) is a deep learning model trained for keyword spotting on the ten provided classes: down, go, left, no, off, on, right, stop, up, and yes.

About

A simple API for real-time keyword spotting using Flask, TensorFlow, and Docker.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors