A real-time web application designed to recognize hand gestures and translate them into actionable text or speech. This project bridges communication gaps by running a full machine learning pipeline—from custom landmark data collection to live browser-based inference.
This repository contains the complete workflow for a gesture recognition tool. It uses a Python-based pipeline to capture and train hand landmarks (using computer vision tools like MediaPipe) and a JavaScript/HTML frontend utilizing WebRTC to stream and process video directly in the browser.
collect_data.py: Python script used to capture webcam feeds, extract hand landmarks, and save the coordinates for model training.dataset.csv: The compiled dataset containing the extracted coordinates and their corresponding gesture labels.gesture_model.pkl: The serialized, pre-trained machine learning model used for classifying the gestures.index.html: The main user interface for the web application.script.js: Client-side logic that handles webcam access, WebRTC data channels, and rendering the translated speech/text.assets/: Directory containing static assets (images, stylesheets, etc.).
- Frontend: HTML5, CSS3, JavaScript (WebRTC)
- Computer Vision: MediaPipe (Landmark Extraction)
- Machine Learning: Python, scikit-learn, Pandas
If you wish to retrain the model with your own gestures:
- Install Python dependencies:
pip install opencv-python mediapipe scikit-learn pandas - Run
collect_data.pyto launch the webcam and record new gestures intodataset.csv. - Train the model (ensure your training script exports the updated
gesture_model.pkl).
-
Clone the repository:
git clone [https://github.com/Mayuur15/Code.INIT-.git](https://github.com/Mayuur15/Code.INIT-.git)
-
Navigate to the project directory.
-
Serve the directory using any local web server. For example, using Python:
python -m http.server 8000
-
Open your browser and navigate to http://localhost:8000. (Note: Webcam access requires the site to be served over localhost or https)