Sign Language to Text and Speech Conversion

This project implements a real-time sign language recognition system using computer vision and deep learning. It can detect American Sign Language (ASL) gestures through a webcam and convert them to text and speech.

🔗 Live demo: https://sign-language-recognitiontool.vercel.app (Note: backend runs on a free-tier host and may take ~30s to wake up on first request)

Features

Real-time hand gesture detection using MediaPipe
CNN-based sign language classification for A-Z letters
Text-to-speech conversion
GUI interface with Tkinter
Data collection tools for training

Files Overview

final_pred.py - Main GUI application with full features
prediction_wo_gui.py - Console-based prediction without GUI
data_collection_binary.py - Tool for collecting binary image data
data_collection_final.py - Tool for collecting skeleton hand data
cnn8grps_rad1_model.h5 - Pre-trained CNN model
white.jpg - White background image for skeleton drawing
AtoZ_3.1/ - Directory structure for training data

Dependencies

Install the required packages:

pip install -r requirements.txt

Required packages:

opencv-python==4.8.1.78
cvzone==1.6.1
tensorflow==2.13.0
keras==2.13.1
numpy==1.24.3
pyttsx3==2.90
pillow==10.0.1
pyenchant==3.2.2

Usage

Running the GUI Application

python final_pred.py

Running Console Prediction

python prediction_wo_gui.py

Data Collection

# For binary images
python data_collection_binary.py

# For skeleton data
python data_collection_final.py

Controls

ESC - Exit application
'a' - Start/stop data collection (in data collection scripts)
'n' - Next letter (in data collection scripts)

How It Works

Hand Detection: Uses CVZone HandTrackingModule to detect hand landmarks
Feature Extraction: Converts hand landmarks to skeleton representation
Classification: CNN model predicts the sign language letter
Post-processing: Applies rules to improve accuracy and handle gestures
Output: Displays predicted text and converts to speech

Model Architecture

The CNN model (cnn8grps_rad1_model.h5) is trained on 8 groups of similar gestures:

Group 0: A, E, M, N, S, T
Group 1: B, D, F, I, K, R, U, V, W
Group 2: C, O
Group 3: G, H
Group 4: L
Group 5: P, Q, Z
Group 6: X
Group 7: Y, J

Notes

Ensure good lighting for optimal hand detection
Keep hand within camera frame
The model works best with clear, distinct gestures
Press ESC to exit any running script

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
AtoZ_3.1		AtoZ_3.1
frontend		frontend
web_portal		web_portal
.gitattributes		.gitattributes
.gitignore		.gitignore
Procfile		Procfile
README.md		README.md
cnn8grps_rad1_model.h5		cnn8grps_rad1_model.h5
data_collection_binary.py		data_collection_binary.py
data_collection_final.py		data_collection_final.py
final_pred.py		final_pred.py
final_pred_backup.py		final_pred_backup.py
prediction_wo_gui.py		prediction_wo_gui.py
render.yaml		render.yaml
requirements.txt		requirements.txt
run_portal.py		run_portal.py
runtime.txt		runtime.txt
server.py		server.py
vercel.json		vercel.json
white.jpg		white.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sign Language to Text and Speech Conversion

Features

Files Overview

Dependencies

Usage

Running the GUI Application

Running Console Prediction

Data Collection

Controls

How It Works

Model Architecture

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sign Language to Text and Speech Conversion

Features

Files Overview

Dependencies

Usage

Running the GUI Application

Running Console Prediction

Data Collection

Controls

How It Works

Model Architecture

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages