A Python-based computer vision pipeline for analyzing pre-recorded workout videos. The system identifies known individuals using facial recognition, classifies exercises from a fixed set, and counts repetitions using pose estimation.
- Face Recognition: Identifies known users using InsightFace with ArcFace embeddings
- Exercise Classification: Rule-based classifier for 4 exercises (Squats, Push-ups, Lunges, Bicep Curls)
- Rep Counting: Automatic repetition counting using pose-based state machines
- Video Annotation: Outputs annotated videos with person name, exercise type, and rep count overlays
- JSON Results: Structured analysis results with timestamps and confidence scores
┌─────────────────┐
│ Video Input │
└────────┬────────┘
│
┌────┴─────┐
│ │
▼ ▼
┌────────┐ ┌──────────────┐
│ Face │ │ MediaPipe │
│ ID │ │ Pose │
└────────┘ └──────┬───────┘
┌────┴────┐
│ │
▼ ▼
┌─────────┐ ┌───────────┐
│Exercise │ │ Rep │
│Classify │ │ Counter │
└─────────┘ └───────────┘
│ │
└────┬────┘
▼
┌──────────────────┐
│ Output Generator │
└──────────────────┘
│
┌─────────┴─────────┐
▼ ▼
┌──────────┐ ┌──────────┐
│Annotated │ │ JSON │
│ Video │ │ Results │
└──────────┘ └──────────┘
face-recognition & exercise classification/
├── README.md # This file
├── requirements.txt # Python dependencies
├── main.py # Main pipeline script
├── enroll_user.py # User enrollment script
├── modules/
│ ├── __init__.py
│ ├── face_recognition_module.py # Face identification logic
│ ├── exercise_classifier.py # Exercise classification
│ ├── rep_counter.py # Rep counting state machines
│ └── utils.py # Utility functions (angles, etc.)
├── data/
│ ├── enrollment/ # User enrollment images
│ ├── embeddings.pkl # Stored face embeddings
│ └── videos/ # Input workout videos
└── outputs/ # Annotated videos and JSON results
- Python 3.8 or higher
- (Optional) CUDA-capable GPU for faster processing
- Clone or navigate to the repository:
cd "face-recognition & exercise classification"- Create a virtual environment (recommended):
# Windows
python -m venv venv
.\venv\Scripts\activate
# Linux/Mac
python3 -m venv venv
source venv/bin/activate- Install dependencies:
pip install -r requirements.txtNote: If you encounter issues with InsightFace on Windows, you may need to install it from a wheel file. Visit InsightFace GitHub for platform-specific instructions.
For CPU-only systems, replace onnxruntime-gpu with onnxruntime in requirements.txt before installing:
pip install onnxruntime==1.17.0Before analyzing videos, enroll known users by providing their photos.
python enroll_user.py --name "John Doe" --directory "data/enrollment/john"python enroll_user.py --name "Jane Smith" --images "data/enrollment/jane1.jpg" "data/enrollment/jane2.jpg"Tips:
- Provide 3-5 clear photos per person for best results
- Use photos with good lighting and frontal face views
- Vary poses slightly (different angles, expressions)
Process a workout video to generate annotated output and JSON results:
python main.py --video "data/videos/workout1.mp4"This will create:
outputs/annotated_workout1.mp4- Video with overlaysoutputs/workout1_results.json- Detailed analysis results
# Specify custom output paths
python main.py --video "input.mp4" --output "custom_output.mp4" --json "results.json"
# Generate only JSON results (no video output)
python main.py --video "input.mp4" --no-videoThe output video includes:
- Top-left: Identified person name and confidence score
- Top-right: Detected exercise and confidence score
- Bottom-center: Rep count and last rep duration
- Pose skeleton: Overlay showing body landmarks and connections
Example output structure:
{
"video_filename": "workout1.mp4",
"duration_seconds": 45.2,
"identified_person": "John Doe",
"person_confidence": 0.87,
"exercise_detected": "Squats",
"exercise_confidence": 0.92,
"total_reps": 12,
"reps_detail": [
{
"rep_num": 1,
"start_time": 2.1,
"bottom_time": 2.8,
"end_time": 3.4,
"duration": 1.3
},
{
"rep_num": 2,
"start_time": 3.8,
"bottom_time": 4.4,
"end_time": 5.0,
"duration": 1.2
}
],
"processing_time_seconds": 8.3
}The system currently supports 4 exercises:
-
Squats
- Detection: Hip angle < 100°, knee flexion, upright torso
- Rep counting: STANDING → DOWN (hip < 90°) → STANDING (hip > 140°)
-
Push-ups
- Detection: Elbow flexion, horizontal body orientation
- Rep counting: UP → DOWN (elbow < 90°) → UP (elbow > 140°)
-
Lunges
- Detection: Asymmetric leg position, forward stance
- Rep counting: STANDING → DOWN (knee < 100°) → STANDING (knee > 150°)
-
Bicep Curls
- Detection: Elbow flexion with stable shoulder, upright torso
- Rep counting: EXTENDED → CURLED (elbow < 50°) → EXTENDED (elbow > 140°)
- Technology: InsightFace with ArcFace model
- Embedding: 512-dimensional face vectors
- Matching: Cosine similarity with 0.6 threshold
- Efficiency: Checks every 30 frames (~1 per second)
- Pose Extraction: MediaPipe extracts 33 body landmarks per frame
- Feature Engineering: Calculates angles (hip, knee, elbow) and positions
- Rule-based Classifier: Each exercise has characteristic angle ranges
- Temporal Smoothing: Majority voting over 30-frame window
Key angles:
- Hip angle: Shoulder-Hip-Knee (squats)
- Elbow angle: Shoulder-Elbow-Wrist (push-ups, curls)
- Knee angle: Hip-Knee-Ankle (squats, lunges)
- State Machine: Tracks exercise phases (up/down transitions)
- Angle Thresholds: Configurable for each exercise type
- Rep Validation: Only counts complete cycles
- Timestamps: Records start, bottom/peak, and end times
Example (Squats):
STANDING (hip > 140°) → DOWN (hip < 90°) → STANDING (hip > 140°) = 1 rep
- Add exercise signature to classifier (
modules/exercise_classifier.py):
def classify_new_exercise(self, features: Dict[str, float]) -> float:
confidence = 0.0
# Add angle checks
if features['some_angle'] < threshold:
confidence += 0.4
return confidence- Create a rep counter (
modules/rep_counter.py):
class NewExerciseRepCounter(RepCounter):
def process_frame(self, landmarks, timestamp: float) -> None:
# Implement state machine logic
pass- Register in main pipeline (
main.py):
self.rep_counters['New Exercise'] = NewExerciseRepCounter()Edit the counter initialization in main.py:
# Make squats more/less strict
self.rep_counters['Squats'] = SquatRepCounter(
down_threshold=80, # Default: 90
up_threshold=150 # Default: 140
)- Processing Speed: 20-30 FPS on GPU, 8-12 FPS on CPU
- Accuracy:
- Face Recognition: ~95% for enrolled users in good lighting
- Exercise Classification: ~85-90% for supported exercises
- Rep Counting: ~90% accuracy for clean form
Solution: Install InsightFace with proper dependencies:
pip install insightface onnxruntime-gpuSolution:
- Use lower resolution videos (720p instead of 1080p)
- Reduce pose model complexity in
main.py:
self.pose = self.mp_pose.Pose(model_complexity=0) # 0=lite, 1=full, 2=heavySolution:
- Ensure face is clearly visible and well-lit in video
- Use high-quality enrollment photos (frontal view, good lighting)
- Lower similarity threshold in
FaceRecognitionModuleinitialization
Solution:
- Ensure full body is visible in frame
- Check that exercise form matches expected patterns
- Adjust angle thresholds in
exercise_classifier.py
Solution:
- Verify exercise is correctly classified first
- Adjust rep counter thresholds for your specific form
- Ensure full range of motion (complete up/down cycles)
For faster processing with NVIDIA GPU:
- Install CUDA Toolkit (11.x or 12.x)
- Install cuDNN
- Install GPU-enabled packages:
pip install onnxruntime-gpu==1.17.0Verify GPU usage:
import onnxruntime as ort
print(ort.get_available_providers())
# Should include 'CUDAExecutionProvider'- Single person: Designed for one primary subject per video
- Pre-recorded only: No live streaming support
- Fixed exercise set: Limited to 4 exercise types
- Form dependent: Requires proper exercise form for accurate counting
- Lighting dependent: Face recognition needs good lighting conditions
Potential improvements for production:
- Multi-person tracking
- Additional exercises (planks, jumping jacks, etc.)
- ML-based exercise classifier (replace rule-based)
- Form quality assessment
- Real-time streaming support
- Mobile deployment
- Minimum: Intel i5/Ryzen 5, 8GB RAM, Python 3.8+
- Recommended: Intel i7/Ryzen 7, 16GB RAM, NVIDIA GPU (GTX 1060+)
- Storage: ~2GB for dependencies + model weights
This is an internal MVP for demonstration purposes.
For issues or questions, refer to:
- MediaPipe: https://google.github.io/mediapipe/
- InsightFace: https://github.com/deepinsight/insightface
- OpenCV: https://docs.opencv.org/