A machine learning-based HR analytics system that predicts employee performance using historical and simulated workforce data.
The project helps organizations:
- Identify high-performing employees
- Detect underperformers early
- Support HR decision-making
- Improve workforce planning
To build an end-to-end machine learning pipeline that:
- Analyzes employee data
- Trains a predictive model
- Classifies employee performance
- Visualizes key insights through an interactive dashboard
Organizations often struggle to evaluate employee performance objectively.
This system solves this by using data-driven predictions instead of manual evaluation methods.
- Programming Language: Python
- Frontend (UI): Streamlit
- Machine Learning: Scikit-learn
- Data Handling: Pandas, NumPy
- Visualization: Matplotlib
- Model: Random Forest Classifier
- Serialization: Joblib
Employee-Performance-Predictor/
│
├── data/ # Dataset (CSV files)
├── models/ # Trained ML model (.pkl)
├── images/ # Generated visualizations
├── notebooks/ # Jupyter notebooks (EDA & analysis)
│
├── app.py # Streamlit web application
├── main.py # Model training script
├── requirements.txt
└── README.md
git clone https://github.com/your-username/employee-performance-predictor.git
cd employee-performance-predictor
python -m venv venv
Windows
venv\Scripts\activate
Mac/Linux
source venv/bin/activate
pip install -r requirements.txt
streamlit run app.py
python main.py
- Upload employee dataset (CSV)
- Automatic data preprocessing
- Label encoding for categorical variables
- Machine learning model training (Random Forest)
- Performance prediction (High / Medium / Low)
- Model evaluation (Accuracy, Confusion Matrix)
- Feature importance visualization
- Interactive Streamlit dashboard
Data Collection
↓
Data Preprocessing
↓
Feature Engineering
↓
Model Training (Random Forest)
↓
Evaluation
↓
Prediction + Visualization
- Confusion Matrix Visualization
- Feature Importance Graph
- Prediction Interface (Streamlit UI)
- Age
- Experience
- Salary
- Training Hours
- Department
- Achieved high classification accuracy on synthetic dataset
- Identified key factors influencing employee performance
- Built an interactive HR analytics dashboard
- Integration with real HR datasets
- Employee attrition prediction
- Cloud deployment (AWS / Render / Streamlit Cloud)
- Authentication system for HR users
- Improved UI dashboards
Developed as a Data Science + Machine Learning portfolio project Focused on HR analytics and business decision support systems