A comprehensive machine learning project comparing different hyperparameter tuning strategies for RandomForest classifier on a customer churn prediction task.
This project demonstrates and compares multiple hyperparameter tuning techniques:
- Baseline Model: RandomForest with default parameters
- Grid Search: Exhaustive parameter search
- Random Search: Randomized parameter sampling
- Bayesian Optimization: Intelligent sample-efficient search
- Manual Tuning: Parameter-by-parameter exploration
- XGBoost with Early Stopping: Alternative model with efficiency techniques
✅ Modular code structure with clear separation of concerns
✅ Multiple tuning strategies for comparison
✅ Comprehensive performance metrics and analysis
✅ Automated result saving and comparison reporting
✅ Full reproducibility with fixed random seeds
ml-evaluate/
├── src/
│ ├── models/ # Model definitions
│ │ ├── __init__.py
│ │ └── baseline_model.py
│ ├── tuning/ # Hyperparameter tuning implementations
│ │ ├── __init__.py
│ │ ├── grid_search_tuning.py
│ │ ├── random_search_tuning.py
│ │ ├── bayesian_tuning.py
│ │ ├── manual_tuning.py
│ │ └── xgboost_tuning.py
│ └── utils/ # Utility modules
│ ├── __init__.py
│ ├── data_loader.py
│ └── evaluate_models.py
├── results/ # Output directory for saved results
├── notebooks/ # Jupyter notebooks for analysis
├── docs/ # Documentation
├── main.py # Main entry point
├── requirements.txt # Project dependencies
├── .gitignore # Git ignore file
└── README.md # This file
- Python 3.8+
- pip or conda
-
Clone or download the repository
cd ml-evaluate -
Create a virtual environment (recommended)
# Using venv python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Or using conda conda create -n ml-eval python=3.9 conda activate ml-eval
-
Install dependencies
pip install -r requirements.txt
The project supports multiple data sources:
-
Synthetic Data (Default)
- 1,000 samples, 20 features
- Binary classification (synthetic churn prediction)
- 60/40 class imbalance
- Generated on-the-fly
-
sklearn Datasets
breast_cancer: Binary classification (569 samples, 30 features)iris: Multi-class classification (150 samples, 4 features)wine: Multi-class classification (178 samples, 13 features)
-
CSV Files
- Load your own data from CSV
- Specify target column or use last column as target
python main.pypython main.py --source breast_cancer
python main.py --source iris
python main.py --source wine# Last column as target
python main.py --source csv --csv-path data/sample_data.csv
# Specify target column
python main.py --source csv --csv-path data/sample_data.csv --target churnExecute specific tuning methods:
# Baseline model
python src/models/baseline_model.py
# Grid Search
python src/tuning/grid_search_tuning.py
# Random Search
python src/tuning/random_search_tuning.py
# Bayesian Optimization
python src/tuning/bayesian_tuning.py
# Manual tuning
python src/tuning/manual_tuning.py
# XGBoost with early stopping
python src/tuning/xgboost_tuning.pyCompare all results:
python src/utils/evaluate_models.pyThe project uses a synthetic customer churn dataset with:
- 1,000 samples
- 20 features (15 informative, 3 redundant, 2 repeated)
- Binary classification (Churn vs. No Churn)
- Class imbalance: 60% No Churn, 40% Churn
- 80/20 train-test split
Results are saved as numpy files in the results/ directory:
baseline_results.npy- Baseline model performancegrid_search_results.npy- Grid Search resultsrandom_search_results.npy- Random Search resultsbayesian_results.npy- Bayesian Optimization resultsmanual_tuning_results.npy- Manual tuning explorationxgboost_results.npy- XGBoost resultscomparison_results.npy- Aggregated comparison data
For each tuning method, we track:
- Accuracy: Model performance on test set
- Search Time: Total tuning computation time
- Best Parameters: Optimal hyperparameter configuration
- CV Score: Cross-validation performance during search
The evaluation compares:
- Accuracy Improvement: Gain over baseline model
- Tuning Time: Computational cost of each method
- Efficiency: Accuracy gain per second of tuning
- Trade-offs: Best accuracy vs. fastest tuning
Typical findings from comparison:
- Grid Search achieves high accuracy but is computationally expensive
- Random Search balances quality and speed
- Bayesian Optimization provides sample-efficient exploration
- Early stopping can reduce training time without sacrificing accuracy
- Python 3.8+
- scikit-learn: Machine learning framework
- scikit-optimize: Bayesian optimization
- XGBoost: Gradient boosting library
- NumPy: Numerical computing
- pandas: Data manipulation (optional)
Key hyperparameter ranges explored:
n_estimators: [50, 100, 200, 300]max_depth: [3, 5, 10, 15, 20]min_samples_split: [2, 5, 10, 20]min_samples_leaf: [1, 2, 4, 10]
- Cross-validation folds: 5
- Scoring metric: Accuracy
- Parallel jobs: -1 (all cores)
- Create a new file in
src/tuning/ - Implement a class with:
run_*_search()method for tuningevaluate()method for testingsave_results()method for persistence
- Update
main.pyto include the new method - Update
src/utils/evaluate_models.pyto handle results
# Syntax check
python -m py_compile src/**/*.py
# Quick validation
python -c "from src.models import BaselineModel; print('Import successful')"See docs/DATA_SOURCES.md for detailed guide on:
- Using synthetic data
- Loading sklearn datasets (breast_cancer, iris, wine)
- Loading custom CSV files
- Data preprocessing and requirements
- Troubleshooting data issues
- Parallel Computing: Set
n_jobs=-1for fast grid/random search - Early Stopping: Use with gradient boosting methods for speed
- Reduced CV Folds: Start with 3 folds for faster experimentation
- Subset Data: Use smaller samples during development
Ensure you're running from the project root directory:
cd path/to/ml-evaluate
python main.pypip install --upgrade -r requirements.txt- Reduce dataset size in
DataLoader - Use fewer CV folds
- Reduce
n_iterin Bayesian/Random search
Suggestions for improvements:
- Add cross-validation visualizations
- Implement hyperparameter importance analysis
- Add more base models (SVM, Neural Networks)
- Create interactive comparison dashboard
This project is provided as-is for educational and research purposes.
Created for machine learning education and hyperparameter tuning evaluation.
- scikit-learn documentation
- scikit-optimize documentation
- XGBoost documentation
- Hyperparameter Optimization Best Practices
- v1.0.0 (2026-03-31): Initial public release
- Baseline, Grid Search, Random Search, Bayesian Optimization
- Manual tuning exploration
- XGBoost with early stopping
- Comprehensive comparison analysis
Last Updated: March 31, 2026