| title | Unified ML Pipelines |
|---|---|
| emoji | π€ |
| colorFrom | yellow |
| colorTo | orange |
| sdk | streamlit |
| sdk_version | 1.31.0 |
| app_file | hf_app.py |
| pinned | false |
| license | mit |
π‘ Personal Suggestions:
- If you have a better CPU or a GPU in your device, you must fork this repository and use it in your personal computer to get faster results.
- Make sure you preprocess your data (handle missing values, datatypes, outliers etc.) to get more accuracy on our models.
Mathematics-Driven Parallel Machine Learning Pipelines for Regression & Classification
This application allows you to train multiple families of Machine Learning models on your tabular data simultaneously, with mathematically-correct preprocessing tailored for each model family.
- β Dual Learning Types: Support for both Regression and Classification tasks
- β Upload CSV: Bring your own dataset
- β Automated Preprocessing: Mathematical-aware preprocessing for different model types
- β 5 Model Families: Weight-based, Tree-based, Neural Network, Instance-based, and Kernel-based (classification)
- β 14+ ML Models: Comprehensive model coverage across all families
- β Hyperparameter Tuning: GridSearchCV and RandomizedSearchCV with optimized grids
- β MLFlow Integration: Full experiment tracking, metrics logging, and model versioning
- β Interactive Visualizations: Plotly charts for model comparison
- β Top 3 Models Display: Quick identification of best performers
- β Detailed Metrics Tables: Complete breakdown of all model results
- β Error & Outlier Detection: Automated issue detection with actionable suggestions
- β Hyperparameters Display: View tuned hyperparameters for each model
- β Streamlit Cloud: One-click deployment
- β Hugging Face Spaces: Unified app deployment
- β Local Development: FastAPI + Streamlit separation
- β MLFlow UI: Experiment tracking dashboard
Simply visit the deployed Space and upload your CSV file!
https://unified-ml-pipelines.streamlit.app/
# Clone the repository
git clone https://github.com/okyashgajjar/Unified-ML-Pipelines.git
cd Unified-ML-Pipelines
# Install dependencies
pip install -r requirements.txt
# Run the unified app
streamlit run hf_app.py# Terminal 1: Start FastAPI backend
uvicorn app:app --reload
# Terminal 2: Start Streamlit frontend
streamlit run streamlit_app.pymlflow ui
# Running on http://localhost:5000Unified-ML-Pipelines/
βββ hf_app.py # Unified app for HF Spaces (Frontend + Backend)
βββ app.py # FastAPI backend (REST API)
βββ streamlit_app.py # Streamlit frontend (Web UI)
βββ requirements.txt # Dependencies
β
βββ Superwised_Regression/
β βββ preprocessing.py # Data cleaning & validation
β βββ tabular_data/
β βββ weight_reg.py # Linear, Ridge, Lasso
β βββ tree_reg.py # DT, RF, XGBoost, GBM
β βββ nn_reg.py # MLP Regressor
β βββ instance_reg.py # KNN, Radius Neighbors
β βββ parallel_executor.py # Sequential execution
β βββ mlflow_tracker.py # MLFlow integration
β
βββ Superwised_Classification/
β βββ tabular_data/
β βββ weight_class.py # Logistic Regression, Ridge Classifier
β βββ tree_class.py # DT, RF, GBM, AdaBoost, LightGBM, XGBoost
β βββ nn_class.py # MLP Classifier
β βββ kernel_class.py # SVC (RBF, Linear, Poly)
β βββ instance_class.py # KNN Classifier
β
βββ dataset/ # Sample datasets
βββ mlruns/ # MLFlow experiment data
βββ MASTER_DOCUMENTATION.md # Full technical documentation
βββ PROJECT_SUMMARY.md # Project philosophy & approach
| Family | Models |
|---|---|
| Weight-Based | Linear, Ridge, Lasso |
| Tree-Based | DT, RF, XGBoost, GBM |
| Neural Network | MLP Regressor |
| Instance-Based | KNN, Radius Neighbors |
| Family | Models |
|---|---|
| Weight-Based | Logistic Regression, Ridge Classifier |
| Tree-Based | DT, RF, GBM, AdaBoost, LightGBM, XGBoost |
| Neural Network | MLP Classifier |
| Kernel-Based | SVC (RBF, Linear, Poly) |
| Instance-Based | KNN Classifier |
| Metric | Interpretation |
|---|---|
| MAE | Average error magnitude (lower = better) |
| RMSE | Penalizes large errors (lower = better) |
| RΒ² | % variance explained (higher = better) |
| MAPE | Percentage error (lower = better) |
| MSE | Squared error (lower = better) |
| Metric | Interpretation |
|---|---|
| Accuracy | Overall correct predictions (higher = better) |
| Precision | True positives / Predicted positives (higher = better) |
| Recall | True positives / Actual positives (higher = better) |
| F1 Score | Harmonic mean of precision & recall (higher = better) |
- RMSE vs MAE Ratio: Detects outlier sensitivity (>1.2 indicates outliers)
- Negative RΒ²: Identifies models worse than baseline mean
- High MAPE: Flags issues with small target values (>50%)
| Endpoint | Method | Description |
|---|---|---|
/api/health |
GET | Health check with version info |
/api/train |
POST | Submit training job with CSV file |
/api/train-classification |
POST | Submit classification training job |
/api/results/{job_id} |
GET | Get training results |
/api/jobs |
GET | List all training jobs |
- Optimized Hyperparameter Grids: ~80% reduction in search space
- 3-Fold Cross-Validation: Faster than 5-fold with minimal accuracy loss
- Sequential Execution: Stable and reliable on all hardware
- Expected Training Time: ~5-8 minutes for 10K rows
- Project overview and quick start guide
- Upload CSV file
- Preview data with row/column counts
- Select Learning Type: Regression or Classification
- Choose target column
- Enable/disable MLFlow tracking
- Start training with real-time progress
- Top 3 Models: Side-by-side comparison cards
- Interactive Charts: Bar charts, heatmaps, scatter plots
- Detailed Results Table: Sortable with all metrics
- Error Analysis: Automated issue detection (Regression)
- View all past training jobs
- Quick access to results
- Job status tracking
MIT License - feel free to use and modify.
- GitHub: https://github.com/devaldaki3/Unified-ML-Pipelines
- Documentation: See
MASTER_DOCUMENTATION.mdfor full technical details
Built with β€οΈ focusing on mathematical correctness and educational value.
Version 2.0 | Last Updated: January 2026