NutriClass is a GUVI Mini Project that I developed to demonstrate an end-to-end Machine Learning workflow using food nutritional data.
The project focuses on how nutritional attributes such as calories, protein, carbohydrates, fat, and sugar can be used to classify food items accurately.
This project is designed to be exam-ready, portfolio-ready, and live-evaluation ready, following industry-style ML practices.
In real-world diet planning and nutrition monitoring, users often know their nutritional targets but not the exact food that satisfies those targets.
Manual identification:
- Is error-prone
- Lacks consistency
- Does not scale
This project automates the process by learning a one-to-one mapping between nutritional values and food names using machine learning.
- Each food item represents a distinct, allowed meal
- Nutritional values act as a fingerprint
- In strict diet scenarios, only one food is permitted, not a list
| Aspect | Recommendation System | NutriClass |
|---|---|---|
| Output | Top-N foods | Single exact food |
| Control | Flexible | Strict |
| Use Case | Casual diet | Medical / fitness diet |
| Error Tolerance | High | Very low |
This design ensures diet compliance, automation, and precision.
-
Smart Dietary Applications
Auto-select food based on nutritional targets -
Health Monitoring Tools
Assist dieticians and nutritionists -
Food Logging Systems
Automatically classify user-entered nutrition -
Educational Platforms
Explain foodβnutrition relationships using ML -
Meal & Grocery Planning Apps
Suggest exact replacements within constraints
- Dataset Type: Tabular
- Raw Data: Synthetic and imbalanced (realistic scenario)
-
Calories
-
Protein
-
Carbohydrates
-
Fat
-
Sugar
-
Target Variable:
Food_Name
- Raw data for realism and imbalance handling
- Processed data for modeling and deployment
- Studied class distribution and imbalance
- Inspected nutrition ranges per food
- Identified noisy and duplicate entries
π Notebook:
01_data_understanding.ipynb
Handled:
- Missing values (imputation / removal)
- Duplicate food records
- Outliers using statistical thresholds
- Feature scaling using
StandardScaler
Clean data stored separately to ensure reproducibility.
π Notebook:
02_data_cleaning.ipynb
- Distribution plots for nutritional features
- Inter-class variation analysis
- Feature correlation analysis
π Notebook:
03_eda.ipynb
- Label encoding for food names
- PCA to understand dimensional contribution
- Feature importance analysis for interpretability
π Notebook:
04_feature_engineering.ipynb
Although the final output is supervised classification, unsupervised learning was used to:
- Understand natural food groupings
- Validate nutritional similarity patterns
- Support business explanation during evaluation
π Techniques:
- K-Means Clustering
- Distance-based similarity analysis
π Notebook:
05_unsupervised_learning.ipynb
Trained and compared multiple classifiers:
- Logistic Regression
- Decision Tree
- Random Forest
- K-Nearest Neighbors
- Support Vector Machine
- Gradient Boosting
- XGBoost
Used cross-validation and GridSearchCV for tuning.
π Notebook:
06_supervised_learning.ipynb
To ensure production readiness, pipelines were created:
- Scaling
- Encoding
- Feature transformation
π pipelines/preprocessing_pipeline.py
- Unified preprocessing + model flow
- Ensures same logic for training & inference
π pipelines/model_pipelines.py
- GridSearchCV used
- Prevents overfitting
π pipelines/grid_search.py
Each model was evaluated using:
- Accuracy
- Precision
- Recall
- F1-score
- Confusion Matrix
These metrics help analyze:
- Overall correctness
- Class-wise misclassification
- Model stability
The project includes a multi-page Streamlit application designed for clear separation of functionality and usability.
| Page | Purpose |
|---|---|
| Food Classifier | Predict exact food name |
| Diet Recommendation | Nutrition-based guidance |
| Pipeline Overview | Explain ML workflow |
| Raw Data Explorer | Inspect original dataset |
Follow these steps to run the project locally:
git clone <repository-url>
cd NutriClass2οΈβ£ Create a Virtual Environment (Optional but Recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate3οΈβ£ Install Dependencies
pip install -r requirements.txt4οΈβ£ Run the Streamlit App
streamlit run app.pyThe application will open in your browser at:
http://localhost:8501
β οΈ Note: GitHub Actions CI may fail due to Streamlit UI execution in a headless environment.
The application runs successfully in a local setup.
ποΈ Project Structure
NutriClass/
β
βββ app.py
βββ README.md
βββ requirements.txt
β
βββ data/
β βββ raw/
β β βββ synthetic_food_dataset_imbalanced.csv
β βββ processed/
β βββ clean_food_data.csv
β
βββ notebooks/
β βββ 01_data_understanding.ipynb
β βββ 02_data_cleaning.ipynb
β βββ 03_eda.ipynb
β βββ 04_feature_engineering.ipynb
β βββ 05_unsupervised_learning.ipynb
β βββ 06_supervised_learning.ipynb
β
βββ pipelines/
β βββ preprocessing_pipeline.py
β βββ model_pipelines.py
β βββ grid_search.py
β
βββ models/
β βββ nutriclass_pipeline.pkl
β
βββ pages/
βββ 1_Food_Classifier.py
βββ 2_Diet_Recommendation.py
βββ 3_Pipeline_Overview.py
βββ 4_Raw_Data_Explorer.pyπ€ Project Presentation & Author
Project Developed By:
Bhuvaneswari G
Web Developer & Data Science Learnerπ Conclusion
NutriClass showcases how a structured machine learning approach can be applied to the Food & Nutrition domain, delivering accurate classification, clear insights, and real-world applicability through a deployable application.