Skip to content

bhuvana87ps/NutriClass

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ₯— NutriClass: Food Classification Using Nutritional Data

πŸ“Œ Project Overview

NutriClass is a GUVI Mini Project that I developed to demonstrate an end-to-end Machine Learning workflow using food nutritional data.
The project focuses on how nutritional attributes such as calories, protein, carbohydrates, fat, and sugar can be used to classify food items accurately.

This project is designed to be exam-ready, portfolio-ready, and live-evaluation ready, following industry-style ML practices.


❓ Problem Statement

In real-world diet planning and nutrition monitoring, users often know their nutritional targets but not the exact food that satisfies those targets.

Manual identification:

  • Is error-prone
  • Lacks consistency
  • Does not scale

This project automates the process by learning a one-to-one mapping between nutritional values and food names using machine learning.


🧠 Business Assumptions & Design Logic

Why Food Name as Target Column?

  • Each food item represents a distinct, allowed meal
  • Nutritional values act as a fingerprint
  • In strict diet scenarios, only one food is permitted, not a list

Why Classification Instead of Recommendation?

Aspect Recommendation System NutriClass
Output Top-N foods Single exact food
Control Flexible Strict
Use Case Casual diet Medical / fitness diet
Error Tolerance High Very low

This design ensures diet compliance, automation, and precision.


πŸ’Ό Business Use Cases

  • Smart Dietary Applications
    Auto-select food based on nutritional targets

  • Health Monitoring Tools
    Assist dieticians and nutritionists

  • Food Logging Systems
    Automatically classify user-entered nutrition

  • Educational Platforms
    Explain food–nutrition relationships using ML

  • Meal & Grocery Planning Apps
    Suggest exact replacements within constraints


πŸ“Š Dataset Description

  • Dataset Type: Tabular
  • Raw Data: Synthetic and imbalanced (realistic scenario)

Features:

  • Calories

  • Protein

  • Carbohydrates

  • Fat

  • Sugar

  • Target Variable: Food_Name

Dataset Stages:

  • Raw data for realism and imbalance handling
  • Processed data for modeling and deployment

πŸ”¬ Project Methodology & Workflow

1️⃣ Data Understanding

  • Studied class distribution and imbalance
  • Inspected nutrition ranges per food
  • Identified noisy and duplicate entries

πŸ“Œ Notebook:
01_data_understanding.ipynb


2️⃣ Data Cleaning & Preprocessing

Handled:

  • Missing values (imputation / removal)
  • Duplicate food records
  • Outliers using statistical thresholds
  • Feature scaling using StandardScaler

Clean data stored separately to ensure reproducibility.

πŸ“Œ Notebook:
02_data_cleaning.ipynb


3️⃣ Exploratory Data Analysis (EDA)

  • Distribution plots for nutritional features
  • Inter-class variation analysis
  • Feature correlation analysis

πŸ“Œ Notebook:
03_eda.ipynb


4️⃣ Feature Engineering

  • Label encoding for food names
  • PCA to understand dimensional contribution
  • Feature importance analysis for interpretability

πŸ“Œ Notebook:
04_feature_engineering.ipynb


5️⃣ Unsupervised Learning (Analysis Support)

Although the final output is supervised classification, unsupervised learning was used to:

  • Understand natural food groupings
  • Validate nutritional similarity patterns
  • Support business explanation during evaluation

πŸ“Œ Techniques:

  • K-Means Clustering
  • Distance-based similarity analysis

πŸ“Œ Notebook:
05_unsupervised_learning.ipynb


6️⃣ Supervised Learning & Model Training

Trained and compared multiple classifiers:

  • Logistic Regression
  • Decision Tree
  • Random Forest
  • K-Nearest Neighbors
  • Support Vector Machine
  • Gradient Boosting
  • XGBoost

Used cross-validation and GridSearchCV for tuning.

πŸ“Œ Notebook:
06_supervised_learning.ipynb


βš™οΈ ML Pipelines & Engineering Design

To ensure production readiness, pipelines were created:

πŸ”Ή Preprocessing Pipeline

  • Scaling
  • Encoding
  • Feature transformation

πŸ“„ pipelines/preprocessing_pipeline.py


πŸ”Ή Model Pipelines

  • Unified preprocessing + model flow
  • Ensures same logic for training & inference

πŸ“„ pipelines/model_pipelines.py


πŸ”Ή Hyperparameter Optimization

  • GridSearchCV used
  • Prevents overfitting

πŸ“„ pipelines/grid_search.py


πŸ“ˆ Evaluation Metrics

Each model was evaluated using:

  • Accuracy
  • Precision
  • Recall
  • F1-score
  • Confusion Matrix

These metrics help analyze:

  • Overall correctness
  • Class-wise misclassification
  • Model stability

πŸ–₯️ Streamlit Application Design

The project includes a multi-page Streamlit application designed for clear separation of functionality and usability.

πŸ”Ή Pages Overview

Page Purpose
Food Classifier Predict exact food name
Diet Recommendation Nutrition-based guidance
Pipeline Overview Explain ML workflow
Raw Data Explorer Inspect original dataset

▢️ How to Run the Project

Follow these steps to run the project locally:

1️⃣ Clone the Repository

git clone <repository-url>
cd NutriClass

2️⃣ Create a Virtual Environment (Optional but Recommended)

python -m venv venv
source venv/bin/activate    # On Windows: venv\Scripts\activate

3️⃣ Install Dependencies

pip install -r requirements.txt

4️⃣ Run the Streamlit App

streamlit run app.py

The application will open in your browser at:

http://localhost:8501

⚠️ Note: GitHub Actions CI may fail due to Streamlit UI execution in a headless environment.
The application runs successfully in a local setup.

πŸ—‚οΈ Project Structure

NutriClass/
β”‚
β”œβ”€β”€ app.py
β”œβ”€β”€ README.md
β”œβ”€β”€ requirements.txt
β”‚
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ raw/
β”‚   β”‚   └── synthetic_food_dataset_imbalanced.csv
β”‚   └── processed/
β”‚       └── clean_food_data.csv
β”‚
β”œβ”€β”€ notebooks/
β”‚   β”œβ”€β”€ 01_data_understanding.ipynb
β”‚   β”œβ”€β”€ 02_data_cleaning.ipynb
β”‚   β”œβ”€β”€ 03_eda.ipynb
β”‚   β”œβ”€β”€ 04_feature_engineering.ipynb
β”‚   β”œβ”€β”€ 05_unsupervised_learning.ipynb
β”‚   └── 06_supervised_learning.ipynb
β”‚
β”œβ”€β”€ pipelines/
β”‚   β”œβ”€β”€ preprocessing_pipeline.py
β”‚   β”œβ”€β”€ model_pipelines.py
β”‚   └── grid_search.py
β”‚
β”œβ”€β”€ models/
β”‚   └── nutriclass_pipeline.pkl
β”‚
└── pages/
    β”œβ”€β”€ 1_Food_Classifier.py
    β”œβ”€β”€ 2_Diet_Recommendation.py
    β”œβ”€β”€ 3_Pipeline_Overview.py
    └── 4_Raw_Data_Explorer.py

πŸ‘€ Project Presentation & Author

Project Developed By:
Bhuvaneswari G
Web Developer & Data Science Learner

πŸ”š Conclusion

NutriClass showcases how a structured machine learning approach can be applied to the Food & Nutrition domain, delivering accurate classification, clear insights, and real-world applicability through a deployable application.