Skip to content

Isacapps/Predictive-Modeling-Student-Depression-Risk-Factors.ipynb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 

Repository files navigation

Predictive Analysis of Mental Health Determinants in Student Populations

Abstract
This research project implements a supervised machine learning framework to identify and classify depression risk factors among students. By analyzing a dataset of over 27,000 observations, the study explores the intersection of academic pressure, socio-economic status, and lifestyle habits to provide a data-driven perspective on student well-being.

Research Objectives

  • Risk Classification: Develop high-precision models to categorize individuals based on depressive symptom indicators.
  • Feature Criticality: Quantify the impact of variables such as 'Financial Stress', 'Academic Pressure', and 'Study Satisfaction'.
  • Lifestyle Correlation: Evaluate the statistical significance of sleep patterns and dietary habits in predicting mental health outcomes.

Dataset Taxonomy The study utilizes a multi-dimensional dataset comprising 18 features across four primary domains:

  • Demographics: Age, Gender, City, and Profession.
  • Academic Metrics: Degree type, CGPA, Academic Pressure, and Study Satisfaction.
  • Lifestyle Factors: Sleep Duration, Dietary Habits, and Daily Study/Work Hours.
  • Clinical Indicators: Family History of Mental Illness, Suicidal Thoughts, and Financial Stress levels.

Methodology: Data Preprocessing To ensure the integrity of the statistical inference, a comprehensive preprocessing pipeline was executed:

  • Missing Value Management: Systematic imputation or removal of incomplete records to maintain dataset balance.
  • Categorical Encoding: Implementation of One-Hot Encoding and Ordinal Mapping to translate qualitative descriptors (e.g., 'Sleep Duration') into computationally viable formats.
  • Outlier Mitigation: Statistical filtering of non-physiological values in continuous features (e.g., Age and Study Hours) to reduce model variance.
  • Feature Scaling: Standardizing numerical inputs to a uniform scale, preventing magnitude bias during algorithmic training.
  • Target Encoding: Harmonizing the 'Depression' label for binary classification tasks.
  • Model Validation & Diagnostic: Implementation of Confusion Matrices across all classifiers to provide a granular breakdown of True Positives and False Negatives, ensuring the clinical reliability of the predictions.

Technical Framework

  • Language: Python
  • Key Libraries: Pandas (Data Manipulation), Scikit-Learn (Machine Learning), Seaborn & Matplotlib (Exploratory Data Analysis).
  • Evaluation Metrics: Performance Metrics: While accuracy was monitored, the primary optimization focused on Recall and F1-Score.
    • Recall Optimization: Critical for clinical screening to minimize Type II errors (False Negatives), ensuring that students at risk of depression are correctly identified.
    • Confusion Matrix Analysis: Used as the primary diagnostic tool to evaluate the trade-off between sensitivity and specificity for each implemented model.

Project Deliverables

  • Analysis Notebook: Python implementation of the end-to-end ML pipeline.
  • Research Presentation: Detailed slide deck summarizing findings and clinical insights.
  • Visual Assets: High-resolution infographics and correlation matrices.

Authors

  • Isabella Cappiello
  • Nathan Dubourg
  • Romain Sartori

About

Predictive modeling of student depression determinants. Features an end-to-end pipeline: rigorous data preprocessing, statistical visualization (EDA), and classification via supervised Machine Learning (Scikit-Learn).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors