This is the repo of Stat_Collective for the workings in Datastorm 2.0 Approach Load Dataset directly as csv into panda dataframe Train set, validation set and test set is loaded seperately. 1. Data Cleaning and EDA Duplicate dataset for undersampled (Skip this for first fit) Check for Missing Data Check for ordinal data masked us numerical data Plots and charts for Data 2. Data Preprocessing Dummy variables Impute variables? Feature Selection Scaling 3. Model Building 3.1 Logistic Regression 3.2 Random Forest Classification 3.3 K-Nearest Neighbours 3.4 Gradient Boosting 3.5 XG Boost 3.6 Support Vector Machine 3.7 Neural Network 4. Model Evaluation (Hyper Parameter Tuning) F1 score confusion matrix 5. Model Testing F1 score Confusion matrix