GitHub

Model Selection using K-Fold Cross-Validation and Bootstrap

Project Overview

This project evaluates machine learning models using K-Fold Cross-Validation and Bootstrap Resampling. It aims to select the best model based on predictive performance and robustness. The evaluation is performed on the Digits dataset using hyperparameter tuning for optimal results.

The following classifiers are included:

Random Forest
Logistic Regression
Support Vector Machines (SVM)
K-Neighbors Classifier
Decision Tree

Logs, reports, and visualizations provide detailed insights into model performance.

How to Run the Code

Prerequisites

Python Version: Ensure Python 3.10 or higher is installed.
Dependencies: Install required libraries with:
```
pip install -r requirements.txt
```

Execution Steps

Clone the repository or download the Python script.
Navigate to the script directory.
Run the script using:
```
python Project2_KCross_Bootstrap.py
```

Outputs

Logs: Saved in logs/debug.log.
Reports: Stored in results/.
Visualizations: Saved as .png files in figures/.

Answers to Key Questions

1. Do your cross-validation and bootstrapping model selectors agree with a simpler model selector like AIC in simple cases (like linear regression)?

Yes, in simpler cases such as linear regression, both K-Fold Cross-Validation and Bootstrap Resampling align well with simpler model selectors like AIC. These methods focus on evaluating model performance, balancing complexity and predictive accuracy:

K-Fold Cross-Validation evaluates models across multiple splits, providing stable estimates of performance variability.
Bootstrap Resampling assesses generalization by evaluating predictions on resampled data.

While AIC relies on assumptions like model linearity, K-Fold and Bootstrap are more versatile, making them suitable for evaluating a wider range of models.

2. In what cases might the methods you've written fail or give incorrect or undesirable results?

The methods may face challenges in the following cases:

Imbalanced Datasets: K-Fold might fail to preserve class distributions across splits, leading to misleading results.
Small Datasets: Both methods can struggle with small datasets. K-Fold may lose critical data in splits, while Bootstrap may produce overly optimistic estimates.
Overfitting During Tuning: Excessive hyperparameter tuning in K-Fold can lead to overfitting, causing poor performance on unseen data.

Such issues can result in models being incorrectly evaluated, favoring those that perform well on the validation data but fail on unseen data.

3. What could you implement given more time to mitigate these cases or help users of your methods?

With more time, the following improvements could be implemented:

Stratified K-Fold: Automatically preserve class distributions across folds for imbalanced datasets.
Nested Cross-Validation: Separate hyperparameter tuning and evaluation to avoid overfitting and provide a true estimate of model performance.
Advanced Bootstrap Adjustments:
- .632+ Bootstrap for more realistic error estimation.
- Custom sampling ratios for flexible resampling.
Additional Metrics: Provide F1-score, ROC-AUC, or precision-recall curves for a more nuanced evaluation.
Automated Error Analysis: Generate reports highlighting misclassification patterns and critical features.

4. What parameters have you exposed to your users in order to use your model selectors?

The program provides flexibility by exposing the following parameters:

K-Fold Parameters:
- Number of splits (n_splits_values): Users can specify split sizes, such as 5 or 10.
Bootstrap Parameters:
- Number of iterations (n_iterations): Control the resampling count (default: 100).
Model Hyperparameters:
- Random Forest: n_estimators (number of trees).
- Logistic Regression: Regularization parameter C.
- SVM: Regularization parameter C and kernel coefficient gamma.
- Decision Tree: Maximum depth (max_depth).
- K-Neighbors: Number of neighbors (n_neighbors).
Output Controls:
- Logs: Debug information and runtime statistics saved in logs/.
- Reports: Summaries of metrics and hyperparameters stored in results/.

Expected Outputs

Logs:
- Debug and runtime logs are saved to logs/debug.log.
Reports:
- K-Fold results with metrics and hyperparameters: results/kfold_results.txt.
- Bootstrap results with mean and standard deviation: results/bootstrap_results.txt.
Visualizations:
- K-Fold Visualizations:
  - Mean ± Standard Deviation of scores across models.
  - Scores across different fold sizes.
- Bootstrap Visualizations:
  - Trends of accuracy over iterations.
  - Distribution of bootstrap scores.

All figures are saved as .png files in the figures/ directory.

Summary

This program provides robust model evaluation using K-Fold Cross-Validation and Bootstrap Resampling. It ensures flexibility for various datasets and offers detailed logs, reports, and visualizations, helping users make informed model selection decisions.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
logs		logs
results		results
.gitignore		.gitignore
Project2_Kcross_Bootstrap.py		Project2_Kcross_Bootstrap.py
README.md		README.md
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Model Selection using K-Fold Cross-Validation and Bootstrap

Project Overview

How to Run the Code

Prerequisites

Execution Steps

Outputs

Answers to Key Questions

1. Do your cross-validation and bootstrapping model selectors agree with a simpler model selector like AIC in simple cases (like linear regression)?

2. In what cases might the methods you've written fail or give incorrect or undesirable results?

3. What could you implement given more time to mitigate these cases or help users of your methods?

4. What parameters have you exposed to your users in order to use your model selectors?

Expected Outputs

Summary

Team members

Krishna Manideep Malladi (A20550891)

Manvitha Byrineni (A20550783)

Udaya sree Vankdavath (A20552992)

About

Uh oh!

Releases

Packages

Languages

Udaya7893/Project2

Folders and files

Latest commit

History

Repository files navigation

Model Selection using K-Fold Cross-Validation and Bootstrap

Project Overview

How to Run the Code

Prerequisites

Execution Steps

Outputs

Answers to Key Questions

1. Do your cross-validation and bootstrapping model selectors agree with a simpler model selector like AIC in simple cases (like linear regression)?

2. In what cases might the methods you've written fail or give incorrect or undesirable results?

3. What could you implement given more time to mitigate these cases or help users of your methods?

4. What parameters have you exposed to your users in order to use your model selectors?

Expected Outputs

Summary

Team members

Krishna Manideep Malladi (A20550891)

Manvitha Byrineni (A20550783)

Udaya sree Vankdavath (A20552992)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages