- Keep the continuous data with the 10 highest explained variances. Drop all other continuous data (Variances obtained by PCA testing)
- One-hot encode relevant categorical data
- Handle NaNs a. For continuous data, impute the value (replace with the mean of the column) b. For categorical data replace with 0
- Reject outliers more than 1.5 times the interquartile range outside of the 1st and 3rd quartiles respectively
tylerhedge/Datathon23
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|