Practical Machine Learning

Course 8 in the Data Science Specialization by John Hopkins University

The Data Scientist's Toolbox
R Programming
Getting and Cleaning Data
Exploratory Data Analysis
Reproducible Research
Statistical Inference
Regression Models
Practical Machine Learning
Developing Data Products
Data Science Capstone

How I feel about building "black box" machine learning models:

Overall score: 5/10

Pros:

Useful for learning about concepts & models
Course project at the end provides a good opportunity for hands-on practice.

Cons:

Slides were insufficiently prepared, some of models used for the quiz questions are not taught in the slides (e.g. SVM, bats() forecast)
No hyperlinks in the PDF slides! Students cannot access the additional information the lecturer provides.

Week 1

In and out of sample errors
Cross Validation

Week 2

Caret package
Principal Component Analysis (PCA)
Regression with multiple features
Plotting

Week 3

Decision trees and random forest
Bagging
Boosting

Week 4

Regular Expression
Unsupervised Prediction
Combining predictors
Forecasting

Course Project

Models used for the coursework project:

Gradient Boosted Machine(gbm)
Support Vector Machine (svmradial)
Random Forest

Among these 3 models, random forest has the best performance (Accuracy > 99%), so we used rf to do the prediction.

Performance issue with random forest model

One of the biggest challenges that I struggled with during the coursework is the amount of processing time the random forest model takes. At first, each run of the rf model would take 1-2 hours, which was not helpful since I just wanted to see if using different features would result in different accuracies!

Afterwards I looked at the course's discussion forum and found this helpful post: Improve random forest performance. This suggests a way to increase rf model performance through a combination of Cross-Validation and Parallel Processing. Applying this results in a ~70% decrease in computing time (took my mac about 25 mins to knit the html file from rmd file).

Week 2 Notes

Preprossing data with caret

library(caret)
library(kernlab)
data(spam)
inTrain<-createDataPartition(y=spam$type,p=0.75,list=FALSE)
training<-spam[inTrain,]
testing<-spam[-inTrain,]
hist(training$capitalAve,main="",xlab="avg. capital run length")

The histogram shows that the data are heavily skewed to the left.

Standardizing the variables (so that they have `mean = 0` and `sd=1`)

trainCapAve<-training$capitalAve
trainCapAveS<-(trainCapAve-mean(trainCapAve))/sd(trainCapAve)
mean(trainCapAveS)
sd(trainCapAveS)

Standardizing the test set, using mean and sd of the training set. This means that the standardized test cap will not be exactly the same as that of the training set, but they should be similar.

testCapAve<-testing$capitalAve
testCapAveS<-(testCapAve-mean(trainCapAve))/sd(trainCapAve)
mean(testCapAveS)

Use preprocess() function to do the standardization on the training set. The result is the same as using the above functions

preObj<-preProcess(training[,-58],method=c("center","scale"))
trainCapAveS<-predict(preObj,training[,-58])$capitalAve
mean(trainCapAveS)
sd(trainCapAveS)

Use `preProcess()` to do the same on the testing dataset. Note that `preObj` (which was created based on the training set) is also used to predict on the testing set.

Note that mean() is not equal to 0 on the testing set, and sd is not equal to 1.

testCapAveS<-predict(preObj,testing[,-58])$capitalAve
mean(testCapAveS)
sd(testCapAveS)

Use `preProcess()` directly when building a model

set.seed(1)
model<-train(type ~.,data=training,preProcess=c("center","scale"),method="glm")
model

Standardising - Box-Cox Transforms

This transforms the data into normal shape - i.e. bell shape

preObj<-preProcess(training[,-58],method=c("BoxCox"))
trainCapAveS<-predict(preObj,training[,-58])$capitalAve
par(mfrow=c(1,2))
hist(trainCapAveS)
qqnorm(trainCapAveS)

Standardization: Imputing data where it is NA using `knnImpute`

knnImpute uses the average of the k-nearest neighbours to impute the data where it's not available.

set.seed(1)

# Make some value NAs
training$capAve<-training$capitalAve
selectNA<-rbinom(dim(training)[1],size=1,prob=0.05)==1
training$capAve[selectNA]<-NA

# Impute data when it's NA, and standardize
preObj<-preProcess(training[,-58],method="knnImpute")
capAve<-predict(preObj,training[,-58])$capAve

# Standardize true values
capAveTruth<-training$capitalAve
capAveTruth<-(capAveTruth-mean(capAveTruth))/sd(capAveTruth)

Look at the difference at the imputed value (capAve) and the true value (capAveTruth), using quantile() function.

If the values are all relatively small, then it shows that imputing data works (i.e. doesn't change the dataset too much).

quantile(capAve-capAveTruth)

Some notes on preprocessing data

training and testing must be processed in the same way (i.e. use the same preObj in predict() function)

Covariate/Predictor/Feature Creation

Step 1: raw data -> features (e.g. free text -> data frame) Google "Feature extraction for [data type]" Examples:
- Text files: frequency of words, frequency of phrases, frequency of capital letters
- Images: Edges, corners, ridges
- Webpages: # and type of images, position of elements, colors, videos (e.g. A/B testing)
- People: Height, weight, hair color, gender etc.
Step 2: features -> new, useful features
- more useful for some models (e.g. regression, SVM) than others( e.g. decision trees)
- should be done only on the training set
- new features should be added to data frames
An example of feature creation {r} library(ISLR) library(caret) data(Wage) inTrain<-createDataPartition(y=Wage$wage,p=0.7,list=FALSE) training<-Wage[inTrain,] testing<-Wage[-inTrain,]

Convert factor variables to dummy variables

The jobclass column is chacracters, so we can convert it to dummy variable with dummyVars function
```
dummies<-dummyVars(wage ~ jobclass,data=training)
head(predict(dummies,newdata=training))
```
Remove features which is the same throughout the dataframe, using nearZeroVar

If nsv (`nearZeroVar`) returns TRUE, then this feature is not important and thus can be removed. 

```{r}
nsv<-nearZeroVar(training,saveMetrics = TRUE)
nsv
```
* Spline basis
`df=3` says that we want a 3rd-degree polynomial on this variable `training$age`.
First column means `age`
Second column means `age^2`
Third column means `age^3`
```{r}
library(splines)
bsBasis<-bs(training$age,df=3)
bsBasis
```
#### Fitting curves with splines
```{r}
lm1<-lm(wage~bsBasis,data=training)
plot(training$age,training$wage,pch=19,cex=0.5)
points(training$age,predict(lm1,newdata=training),col="red",pch=19,cex=0.5)
```
#### splines on the test set.
Note that we are using the same `bsBasis` as is created in the training dataset
```{r}
predict(bsBasis,age=testing$age)
```

PCA (Principal Components Analysis), mostly useful for linear-type models

Find features which are correlated

which() returns the list of features with correlation > 0.8

library(caret)
library(kernlab)
data(spam)
set.seed(1)
inTrain<-createDataPartition(y=spam$type,p=0.75,list=FALSE)
training<-spam[inTrain,]
testing<-spam[-inTrain,]

M<-abs(cor(training[,-58]))
diag(M)<-0
which(M>0.8,arr.ind=T)

Take a look at the correlated features:

  names(spam)[c(34,32,40)]
  plot(spam[,34],spam[,32])

Apply PCA in R: prcomp()

smallSpam<-spam[,c(34,32)]
prComp<-prcomp(smallSpam)
plot(prComp$x[,1],prComp$x[,2])
prComp$rotation

PCA on spam data

typeColor<-((spam$type=="spam")*1+1)
prComp<-prcomp(log10(spam[,-58]+1))
plot(prComp$x[,1],prComp$x[,2],col=typeColor,xlab="PC1",ylab="PC2")

PCA with caret, preProcess()

preProc<-preProcess(log10(spam[,-58]+1),method="pca",pcaComp = 2)
spamPC<-predict(preProc,log10(spam[,-58]+1))
plot(spamPC[,1],spamPC[,2],col=typeColor)

Preprocessing with PCA to create model based on the training set

preProc<-preProcess(log10(training[,-58]+1),method="pca",pcaComp=2)
trainPC<-predict(preProc,log10(training[,-58]+1))
modelFit <- train(x = trainPC, y = training$type,method="glm")

Preprocessing with PCA to use on the testing set

Note that we should use the same PCA procedure (preProc) when using predict() on the testing set

testPC<-predict(preProc,log10(testing[,-58]+1))
confusionMatrix(testing$type,predict(modelFit,testPC))

Accuracy is > 0.9!

Alternative: preProcess with PCA during the training process (instead of doing PCA first, then do the training)

modelFit <- train(x = trainPC, y = training$type,method="glm",preProcess="pca")
confusionMatrix(testing$type,predict(modelFit,testPC))

Predicting with Regression

Use the fainthful eruption data in caret

library(caret)
data(faithful)
set.seed(333)
inTrain<-createDataPartition(y=faithful$waiting,p=0.5,list=FALSE)
trainFaith<-faithful[inTrain,]
testFaith<-faithful[-inTrain,]
head(trainFaith)

Plot eruption duration vs. waiting time.

You can see that there's a roughly linear relationship between the two variables.

plot(trainFaith$waiting,trainFaith$eruptions,pch=19,col="blue",xlab="waiting",ylab="eruption duration")

Fit a linear regression model

lm1<-lm(eruptions~waiting,data=trainFaith)
summary(lm1)

Plot the model fit

plot(trainFaith$waiting,trainFaith$eruptions,pch=19,col="blue",xlab="waiting",ylab="eruption duration")
lines(trainFaith$waiting,lm1$fitted,lwd=3)

Predicting a new value with the linear regression model

When waiting time = 80

newdata<-data.frame(waiting=80)
predict(lm1,newdata)

Plot predictions - training vs testing set

par(mfrow=c(1,2))
# training
plot(trainFaith$waiting,trainFaith$eruptions,pch=19,col="blue",main="training",xlab="waiting",ylab="eruption duration")
lines(trainFaith$waiting,predict(lm1),lwd=3)
# testing
plot(testFaith$waiting,testFaith$eruptions,pch=19,col="blue",main="testing",xlab="waiting",ylab="eruption duration")
lines(testFaith$waiting,predict(lm1,newdata=testFaith),lwd=3)

Get training & testing errors

# RMSE on training
sqrt(sum((lm1$fitted-trainFaith$eruptions)^2))
# RMSE on testing
sqrt(sum((predict(lm1,newdata=testFaith)-testFaith$eruptions)^2))

Prediction intervals

pred1<-predict(lm1,newdata=testFaith,interval="prediction")
ord<-order(testFaith$waiting)
plot(testFaith$waiting,testFaith$eruptions,pch=19,col="blue")
matlines(testFaith$waiting[ord],pred1[ord,],type="l",col=c(1,2,2),lty=c(1,1,1),lwd=3)

Same process with caret

modFit<-train(eruptions~waiting,data=trainFaith,method="lm")
summary(modFit$finalModel)

Predicting with regression, multiple covariates

Use the wages dataset in ISLR package

library(ISLR)
library(ggplot2)
library(caret)
data(Wage)
Wage<-subset(Wage,select=-c(logwage))
summary(Wage)

inTrain<-createDataPartition(y=Wage$wage,p=0.7,list=FALSE)
training<-Wage[inTrain,]
testing<-Wage[-inTrain,]
dim(training)
dim(testing)

Feature plot on the wages dataset

featurePlot(x=training[,c("age","education","jobclass")],y=training$wage,plot="pairs")

Plot age vs. wage

qplot(age,wage,data=training)

Plot age vs wage, color by jobclass

We can see that the outliners are mostly for people in informational jobclass

qplot(age,wage,color=jobclass,data=training)

Plot age vs. wage, color by education

You can see that the outliners are mostly advance degree education

qplot(age,wage,color=education,data=training)

Fit a linear model

modFit<-train(wage~age+jobclass+education,method="lm",data=training)
finMod<-modFit$finalModel
print(modFit)

plot(finMod,1,pch=19,cex=0.5,col="#00000010")

Color by variables not used in the model

qplot(finMod$fitted,finMod$residuals,color=race,data=training)

Plot by index (i.e. which rows in the dataframe they are at)

plot(finMod$residuals,pch=19)

Predicted vs. truth in test set

pred<-predict(modFit,testing)
qplot(wage,pred,color=year,data=testing)

If you want to use all covariates (variables)

modFitAll<-train(wage~.,data=training,method="lm")
pred<-predict(modFitAll,newdata=testing)
qplot(wage,pred,data=testing)

Week 3

Regression with Trees

Pros: better interpretability, better performance for non-linear settings

Stop splitting when the leaves are pure

Measures of impurity

Misclassification Error:

0 = perfect purity
0.5 = no purity

Gini index:

0 = perfect purity
0.5 = no purity

Deviance/information gain:

0 = perfect purity
1 = no purity

Example: Iris Data

data(iris)
library(ggplot2)
library(caret)
names(iris)
table(iris$Species)

inTrain<-createDataPartition(y=iris$Species,p=0.7,list=FALSE)
training<-iris[inTrain,]
testing<-iris[-inTrain,]

# plot the Iris petal widths/species
qplot(Petal.Width,Sepal.Width,col=Species, data=training)

Train the model

#rpart is R's package for doing regressions
modFit<-train(Species~.,method="rpart",data=training)
print(modFit$finalModel)

# plot tree
plot(modFit$finalModel,uniform=TRUE,main="Classification Tree")
text(modFit$finalModel,use.n=TRUE,all=TRUE,cex=0.8)

Use the rattle package to make the trees look better

library(rattle)
fancyRpartPlot(modFit$finalModel)

Predict new values

predict(modFit,newdata=testing)

Notes: Classification trees are non-linear models

They use interaction between variables
Tree can also be used for regression problems (i.e. continuous outcome)

Bagging (Bootstrap aggregating)

What is bagging?

Resample cases and recalculate predictions
Average or majority vote
It produces similar bias, but reduces variance.
Bagging is more useful for non-linear functions

Example with the Ozone data from ElemStatLearn package

library(ElemStatLearn)
data(ozone,package="ElemStatLearn")
ozone<-ozone[order(ozone$ozone),]

We'll predict temperature based on zone

Bagged loess

ll<-matrix(NA,nrow=10,ncol=155)

#we'll resample the data 10 times (loop 10 times)
for(i in 1:10){
        # each time we'll resample with replacement
        ss<-sample(1:dim(ozone)[1],replace=T)
        # ozone0 is the resampled subset. We'll also reorder the resampled subset with ozone
        ozone0<-ozone[ss,];ozone0<-ozone0[order(ozone0$ozone),]
        # we'll fit a loess line through the resampled subset. span determins how smooth this line would be 
        loess0<-loess(temperature~ozone,data=ozone0,span=0.2)
        # for each of the loess curve, we'll predict the outcome for the 155 rows in the original dataset
        ll[i,]<-predict(loess0,newdata=data.frame(ozone=1:155))
}

Bagged loess

The red line is the bagged (average) line across the 10 resamples

plot(ozone$ozone,ozone$temperature,pch=19,cex=0.5)
for(i in 1:10){lines(1:155,ll[i,],col="grey",lwd=2)}
lines(1:155,apply(ll,2,mean),col="red",lwd=2)

Notes:

Bagging is most useful for non-linear models
Often used with trees & random forests

Random Forests

What is random forests?

Bootstrap samples
At each split, bootstrap variables
Grow multiple trees and vote

Pros:

Accuracy

Cons:

Speed
Interpretability
Overfitting

Random Forest on Iris data

data(iris)
library(ggplot2)
library(caret)
inTrain<-createDataPartition(y=iris$Species,p=0.7,list=FALSE)
training<-iris[inTrain,]
testing<-iris[-inTrain,]

# build random forest model using caret
modFit<-train(Species~.,model="rf",prox=TRUE,data=training)

Getting a single tree

library(randomForest)
getTree(modFit$finalModel,k=2)

Class "centers"

irisP<-classCenter(training[,c(3,4)],training$Species,modFit$finalModel$prox)
irisP<-as.data.frame(irisP)
irisP$Species<-rownames(irisP)
p<-qplot(Petal.Width,Petal.Length,col=Species,data=training)

# This line plots the three centers
p+geom_point(aes(x=Petal.Width,y=Petal.Length,col=Species),size=5,shape=4,data=irisP)

Predicting new values

pred<-predict(modFit,testing)
testing$preRight<-pred==testing$Species
table(pred,testing$Species)
qplot(Petal.Width,Petal.Length,col=preRight,data=testing,main="newdata Predictions")

Boosting

Boosting and random forest are two of the most accurate out of the box classifiers for prediction analysis.

What is boosting?

Take lots of (possibly) weak predictors
Weight them and add them up
Get a strong predictor

Wage example for boosting

library(ISLR)
data(Wage)
library(ggplot2)
library(caret)

Wage<-subset(Wage,select=-c(logwage))
set.seed(1)
inTrain<-createDataPartition(y=Wage$wage,p=0.7,list=FALSE)
training<-Wage[inTrain,]
testing<-Wage[-inTrain,]

Fit the boosting model

gbm is boosting for tree models.

modFit<-train(wage~.,data=training,method="gbm",verbose=FALSE)
qplot(predict(modFit,testing),wage,data=testing)

Model based prediction

What is model based prediction?

Assume the data follow a probabilistic model
Use Bayes' theorem to identify optimal classifiers

Pros

Take advantage of data structures
Computationally convenient
Reasonably accurate

Cons

Make additional assumptions about data
When model is incorrect, it may reduce accuracy

Naive Bayes assumes that all features are independent of each other - useful for binary or categorical data, e.g. text classification

Model based prediction with Iris data

data(iris)
library(ggplot2)
library(caret)

set.seed(2)
inTrain<-createDataPartition(y=iris$Species,p=0.7,list=FALSE)
training<-iris[inTrain,]
testing<-iris[-inTrain,]

Build predictions

lda = linear discriminant analysis
nb = Naive Bayes

modlda<-train(Species~.,data=training,method="lda")
modnb<-train(Species~.,data=training,method="nb")
plda<-predict(modlda,testing)
pnb<-predict(modnb,testing)
table(plda,pnb)

equalPredictions =(plda==pnb)
qplot(Petal.Width,Sepal.Width,col=equalPredictions,data=testing)

Week 4

Regularized regression

What is regularised regression?

Fit a regression model
Penalize (or shrink) large coefficients

Pros

Help with bias/variance tradeoff
Help with model selection

Cons

Computationally demanding
Lower performance than random forests and boosting

Prediction error = irreducible error + bias^2 + variance

Tuning parameter lambda

lambda controls the size of the coefficients, and the amount of regularization
As lambda approaches 0, we obtain the least squared solution (i.e. what we get from the standard linear model)
As lambda approaches infinity, the coefficients go towards 0

In caret methods for penalised regularization models are:

ridge
lasso
relaxo

Combining predictors

Combining classifiers generally improves accuracy but reduces interpretability.
How? Use majority vote

similar classifiers: bagging, boosting, random forest
different classifiers: model stacking, model ensembling

example with wage data

library(ISLR)
data(Wage)
library(ggplot2)
library(caret)
Wage<-subset(Wage,select=-c(logwage))

# Create a building data set (which is split into training and testing data) and validation set
inBuild<-createDataPartition(y=Wage$wage,p=0.7,list=FALSE)
validation<-Wage[-inBuild,]
buildData<-Wage[inBuild,]

inTrain<-createDataPartition(y=buildData$wage,p=0.7,list=FALSE)
training<-buildData[inTrain,]
testing<-buildData[-inTrain,]

dim(training)

Build two different models: linear regression + random forest

mod1<-train(wage~.,method="glm",data=training)
mod2<-train(wage~.,method="rf",data=training,trControl=trainControl(method="cv"),number=3)

Plot the two different models on the same chart

pred1<-predict(mod1,testing)
pred2<-predict(mod2,testing)
qplot(pred1,pred2,colour=wage,data= testing)

Fit a model that combines two different predictors

First, create a new dataframe that is the prediction of the original two models. Then train a new model based on the new dataframe.

predDF<-data.frame(pred1,pred2,wage=testing$wage)
combModFit<-train(wage~.,method="gam",data=predDF)
combPred<-predict(combModFit,predDF)

Testing erros between the two original predictors and the combined predictor

# linear regression
sqrt(sum((pred1-testing$wage)^2))

# random forest
sqrt(sum((pred2-testing$wage)^2))

# combined (linear regression + random forest)
sqrt(sum((combPred-testing$wage)^2))

We can see that the combined predictor has the lowest testing error rate

Predict on validation set

pred1V<-predict(mod1,validation)
pred2V<-predict(mod2,validation)
predVDF<-data.frame(pred1=pred1V,pred2=pred2V)
combPredV<-predict(combModFit,predVDF)

Error rate on validation set

# linear regression
sqrt(sum((pred1V-validation$wage)^2))
# random forest
sqrt(sum((pred2V-validation$wage)^2))
# combined 
sqrt(sum((combPredV-validation$wage)^2))

Notes on combining predictors

Typical model for binary/multiclass data

Build an odd number of models
Predict with each model
Predict the class by majority vote

Forecasting on time series & spatial data

example: predict the price of Google stock

library(quantmod)
from.dat<-as.Date("01/01/08",format="%m/%d/%y")
to.dat<-as.Date("12/31/13",format="%m/%d/%y")
getSymbols("GOOG",src="yahoo",from=from.dat,to=to.dat)

Summarize monthly opening price for Google, and store it as time series

mGoog<-to.monthly(GOOG)
googOpen<-Op(mGoog)
ts1<-ts(googOpen,frequency = 12)
plot(ts1,xlab="Year+1",ylab="GOOG")

Decompose a time series into parts

trend, seasonal and random

plot(decompose(ts1),xlab="Years+1")

Build training and test sets for the prediction

ts1Train<-window(ts1,start=1,end=5)
ts1Test<-window(ts1,start=5,end=(7-0.01))
ts1Train

Simple moving average

plot(ts1Train)
lines(ma(ts1Train,order=3),col="red")

Exponential smoothing (weights nearby time points more than points that are far away)

We can get a range of the possible

library(forecast)
ets1<-ets(ts1Train,model="MMM")
fcast<-forecast(ets1)
plot(fcast)
lines(ts1Test,col="red")

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Course_Project		Course_Project
Quiz		Quiz
Week1		Week1
Week2		Week2
Week3		Week3
Week4		Week4
README.md		README.md

yanniey/Coursera_Practical_Machine_Learning

Folders and files

Latest commit

History

Repository files navigation

Practical Machine Learning

Course 8 in the Data Science Specialization by John Hopkins University

Overall score: 5/10

Pros:

Cons:

Week 1

Week 2

Week 3

Week 4

Course Project

Week 2 Notes

Preprossing data with caret

Standardizing the variables (so that they have mean = 0 and sd=1)

Standardizing the test set, using mean and sd of the training set. This means that the standardized test cap will not be exactly the same as that of the training set, but they should be similar.

Use preprocess() function to do the standardization on the training set. The result is the same as using the above functions

Use preProcess() to do the same on the testing dataset. Note that preObj (which was created based on the training set) is also used to predict on the testing set.

Use preProcess() directly when building a model

Standardising - Box-Cox Transforms

Standardization: Imputing data where it is NA using knnImpute

Some notes on preprocessing data

Covariate/Predictor/Feature Creation

PCA (Principal Components Analysis), mostly useful for linear-type models

PCA on spam data

PCA with caret, preProcess()

Preprocessing with PCA to create model based on the training set

Preprocessing with PCA to use on the testing set

Alternative: preProcess with PCA during the training process (instead of doing PCA first, then do the training)

Predicting with Regression

Plot eruption duration vs. waiting time.

Fit a linear regression model

Plot the model fit

Predicting a new value with the linear regression model

Plot predictions - training vs testing set

Get training & testing errors

Prediction intervals

Same process with caret

Predicting with regression, multiple covariates

Feature plot on the wages dataset

Plot age vs. wage

Plot age vs wage, color by jobclass

Plot age vs. wage, color by education

Fit a linear model

Color by variables not used in the model

Plot by index (i.e. which rows in the dataframe they are at)

Predicted vs. truth in test set

If you want to use all covariates (variables)

Week 3

Regression with Trees

Measures of impurity

Bagging (Bootstrap aggregating)

Bagged loess

Bagged loess

Random Forests

Getting a single tree

Class "centers"

Predicting new values

Boosting

What is boosting?

Wage example for boosting

Fit the boosting model

Model based prediction

What is model based prediction?

Pros

Cons

Build predictions

Week 4

Regularized regression

What is regularised regression?

Combining predictors

Fit a model that combines two different predictors

Testing erros between the two original predictors and the combined predictor

Predict on validation set

Error rate on validation set

Notes on combining predictors

Forecasting on time series & spatial data

Standardizing the variables (so that they have `mean = 0` and `sd=1`)

Use `preProcess()` to do the same on the testing dataset. Note that `preObj` (which was created based on the training set) is also used to predict on the testing set.

Use `preProcess()` directly when building a model

Standardization: Imputing data where it is NA using `knnImpute`

Packages