caret is an R package that aids in data processing needed for machine learning problems. It stands for classification and regression training. When building models for a real dataset, there are some tasks other than the actual learning algorithm that need to be performed, such as cleaning the data, dealing with incomplete observations, validating our model on a test set, and compare different models.
caret helps in these scenarios, independent of the actual learning algorithms used.
Pre-processing in caret is done through the
preProcess() function. Given a matrix or data frame type object
preProcess() applies transformations on the training data which can then be applied to testing data.
The heart of the
preProcess() function is the
method argument. Method operations are applied in this order:
- Zero-variance filter
- Near-zero variance filter
- Box-Cox/Yeo-Johnson/exponential transformation
- Spatial Sign
Below, we take the mtcars data set and perform centering, scaling, and a spatial sign transform.
auto_index <- createDataPartition(mtcars$mpg, p = .8, list = FALSE, times = 1) mt_train <- mtcars[auto_index,] mt_test <- mtcars[-auto_index,] process_mtcars <- preProcess(mt_train, method = c("center","scale","spatialSign")) mtcars_train_transf <- predict(process_mtcars, mt_train) mtcars_test_tranf <- predict(process_mtcars,mt_test)