Adaptive Resampling

Models can benefit significantly from tuning but the optimal values are rarely known beforehand. train can be used to define a grid of possible points and resampling can be used to generate good estimates of performance for each tuning parameter combination. However, in the nominal resampling process, all the tuning parameter combinations are computed for all the resamples before a choice is made about which parameters are good and which are poor.

caret contains the ability to adaptively resample the tuning parameter grid in a way that concentrates on values that are the in the neighborhood of the optimal settings. See this paper for the details.

To illustrate, we will use the chemical mutagenicity data from Kazius et al (2005):

library(QSARdata)
data(Mutagen)
set.seed(4567)
inTraining <- createDataPartition(Mutagen_Outcome, p = 0.75, list = FALSE)
training_x <- Mutagen_Dragon[inTraining, ]
training_y <- Mutagen_Outcome[inTraining]
testing_x <- Mutagen_Dragon[-inTraining, ]
testing_y <- Mutagen_Outcome[-inTraining]
## Get rid of predictors that are very sparse
nzv <- nearZeroVar(training_x)
training_x <- training_x[, -nzv]
testing_x <- testing_x[, -nzv]

Previously, we used this code to tune the model:

fitControl <- trainControl(method = "repeatedcv",
                           number = 10,
                           repeats = 5,
                           ## Estimate class probabilities
                           classProbs = TRUE,
                           ## Evaluate performance using 
                           ## the following function
                           summaryFunction = twoClassSummary)
set.seed(825)
svmFit <- train(x = training_x,
                y = training_y,
                method = "svmRadial",
                trControl = fitControl,
                preProc = c("center", "scale"),
                tuneLength = 8,
                metric = "ROC")

Using this method, the optimal tuning parameters were a RBF kernel parameter of 8 × 10^-4 and a cost value of 8. To use the adaptive procedure, the trainControl option needs some additional arguments:

min is the minimum number of resamples that will be used for each tuning parameter. The default value is 5 and increasing it will decrease the speed-up generated by adaptive resampling but should also increase the likelihood of finding a good model.
alpha is a confidence level that is used to remove parameter settings. To date, this value has not shown much of an effect.
method is either "gls" for a linear model or "BT" for a Bradley-Terry model. The latter may be more useful when you expect the model to do very well (e.g. an area under the ROC curve near 1) or when there are a large number of tuning parameter settings.
complete is a logical value that specifies whether train should generate the full resampling set if it finds an optimal solution before the end of resampling. If you want to know the optimal parameter settings and don't care much for the estimated performance value, a value of FALSE would be appropriate here.

The new code is:

fitControl2 <- trainControl(method = "adaptive_cv",
                            number = 10,
                            repeats = 5,
                            ## Estimate class probabilities
                            classProbs = TRUE,
                            ## Evaluate performance using 
                            ## the following function
                            summaryFunction = twoClassSummary,
                            ## Adaptive resampling information:
                            adaptive = list(min = 10,
                                            alpha = 0.05,
                                            method = "gls",
                                            complete = TRUE))
set.seed(825)
svmFit2 <- train(x = training_x,
                 y = training_y,
                 method = "svmRadial",
                 trControl = fitControl2,
                 preProc = c("center", "scale"),
                 tuneLength = 8,
                 metric = "ROC")

These computations were 2.8-fold faster than the original analysis. Here, the optimal tuning parameters were a RBF kernel parameter of 8 × 10^-4 and a cost value of 8. These match the previous settings.

Remember that this methodology is experimental, so please send any questions or bug reports to the package maintainer.

Adaptive Resampling

Links

Topics