The caret Package

This page shows a network diagram of all the models that can be accessed by caret's train function. See the Revolutions blog for details about how this visualization was made. In summary, the package annotates each model by a set of tags (e.g. "Bagging", "L1 Regularization" etc.). Using this information we can cluster models that are similar to each other.

Green circles are models only used for regression, blue is classification only and orange is “dual use”. Hover over a circle to get the model name and the model code used by the caret package and refreshing the screen will re-configure the layout.

The data used to create this graph can be found here. You can also use it along with maximum dissimilarity sampling to pick out a diverse set of models. Suppose you would like to use a SVM model with a radial basis function on some regression data. Based on these tags, what other four models would constitute the most diverse set?

tag <- read.csv("tag_data.csv", row.names = 1)
tag <- as.matrix(tag)
## Select only models for regression
regModels <- tag[tag[,"Regression"] == 1,]
all <- 1:nrow(regModels)
## Seed the analysis with the SVM model
start <- grep("(svmRadial)", rownames(regModels), fixed = TRUE)
pool <- all[all != start]
## Select 4 model models by maximizing the Jaccard
## dissimilarity between sets of models
nextMods <- maxDissim(regModels[start,,drop = FALSE],
                      regModels[pool, ],
                      method = "Jaccard",
                      n = 4)
rownames(regModels)[c(start, nextMods)]

[1] "Support Vector Machines with Radial Basis Function Kernel (svmRadial)"
[2] "Cubist (cubist)"                                                      
[3] "Bayesian Regularized Neural Networks (brnn)"                          
[4] "Ridge Regression with Variable Selection (foba)"                      
[5] "Projection Pursuit Regression (ppr)"

Models Clustered by Tag Similarity

Links

Topics