The caret package (short for Classification And REgression Training) is a set of functions that attempt to streamline the process for creating predictive models. The package contains tools for:
-
-
• data splitting
-
• pre-processing
-
• model tuning using resampling
-
• variable importance estimation
-
-
as well as other functionality.
There are many different modeling functions in R. Some have different syntax for model training and/or prediction. The package started off as a way to provide a uniform interface the functions themselves, as well as a way to standardize common tasks (such parameter tuning and variable importance).
The package has three vignettes that provide the details:
There is also a paper on caret in the Journal of Statistical Software. The example data can be obtained here (the predictors) and here (the classes).
The current release version can be found on CRAN.
You can always email me with questions, comments or suggestions.
My current list of things to put in caret are (in no particular order):
-
-
• adding new models as they are released
-
• bootstrap 632 (and 632+) estimators
-
• resampled ROC AUC estimates (which I do not think are very helpful, but we won’t know until we see them)
-
• variable importance metrics for some other models
-