The caret package (short for Classification And REgression Training) is a set of functions that attempt to streamline the process for creating predictive models. The package contains tools for:


    1. data splitting

    2. pre-processing

    3. model tuning using resampling

    4. variable importance estimation


as well as other functionality.


There are many different modeling functions in R. Some have different syntax for model training and/or prediction. The package started off as a way to provide a uniform interface the functions themselves, as well as a way to standardize common tasks (such parameter tuning and variable importance).


The package has three vignettes that provide the details:


    1. example data, pre-processing functions, visualizations and other functions

    2. model tuning, prediction and performance functions

    3. variable importance functions

    4. feature selection


There is also a paper on caret in the Journal of Statistical Software. The example data can be obtained here (the predictors) and here (the classes).


The current release version can be found on CRAN.


You can always email me with questions, comments or suggestions.


My current list of things to put in caret are (in no particular order):


    1. adding new models as they are released

    2. bootstrap 632 (and 632+) estimators

    3. resampled ROC AUC estimates (which I do not think are very helpful, but we won’t know until we see them)

    4. variable importance metrics for some other models

The caret Package

Parallel Execution of caret functions


Parameter tuning in caret is done via resampling; every candidate model is evaluated many times using the bootstrap or cross-validation. Previously, there were companion packages to caret (caretNWS and caretLSF) that worked for specific technologies. As of version 4.02 of caret, these packages are no longer needed since caret can use any parallel processing technology. See the examples in train (by typing

?train) that demonstrate this using MPI and NWS.



What's New in the Latest Version


There is a simple text file in the package directory that shows the changes on a version-by-version basis. Recently, the majority of the work has been around adding routines for feature selection, such as recursive feature elimination.