Fast, weighted ROC curves
tests | |
coverage |
Receiver Operating Characteristic (ROC) curve analysis is one way to evaluate an algorithm for binary classification. R packages ROCR/pROC/AUC/PerfMeas/PRROC implement ROC curve computation. However, if the observations have weights (non-uniform loss, see Definition vignette) then these packages can not be used. The WeightedROC package implements ROC and Area Under the Curve (AUC) computation for weighted binary classification problems.
From CRAN:
install.packages("WeightedROC")
From GitHub:
if(!require(devtools))install.packages("devtools")
devtools::install_github("tdhock/WeightedROC")
library(WeightedROC)
example(WeightedROC)
example(WeightedAUC)
Package | version | date | lines of R code | weights | tests | cumsum |
---|---|---|---|---|---|---|
pROC | 1.7.9 | 2014-06-12 | 5666 | no | no | yes |
ROCR | 1.0-5 | 2013-05-16 | 1650 | no | no | yes |
PerfMeas | 1.2.1 | 2014-09-07 | 684 | no | no | no |
PRROC | 1.3 | 2017-04-21 | 610 | yes | yes | yes |
AUC | 0.3.0 | 2013-09-30 | 354 | no | no | no |
WeightedROC | 2017.08.12 | 2017-08-12 | 288 | yes | yes | yes |
glmnet::auc | 1.9-5 | 2013-08-01 | 22 | yes | no | yes |
DescTools::AUC | TODO | |||||
bayestestR::area_under_curve | TODO |
- weights shows which packages allow weights (non-uniform loss for each observation).
- tests shows which R packages implement unit tests to check that the ROC/AUC is computed correctly.
- lines of R code shows how many lines of code were used in the pkg/R/* files. Note that WeightedROC has the simplest implementation other than glmnet::auc.
- cumsum shows whether or not the cumsum function is used to compute the ROC curve. Using the cumsum function is simple to code and fast – see the Speed vignette.
For “soft” real-valued labels (not “hard” labels
To compute other evaluation metrics (e.g. lift) use the ROCR package. WeightedROC does not implement evaluation metrics other than ROC/AUC.
To compute the partial AUC and compare curves using statistical tests use the pROC package. WeightedROC does not implement these features.
The glmnet package includes an auc
function for computing AUC, but
does not include a function for computing the ROC curve. So it
actually can compute the AUC faster than WeightedROC, for both equal
or unequal weights. WARNINGS:
- make sure the class labels are either 0 or 1 (not factors, not -1 or 1 – these will give the incorrect result, with no warning/error).
- if the data set has tied scores AND weights, glmnet::auc computes
something different, see
example(WeightedAUC)
.