Title: | Performance Assessment of Binary Classifier with Visualization |
---|---|
Description: | Sensitivity (or recall or true positive rate), false positive rate, specificity, precision (or positive predictive value), negative predictive value, misclassification rate, accuracy, F-score- these are popular metrics for assessing performance of binary classifier for certain threshold. These metrics are calculated at certain threshold values. Receiver operating characteristic (ROC) curve is a common tool for assessing overall diagnostic ability of the binary classifier. Unlike depending on a certain threshold, area under ROC curve (also known as AUC), is a summary statistic about how well a binary classifier performs overall for the classification task. ROCit package provides flexibility to easily evaluate threshold-bound metrics. Also, ROC curve, along with AUC, can be obtained using different methods, such as empirical, binormal and non-parametric. ROCit encompasses a wide variety of methods for constructing confidence interval of ROC curve and AUC. ROCit also features the option of constructing empirical gains table, which is a handy tool for direct marketing. The package offers options for commonly used visualization, such as, ROC curve, KS plot, lift plot. Along with in-built default graphics setting, there are rooms for manual tweak by providing the necessary values as function arguments. ROCit is a powerful tool offering a range of things, yet it is very easy to use. |
Authors: | Md Riaz Ahmed Khan [aut, cre], Thomas Brandenburger [aut] |
Maintainer: | Md Riaz Ahmed Khan <[email protected]> |
License: | GPL-3 |
Version: | 2.1.1 |
Built: | 2025-03-06 03:17:48 UTC |
Source: | https://github.com/riazakhan94/rocit |
Function cartesian_2D
takes two vectors as input and
returns the two dimensional cartesian product.
cartesian_2D(array_x, array_y)
cartesian_2D(array_x, array_y)
array_x |
A vector, indicating the first set. |
array_y |
A vector, indicating the second set. |
A matrix of length(array_x)
* length(array_y)
rows
and two columns. Each row indicates an ordered pair.
cartesian_2D
is used internally in other function(s) of ROCit.
Works if matrix/data frames are passed as arguments. However,
returns might not be valid if arguments are not one dimensional.
x <- seq(3) y <- c(10,20,30) cartesian_2D(x,y)
x <- seq(3) y <- c(10,20,30) cartesian_2D(x,y)
See ciAUC.rocit
.
ciAUC(object, ...)
ciAUC(object, ...)
object |
An object of class |
... |
Arguments to be passed to methods.
See |
ciAUC
constructs confidence interval
of area under curve (AUC) of receiver operating characteristic (ROC)
curve. This is an S3 method defined for object of class "rocit"
.
## S3 method for class 'rocit' ciAUC( object, level = 0.95, delong = FALSE, logit = FALSE, nboot = NULL, step = FALSE, ... = NULL )
## S3 method for class 'rocit' ciAUC( object, level = 0.95, delong = FALSE, logit = FALSE, nboot = NULL, step = FALSE, ... = NULL )
object |
An object of class |
level |
Level of confidence, must be within the range (0 1). Default is 0.95. |
delong |
Logical; indicates whether DeLong formula should
be used to estimate the variance of AUC. Default is |
logit |
Logical; indicates whether confidence interval of
logit transformed AUC should be evaluated first. Default is |
nboot |
Number of bootstrap samples, if bootstrap method is desired.
Default is NULL. If a numeric value is specified, overrides
|
step |
Logical, default in |
... |
|
An object of class "rocitaucci"
.
data("Diabetes") logistic.model <- glm(as.factor(dtest)~chol+age+bmi, data = Diabetes,family = "binomial") score <- logistic.model$fitted.values class <- logistic.model$y # Make the rocit objects rocit_bin <- rocit(score = score, class = class, method = "bin") # Confidence interval of AUC ciAUC(rocit_bin, level = 0.9) ciAUC(rocit_bin, delong = TRUE, logit = TRUE)
data("Diabetes") logistic.model <- glm(as.factor(dtest)~chol+age+bmi, data = Diabetes,family = "binomial") score <- logistic.model$fitted.values class <- logistic.model$y # Make the rocit objects rocit_bin <- rocit(score = score, class = class, method = "bin") # Confidence interval of AUC ciAUC(rocit_bin, level = 0.9) ciAUC(rocit_bin, delong = TRUE, logit = TRUE)
See ciROC.rocit
.
ciROC(object, ...)
ciROC(object, ...)
object |
An object of class |
... |
Arguments to be passed to methods.
See |
ciROC
constructs confidence interval
of receiver operating characteristic (ROC)
curve. This is an S3 method defined for object of class "rocit"
.
## S3 method for class 'rocit' ciROC(object, level = 0.95, nboot = 500, ... = NULL)
## S3 method for class 'rocit' ciROC(object, level = 0.95, nboot = 500, ... = NULL)
object |
An object of class |
level |
Level of confidence, must be within the range (0 1). Default is 0.95. |
nboot |
Number of bootstrap samples, used to estimate |
... |
|
For large values of and
,
the distribution of
at
can be approximated as a normal distribution
with following mean and variance:
where ,
and
are the probability distribution functions of
the diagnostic variable in positive and negative groups
(with corresponding cumulative distribution functions
and
),
, and
is the survival
function given by:
.
density
and
approxfun
were used to approximate PDF and CDF
of the diagnostic score in the two groups and the inverse survival
of the diagnostic in the negative responses.
For "binomial"
type, variance of is given by
. Bootstrap method was used to estimate
,
and
. The lower and upper limit of
are inverse probit transformed to obtain the confidence interval
of the ROC curve.
A list of class "rocci"
, having following elements:
`ROC estimation method`` |
The method applied to estimate ROC curve in the
|
`Confidence level` |
Level of confidence as supplied as argument. |
FPR |
An array containing all the FPR values, for which TPR and confidence interval of TPR were estimated. |
TPR |
Array containing the TPR values associated with the FPR values. |
LowerTPR |
Lower limits of the TPR values. Forced to zero for
|
UpperTPR |
Upper limits of the TPR values. Forced to one for
|
Pepe, Margaret Sullivan. The statistical evaluation of medical tests for classification and prediction. Medicine, 2003.
plot.rocci
, rocit
, ciAUC.rocit
data("Loan") score <- Loan$Score class <- ifelse(Loan$Status == "CO", 1, 0) rocit_emp <- rocit(score = score, class = class, method = "emp") # ------------------------------------------------ ciROC_emp90 <- ciROC(rocit_emp, level = 0.9) plot(ciROC_emp90, egend = TRUE)
data("Loan") score <- Loan$Score class <- ifelse(Loan$Status == "CO", 1, 0) rocit_emp <- rocit(score = score, class = class, method = "emp") # ------------------------------------------------ ciROC_emp90 <- ciROC(rocit_emp, level = 0.9) plot(ciROC_emp90, egend = TRUE)
Function ciROCbin
estimates confidence interval
of binormally estimated ROC curve.
ciROCbin(rocit_bin, level, nboot)
ciROCbin(rocit_bin, level, nboot)
rocit_bin |
An object of class |
level |
Desired level of confidence to be estimated. |
nboot |
Number of bootstrap samples, used to estimate |
A list object containing TPR, upper and lower bound of TPR at certain FPR values.
ciROCbin
is used internally in ciROC.rocit
of ROCit.
data("Loan") score <- Loan$Score class <- ifelse(Loan$Status == "CO", 1, 0) rocit_bin <- rocit(score = score, class = class, method = "bin") ciROC_bin90 <- ciROCbin(rocit_bin, level = 0.9, nboot = 300) TPR <- ciROC_bin90$TPR FPR <- ciROC_bin90$FPR Upper90 <- ciROC_bin90$UpperTPR Lower90 <- ciROC_bin90$LowerTPR plot(TPR~FPR, type = "l") lines(Upper90~FPR, lty = 2) lines(Lower90~FPR, lty = 2) grid() legend("bottomright", c("Binormal ROC curve", "90% CI"), lty = c(1,2))
data("Loan") score <- Loan$Score class <- ifelse(Loan$Status == "CO", 1, 0) rocit_bin <- rocit(score = score, class = class, method = "bin") ciROC_bin90 <- ciROCbin(rocit_bin, level = 0.9, nboot = 300) TPR <- ciROC_bin90$TPR FPR <- ciROC_bin90$FPR Upper90 <- ciROC_bin90$UpperTPR Lower90 <- ciROC_bin90$LowerTPR plot(TPR~FPR, type = "l") lines(Upper90~FPR, lty = 2) lines(Lower90~FPR, lty = 2) grid() legend("bottomright", c("Binormal ROC curve", "90% CI"), lty = c(1,2))
Function ciROCemp
estimates confidence interval
of empirically estimated ROC curve.
ciROCemp(rocit_emp, level)
ciROCemp(rocit_emp, level)
rocit_emp |
An object of class |
level |
Desired level of confidence to be estimated. |
A list object containing TPR, upper and lower bound of TPR at certain FPR values.
ciROCemp
is used internally in ciROC.rocit
of ROCit.
set.seed(100) score <- c(runif(20, 15, 35), runif(15, 25, 45)) class <- c(rep(1, 20), rep(0, 15)) rocit_object <- rocit(score, class) ciROC <- ciROCemp(rocit_object, level = 0.9) names(ciROC)
set.seed(100) score <- c(runif(20, 15, 35), runif(15, 25, 45)) class <- c(rep(1, 20), rep(0, 15)) rocit_object <- rocit(score, class) ciROC <- ciROCemp(rocit_object, level = 0.9) names(ciROC)
convertclass
converts a binary variable with any
response into 1/0 response. It is used internally in other functions of
package ROCit.
convertclass(x, reference = NULL)
convertclass(x, reference = NULL)
x |
A vector of exactly two unique values. |
reference |
The reference value. Depending on the class of |
A numeric vector of 1 and 0. Gives warning if there exists NA
(s)
in x
.
convertclass
is used internally in other function(s) of ROCit.
x <- c("cat", "cat", "dog", "cat") convertclass(x) # by default, "cat" is converted to 0 convertclass(x, reference = "dog") # ---------------------------- set.seed(10) x <- round(runif(10, 2, 3)) convertclass(x, reference = 3) # numeric reference can be supplied as character convertclass(x, reference = "3") # same result
x <- c("cat", "cat", "dog", "cat") convertclass(x) # by default, "cat" is converted to 0 convertclass(x, reference = "dog") # ---------------------------- set.seed(10) x <- round(runif(10, 2, 3)) convertclass(x, reference = 3) # numeric reference can be supplied as character convertclass(x, reference = "3") # same result
These data are courtesy of Dr John Schorling, Department of Medicine, University of Virginia School of Medicine.
The data contains information on 403 subjects from 1046 subjects who were interviewed in a study to understand the prevalence of obesity, diabetes, and other cardiovascular risk factors in central Virginia for African Americans. According to Dr John Hong, Diabetes Mellitus Type II (adult onset diabetes) is associated most strongly with obesity. The waist/hip ratio may be a predictor in diabetes and heart disease. DM II is also associated with hypertension - they may both be part of "Syndrome X". The 403 subjects were the ones who were actually screened for diabetes. Glycosylated hemoglobin > 7.0 is usually taken as a positive diagnosis of diabetes.
Diabetes
Diabetes
A data frame with 403 rows and 22 variables (See "Note"):
Subject id
Total cholesterol
Stabilized glucose
High density lipoprotein
Cholesterol/hdl ratio
Glycosylated hemoglobin
A factor with levels Buckingham
and Louisa
Age (years)
Gender, male
or female
Height (inches)
Weight (pounds)
A factor with levels small
,
medium
and large
First systolic blood pressure
First diastolic blood pressure
Second systolic blood pressure
Second diastolic blood pressure
Waist (inches)
Hip (inches)
Postprandial time when labs were drawn in minutes
Body mass index
An indicator whether glyhb
is greater than 7 or not
Waist to hip ratio
The last three variables (bmi
, dtest
, whr
)
were created. For bmi
, following
formula
was used:
staff.pubhealth.ku.dk/~tag/Teaching/share/data/Diabetes.html#sec-2
Willems, James P., J. Terry Saunders, Dawn E. Hunt, and John B. Schorling. "Prevalence of coronary heart disease risk factors among rural blacks: a community-based study." Southern medical journal 90, no. 8 (1997): 814-820.
Schorling, John B., Julienne Roach, Marjorie Siegel, Natalie Baturka, Dawn E. Hunt, Thomas M. Guterbock, and Herbert L. Stewart. "A trial of church-based smoking cessation interventions for rural African Americans." Preventive Medicine 26, no. 1 (1997): 92-101.
data("Diabetes") plot(Diabetes$hdl~Diabetes$weight, pch = 16, col =ifelse(Diabetes$gender=="male",1,2)) #------------------------------------------ ## density plot femaleBMI <- density(subset(Diabetes, gender == "female")$bmi, na.rm = TRUE) maleBMI <- density(subset(Diabetes, gender == "male")$bmi, na.rm = TRUE) ## ------- plot(NULL, ylim = c(0,0.08), xlim = c(10,60), xlab = "BMI", ylab = "Density", main = "") grid(col = 1) polygon(maleBMI, col = rgb(0,0,1,0.2), border = 4) polygon(femaleBMI, col = rgb(1,0,0,0.2), border = 2) abline(h = 0) legend("topright", c("Male", "Female"), pch = 15, col = c(rgb(0,0,1,0.2), rgb(1,0,0,0.2)), bty = "n") #------------------------------------------ logistic.model <- glm(as.factor(dtest)~chol+age+bmi, data = Diabetes,family = "binomial") summary(logistic.model) #------------------------------------------ class <- logistic.model$y score <- logistic.model$fitted.values rocit_object <- rocit(score = score, class = class) summary(rocit_object) plot(rocit_object)
data("Diabetes") plot(Diabetes$hdl~Diabetes$weight, pch = 16, col =ifelse(Diabetes$gender=="male",1,2)) #------------------------------------------ ## density plot femaleBMI <- density(subset(Diabetes, gender == "female")$bmi, na.rm = TRUE) maleBMI <- density(subset(Diabetes, gender == "male")$bmi, na.rm = TRUE) ## ------- plot(NULL, ylim = c(0,0.08), xlim = c(10,60), xlab = "BMI", ylab = "Density", main = "") grid(col = 1) polygon(maleBMI, col = rgb(0,0,1,0.2), border = 4) polygon(femaleBMI, col = rgb(1,0,0,0.2), border = 2) abline(h = 0) legend("topright", c("Male", "Female"), pch = 15, col = c(rgb(0,0,1,0.2), rgb(1,0,0,0.2)), bty = "n") #------------------------------------------ logistic.model <- glm(as.factor(dtest)~chol+age+bmi, data = Diabetes,family = "binomial") summary(logistic.model) #------------------------------------------ class <- logistic.model$y score <- logistic.model$fitted.values rocit_object <- rocit(score = score, class = class) summary(rocit_object) plot(rocit_object)
See gainstable.default
,
gainstable.rocit
.
gainstable(...)
gainstable(...)
... |
Arguments to be passed to methods.
See |
Default S3 method to create gains table from a vector of diagnostic score and the class of observations.
## Default S3 method: gainstable(score, class, negref = NULL, ngroup = 10, breaks = NULL, ... = NULL)
## Default S3 method: gainstable(score, class, negref = NULL, ngroup = 10, breaks = NULL, ... = NULL)
score |
An numeric array of diagnostic score. Same as in
|
class |
An array of equal length of score,
containing the class of the observations. Same as in |
negref |
The reference value, same as the
|
ngroup |
Number of desired groups in gains table. Ignored if
|
breaks |
Percentiles (in percentage) at which observations
should be separated to
form groups. If specified, |
... |
|
gainstable
function creates gains table containing
ngroup
number of groups or buckets. The algorithm first orders
the score variable with respect to score variable. In case of tie,
it class becomes the ordering variable, keeping the positive responses first.
The algorithm calculates the ending index in each bucket as
. Each bucket should have
at least 5 observations.
If buckets' end index are to be ended at desired level of
population, then breaks
should be specified.
If specified, it overrides ngroup
and ngroup
is ignored.
breaks
by default always includes 100. If whole number does not exist
at specified population, nearest integers are considered.
A list of class "gainstable"
. It has the following components:
Bucket |
The serial number of buckets or groups. |
Obs |
Number of observation in the group. |
CObs |
Cumulative number of observations up to the group. |
Depth |
Cumulative population depth up to the group. |
Resp |
Number of (positive) responses in the group. |
CResp |
Cumulative number of (positive) responses up to the group. |
RespRate |
(Positive) response rate in the group. |
CRespRate |
Cumulative (positive) response rate up to the group |
CCapRate |
Cumulative overall capture rate of (positive) responses up to the group. |
Lift |
Lift index in the group. Calculated as
|
CLift |
Cumulative lift index up to the group. |
The algorithm is designed for complete cases. If NA(s) found in
either score
or class
, then removed.
gainstable.rocit
, plot.gainstable
,
rocit
data("Loan") class <- Loan$Status score <- Loan$Score # ---------------------------------------------------------------- gtable15 <- gainstable(score = score, class = class, negref = "FP", ngroup = 15) gtable_custom <- gainstable(score = score, class = class, negref = "FP", breaks = seq(1,100,15)) # ---------------------------------------------------------------- print(gtable15) print(gtable_custom) # ---------------------------------------------------------------- plot(gtable15) plot(gtable_custom) plot(gtable_custom, type = 2) plot(gtable_custom, type = 3)
data("Loan") class <- Loan$Status score <- Loan$Score # ---------------------------------------------------------------- gtable15 <- gainstable(score = score, class = class, negref = "FP", ngroup = 15) gtable_custom <- gainstable(score = score, class = class, negref = "FP", breaks = seq(1,100,15)) # ---------------------------------------------------------------- print(gtable15) print(gtable_custom) # ---------------------------------------------------------------- plot(gtable15) plot(gtable_custom) plot(gtable_custom, type = 2) plot(gtable_custom, type = 3)
S3 method to create gains table from object of
class "rocit"
.
## S3 method for class 'rocit' gainstable(x, ngroup = 10, breaks = NULL, ... = NULL)
## S3 method for class 'rocit' gainstable(x, ngroup = 10, breaks = NULL, ... = NULL)
x |
A |
ngroup |
Number of desired groups in gains table. See
|
breaks |
Percentiles (in percentage) at which observations
should be separated to
form groups. See |
... |
|
gainstable.rocit
calls gainstable.default
.
It creates the score
and class
variables from the
supplied "rocit"
object internally. See
gainstable.default
for details.
A list of class "gainstable"
, same as returned by
gainstable.default
.
gainstable.default
, plot.gainstable
,
rocit
data("Loan") class <- Loan$Status score <- Loan$Score rocit_emp <- rocit(score = score, class = class, negref = "FP") # ---------------------------------------------------------------- gtable15 <- gainstable(rocit_emp, ngroup = 15) gtable_custom <- gainstable(rocit_emp, breaks = seq(1,100,15)) print(gtable15) print(gtable_custom) # ---------------------------------------------------------------- plot(gtable15) plot(gtable_custom) plot(gtable_custom, type = 2) plot(gtable_custom, type = 3)
data("Loan") class <- Loan$Status score <- Loan$Score rocit_emp <- rocit(score = score, class = class, negref = "FP") # ---------------------------------------------------------------- gtable15 <- gainstable(rocit_emp, ngroup = 15) gtable_custom <- gainstable(rocit_emp, breaks = seq(1,100,15)) print(gtable15) print(gtable_custom) # ---------------------------------------------------------------- plot(gtable15) plot(gtable_custom) plot(gtable_custom, type = 2) plot(gtable_custom, type = 3)
Function getsurvival
calculates survival probability
from an object of class "density" at specified value.
getsurvival(x, cutoff)
getsurvival(x, cutoff)
x |
An object of class "density". |
cutoff |
Value at which survival probability will be calculated. |
The survival function S, of a random variable is defined by,
where is the cumulative density
function (CDF) of
.
Survival probability.
getsurvival
is used internally in other function(s) of ROCit.
data("Loan") k <- density(Loan$Income) # What portion have income over 100,000 getsurvival(k,100000)
data("Loan") k <- density(Loan$Income) # What portion have income over 100,000 getsurvival(k,100000)
See ksplot.rocit
.
ksplot(object, ...)
ksplot(object, ...)
object |
An object of class |
... |
Arguments to be passed to methods.
See |
Generates cumulative density of diagnostic variable in positive and negative responses.
## S3 method for class 'rocit' ksplot( object, col = c("#26484F", "#BEBEBE", "#FFA54F"), lty = c(1, 1, 1), legend = T, legendpos = "bottomright", values = T, ... = NULL )
## S3 method for class 'rocit' ksplot( object, col = c("#26484F", "#BEBEBE", "#FFA54F"), lty = c(1, 1, 1), legend = T, legendpos = "bottomright", values = T, ... = NULL )
object |
An object of class |
col |
Colors to be used for plot. Minimum three colors need to be supplied for F(c), G(c) and KS Stat mark. |
lty |
Line types of the plots. |
legend |
A logical value indicating whether legends to appear in the plot. |
legendpos |
Position of the legend. A single keyword from
|
values |
A logical value, indicating whether values to be returned. |
... |
|
This function plots the cumulative density functions $F(c)$ and $G(c) of the diagnostic variable in the negative and positive populations. If the positive population have higher value then negative curve ($F(c)$) ramps up quickly. The KS statistic is the maximum difference of $F(c)$ and $G(c)$.
If values = TRUE
, then Cutoff, F(c), G(c), KS stat,
KS Cutoff are returned silently.
Customized plots can be made by using the returned values of the function.
data("Diabetes") logistic.model <- glm(as.factor(dtest)~chol+age+bmi, data = Diabetes,family = "binomial") class <- logistic.model$y score <- qlogis(logistic.model$fitted.values) # ------------------------------------------------------------- roc_emp <- rocit(score = score, class = class) # default method empirical # ------------------------------------------------------------- kplot1 <- ksplot(roc_emp) message("KS Stat (empirical) : ", kplot1$`KS stat`) message("KS Stat (empirical) cutoff : ", kplot1$`KS Cutoff`)
data("Diabetes") logistic.model <- glm(as.factor(dtest)~chol+age+bmi, data = Diabetes,family = "binomial") class <- logistic.model$y score <- qlogis(logistic.model$fitted.values) # ------------------------------------------------------------- roc_emp <- rocit(score = score, class = class) # default method empirical # ------------------------------------------------------------- kplot1 <- ksplot(roc_emp) message("KS Stat (empirical) : ", kplot1$`KS stat`) message("KS Stat (empirical) cutoff : ", kplot1$`KS Cutoff`)
A data containing information about 900 borrowers. It is a modified version of publicly available real data.
Loan
Loan
A data frame with 900 rows and 9 variables:
Amount of loan, shown as percentage of a certain amount.
The number of payments on the loan. Values are in months.
Interest rate.
Ratio of installment amount and total loan amount.
Employment length, categorized.
A:
0-2 years
B:
3-5 years
C:
7-8 years
D:
8+ years
U:
Unknown
Status of home ownership.
Annual income.
A factor indicating whether the loan was fully paid
(FP
) or charged off (CO
) after full term.
A risk score calculated from loan amount, interest rate and
annual income. The log-odds of logistic regression were transformed into
scores using ,
and
.
See "References".
http://www.lendingclub.com/info/download-data.action
Siddiqi, Naeem. Credit risk scorecards: developing and implementing intelligent credit scoring. Vol. 3. John Wiley & Sons, 2012.
data("Loan") boxplot(Income~Home, data = Loan, col = c(2:4), pch = 16, ylim = c(0,200000), ylab = "Income", xlab = "Home Ownership Status", main = "Annual Income Boxplot") grid()
data("Loan") boxplot(Income~Home, data = Loan, col = c(2:4), pch = 16, ylim = c(0,200000), ylab = "Income", xlab = "Home Ownership Status", main = "Annual Income Boxplot") grid()
See measureit.default
,
measureit.rocit
measureit(...)
measureit(...)
... |
Arguments to be passed to methods.
See |
This function computes various performance metrics at different cutoff values.
## Default S3 method: measureit( score, class, negref = NULL, measure = c("ACC", "SENS"), step = FALSE, ... = NULL )
## Default S3 method: measureit( score, class, negref = NULL, measure = c("ACC", "SENS"), step = FALSE, ... = NULL )
score |
An numeric array of diagnostic score. |
class |
An array of equal length of score, containing the class of the observations. |
negref |
The reference value, same as the
|
measure |
The performance metrics to be evaluated. See "Details" for available options. |
step |
Logical, default in |
... |
|
Various performance metrics for binary classifier are
available that are cutoff specific. For a certain cutoff value, all the
observations having score equal or greater are predicted as
positive. Following metrics can be called for
via measure
argument:
ACC:
Overall accuracy of classification =
= (TP + TN) / (TP + FP + TN + FN)
MIS:
Misclassification rate =
SENS:
Sensitivity =
SPEC:
Specificity =
PREC:
Precision =
REC:
Recall. Same as sensitivity.
PPV:
Positive predictive value. Same as precision
NPV:
Positive predictive value =
TPR:
True positive rate. Same as sensitivity.
FPR:
False positive rate. Same as .
TNR:
True negative rate. Same as specificity.
FNR:
False negative rate =
pDLR:
Positive diagnostic likelihood ratio =
nDLR:
Negative diagnostic likelihood ratio =
FSCR:
F-score, defined as
Exact match is required. If the values passed in the
measure
argument do not match with the
available options, then ignored.
An object of class "measureit"
. By default it contains the
followings:
Cutoff |
Cutoff at which metrics are evaluated. |
Depth |
What portion of the observations fall on or above the cutoff. |
TP |
Number of true positives, when the observations having score equal or greater than cutoff are predicted positive. |
FP |
Number of false positives, when the observations having score equal or greater than cutoff are predicted positive. |
TN |
Number of true negatives, when the observations having score equal or greater than cutoff are predicted positive. |
FN |
Number of false negatives, when the observations having score equal or greater than cutoff are predicted positive. |
When other metrics are called via measure
, those also appear
in the return in the order they are listed above.
The algorithm is designed for complete cases. If NA(s) found in
either score
or class
, then removed.
Internally sorting is performed, with respect to the
score
. In case of tie, sorting is done with respect to class
.
Riaz Khan, [email protected]
measureit.rocit
, print.measureit
data("Diabetes") logistic.model <- glm(factor(dtest)~chol+age+bmi, data = Diabetes,family = "binomial") class <- logistic.model$y score <- logistic.model$fitted.values # ------------------------------------------------------------- measure <- measureit(score = score, class = class, measure = c("ACC", "SENS", "FSCR")) names(measure) plot(measure$ACC~measure$Cutoff, type = "l") plot(measure$TP~measure$FP, type = "l")
data("Diabetes") logistic.model <- glm(factor(dtest)~chol+age+bmi, data = Diabetes,family = "binomial") class <- logistic.model$y score <- logistic.model$fitted.values # ------------------------------------------------------------- measure <- measureit(score = score, class = class, measure = c("ACC", "SENS", "FSCR")) names(measure) plot(measure$ACC~measure$Cutoff, type = "l") plot(measure$TP~measure$FP, type = "l")
This is an S3 method for object of class "rocit"
.
It computes various performance metrics at different cutoff values.
## S3 method for class 'rocit' measureit(x, measure = c("ACC", "SENS"), ... = NULL)
## S3 method for class 'rocit' measureit(x, measure = c("ACC", "SENS"), ... = NULL)
x |
An object of class |
measure |
The performance metrics to be evaluated. See "Details" for available options. |
... |
|
This function calls measureit.default
. From the
components of "rocit"
objects, it calculates the score
and class
variables internally. See measureit.default
for other
details and available options for measure
argument.
An object of class "measureit"
, same as returned by
measureit.default
.
See measureit.default
.
measureit.default
, print.measureit
data("Diabetes") logistic.model <- glm(as.factor(dtest)~chol+age+bmi, data = Diabetes,family = "binomial") class <- logistic.model$y score <- logistic.model$fitted.values rocit_object <- rocit(score = score, class = class) # ------------------------------------------------------------- measure <- measureit(rocit_object, measure = c("ACC", "SENS", "FSCR")) names(measure) plot(measure$ACC~measure$Cutoff, type = "l") plot(measure$TP~measure$FP, type = "l")
data("Diabetes") logistic.model <- glm(as.factor(dtest)~chol+age+bmi, data = Diabetes,family = "binomial") class <- logistic.model$y score <- logistic.model$fitted.values rocit_object <- rocit(score = score, class = class) # ------------------------------------------------------------- measure <- measureit(rocit_object, measure = c("ACC", "SENS", "FSCR")) names(measure) plot(measure$ACC~measure$Cutoff, type = "l") plot(measure$TP~measure$FP, type = "l")
The function calculates the maximum likelihood (ML) estimates
of the two parameters and
, when a set of numbers are
assumed to be normally distributed.
MLestimates(x)
MLestimates(x)
x |
A numeric vector. |
If a set of observations
are assumed to be normally
distributed, two parameters, mean and the variance (the square
of
)
are to be estimated. In theory, the ML estimate of
is the mean of the
observations. And the ML estimate of square of
is the mean
squared deviation of the observations from the estimated
.
A "list"
object of two numeric components,
and
.
MLestimates
is used internally in other function(s) of ROCit.
# Find the two parameters set.seed(10) points <- rnorm(200, 10, 5) ML <- MLestimates(points) message("The ML estimates are: mean = ", round(ML$mu, 3), " , SD = ", round(ML$sigma, 3)) #----------------------------------------- # Superimpose smooth curve over hostogram set.seed(100) x <- rnorm(400) hist(x, probability = TRUE, col = "gray90") ML <- MLestimates(x) x <- seq(-3, 3, 0.01) density <- dnorm(x, mean = ML$mu, sd = ML$sigma) lines(density~x, lwd = 2)
# Find the two parameters set.seed(10) points <- rnorm(200, 10, 5) ML <- MLestimates(points) message("The ML estimates are: mean = ", round(ML$mu, 3), " , SD = ", round(ML$sigma, 3)) #----------------------------------------- # Superimpose smooth curve over hostogram set.seed(100) x <- rnorm(400) hist(x, probability = TRUE, col = "gray90") ML <- MLestimates(x) x <- seq(-3, 3, 0.01) density <- dnorm(x, mean = ML$mu, sd = ML$sigma) lines(density~x, lwd = 2)
"gainstable"
ObjectAn S3 method to make different plots using entries of gains table.
## S3 method for class 'gainstable' plot( x, y = NULL, type = 1, col = c("#BEBEBE", "#26484F", "#8B4500"), legend = TRUE, ... = NULL )
## S3 method for class 'gainstable' plot( x, y = NULL, type = 1, col = c("#BEBEBE", "#26484F", "#8B4500"), legend = TRUE, ... = NULL )
x |
An object of class |
y |
|
type |
Plot type. See "Details". |
col |
Colors to be used for plot. |
legend |
A logical value indicating whether legend to appear. See "Details" |
... |
|
Currently three types are available. type = 1
shows
lift and cumulative lift against population depth. type = 2
shows response
rate and cumulative response rate against population depth.
type = 3
shows
cumulative capture rate of positive responses against population depth.
For type
1 and 2, three colors and for 3, two colors
are required.
If more than required specified, then first 3 (for type
1, 2) or
2 (for type
3) colors are used. If less than required specified,
then
specified colors are repeated.
If legend
is TRUE
,
then legend appears in the plot. For type
1 and 2, legend
position is "topright"
, for 3, "bottomright"
.
data("Loan") class <- Loan$Status score <- Loan$Score rocit_emp <- rocit(score = score, class = class, negref = "FP") # ---------------------------------------------------------------- gtable <- gainstable(rocit_emp) # ---------------------------------------------------------------- plot(gtable) plot(gtable, legend = FALSE) plot(gtable, col = 2:4) plot(gtable, type = 2, col = 2:4) plot(gtable, type = 3, col = 2:3)
data("Loan") class <- Loan$Status score <- Loan$Score rocit_emp <- rocit(score = score, class = class, negref = "FP") # ---------------------------------------------------------------- gtable <- gainstable(rocit_emp) # ---------------------------------------------------------------- plot(gtable) plot(gtable, legend = FALSE) plot(gtable, col = 2:4) plot(gtable, type = 2, col = 2:4) plot(gtable, type = 3, col = 2:3)
This function plots receiver operating
characteristic (ROC) curve with confidence limits.
This is an S3 method for object of class
"rocci"
, returned by ciROC.rocit
function.
## S3 method for class 'rocci' plot( x, col = c("#2F4F4F", "#404040"), lty = c(1, 2), lwd = c(2, 1), grid = TRUE, legend = TRUE, legendpos = "bottomright", ... = NULL )
## S3 method for class 'rocci' plot( x, col = c("#2F4F4F", "#404040"), lty = c(1, 2), lwd = c(2, 1), grid = TRUE, legend = TRUE, legendpos = "bottomright", ... = NULL )
x |
An object of class |
col |
Color(s) to be used for the plot. First two colors are used for the ROC curve and confidence limits if multiple colors supplied. Same color is used if single color supplied. |
lty |
The line type. Same as in |
lwd |
The line width. Same as in |
grid |
Logical, indicating whether to add rectangular grid. Calls
|
legend |
Logical, indicating whether to add legends to the plot. |
legendpos |
Position of the legend. A single keyword from
|
... |
|
score <- c(rnorm(300,30,15), rnorm(300,50,15)) class <- c(rep(0,300), rep(1,300)) rocit_object <- rocit(score = score, class = class, method = "bi") rocci_object <- ciROC(rocit_object) # --------------------------- plot(rocci_object) plot(rocci_object, col = c(2,4)) plot(rocci_object, col = c(2,4), legendpos = "bottom", lty = c(1,3))
score <- c(rnorm(300,30,15), rnorm(300,50,15)) class <- c(rep(0,300), rep(1,300)) rocit_object <- rocit(score = score, class = class, method = "bi") rocci_object <- ciROC(rocit_object) # --------------------------- plot(rocci_object) plot(rocci_object, col = c(2,4)) plot(rocci_object, col = c(2,4), legendpos = "bottom", lty = c(1,3))
This function generates receiver operating
characteristic (ROC) curve. This is an S3 method for object of class
"rocit"
, returned by rocit
function.
## S3 method for class 'rocit' plot( x, col = c("#2F4F4F", "#BEBEBE"), legend = TRUE, legendpos = "bottomright", YIndex = TRUE, values = TRUE, ... = NULL )
## S3 method for class 'rocit' plot( x, col = c("#2F4F4F", "#BEBEBE"), legend = TRUE, legendpos = "bottomright", YIndex = TRUE, values = TRUE, ... = NULL )
x |
An object of class |
col |
Colors to be used in the plot. If multiple specified,
the first color is used for the ROC curve, and the second color is used for
the chance line ( |
legend |
A logical value indicating whether legends to appear in the plot. |
legendpos |
Position of the legend. A single keyword from
|
YIndex |
A logical value indicating whether optimal
Youden Index (i.e where |
values |
A logical value, indicating whether values to be returned. |
... |
|
If values = TRUE
, then AUC, Cutoff, TPR, FPR,
optimal Youden Index with
associated TPR, FPR, Cutoff are returned silently.
Customized plots can be made by using the returned values of the function.
data("Loan") score <- Loan$Score class <- ifelse(Loan$Status == "FP", 0, 1) rocit_emp <- rocit(score = score, class = class) # ----------------------- plot(rocit_emp) plot(rocit_emp, col = c(2,4), legendpos = "bottom", YIndex = FALSE, values = FALSE) # ----------------------- rocit_bin <- rocit(score = score, class = class, method = "bin") # ----------------------- plot(rocit_emp, col = c(1,"gray50"), legend = FALSE, YIndex = FALSE) lines(rocit_bin$TPR~rocit_bin$FPR, col = 2, lwd = 2) legend("bottomright", col = c(1,2), c("Empirical ROC", "Binormal ROC"), lwd = 2)
data("Loan") score <- Loan$Score class <- ifelse(Loan$Status == "FP", 0, 1) rocit_emp <- rocit(score = score, class = class) # ----------------------- plot(rocit_emp) plot(rocit_emp, col = c(2,4), legendpos = "bottom", YIndex = FALSE, values = FALSE) # ----------------------- rocit_bin <- rocit(score = score, class = class, method = "bin") # ----------------------- plot(rocit_emp, col = c(1,"gray50"), legend = FALSE, YIndex = FALSE) lines(rocit_bin$TPR~rocit_bin$FPR, col = 2, lwd = 2) legend("bottomright", col = c(1,2), c("Empirical ROC", "Binormal ROC"), lwd = 2)
'gainstable'
ObjectS3 print method to print "gainstable"
object.
## S3 method for class 'gainstable' print(x, maxdigit = 3, ... = NULL)
## S3 method for class 'gainstable' print(x, maxdigit = 3, ... = NULL)
x |
An object of class |
maxdigit |
How many digits after decimal to be printed. |
... |
|
data("Loan") class <- Loan$Status score <- Loan$Score rocit_emp <- rocit(score = score, class = class, negref = "FP") # ---------------------------------------------------------------- gtable8 <- gainstable(rocit_emp, ngroup = 8) print(gtable8) print(gtable8, maxdigit = 4)
data("Loan") class <- Loan$Status score <- Loan$Score rocit_emp <- rocit(score = score, class = class, negref = "FP") # ---------------------------------------------------------------- gtable8 <- gainstable(rocit_emp, ngroup = 8) print(gtable8) print(gtable8, maxdigit = 4)
'measureit'
ObjectS3 method to print object of "measureit"
class
in organized way.
## S3 method for class 'measureit' print(x, n = NULL, ... = NULL)
## S3 method for class 'measureit' print(x, n = NULL, ... = NULL)
x |
An object of class |
n |
How many rows of output is desired in the output. If NULL, then
prints all the rows. If specified, then first n rows are printed. If specified
n is bigger than the number of possible rows, then n is adjusted. If
non integer or negative, default (10 or number of possible rows,
whichever is smaller) is set. If |
... |
|
data("Diabetes") logistic.model <- glm(as.factor(dtest)~chol+age+bmi, data = Diabetes,family = "binomial") class <- logistic.model$y score <- logistic.model$fitted.values # ------------------------------------------------------------- measure <- measureit(score = score, class = class, measure = c("ACC", "SENS", "FSCR")) print(measure, n = 5) print(measure, n = 10)
data("Diabetes") logistic.model <- glm(as.factor(dtest)~chol+age+bmi, data = Diabetes,family = "binomial") class <- logistic.model$y score <- logistic.model$fitted.values # ------------------------------------------------------------- measure <- measureit(score = score, class = class, measure = c("ACC", "SENS", "FSCR")) print(measure, n = 5) print(measure, n = 10)
rocci
ObjectPrint rocci
Object
## S3 method for class 'rocci' print(x, ... = NULL)
## S3 method for class 'rocci' print(x, ... = NULL)
x |
An object of class |
... |
|
data("Diabetes") roc_empirical <- rocit(score = Diabetes$chol, class = Diabetes$dtest, negref = "-") # default method empirical roc_binormal <- rocit(score = Diabetes$chol, class = Diabetes$dtest, negref = "-", method = "bin") # --------------------- print(ciROC(roc_empirical)) print(ciROC(roc_binormal))
data("Diabetes") roc_empirical <- rocit(score = Diabetes$chol, class = Diabetes$dtest, negref = "-") # default method empirical roc_binormal <- rocit(score = Diabetes$chol, class = Diabetes$dtest, negref = "-", method = "bin") # --------------------- print(ciROC(roc_empirical)) print(ciROC(roc_binormal))
rocit
ObjectPrint rocit
Object
## S3 method for class 'rocit' print(x, ... = NULL)
## S3 method for class 'rocit' print(x, ... = NULL)
x |
An object of class |
... |
|
data("Diabetes") roc_empirical <- rocit(score = Diabetes$chol, class = Diabetes$dtest, negref = "-") # default method empirical roc_binormal <- rocit(score = Diabetes$chol, class = Diabetes$dtest, negref = "-", method = "bin") # --------------------- print(roc_empirical) print(roc_binormal)
data("Diabetes") roc_empirical <- rocit(score = Diabetes$chol, class = Diabetes$dtest, negref = "-") # default method empirical roc_binormal <- rocit(score = Diabetes$chol, class = Diabetes$dtest, negref = "-", method = "bin") # --------------------- print(roc_empirical) print(roc_binormal)
Print Confidence Interval of AUC
## S3 method for class 'rocitaucci' print(x, ... = NULL)
## S3 method for class 'rocitaucci' print(x, ... = NULL)
x |
An object of class |
... |
|
data("Diabetes") logistic.model <- glm(as.factor(dtest)~chol+age+bmi, data = Diabetes,family = "binomial") score <- logistic.model$fitted.values class <- logistic.model$y # Make the rocit objects rocit_bin <- rocit(score = score, class = class, method = "bin") obj_1 <- ciAUC(rocit_bin, level = 0.9) obj_2 <- ciAUC(rocit_bin, delong = TRUE) obj_3 <- ciAUC(rocit_bin, delong = TRUE, logit = TRUE) # Print print(obj_1) print(obj_2) print(obj_3)
data("Diabetes") logistic.model <- glm(as.factor(dtest)~chol+age+bmi, data = Diabetes,family = "binomial") score <- logistic.model$fitted.values class <- logistic.model$y # Make the rocit objects rocit_bin <- rocit(score = score, class = class, method = "bin") obj_1 <- ciAUC(rocit_bin, level = 0.9) obj_2 <- ciAUC(rocit_bin, delong = TRUE) obj_3 <- ciAUC(rocit_bin, delong = TRUE, logit = TRUE) # Print print(obj_1) print(obj_2) print(obj_3)
Function rankorderdata
rank-orders the data
with respect to some variable (diagnostic variable).
rankorderdata(score, class, dec = TRUE)
rankorderdata(score, class, dec = TRUE)
score |
A vector containing (diagnostic) scores. |
class |
A vector containing the class. |
dec |
Logical. |
A dataframe, rank-ordered with respect to the score.
rankorderdata
is used internally in other function(s) of ROCit.
Riaz Khan, [email protected]
score <- c(0.4 * runif(20) + 0.2, 0.4*runif(20)) class <- c(rep("A",20), rep("B",20)) returndata <- rankorderdata(score, class, dec = FALSE) returndata
score <- c(0.4 * runif(20) + 0.2, 0.4*runif(20)) class <- c(rep("A",20), rep("B",20)) returndata <- rankorderdata(score, class, dec = FALSE) returndata
rocit
is the main function of ROCit package.
With the diagnostic score and the class of each observation,
it calculates true positive rate (sensitivity) and
false positive rate (1-Specificity) at convenient cutoff values
to construct ROC curve. The function returns "rocit"
object,
which can be passed as arguments for other S3 methods.
rocit(score, class, negref = NULL, method = "empirical", step = FALSE)
rocit(score, class, negref = NULL, method = "empirical", step = FALSE)
score |
An numeric array of diagnostic score. |
class |
An array of equal length of score, containing the class of the observations. |
negref |
The reference value, same as the
|
method |
The method of estimating ROC curve. Currently supports
|
step |
Logical, default in |
ROC curve is defined as the set of ordered pairs,
, where,
,
where,
and
at cutoff
.
Alternately, it can be defined as:
where and
are the cumulative density functions of the
diagnostic score in negative and positive responses respectively.
rocit
evaluates TPR and FPR values at convenient cutoffs.
As the name implies, empirical TPR and FPR values are evaluated
for method = "empirical"
. For "binormal"
, the distribution
of diagnostic are assumed to be normal and maximum likelihood parameters
are estimated. If method = "nonparametric"
, then kernel density
estimates (using density
) are applied with
following bandwidth:
as described in Zou et al. From the kernel estimates of PDFs, CDFs are estimated using trapezoidal rule.
For "empirical"
ROC, the algorithm firt rank orders the
data and calculates TPR and FPR by treating all predicted
up to certain level as positive. If step
is TRUE
,
then the ROC curve is generated based on all the calculated
{FPR, TPR} pairs regardless of tie in the data. If step
is
FALSE
, then the ROC curve follows a diagonal path for the ties.
For "empirical"
ROC, trapezoidal rule is
applied to estimate area under curve (AUC). For "binormal"
, AUC is estimated by
, where
and
are functions
of mean and variance of the diagnostic in two groups.
For
"nonparametric"
, AUC is estimated as
by
A list of class "rocit"
, having following elements:
method |
The method applied to estimate ROC curve. |
pos_count |
Number of positive responses. |
neg_count |
Number of negative responses. |
pos_D |
Array of diagnostic scores in positive responses. |
neg_D |
Array of diagnostic scores in negative responses. |
AUC |
Area under curve. See "Details". |
Cutoff |
Array of cutoff values at which the
true positive rates and false positive rates
are evaluated. Applicable for |
param |
Maximum likelihood estimates of |
TPR |
Array of true positive rates (or sensitivities or recalls), evaluated at the cutoff values. |
FPR |
Array of false positive rates (or 1-specificity), evaluated at the cutoff values. |
The algorithm is designed for complete cases. If NA(s) found in
either score
or class
, then removed.
Pepe, Margaret Sullivan. The statistical evaluation of medical tests for classification and prediction. Medicine, 2003.
Zou, Kelly H., W. J. Hall, and David E. Shapiro. "Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests." Statistics in medicine 16, no. 19 (1997): 2143-2156.
ciROC
, ciAUC
, plot.rocit
,
gainstable
, ksplot
# --------------------- data("Diabetes") roc_empirical <- rocit(score = Diabetes$chol, class = Diabetes$dtest, negref = "-") # default method empirical roc_binormal <- rocit(score = Diabetes$chol, class = Diabetes$dtest, negref = "-", method = "bin") # --------------------- summary(roc_empirical) summary(roc_binormal) # --------------------- plot(roc_empirical) plot(roc_binormal, col = c("#00BA37", "#F8766D"), legend = FALSE, YIndex = FALSE)
# --------------------- data("Diabetes") roc_empirical <- rocit(score = Diabetes$chol, class = Diabetes$dtest, negref = "-") # default method empirical roc_binormal <- rocit(score = Diabetes$chol, class = Diabetes$dtest, negref = "-", method = "bin") # --------------------- summary(roc_empirical) summary(roc_binormal) # --------------------- plot(roc_empirical) plot(roc_binormal, col = c("#00BA37", "#F8766D"), legend = FALSE, YIndex = FALSE)
Prints the summary of rocit object.
## S3 method for class 'rocit' summary(object, ... = NULL)
## S3 method for class 'rocit' summary(object, ... = NULL)
object |
An object of class |
... |
|
data("Diabetes") roc_empirical <- rocit(score = Diabetes$chol, class = Diabetes$dtest, negref = "-") # --------------------- summary(roc_empirical)
data("Diabetes") roc_empirical <- rocit(score = Diabetes$chol, class = Diabetes$dtest, negref = "-") # --------------------- summary(roc_empirical)
trapezoidarea
calculates the approximated area
under curve, using trapezoidal rule.
trapezoidarea(x, y)
trapezoidarea(x, y)
x , y
|
Numeric vectors of same length, representing the |
The function approximates the area bounded by the following 4 curves:
and
are set at the min and max value of given
x
coordinates. are some points on the
curve.
Numeric value of the area under curve approximated with trapezoid rule.
trapezoidarea
is used internally in other function(s) of ROCit.
# Area under rectangle ----------------- trapezoidarea(seq(0, 10), rep(1, 11)) # Area under triangle ------------------ trapezoidarea(seq(0, 10), seq(0, 10)) # Area under normal pdf ---------------- x_vals <- seq(-3, 3, 0.01); y_vals <- dnorm(x_vals) trapezoidarea(x = x_vals, y = y_vals) # theoretically 1
# Area under rectangle ----------------- trapezoidarea(seq(0, 10), rep(1, 11)) # Area under triangle ------------------ trapezoidarea(seq(0, 10), seq(0, 10)) # Area under normal pdf ---------------- x_vals <- seq(-3, 3, 0.01); y_vals <- dnorm(x_vals) trapezoidarea(x = x_vals, y = y_vals) # theoretically 1