# ROC

# ROC curve

- Binary case:
- Y = true
**positive**rate = sensitivity, - X = false
**positive**rate = 1-specificity = 假陽性率

- Y = true
- Area under the curve AUC from the wikipedia: the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one (assuming 'positive' ranks higher than 'negative').

- [math]\displaystyle{ A = \int_{\infty}^{-\infty} \mbox{TPR}(T) \mbox{FPR}'(T) \, dT = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} I(T'\gt T)f_1(T') f_0(T) \, dT' \, dT = P(X_1 \gt X_0) }[/math]

where [math]\displaystyle{ X_1 }[/math] is the score for a positive instance and [math]\displaystyle{ X_0 }[/math] is the score for a negative instance, and [math]\displaystyle{ f_0 }[/math] and [math]\displaystyle{ f_1 }[/math] are probability densities as defined in previous section.

- Interpretation of the AUC. A small toy example (n=12=4+8) was used to calculate the exact probability [math]\displaystyle{ P(X_1 \gt X_0) }[/math] (4*8=32 all combinations).
- It is a discrimination measure which tells us how well we can classify patients in two groups: those with and those without the outcome of interest.
- Since the measure is based on ranks, it is not sensitive to systematic errors in the calibration of the quantitative tests.
- The AUC can be defined as
**The probability that a randomly selected case will have a higher test result than a randomly selected control**. - Plot of sensitivity/specificity (y-axis) vs cutoff points of the biomarker
- The Mann-Whitney U test statistic (or Wilcoxon or Kruskall-Wallis test statistic) is equivalent to the AUC (Mason, 2002)
- The p-value of the Mann-Whitney U test can thus safely be used to test whether the AUC differs significantly from 0.5 (AUC of an uninformative test).

- Calculate AUC by hand. AUC is equal to the
**probability that a true positive is scored greater than a true negative.** - How to calculate Area Under the Curve (AUC), or the c-statistic, by hand or by R
- Introduction to the ROCR package. Add threshold labels
- http://freakonometrics.hypotheses.org/9066, http://freakonometrics.hypotheses.org/20002
- Illustrated Guide to ROC and AUC
- ROC Curves in Two Lines of R Code
- Learning Data Science: Understanding ROC Curves
- Gini and AUC. Gini = 2*AUC-1.
- Generally, an AUC value over 0.7 is indicative of a model that can distinguish between the two outcomes well. An AUC of 0.5 tells us that the model is a random classifier, and it cannot distinguish between the two outcomes.
- ROC Day at BARUG
- ROC and AUC, Clearly Explained! StatQuest
- Optimal threshold
- Precision/PPV (proportion of positive results that were correctly classified) replacing the False Positive Rate. Useful for unbalanced data.

# Survival data

'Survival Model Predictive Accuracy and ROC Curves' by Heagerty & Zheng 2005

- Recall
**Sensitivity=**[math]\displaystyle{ P(\hat{p_i} \gt c | Y_i=1) }[/math],**Specificity=**[math]\displaystyle{ P(\hat{p}_i \le c | Y_i=0 }[/math]), [math]\displaystyle{ Y_i }[/math] is binary outcomes, [math]\displaystyle{ \hat{p}_i }[/math] is a prediction, [math]\displaystyle{ c }[/math] is a criterion for classifying the prediction as positive ([math]\displaystyle{ \hat{p}_i \gt c }[/math]) or negative ([math]\displaystyle{ \hat{p}_i \le c }[/math]). - For survival data, we need to use a fixed time/horizon (
*t*) to classify the data as either a case or a control. Following Heagerty and Zheng's definition (Incident/dynamic),**Sensitivity(c, t)=**[math]\displaystyle{ P(M_i \gt c | T_i = t) }[/math],**Specificity=**[math]\displaystyle{ P(M_i \le c | T_i \gt 0 }[/math]) where*M**c*among the subpopulation of individuals who die at time*t*, while specificity measures the fraction of subjects with a marker less than or equal to*c*among those who survive beyond time t. - The AUC measures the
**probability that the marker value for a randomly selected case exceeds the marker value for a randomly selected control** - ROC curves are useful for comparing the discriminatory capacity of different potential biomarkers.

# Confusion matrix, Sensitivity/Specificity/Accuracy

Predict | ||||

1 | 0 | |||

True | 1 | TP | FN | Sens=TP/(TP+FN)=Recall FNR=FN/(TP+FN) |

0 | FP | TN | Spec=TN/(FP+TN), 1-Spec=FPR | |

PPV=TP/(TP+FP) FDR=FP/(TP+FP) |
NPV=TN/(FN+TN) | N = TP + FP + FN + TN |

- Sensitivity 敏感度 = TP / (TP + FN) = Recall
- Specificity 特異度 = TN / (TN + FP)
- Accuracy = (TP + TN) / N
- False discovery rate FDR = FP / (TP + FP)
- False negative rate FNR = FN / (TP + FN)
- False positive rate FPR = FP / (FP + TN)
- True positive rate = TP / (TP + FN)
- Positive predictive value (PPV) = TP / # positive calls = TP / (TP + FP) = 1 - FDR
- Negative predictive value (NPV) = TN / # negative calls = TN / (FN + TN)
- Prevalence 盛行率 = (TP + FN) / N.
- Note that PPV & NPV can also be computed from sensitivity, specificity, and prevalence:
- PPV is directly proportional to the prevalence of the disease or condition..
- For example, in the extreme case if the prevalence =1, then PPV is always 1.

- [math]\displaystyle{ \text{PPV} = \frac{\text{sensitivity} \times \text{prevalence}}{\text{sensitivity} \times \text{prevalence}+(1-\text{specificity}) \times (1-\text{prevalence})} }[/math]
- [math]\displaystyle{ \text{NPV} = \frac{\text{specificity} \times (1-\text{prevalence})}{(1-\text{sensitivity}) \times \text{prevalence}+\text{specificity} \times (1-\text{prevalence})} }[/math]

- Prediction of heart disease and classifiers’ sensitivity analysis Almustafa, 2020
- Positive percent agreement (PPA) and negative percent agreement (NPA)
- ConfusionTableR has made it to CRAN 21/07/2021
- Precision, Recall, Specificity, Prevalence, Kappa, F1-score check with R by using the caret:: confusionMatrix() function.

# Precision recall curve

- Precision and recall from wikipedia
- Y-axis: Precision = tp/(tp + fp) = PPV, large is better
- X-axis: Recall = tp/(tp + fn) = Sensitivity, large is better

- The Relationship Between Precision-Recall and ROC Curves. Remember ROC is defined as
- Y-axis: Sensitivity = tp/(tp + fn) = Recall
- X-axis: 1-Specificity = fp/(fp + tn)

- The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets

# Incidence, Prevalence

https://www.health.ny.gov/diseases/chronic/basicstat.htm

# Calculate area under curve by hand (using trapezoid), relation to concordance measure and the Wilcoxon–Mann–Whitney test

- https://stats.stackexchange.com/a/146174
- The meaning and use of the area under a receiver operating characteristic (ROC) curve J A Hanley, B J McNeil 1982

# genefilter package and rowpAUCs function

- rowpAUCs function in genefilter package. The aim is to find potential biomarkers whose expression level is able to distinguish between two groups.

# source("http://www.bioconductor.org/biocLite.R") # biocLite("genefilter") library(Biobase) # sample.ExpressionSet data data(sample.ExpressionSet) library(genefilter) r2 = rowpAUCs(sample.ExpressionSet, "sex", p=0.1) plot(r2[1]) # first gene, asking specificity = .9 r2 = rowpAUCs(sample.ExpressionSet, "sex", p=1.0) plot(r2[1]) # it won't show pAUC r2 = rowpAUCs(sample.ExpressionSet, "sex", p=.999) plot(r2[1]) # pAUC is very close to AUC now

# Use and Misuse of the Receiver Operating Characteristic Curve in Risk Prediction

http://circ.ahajournals.org/content/115/7/928

# Performance evaluation

- Testing for improvement in prediction model performance by Pepe et al 2013.

# Some R packages

- Some R Packages for ROC Curves
- ROCR 2005
- pROC 2010. get AUC and plot multiple ROC curves together at the same time
- PRROC 2014
- plotROC 2014
- precrec 2015
- ROCit 2019
- ROC from Bioconductor

- ROC animation

# Cross-validation ROC

- ROC cross-validation caret::train(, metric = "ROC")
- Feature selection + cross-validation, but how to make ROC-curves in R. Small samples within the cross-validation may lead to
**underestimated**AUC as the ROC curve with all data will tend to be smoother and less underestimated by the trapezoidal rule. - How to easily make a ROC curve in R
- cvAUC package as linked from Some R Packages for ROC Curves

# mean ROC curve

ROC with cross-validation for linear regression in R

# Comparison of two AUCs

- Statistical Assessments of AUC. This is using the
**pROC::roc.test**function. - prioritylasso. It is using roc(), auc(), roc.test(), plot.roc() from the
**pROC**package. The calculation based on the training data is biased so we need to report the one based on test data.

# Confidence interval of AUC

How to get an AUC confidence interval. pROC package was used.

# DeLong test for comparing two ROC curves

- Comparing AUCs of Machine Learning Models with DeLong’s Test
- Misuse of DeLong test to compare AUCs for nested models
- What is the DeLong test for comparing AUCs?
- R语言，ROC曲线，deLong test. pROC::roc.test() was used.
- Daim::deLong.test()

# AUC can be a misleading measure of performance

AUC is high but precision is low (i.e. FDR is high). https://twitter.com/michaelhoffman/status/1398380674206285830?s=09.

# Caveats and pitfalls of ROC analysis in clinical microarray research

Caveats and pitfalls of ROC analysis in clinical microarray research (and how to avoid them) Berrar 2011

# Picking a threshold based on model performance/utility

Squeezing the Most Utility from Your Models

# Unbalanced classes

- ROC is especially useful for unbalanced data where the 0.5 threshold may not be appropriate.
- Use Precison/PPV to replace FDR
- Practical Guide to deal with Imbalanced Classification Problems in R
- The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets
- Roc animation
- Undersampling By Groups In R. See the ROSE package & its paper in 2014.
- imbalance package
- Chapter 11 Subsampling For Class Imbalances from the
**caret**package documentation - SMOTE