ROC: Difference between revisions
Line 23: | Line 23: | ||
* Generally, an AUC value over 0.7 is indicative of a model that can distinguish between the two outcomes well. An AUC of 0.5 tells us that the model is a random classifier, and it cannot distinguish between the two outcomes. | * Generally, an AUC value over 0.7 is indicative of a model that can distinguish between the two outcomes well. An AUC of 0.5 tells us that the model is a random classifier, and it cannot distinguish between the two outcomes. | ||
* [https://rviews.rstudio.com/2020/11/12/roc-day-at-barug/?s=09 ROC Day at BARUG] | * [https://rviews.rstudio.com/2020/11/12/roc-day-at-barug/?s=09 ROC Day at BARUG] | ||
* [https://youtu.be/4jRBRDbJemM ROC and AUC, Clearly Explained!] StatQuest | |||
= Survival data = | = Survival data = |
Revision as of 10:05, 12 September 2021
ROC curve
- Binary case:
- Y = true positive rate = sensitivity,
- X = false positive rate = 1-specificity = 假陽性率
- Area under the curve AUC from the wikipedia: the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one (assuming 'positive' ranks higher than 'negative').
- [math]\displaystyle{ A = \int_{\infty}^{-\infty} \mbox{TPR}(T) \mbox{FPR}'(T) \, dT = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} I(T'\gt T)f_1(T') f_0(T) \, dT' \, dT = P(X_1 \gt X_0) }[/math]
where [math]\displaystyle{ X_1 }[/math] is the score for a positive instance and [math]\displaystyle{ X_0 }[/math] is the score for a negative instance, and [math]\displaystyle{ f_0 }[/math] and [math]\displaystyle{ f_1 }[/math] are probability densities as defined in previous section.
- Interpretation of the AUC. A small toy example (n=12=4+8) was used to calculate the exact probability [math]\displaystyle{ P(X_1 \gt X_0) }[/math] (4*8=32 all combinations).
- It is a discrimination measure which tells us how well we can classify patients in two groups: those with and those without the outcome of interest.
- Since the measure is based on ranks, it is not sensitive to systematic errors in the calibration of the quantitative tests.
- The AUC can be defined as The probability that a randomly selected case will have a higher test result than a randomly selected control.
- Plot of sensitivity/specificity (y-axis) vs cutoff points of the biomarker
- The Mann-Whitney U test statistic (or Wilcoxon or Kruskall-Wallis test statistic) is equivalent to the AUC (Mason, 2002)
- The p-value of the Mann-Whitney U test can thus safely be used to test whether the AUC differs significantly from 0.5 (AUC of an uninformative test).
- Calculate AUC by hand. AUC is equal to the probability that a true positive is scored greater than a true negative.
- How to calculate Area Under the Curve (AUC), or the c-statistic, by hand or by R
- Introduction to the ROCR package. Add threshold labels
- http://freakonometrics.hypotheses.org/9066, http://freakonometrics.hypotheses.org/20002
- Illustrated Guide to ROC and AUC
- ROC Curves in Two Lines of R Code
- Learning Data Science: Understanding ROC Curves
- Gini and AUC. Gini = 2*AUC-1.
- Generally, an AUC value over 0.7 is indicative of a model that can distinguish between the two outcomes well. An AUC of 0.5 tells us that the model is a random classifier, and it cannot distinguish between the two outcomes.
- ROC Day at BARUG
- ROC and AUC, Clearly Explained! StatQuest
Survival data
'Survival Model Predictive Accuracy and ROC Curves' by Heagerty & Zheng 2005
- Recall Sensitivity= [math]\displaystyle{ P(\hat{p_i} \gt c | Y_i=1) }[/math], Specificity= [math]\displaystyle{ P(\hat{p}_i \le c | Y_i=0 }[/math]), [math]\displaystyle{ Y_i }[/math] is binary outcomes, [math]\displaystyle{ \hat{p}_i }[/math] is a prediction, [math]\displaystyle{ c }[/math] is a criterion for classifying the prediction as positive ([math]\displaystyle{ \hat{p}_i \gt c }[/math]) or negative ([math]\displaystyle{ \hat{p}_i \le c }[/math]).
- For survival data, we need to use a fixed time/horizon (t) to classify the data as either a case or a control. Following Heagerty and Zheng's definition (Incident/dynamic), Sensitivity(c, t)= [math]\displaystyle{ P(M_i \gt c | T_i = t) }[/math], Specificity= [math]\displaystyle{ P(M_i \le c | T_i \gt 0 }[/math]) where M is a marker value or [math]\displaystyle{ Z^T \beta }[/math]. Here sensitivity measures the expected fraction of subjects with a marker greater than c among the subpopulation of individuals who die at time t, while specificity measures the fraction of subjects with a marker less than or equal to c among those who survive beyond time t.
- The AUC measures the probability that the marker value for a randomly selected case exceeds the marker value for a randomly selected control
- ROC curves are useful for comparing the discriminatory capacity of different potential biomarkers.
Confusion matrix, Sensitivity/Specificity/Accuracy
Predict | ||||
1 | 0 | |||
True | 1 | TP | FN | Sens=TP/(TP+FN)=Recall FNR=FN/(TP+FN) |
0 | FP | TN | Spec=TN/(FP+TN), 1-Spec=FPR | |
PPV=TP/(TP+FP) FDR=FP/(TP+FP) |
NPV=TN/(FN+TN) | N = TP + FP + FN + TN |
- Sensitivity 敏感度 = TP / (TP + FN) = Recall
- Specificity 特異度 = TN / (TN + FP)
- Accuracy = (TP + TN) / N
- False discovery rate FDR = FP / (TP + FP)
- False negative rate FNR = FN / (TP + FN)
- False positive rate FPR = FP / (FP + TN)
- True positive rate = TP / (TP + FN)
- Positive predictive value (PPV) = TP / # positive calls = TP / (TP + FP) = 1 - FDR
- Negative predictive value (NPV) = TN / # negative calls = TN / (FN + TN)
- Prevalence 盛行率 = (TP + FN) / N.
- Note that PPV & NPV can also be computed from sensitivity, specificity, and prevalence:
- PPV is directly proportional to the prevalence of the disease or condition..
- For example, in the extreme case if the prevalence =1, then PPV is always 1.
- [math]\displaystyle{ \text{PPV} = \frac{\text{sensitivity} \times \text{prevalence}}{\text{sensitivity} \times \text{prevalence}+(1-\text{specificity}) \times (1-\text{prevalence})} }[/math]
- [math]\displaystyle{ \text{NPV} = \frac{\text{specificity} \times (1-\text{prevalence})}{(1-\text{sensitivity}) \times \text{prevalence}+\text{specificity} \times (1-\text{prevalence})} }[/math]
- Prediction of heart disease and classifiers’ sensitivity analysis Almustafa, 2020
- Positive percent agreement (PPA) and negative percent agreement (NPA)
- ConfusionTableR has made it to CRAN 21/07/2021
Precision recall curve
- Precision and recall
- Y-axis: Precision = tp/(tp + fp) = PPV, large is better
- X-axis: Recall = tp/(tp + fn) = Sensitivity, large is better
- The Relationship Between Precision-Recall and ROC Curves. Remember ROC is defined as
- Y-axis: Sensitivity = tp/(tp + fn) = Recall
- X-axis: 1-Specificity = fp/(fp + tn)
Incidence, Prevalence
https://www.health.ny.gov/diseases/chronic/basicstat.htm
Calculate area under curve by hand (using trapezoid), relation to concordance measure and the Wilcoxon–Mann–Whitney test
- https://stats.stackexchange.com/a/146174
- The meaning and use of the area under a receiver operating characteristic (ROC) curve J A Hanley, B J McNeil 1982
genefilter package and rowpAUCs function
- rowpAUCs function in genefilter package. The aim is to find potential biomarkers whose expression level is able to distinguish between two groups.
# source("http://www.bioconductor.org/biocLite.R") # biocLite("genefilter") library(Biobase) # sample.ExpressionSet data data(sample.ExpressionSet) library(genefilter) r2 = rowpAUCs(sample.ExpressionSet, "sex", p=0.1) plot(r2[1]) # first gene, asking specificity = .9 r2 = rowpAUCs(sample.ExpressionSet, "sex", p=1.0) plot(r2[1]) # it won't show pAUC r2 = rowpAUCs(sample.ExpressionSet, "sex", p=.999) plot(r2[1]) # pAUC is very close to AUC now
Use and Misuse of the Receiver Operating Characteristic Curve in Risk Prediction
http://circ.ahajournals.org/content/115/7/928
Performance evaluation
- Testing for improvement in prediction model performance by Pepe et al 2013.
Some R packages
- Some R Packages for ROC Curves
- ROCR 2005
- pROC 2010
- PRROC 2014
- plotROC 2014
- precrec 2015
- ROCit 2019
- ROC animation
Comparison of two AUCs
- Statistical Assessments of AUC. This is using the pROC::roc.test function.
- prioritylasso. It is using roc(), auc(), roc.test(), plot.roc() from the pROC package. The calculation based on the training data is biased so we need to report the one based on test data.
Confidence interval of AUC
How to get an AUC confidence interval. pROC package was used.
AUC can be a misleading measure of performance
AUC is high but precision is low (i.e. FDR is high). https://twitter.com/michaelhoffman/status/1398380674206285830?s=09.
Picking a threshold based on model performance/utility
Squeezing the Most Utility from Your Models
Unbalanced classes
Statistics -> Imbalanced/unbalanced Classification. ROC is especially useful for unbalanced data where the 0.5 threshold may not be appropriate.