Difference between revisions of "ROC"

From 太極
Jump to navigation Jump to search
 
(42 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
** Y = true '''positive''' rate = sensitivity,  
 
** Y = true '''positive''' rate = sensitivity,  
 
** X = false '''positive''' rate = 1-specificity = 假陽性率
 
** X = false '''positive''' rate = 1-specificity = 假陽性率
* Area under the curve AUC from the [https://en.wikipedia.org/wiki/Receiver_operating_characteristic wikipedia]: the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one (assuming 'positive' ranks higher than 'negative').
+
<ul>
 +
<li>Area under the curve AUC from the [https://en.wikipedia.org/wiki/Receiver_operating_characteristic wikipedia]: the probability that a classifier will <span style="color: red">rank</span> a randomly chosen positive instance higher than a randomly chosen negative one (assuming 'positive' ranks higher than 'negative').
 
:<math> A = \int_{\infty}^{-\infty} \mbox{TPR}(T) \mbox{FPR}'(T) \, dT = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} I(T'>T)f_1(T') f_0(T) \, dT' \, dT = P(X_1 > X_0) </math>
 
:<math> A = \int_{\infty}^{-\infty} \mbox{TPR}(T) \mbox{FPR}'(T) \, dT = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} I(T'>T)f_1(T') f_0(T) \, dT' \, dT = P(X_1 > X_0) </math>
 
where <math> X_1 </math> is the score for a positive instance and <math> X_0 </math> is the score for a negative instance, and <math>f_0</math> and <math>f_1</math> are probability densities as defined in previous section.
 
where <math> X_1 </math> is the score for a positive instance and <math> X_0 </math> is the score for a negative instance, and <math>f_0</math> and <math>f_1</math> are probability densities as defined in previous section.
 +
</li>
 +
</ul>
 
* [https://datascienceplus.com/interpretation-of-the-auc/ Interpretation of the AUC]. A small toy example (n=12=4+8) was used to calculate the exact probability <math>P(X_1 > X_0) </math> (4*8=32 all combinations).
 
* [https://datascienceplus.com/interpretation-of-the-auc/ Interpretation of the AUC]. A small toy example (n=12=4+8) was used to calculate the exact probability <math>P(X_1 > X_0) </math> (4*8=32 all combinations).
 
** It is a discrimination measure which tells us how well we can classify patients in two groups: those with and those without the outcome of interest.
 
** It is a discrimination measure which tells us how well we can classify patients in two groups: those with and those without the outcome of interest.
Line 14: Line 17:
 
** The p-value of the Mann-Whitney U test can thus safely be used to test whether the AUC differs significantly from 0.5 (AUC of an uninformative test).
 
** The p-value of the Mann-Whitney U test can thus safely be used to test whether the AUC differs significantly from 0.5 (AUC of an uninformative test).
 
* [https://stackoverflow.com/questions/4903092/calculate-auc-in-r Calculate AUC by hand]. AUC is equal to the '''probability that a true positive is scored greater than a true negative.'''  
 
* [https://stackoverflow.com/questions/4903092/calculate-auc-in-r Calculate AUC by hand]. AUC is equal to the '''probability that a true positive is scored greater than a true negative.'''  
 +
* See the uROC() function in <functions.R> from the supplementary of the paper (need access right) [https://onlinelibrary.wiley.com/doi/10.1111/j.1541-0420.2012.01783.x Bivariate Marker Measurements and ROC Analysis] Wang 2012. Let <math>n_1</math> be the number of obs from X1 and <math>n_0</math> be the number of obs from X0. X1 and X0 are the predict values for data from group 1 and 0. <math> TP_i=Prob(X_1>X_{0i})=\sum_j (X_{1j} > X_{0i})/n_1, ~ FP_i=Prob(X_0>X_{0i}) = \sum_j (X_{0j} > X_{0i}) / n_0 </math>. We can draw a scatter plot or smooth.spline() of TP(y-axis) vs FP(x-axis) for the ROC curve.
 +
<syntaxhighlight lang='splus'>
 +
uROC <- function(marker, status)  ### ROC function for univariate marker ###
 +
{
 +
    x <- marker
 +
    bad <-  is.na(status) | is.na(x)
 +
    status <- status[!bad]
 +
    x <- x[!bad]
 +
    if (sum(bad) > 0)
 +
        cat(paste("\n", sum(bad), "records with missing values dropped. \n"))
 +
no_case <- sum(status==1)
 +
no_control <- sum(status==0)
 +
TP <- rep(0, no_control)
 +
FP <- rep(0, no_control)
 +
for (i in 1: no_control){
 +
  TP[i] <- sum(x[status==1]>x[status==0][i])/no_case
 +
  FP[i] <- sum(x[status==0]>x[status==0][i])/no_control
 +
    }
 +
    list(TP = TP, FP = FP)
 +
}
 +
</syntaxhighlight>
 
* [https://stats.stackexchange.com/questions/145566/how-to-calculate-area-under-the-curve-auc-or-the-c-statistic-by-hand How to calculate Area Under the Curve (AUC), or the c-statistic, by hand or by R]
 
* [https://stats.stackexchange.com/questions/145566/how-to-calculate-area-under-the-curve-auc-or-the-c-statistic-by-hand How to calculate Area Under the Curve (AUC), or the c-statistic, by hand or by R]
 
* Introduction to the [https://hopstat.wordpress.com/2014/12/19/a-small-introduction-to-the-rocr-package/ ROCR] package. [https://datascienceplus.com/machine-learning-logistic-regression-for-credit-modelling-in-r/ Add threshold labels]  
 
* Introduction to the [https://hopstat.wordpress.com/2014/12/19/a-small-introduction-to-the-rocr-package/ ROCR] package. [https://datascienceplus.com/machine-learning-logistic-regression-for-credit-modelling-in-r/ Add threshold labels]  
Line 26: Line 50:
 
** [https://youtu.be/4jRBRDbJemM?t=801 Optimal threshold]
 
** [https://youtu.be/4jRBRDbJemM?t=801 Optimal threshold]
 
** [https://youtu.be/4jRBRDbJemM?t=879 Precision/PPV] (proportion of positive results that were correctly classified) replacing the False Positive Rate. Useful for unbalanced data.
 
** [https://youtu.be/4jRBRDbJemM?t=879 Precision/PPV] (proportion of positive results that were correctly classified) replacing the False Positive Rate. Useful for unbalanced data.
 +
 +
== partial AUC ==
 +
* https://onlinelibrary.wiley.com/doi/10.1111/j.1541-0420.2012.01783.x
 +
* [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3068975/ pROC: an open-source package for R and S+ to analyze and compare ROC curves]
 +
** [https://www.rdocumentation.org/packages/pROC/versions/1.18.0/topics/auc pROC::auc()]
 +
* [https://onlinelibrary.wiley.com/doi/pdf/10.1111/1541-0420.00071 Partial AUC Estimation and Regression] Dodd 2003. <math>AUC(t_0,t_1) = \int_{t_0}^{t_1} ROC(t) dt  </math> where the interval <math>(t_0, t_1)</math> denotes the false-positive rates of interest.
 +
 +
== summary ROC ==
 +
* [https://onlinelibrary.wiley.com/doi/10.1002/sim.2103 The partial area under the summary ROC curve] Walter 2005
 +
* [https://www.tandfonline.com/doi/abs/10.1080/02664763.2022.2041565 On summary ROC curve for dichotomous diagnostic studies: an application to meta-analysis of COVID-19] 2022
 +
 +
== Weighted ROC ==
 +
* [https://stats.stackexchange.com/a/158927 What is the difference between area under roc and weighted area under roc?] ''Weighted ROC curves are used when you're interested in performance in a certain region of ROC space (e.g. high recall) and was proposed as an improvement over partial AUC (which does exactly this but has some issues)''
 +
 +
== Adjusted AUC ==
 +
* [http://cainarchaeology.weebly.com/r-function-for-optimism-adjusted-auc.html 'auc.adjust': R function for optimism-adjusted AUC (internal validation)]
 +
* [https://rdrr.io/cran/GmAMisc/man/aucadj.html GmAMisc::aucadj(data, fit, B = 200)]
 +
 +
== Difficult to compute for some models ==
 +
* [https://stackoverflow.com/a/47572573 Plot ROC curve for Nearest Centroid]. For NearestCentroid it is not possible to compute a score. This is simply a limitation of the model.
 +
* [https://stackoverflow.com/a/11777036 k-NN model]. [https://www.rdocumentation.org/packages/class/versions/7.3-19/topics/knn class::knn()] can output prediction probability.
 +
* [https://www.rdocumentation.org/packages/randomForest/versions/4.6-14/topics/predict.randomForest predict.randomForest()] can output class probabilities. See [https://stackoverflow.com/a/12476854 ROC curve for classification from randomForest]
 +
** [https://juliasilge.com/blog/sf-trees-random-tuning/ Tuning random forest hyperparameters with #TidyTuesday trees data]
 +
 +
== Optimal threshold ==
 +
* [https://homepage.stat.uiowa.edu/~rdecook/stat6220/Class_notes/ROC_introduction.pdf#page=25 Max of “sensitivity + specificity”]. See [https://cran.r-project.org/web/packages/Epi/index.html Epi::ROC()] function.
 +
* [https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.9432 On optimal biomarker cutoffs accounting for misclassification costs in diagnostic trilemmas with applications to pancreatic cancer] Bantis, 2022
 +
 +
== ROC Curve AUC for Hypothesis Testing ==
 +
* [https://towardsdatascience.com/interpreting-auroc-in-hypothesis-testing-a45f6f757a62 Interpreting AUROC in Hypothesis Testing]
 +
* [https://www.sciencedirect.com/science/article/pii/S1556086415306043 Receiver Operating Characteristic Curve in Diagnostic Test Assessment] 2010
 +
 +
== Challenges, issues ==
 +
[https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-022-01577-x Methodological conduct of prognostic prediction models developed using machine learning in oncology: a systematic review] 2022. '''class imbalance, data pre-processing, and hyperparameter tuning'''. [https://twitter.com/pauladhiman/status/1512429344316182529?s=20&t=D8wSCw3ET5Sx2C1ZKJxq-g twitter].
  
 
= Survival data =
 
= Survival data =
 
'Survival Model Predictive Accuracy and ROC Curves' by Heagerty & Zheng 2005
 
'Survival Model Predictive Accuracy and ROC Curves' by Heagerty & Zheng 2005
 
* Recall '''Sensitivity=''' <math>P(\hat{p_i} > c | Y_i=1)</math>, '''Specificity=''' <math>P(\hat{p}_i \le c | Y_i=0</math>), <math>Y_i</math> is binary outcomes, <math>\hat{p}_i</math> is a prediction, <math>c</math> is a criterion for classifying the prediction as positive (<math>\hat{p}_i > c</math>) or negative (<math>\hat{p}_i \le c </math>).
 
* Recall '''Sensitivity=''' <math>P(\hat{p_i} > c | Y_i=1)</math>, '''Specificity=''' <math>P(\hat{p}_i \le c | Y_i=0</math>), <math>Y_i</math> is binary outcomes, <math>\hat{p}_i</math> is a prediction, <math>c</math> is a criterion for classifying the prediction as positive (<math>\hat{p}_i > c</math>) or negative (<math>\hat{p}_i \le c </math>).
* For survival data, we need to use a fixed time/horizon (''t'') to classify the data as either a case or a control. Following Heagerty and Zheng's definition (Incident/dynamic), '''Sensitivity(c, t)=''' <math>P(M_i > c | T_i = t)</math>, '''Specificity=''' <math>P(M_i \le c | T_i > 0</math>) where ''' ''M'' ''' is a marker value or <math>Z^T \beta</math>. Here sensitivity measures the expected fraction of subjects with a marker greater than ''c'' among the subpopulation of individuals who die at time ''t'', while specificity measures the fraction of subjects with a marker less than or equal to ''c'' among those who survive beyond time t.
+
* For survival data, we need to use a fixed time/horizon (''t'') to classify the data as either a case or a control. Following Heagerty and Zheng's definition in ''Survival Model Predictive Accuracy and ROC Curves'' (Incident/dynamic) 2005, '''Sensitivity(c, t)=''' <math>P(M_i > c | T_i = t)</math>, '''Specificity=''' <math>P(M_i \le c | T_i > t</math>) where ''' ''M'' ''' is a marker value or <math>Z^T \beta</math>. Here sensitivity measures the expected fraction of subjects with a marker greater than ''c'' among the subpopulation of individuals who die at time ''t'', while specificity measures the fraction of subjects with a marker less than or equal to ''c'' among those who survive beyond time t.
 
* The AUC measures the '''probability that the marker value for a randomly selected case exceeds the marker value for a randomly selected control'''
 
* The AUC measures the '''probability that the marker value for a randomly selected case exceeds the marker value for a randomly selected control'''
 
* ROC curves are useful for comparing the discriminatory capacity of different potential biomarkers.
 
* ROC curves are useful for comparing the discriminatory capacity of different potential biomarkers.
  
 
= Confusion matrix, Sensitivity/Specificity/Accuracy =
 
= Confusion matrix, Sensitivity/Specificity/Accuracy =
 +
[https://en.wikipedia.org/wiki/Confusion_matrix Wikipedia]
  
 
{| border="1" style="border-collapse:collapse; text-align:center;"
 
{| border="1" style="border-collapse:collapse; text-align:center;"
Line 42: Line 101:
 
|                    ||  ||  1      ||    0      ||
 
|                    ||  ||  1      ||    0      ||
 
|-
 
|-
| rowspan="2" | True || 1 ||  TP    ||    FN    || style="color: white;background: red;"|Sens=TP/(TP+FN)=Recall <br/> FNR=FN/(TP+FN)
+
| rowspan="2" | True || 1 ||  TP    ||    FN    || style="color: white;background: red;"|Sens=TP/(TP+FN)=Recall=TPR <br/> FNR=FN/(TP+FN)
 
|-  
 
|-  
 
|    0              ||  FP    ||    TN    || style="color: white;background: blue;"|Spec=TN/(FP+TN), 1-Spec=FPR
 
|    0              ||  FP    ||    TN    || style="color: white;background: blue;"|Spec=TN/(FP+TN), 1-Spec=FPR
 
|-
 
|-
|                    ||    ||  style="color: white;background: green;"|PPV=TP/(TP+FP) <br/> FDR=FP/(TP+FP)||  NPV=TN/(FN+TN) ||  N = TP + FP + FN + TN
+
|                    ||    ||  style="color: white;background: green;"|PPV=TP/(TP+FP) <br/> FDR=FP/(TP+FP) <br/>=1-PPV||  NPV=TN/(FN+TN) ||  N = TP + FP + FN + TN
 
|}
 
|}
  
* Sensitivity 敏感度 = TP / (TP + FN) = Recall
+
* <span style="color: red">Sensitivity 敏感度</span> = TP / (TP + FN) = Recall
* Specificity 特異度 = TN / (TN + FP)
+
* <span style="color: blue">Specificity 特異度</span> = TN / (TN + FP)
 
* Accuracy = (TP + TN) / N
 
* Accuracy = (TP + TN) / N
* False discovery rate FDR = FP / (TP + FP)
+
* <span style="color: green">False discovery rate FDR = FP / (TP + FP)</span>
 
* False negative rate FNR = FN / (TP + FN)
 
* False negative rate FNR = FN / (TP + FN)
* False positive rate FPR = FP / (FP + TN)
+
* <span style="color: blue">False positive rate FPR = FP / (FP + TN) = 1 - Spec</span>
* True positive rate = TP / (TP + FN)
+
* <span style="color: red">True positive rate = TP / (TP + FN) = Sensitivity</span>
* [https://en.wikipedia.org/wiki/Positive_and_negative_predictive_values Positive predictive value (PPV)] = TP / # positive calls = TP / (TP + FP) = 1 - FDR
+
* [https://en.wikipedia.org/wiki/Positive_and_negative_predictive_values <span style="color: green">Positive predictive value (PPV)</span>] = TP / # positive calls = TP / (TP + FP) = 1 - FDR = <span style="color: green">Precision</span>
 
* Negative predictive value (NPV) = TN /  # negative calls = TN / (FN + TN)
 
* Negative predictive value (NPV) = TN /  # negative calls = TN / (FN + TN)
 
* Prevalence 盛行率 = (TP + FN) / N.
 
* Prevalence 盛行率 = (TP + FN) / N.
Line 66: Line 125:
 
* [https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03626-y Prediction of heart disease and classifiers’ sensitivity analysis] Almustafa, 2020
 
* [https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03626-y Prediction of heart disease and classifiers’ sensitivity analysis] Almustafa, 2020
 
* [https://www.fda.gov/regulatory-information/search-fda-guidance-documents/statistical-guidance-reporting-results-studies-evaluating-diagnostic-tests-guidance-industry-and-fda Positive percent agreement (PPA) and negative percent agreement (NPA)]
 
* [https://www.fda.gov/regulatory-information/search-fda-guidance-documents/statistical-guidance-reporting-results-studies-evaluating-diagnostic-tests-guidance-industry-and-fda Positive percent agreement (PPA) and negative percent agreement (NPA)]
* [https://hutsons-hacks.info/confusiontabler-has-made-it-to-cran ConfusionTableR has made it to CRAN] 21/07/2021
+
* [https://hutsons-hacks.info/confusiontabler-has-made-it-to-cran ConfusionTableR has made it to CRAN] 21/07/2021. [https://mran.microsoft.com/web/packages/ConfusionTableR/index.html ConfusionTableR]
* [https://www.datatechnotes.com/2019/02/accuracy-metrics-in-classification.html Precision, Recall, Specificity, Prevalence, Kappa, F1-score check with R] by using the [https://www.rdocumentation.org/packages/caret/versions/6.0-86/topics/confusionMatrix caret:: confusionMatrix()] function.
+
* [https://www.datatechnotes.com/2019/02/accuracy-metrics-in-classification.html Precision, Recall, Specificity, Prevalence, Kappa, F1-score check with R] by using the [https://www.rdocumentation.org/packages/caret/versions/6.0-86/topics/confusionMatrix caret:: confusionMatrix()] function. ''If there are only two factor levels, the first level will be used as the "positive" result.''
 +
 
 +
== False positive rates vs false positive rates ==
 +
[https://stats.stackexchange.com/a/340079 FPR (false positive rate) vs FDR (false discovery rate)]
  
 
= Precision recall curve =
 
= Precision recall curve =
 
* [https://en.wikipedia.org/wiki/Precision_and_recall Precision and recall] from wikipedia
 
* [https://en.wikipedia.org/wiki/Precision_and_recall Precision and recall] from wikipedia
** Y-axis: Precision = tp/(tp + fp) = PPV, large is better
+
** Y-axis: Precision = tp/(tp + fp) = PPV. How accurately the model predicted the positive classes. large is better
 
** X-axis: Recall = tp/(tp + fn) = Sensitivity, large is better
 
** X-axis: Recall = tp/(tp + fn) = Sensitivity, large is better
 
* [http://pages.cs.wisc.edu/~jdavis/davisgoadrichcamera2.pdf The Relationship Between Precision-Recall and ROC Curves]. Remember ROC is defined as
 
* [http://pages.cs.wisc.edu/~jdavis/davisgoadrichcamera2.pdf The Relationship Between Precision-Recall and ROC Curves]. Remember ROC is defined as
Line 105: Line 167:
  
 
= Use and Misuse of the Receiver Operating Characteristic Curve in Risk Prediction =
 
= Use and Misuse of the Receiver Operating Characteristic Curve in Risk Prediction =
http://circ.ahajournals.org/content/115/7/928
+
* http://circ.ahajournals.org/content/115/7/928
 +
* [https://bmcmedicine.biomedcentral.com/articles/10.1186/s12916-019-1466-7 Calibration: the Achilles heel of predictive analytics] 2019
  
 
= Performance evaluation =
 
= Performance evaluation =
 
* [https://onlinelibrary.wiley.com/doi/epdf/10.1002/sim.5727 Testing for improvement in prediction model performance] by Pepe et al 2013.
 
* [https://onlinelibrary.wiley.com/doi/epdf/10.1002/sim.5727 Testing for improvement in prediction model performance] by Pepe et al 2013.
 +
 +
== Youden's index/Youden's J statistic ==
 +
* [https://en.wikipedia.org/wiki/Youden%27s_J_statistic Youden's index]
  
 
= Some R packages =
 
= Some R packages =
 
* [https://rviews.rstudio.com/2019/03/01/some-r-packages-for-roc-curves/ Some R Packages for ROC Curves]
 
* [https://rviews.rstudio.com/2019/03/01/some-r-packages-for-roc-curves/ Some R Packages for ROC Curves]
 
** ROCR 2005
 
** ROCR 2005
** pROC 2010. [https://stackoverflow.com/a/37248211 get AUC and plot multiple ROC curves together at the same time]
+
** [https://cran.r-project.org/web/packages/pROC/ pROC] 2010. [https://stackoverflow.com/a/37248211 get AUC and plot multiple ROC curves together at the same time]
 
** PRROC 2014
 
** PRROC 2014
 
** plotROC 2014
 
** plotROC 2014
Line 119: Line 185:
 
** ROCit 2019
 
** ROCit 2019
 
** [http://bioconductor.org/packages/release/bioc/html/ROC.html ROC] from Bioconductor
 
** [http://bioconductor.org/packages/release/bioc/html/ROC.html ROC] from Bioconductor
 +
** [https://cran.r-project.org/web/packages/caret/ caret]
 
* [https://github.com/dariyasydykova/open_projects/tree/master/ROC_animation ROC animation]
 
* [https://github.com/dariyasydykova/open_projects/tree/master/ROC_animation ROC animation]
 +
 +
== pROC ==
 +
<ul>
 +
<li>https://cran.r-project.org/web/packages/pROC/index.html
 +
<li>[https://web.expasy.org/pROC/ pROC: display and analyze ROC curves in R and S+]
 +
<li>[https://medium.com/swlh/roc-curve-and-auc-detailed-understanding-and-r-proc-package-86d1430a3191 ROC Curve and AUC in Machine learning and R pROC Package]
 +
<pre>
 +
library(pROC)
 +
data(aSAH)
 +
roc.s100b <- roc(aSAH$outcome, aSAH$s100b)
 +
roc(aSAH$outcome, aSAH$s100b,
 +
  plot=TRUE,print.
 +
  auc=TRUE,
 +
  col="green",
 +
  lwd =4,
 +
  legacy.axes=TRUE,
 +
  main="ROC Curves")
 +
</pre>
 +
</li>
 +
</ul>
  
 
= Cross-validation ROC =
 
= Cross-validation ROC =
Line 134: Line 221:
 
* [https://statcompute.wordpress.com/2018/12/25/statistical-assessments-of-auc/ Statistical Assessments of AUC]. This is using the '''pROC::roc.test''' function.
 
* [https://statcompute.wordpress.com/2018/12/25/statistical-assessments-of-auc/ Statistical Assessments of AUC]. This is using the '''pROC::roc.test''' function.
 
* [https://cran.r-project.org/web/packages/prioritylasso/vignettes/prioritylasso_vignette.html prioritylasso]. It is using roc(), auc(), roc.test(), plot.roc() from the '''pROC''' package. The calculation based on the training data is biased so we need to report the one based on test data.
 
* [https://cran.r-project.org/web/packages/prioritylasso/vignettes/prioritylasso_vignette.html prioritylasso]. It is using roc(), auc(), roc.test(), plot.roc() from the '''pROC''' package. The calculation based on the training data is biased so we need to report the one based on test data.
 +
 +
= Assess risk of bias =
 +
[https://www.acpjournals.org/doi/10.7326/M18-1377 PROBAST: A Tool to Assess Risk of Bias and Applicability of Prediction Model Studies: Explanation and Elaboration] 2019. http://www.probast.org/
  
 
= Confidence interval of AUC =
 
= Confidence interval of AUC =
Line 150: Line 240:
 
= Caveats and pitfalls of ROC analysis in clinical microarray research =
 
= Caveats and pitfalls of ROC analysis in clinical microarray research =
 
[https://academic.oup.com/bib/article/13/1/83/218392 Caveats and pitfalls of ROC analysis in clinical microarray research (and how to avoid them)] Berrar 2011
 
[https://academic.oup.com/bib/article/13/1/83/218392 Caveats and pitfalls of ROC analysis in clinical microarray research (and how to avoid them)] Berrar 2011
 +
 +
= Limitation in clinical data =
 +
* [https://www.sciencedirect.com/science/article/abs/pii/S0895435618310047 ROC curves for clinical prediction models part 1. ROC plots showed no added value above the AUC when evaluating the performance of clinical prediction models] Verbakel 2020
  
 
= Picking a threshold based on model performance/utility =
 
= Picking a threshold based on model performance/utility =
Line 155: Line 248:
  
 
= Unbalanced classes =
 
= Unbalanced classes =
 +
* [https://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/ 8 Tactics to Combat Imbalanced Classes in Your Machine Learning Dataset]
 +
** [https://machinelearningmastery.com/cross-validation-for-imbalanced-classification/ How to Fix k-Fold Cross-Validation for Imbalanced Classification]. It teaches you how to split samples in CV by using '''stratified k-fold cross-validation'''.
 
* ROC is especially useful for unbalanced data where the 0.5 threshold may not be appropriate.
 
* ROC is especially useful for unbalanced data where the 0.5 threshold may not be appropriate.
 
* [[ROC#Confusion_matrix.2C_Sensitivity.2FSpecificity.2FAccuracy|Use Precison/PPV to replace FDR]]
 
* [[ROC#Confusion_matrix.2C_Sensitivity.2FSpecificity.2FAccuracy|Use Precison/PPV to replace FDR]]
Line 166: Line 261:
 
* [https://topepo.github.io/caret/subsampling-for-class-imbalances.html Chapter 11 Subsampling For Class Imbalances] from the '''caret''' package documentation
 
* [https://topepo.github.io/caret/subsampling-for-class-imbalances.html Chapter 11 Subsampling For Class Imbalances] from the '''caret''' package documentation
 
* [https://twitter.com/joshuastarmer/status/1432753482948300801 SMOTE]
 
* [https://twitter.com/joshuastarmer/status/1432753482948300801 SMOTE]
 +
* [https://www.tandfonline.com/doi/abs/10.1080/01621459.2021.2005609?journalCode=uasa20 Classification Trees for Imbalanced Data: Surface-to-Volume Regularization] Zhu, JASA 2021
 +
* [https://arxiv.org/abs/2202.09101 The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression] 2022
 +
 +
== Metric ==
 +
* [https://machinelearningmastery.com/tour-of-evaluation-metrics-for-imbalanced-classification/ Tour of Evaluation Metrics for Imbalanced Classification]. More strategies are available.
 +
* [https://en.wikipedia.org/wiki/F-score F-score]
 +
** [https://yardstick.tidymodels.org/reference/f_meas.html tidymodels::f_meas()], [https://towardsdatascience.com/modelling-with-tidymodels-and-parsnip-bae2c01c131c Modelling with Tidymodels and Parsnip], [https://medium.com/the-researchers-guide/modelling-binary-logistic-regression-using-tidymodels-library-in-r-part-1-c1bdce0ac055 Modelling Binary Logistic Regression using Tidymodels Library in R]
 +
** [https://towardsdatascience.com/caret-vs-tidymodels-create-complete-reusable-machine-learning-workflows-5c50a7befd2d caret::train(,metric)] from ''Caret vs. tidymodels — create reusable machine learning workflows''
 +
* [https://cran.r-project.org/web/packages/MLmetrics/ MLmetrics]: Machine Learning Evaluation Metrics
 +
* [https://stats.stackexchange.com/a/367911 Classification/evaluation metrics for highly imbalanced data]
 +
* [https://towardsdatascience.com/what-metrics-should-we-use-on-imbalanced-data-set-precision-recall-roc-e2e79252aeba What metrics should be used for evaluating a model on an imbalanced data set? (precision + recall or ROC=TPR+FPR)]
  
 
= Class comparison problem =
 
= Class comparison problem =
 
* [https://bioconductor.org/packages/release/bioc/html/compcodeR.html compcodeR]: RNAseq data simulation, differential expression analysis and performance comparison of differential expression methods
 
* [https://bioconductor.org/packages/release/bioc/html/compcodeR.html compcodeR]: RNAseq data simulation, differential expression analysis and performance comparison of differential expression methods
 
* [https://academic.oup.com/bioinformatics/article/31/17/2778/183245 Polyester]: simulating RNA-seq datasets with differential transcript expression, [https://github.com/leekgroup/polyester_code github], [https://htmlpreview.github.io/?https://github.com/leekgroup/polyester_code/blob/master/polyester_manuscript.html HTML]
 
* [https://academic.oup.com/bioinformatics/article/31/17/2778/183245 Polyester]: simulating RNA-seq datasets with differential transcript expression, [https://github.com/leekgroup/polyester_code github], [https://htmlpreview.github.io/?https://github.com/leekgroup/polyester_code/blob/master/polyester_manuscript.html HTML]
 +
 +
= Reporting =
 +
* [https://www.acpjournals.org/doi/10.7326/M14-0697 Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): The TRIPOD Statement] 2015
 +
* [https://academic.oup.com/jamia/advance-article/doi/10.1093/jamia/ocac002/6511611 Trends in the conduct and reporting of clinical prediction model development and validation: a systematic review] Yang 2022
 +
 +
= Applications =
 +
* [https://www.nature.com/articles/s41598-022-12199-0 Development and validation of an RNA-seq-based transcriptomic risk score for asthma] 2022
 +
 +
= Lessons =
 +
* Unbalanced data: '''kNN''' or '''nearest centroid''' is better than the traditional methods
 +
* Small sample size and large number of predictors: '''t-test''' can select predictors while '''lasso''' cannot

Latest revision as of 17:03, 26 June 2022

ROC curve

  • Binary case:
    • Y = true positive rate = sensitivity,
    • X = false positive rate = 1-specificity = 假陽性率
  • Area under the curve AUC from the wikipedia: the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one (assuming 'positive' ranks higher than 'negative').
    [math]\displaystyle{ A = \int_{\infty}^{-\infty} \mbox{TPR}(T) \mbox{FPR}'(T) \, dT = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} I(T'\gt T)f_1(T') f_0(T) \, dT' \, dT = P(X_1 \gt X_0) }[/math]
    where [math]\displaystyle{ X_1 }[/math] is the score for a positive instance and [math]\displaystyle{ X_0 }[/math] is the score for a negative instance, and [math]\displaystyle{ f_0 }[/math] and [math]\displaystyle{ f_1 }[/math] are probability densities as defined in previous section.
  • Interpretation of the AUC. A small toy example (n=12=4+8) was used to calculate the exact probability [math]\displaystyle{ P(X_1 \gt X_0) }[/math] (4*8=32 all combinations).
    • It is a discrimination measure which tells us how well we can classify patients in two groups: those with and those without the outcome of interest.
    • Since the measure is based on ranks, it is not sensitive to systematic errors in the calibration of the quantitative tests.
    • The AUC can be defined as The probability that a randomly selected case will have a higher test result than a randomly selected control.
    • Plot of sensitivity/specificity (y-axis) vs cutoff points of the biomarker
    • The Mann-Whitney U test statistic (or Wilcoxon or Kruskall-Wallis test statistic) is equivalent to the AUC (Mason, 2002)
    • The p-value of the Mann-Whitney U test can thus safely be used to test whether the AUC differs significantly from 0.5 (AUC of an uninformative test).
  • Calculate AUC by hand. AUC is equal to the probability that a true positive is scored greater than a true negative.
  • See the uROC() function in <functions.R> from the supplementary of the paper (need access right) Bivariate Marker Measurements and ROC Analysis Wang 2012. Let [math]\displaystyle{ n_1 }[/math] be the number of obs from X1 and [math]\displaystyle{ n_0 }[/math] be the number of obs from X0. X1 and X0 are the predict values for data from group 1 and 0. [math]\displaystyle{ TP_i=Prob(X_1\gt X_{0i})=\sum_j (X_{1j} \gt X_{0i})/n_1, ~ FP_i=Prob(X_0\gt X_{0i}) = \sum_j (X_{0j} \gt X_{0i}) / n_0 }[/math]. We can draw a scatter plot or smooth.spline() of TP(y-axis) vs FP(x-axis) for the ROC curve.
uROC <- function(marker, status)   ### ROC function for univariate marker ###
{
    x <- marker
    bad <-  is.na(status) | is.na(x) 
    status <- status[!bad]
    x <- x[!bad]
    if (sum(bad) > 0) 
        cat(paste("\n", sum(bad), "records with missing values dropped. \n"))
	no_case <- sum(status==1)
	no_control <- sum(status==0)
	TP <- rep(0, no_control)
	FP <- rep(0, no_control)
	for (i in 1: no_control){	
	  TP[i] <- sum(x[status==1]>x[status==0][i])/no_case	
	  FP[i] <- sum(x[status==0]>x[status==0][i])/no_control	
    }
    list(TP = TP, FP = FP)
}

partial AUC

summary ROC

Weighted ROC

Adjusted AUC

Difficult to compute for some models

Optimal threshold

ROC Curve AUC for Hypothesis Testing

Challenges, issues

Methodological conduct of prognostic prediction models developed using machine learning in oncology: a systematic review 2022. class imbalance, data pre-processing, and hyperparameter tuning. twitter.

Survival data

'Survival Model Predictive Accuracy and ROC Curves' by Heagerty & Zheng 2005

  • Recall Sensitivity= [math]\displaystyle{ P(\hat{p_i} \gt c | Y_i=1) }[/math], Specificity= [math]\displaystyle{ P(\hat{p}_i \le c | Y_i=0 }[/math]), [math]\displaystyle{ Y_i }[/math] is binary outcomes, [math]\displaystyle{ \hat{p}_i }[/math] is a prediction, [math]\displaystyle{ c }[/math] is a criterion for classifying the prediction as positive ([math]\displaystyle{ \hat{p}_i \gt c }[/math]) or negative ([math]\displaystyle{ \hat{p}_i \le c }[/math]).
  • For survival data, we need to use a fixed time/horizon (t) to classify the data as either a case or a control. Following Heagerty and Zheng's definition in Survival Model Predictive Accuracy and ROC Curves (Incident/dynamic) 2005, Sensitivity(c, t)= [math]\displaystyle{ P(M_i \gt c | T_i = t) }[/math], Specificity= [math]\displaystyle{ P(M_i \le c | T_i \gt t }[/math]) where M is a marker value or [math]\displaystyle{ Z^T \beta }[/math]. Here sensitivity measures the expected fraction of subjects with a marker greater than c among the subpopulation of individuals who die at time t, while specificity measures the fraction of subjects with a marker less than or equal to c among those who survive beyond time t.
  • The AUC measures the probability that the marker value for a randomly selected case exceeds the marker value for a randomly selected control
  • ROC curves are useful for comparing the discriminatory capacity of different potential biomarkers.

Confusion matrix, Sensitivity/Specificity/Accuracy

Wikipedia

Predict
1 0
True 1 TP FN Sens=TP/(TP+FN)=Recall=TPR
FNR=FN/(TP+FN)
0 FP TN Spec=TN/(FP+TN), 1-Spec=FPR
PPV=TP/(TP+FP)
FDR=FP/(TP+FP)
=1-PPV
NPV=TN/(FN+TN) N = TP + FP + FN + TN
  • Sensitivity 敏感度 = TP / (TP + FN) = Recall
  • Specificity 特異度 = TN / (TN + FP)
  • Accuracy = (TP + TN) / N
  • False discovery rate FDR = FP / (TP + FP)
  • False negative rate FNR = FN / (TP + FN)
  • False positive rate FPR = FP / (FP + TN) = 1 - Spec
  • True positive rate = TP / (TP + FN) = Sensitivity
  • Positive predictive value (PPV) = TP / # positive calls = TP / (TP + FP) = 1 - FDR = Precision
  • Negative predictive value (NPV) = TN / # negative calls = TN / (FN + TN)
  • Prevalence 盛行率 = (TP + FN) / N.
  • Note that PPV & NPV can also be computed from sensitivity, specificity, and prevalence:
[math]\displaystyle{ \text{PPV} = \frac{\text{sensitivity} \times \text{prevalence}}{\text{sensitivity} \times \text{prevalence}+(1-\text{specificity}) \times (1-\text{prevalence})} }[/math]
[math]\displaystyle{ \text{NPV} = \frac{\text{specificity} \times (1-\text{prevalence})}{(1-\text{sensitivity}) \times \text{prevalence}+\text{specificity} \times (1-\text{prevalence})} }[/math]

False positive rates vs false positive rates

FPR (false positive rate) vs FDR (false discovery rate)

Precision recall curve

Incidence, Prevalence

https://www.health.ny.gov/diseases/chronic/basicstat.htm

Calculate area under curve by hand (using trapezoid), relation to concordance measure and the Wilcoxon–Mann–Whitney test

genefilter package and rowpAUCs function

  • rowpAUCs function in genefilter package. The aim is to find potential biomarkers whose expression level is able to distinguish between two groups.
# source("http://www.bioconductor.org/biocLite.R")
# biocLite("genefilter")
library(Biobase) # sample.ExpressionSet data
data(sample.ExpressionSet)

library(genefilter)
r2 = rowpAUCs(sample.ExpressionSet, "sex", p=0.1)
plot(r2[1]) # first gene, asking specificity = .9

r2 = rowpAUCs(sample.ExpressionSet, "sex", p=1.0)
plot(r2[1]) # it won't show pAUC

r2 = rowpAUCs(sample.ExpressionSet, "sex", p=.999)
plot(r2[1]) # pAUC is very close to AUC now

Use and Misuse of the Receiver Operating Characteristic Curve in Risk Prediction

Performance evaluation

Youden's index/Youden's J statistic

Some R packages

pROC

Cross-validation ROC

mean ROC curve

ROC with cross-validation for linear regression in R

Comparison of two AUCs

  • Statistical Assessments of AUC. This is using the pROC::roc.test function.
  • prioritylasso. It is using roc(), auc(), roc.test(), plot.roc() from the pROC package. The calculation based on the training data is biased so we need to report the one based on test data.

Assess risk of bias

PROBAST: A Tool to Assess Risk of Bias and Applicability of Prediction Model Studies: Explanation and Elaboration 2019. http://www.probast.org/

Confidence interval of AUC

How to get an AUC confidence interval. pROC package was used.

DeLong test for comparing two ROC curves

AUC can be a misleading measure of performance

AUC is high but precision is low (i.e. FDR is high). https://twitter.com/michaelhoffman/status/1398380674206285830?s=09.

Caveats and pitfalls of ROC analysis in clinical microarray research

Caveats and pitfalls of ROC analysis in clinical microarray research (and how to avoid them) Berrar 2011

Limitation in clinical data

Picking a threshold based on model performance/utility

Squeezing the Most Utility from Your Models

Unbalanced classes

Metric

Class comparison problem

  • compcodeR: RNAseq data simulation, differential expression analysis and performance comparison of differential expression methods
  • Polyester: simulating RNA-seq datasets with differential transcript expression, github, HTML

Reporting

Applications

Lessons

  • Unbalanced data: kNN or nearest centroid is better than the traditional methods
  • Small sample size and large number of predictors: t-test can select predictors while lasso cannot