# ROC curve

• Binary case:
• Y = true positive rate = sensitivity,
• X = false positive rate = 1-specificity = 假陽性率
• Area under the curve AUC from the wikipedia: the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one (assuming 'positive' ranks higher than 'negative').
$\displaystyle{ A = \int_{\infty}^{-\infty} \mbox{TPR}(T) \mbox{FPR}'(T) \, dT = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} I(T'\gt T)f_1(T') f_0(T) \, dT' \, dT = P(X_1 \gt X_0) }$
where $\displaystyle{ X_1 }$ is the score for a positive instance and $\displaystyle{ X_0 }$ is the score for a negative instance, and $\displaystyle{ f_0 }$ and $\displaystyle{ f_1 }$ are probability densities as defined in previous section.
• Interpretation of the AUC. A small toy example (n=12=4+8) was used to calculate the exact probability $\displaystyle{ P(X_1 \gt X_0) }$ (4*8=32 all combinations).
• It is a discrimination measure which tells us how well we can classify patients in two groups: those with and those without the outcome of interest.
• Since the measure is based on ranks, it is not sensitive to systematic errors in the calibration of the quantitative tests.
• The AUC can be defined as The probability that a randomly selected case will have a higher test result than a randomly selected control.
• Plot of sensitivity/specificity (y-axis) vs cutoff points of the biomarker
• The Mann-Whitney U test statistic (or Wilcoxon or Kruskall-Wallis test statistic) is equivalent to the AUC (Mason, 2002)
• The p-value of the Mann-Whitney U test can thus safely be used to test whether the AUC differs significantly from 0.5 (AUC of an uninformative test).
• Calculate AUC by hand. AUC is equal to the probability that a true positive is scored greater than a true negative.
• See the uROC() function in <functions.R> from the supplementary of the paper (need access right) Bivariate Marker Measurements and ROC Analysis Wang 2012. Let $\displaystyle{ n_1 }$ be the number of obs from X1 and $\displaystyle{ n_0 }$ be the number of obs from X0. X1 and X0 are the predict values for data from group 1 and 0. $\displaystyle{ TP_i=Prob(X_1\gt X_{0i})=\sum_j (X_{1j} \gt X_{0i})/n_1, ~ FP_i=Prob(X_0\gt X_{0i}) = \sum_j (X_{0j} \gt X_{0i}) / n_0 }$. We can draw a scatter plot or smooth.spline() of TP(y-axis) vs FP(x-axis) for the ROC curve.
uROC <- function(marker, status)   ### ROC function for univariate marker ###
{
x <- marker
cat(paste("\n", sum(bad), "records with missing values dropped. \n"))
no_case <- sum(status==1)
no_control <- sum(status==0)
TP <- rep(0, no_control)
FP <- rep(0, no_control)
for (i in 1: no_control){
TP[i] <- sum(x[status==1]>x[status==0][i])/no_case
FP[i] <- sum(x[status==0]>x[status==0][i])/no_control
}
list(TP = TP, FP = FP)
}


## Challenges, issues

Methodological conduct of prognostic prediction models developed using machine learning in oncology: a systematic review 2022. class imbalance, data pre-processing, and hyperparameter tuning. twitter.

# Survival data

'Survival Model Predictive Accuracy and ROC Curves' by Heagerty & Zheng 2005

• Recall Sensitivity= $\displaystyle{ P(\hat{p_i} \gt c | Y_i=1) }$, Specificity= $\displaystyle{ P(\hat{p}_i \le c | Y_i=0 }$), $\displaystyle{ Y_i }$ is binary outcomes, $\displaystyle{ \hat{p}_i }$ is a prediction, $\displaystyle{ c }$ is a criterion for classifying the prediction as positive ($\displaystyle{ \hat{p}_i \gt c }$) or negative ($\displaystyle{ \hat{p}_i \le c }$).
• For survival data, we need to use a fixed time/horizon (t) to classify the data as either a case or a control. Following Heagerty and Zheng's definition in Survival Model Predictive Accuracy and ROC Curves (Incident/dynamic) 2005, Sensitivity(c, t)= $\displaystyle{ P(M_i \gt c | T_i = t) }$, Specificity= $\displaystyle{ P(M_i \le c | T_i \gt t }$) where M is a marker value or $\displaystyle{ Z^T \beta }$. Here sensitivity measures the expected fraction of subjects with a marker greater than c among the subpopulation of individuals who die at time t, while specificity measures the fraction of subjects with a marker less than or equal to c among those who survive beyond time t.
• The AUC measures the probability that the marker value for a randomly selected case exceeds the marker value for a randomly selected control
• ROC curves are useful for comparing the discriminatory capacity of different potential biomarkers.

# Confusion matrix, Sensitivity/Specificity/Accuracy

 Predict 1 0 True 1 TP FN Sens=TP/(TP+FN)=Recall=TPR FNR=FN/(TP+FN) 0 FP TN Spec=TN/(FP+TN), 1-Spec=FPR PPV=TP/(TP+FP) FDR=FP/(TP+FP) =1-PPV NPV=TN/(FN+TN) N = TP + FP + FN + TN
• Sensitivity 敏感度 = TP / (TP + FN) = Recall
• Specificity 特異度 = TN / (TN + FP)
• Accuracy = (TP + TN) / N
• False discovery rate FDR = FP / (TP + FP)
• False negative rate FNR = FN / (TP + FN)
• False positive rate FPR = FP / (FP + TN) = 1 - Spec
• True positive rate = TP / (TP + FN) = Sensitivity
• Positive predictive value (PPV) = TP / # positive calls = TP / (TP + FP) = 1 - FDR = Precision
• Negative predictive value (NPV) = TN / # negative calls = TN / (FN + TN)
• Prevalence 盛行率 = (TP + FN) / N.
• Note that PPV & NPV can also be computed from sensitivity, specificity, and prevalence:
$\displaystyle{ \text{PPV} = \frac{\text{sensitivity} \times \text{prevalence}}{\text{sensitivity} \times \text{prevalence}+(1-\text{specificity}) \times (1-\text{prevalence})} }$
$\displaystyle{ \text{NPV} = \frac{\text{specificity} \times (1-\text{prevalence})}{(1-\text{sensitivity}) \times \text{prevalence}+\text{specificity} \times (1-\text{prevalence})} }$

# genefilter package and rowpAUCs function

• rowpAUCs function in genefilter package. The aim is to find potential biomarkers whose expression level is able to distinguish between two groups.
# source("http://www.bioconductor.org/biocLite.R")
# biocLite("genefilter")
library(Biobase) # sample.ExpressionSet data
data(sample.ExpressionSet)

library(genefilter)
r2 = rowpAUCs(sample.ExpressionSet, "sex", p=0.1)
plot(r2[1]) # first gene, asking specificity = .9

r2 = rowpAUCs(sample.ExpressionSet, "sex", p=1.0)
plot(r2[1]) # it won't show pAUC

r2 = rowpAUCs(sample.ExpressionSet, "sex", p=.999)
plot(r2[1]) # pAUC is very close to AUC now


# Comparison of two AUCs

• Statistical Assessments of AUC. This is using the pROC::roc.test function.
• prioritylasso. It is using roc(), auc(), roc.test(), plot.roc() from the pROC package. The calculation based on the training data is biased so we need to report the one based on test data.

# Confidence interval of AUC

How to get an AUC confidence interval. pROC package was used.

# AUC can be a misleading measure of performance

AUC is high but precision is low (i.e. FDR is high). https://twitter.com/michaelhoffman/status/1398380674206285830?s=09.

# Class comparison problem

• compcodeR: RNAseq data simulation, differential expression analysis and performance comparison of differential expression methods
• Polyester: simulating RNA-seq datasets with differential transcript expression, github, HTML

# Lessons

• Unbalanced data: kNN or nearest centroid is better than the traditional methods
• Small sample size and large number of predictors: t-test can select predictors while lasso cannot