Latest revision as of 10:25, 14 November 2023

Power analysis/Sample Size determination

https://en.m.wikipedia.org/wiki/Power_(statistics)
Sample size determination from Wikipedia
Power and Sample Size Determination http://www.stat.wisc.edu/~st571-1/10-power-2.pdf#page=12
http://biostat.mc.vanderbilt.edu/wiki/pub/Main/AnesShortCourse/HypothesisTestingPart1.pdf#page=40
Power analysis and sample size calculation for Agriculture (pwr, lmSupport, simr packages are used)
Why Within-Subject Designs Require Fewer Participants than Between-Subject Designs

Binomial distribution

Binomial test. Calculating a p-value for a two-tailed test is slightly more complicated, since a binomial distribution isn't symmetric if [math]\displaystyle{ \pi _{0}\neq 0.5 }[/math].
How To Get The Power Of Test In Hypothesis Testing With Binomial Distribution
How to Perform a Binomial Test in R
?binom.test

Power analysis for default Bayesian t-tests

http://daniellakens.blogspot.com/2016/01/power-analysis-for-default-bayesian-t.html

Using simulation for power analysis

Pearson correlation

set.seed(0)

simulate_p_value <- function(sample_size) {
  # Simulate data
  x <- rnorm(sample_size)
  y <- x + rnorm(sample_size)
  
  # Calculate correlation and p-value
  cor.test(x, y)$p.value
}

simulate_p_value_ttest <- function(sample_size) {
  # Simulate data
  x <- rnorm(sample_size, mean = 0)
  y <- rnorm(sample_size, mean = 0.5)
  
  # Calculate p-value
  t.test(x, y)$p.value
}

sample_sizes <- c(10, 30, 50, 100, 300, 500, 1000)
p_values <- sapply(sample_sizes, simulate_p_value)

plot(sample_sizes, p_values, log = "y", type = "o",
     xlab = "Sample Size", ylab = "p-value",
     main = "Effect of Sample Size on p-value")

T-test case

T-test variance could be large if the sample size is small. Below is a case of 30 vs 5 and effect size .5.

set.seed(13)
hist(replicate(100, t.test(rnorm(30), rnorm(5)+.5)$p.value), 
     main='', xlab='pvalue')
set.seed(13); summary(replicate(100, t.test(rnorm(30), rnorm(5)+.5)$p.value))
#      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
# 0.0000157 0.1730494 0.3178627 0.3970372 0.6592276 0.9999164

Power analysis and sample size calculation for Agriculture

http://r-video-tutorial.blogspot.com/2017/07/power-analysis-and-sample-size.html

Power calculation for proportions (shiny app)

https://juliasilge.shinyapps.io/power-app/

Derive the formula/manual calculation

One-sample 1-sided test, One sample 2-sided test
Two-sample 2-sided T test ([math]\displaystyle{ n }[/math] is the sample size in each group)

[math]\displaystyle{ \begin{align} Power & = P_{\mu_1-\mu_2 = \Delta}(\frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\sigma^2/n + \sigma^2/n}} \gt Z_{\alpha /2}) + P_{\mu_1-\mu_2 = \Delta}(\frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\sigma^2/n + \sigma^2/n}} \lt -Z_{\alpha /2}) \\ & \approx P_{\mu_1-\mu_2 = \Delta}(\frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\sigma^2/n + \sigma^2/n}} \gt Z_{\alpha /2}) \\ & = P_{\mu_1-\mu_2 = \Delta}(\frac{\bar{X}_1 - \bar{X}_2 - \Delta}{\sqrt{2 * \sigma^2/n}} \gt Z_{\alpha /2} - \frac{\Delta}{\sqrt{2 * \sigma^2/n}}) \\ & = \Phi(-(Z_{\alpha /2} - \frac{\Delta}{\sqrt{2 * \sigma^2/n}})) \\ & = 1 - \beta =\Phi(Z_\beta) \end{align} }[/math]

Therefore

[math]\displaystyle{ \begin{align} Z_{\beta} &= - Z_{\alpha/2} + \frac{\Delta}{\sqrt{2 * \sigma^2/n}} \\ Z_{\beta} + Z_{\alpha/2} & = \frac{\Delta}{\sqrt{2 * \sigma^2/n}} \\ 2 * (Z_{\beta} + Z_{\alpha/2})^2 * \sigma^2/\Delta^2 & = n \\ n & = 2 * (Z_{\beta} + Z_{\alpha/2})^2 * \sigma^2/\Delta^2 \end{align} }[/math]

# alpha = .05, delta = 200, n = 79.5, sigma=450
1 - pnorm(1.96 - 200*sqrt(79.5)/(sqrt(2)*450)) + pnorm(-1.96 - 200*sqrt(79.5)/(sqrt(2)*450))
# [1] 0.8
pnorm(-1.96 - 200*sqrt(79.5)/(sqrt(2)*450))
# [1] 9.58e-07
1 - pnorm(1.96 - 200*sqrt(79.5)/(sqrt(2)*450)) 
# [1] 0.8

Calculating required sample size in R and SAS

pwr package is used. For two-sided test, the formula for sample size is

[math]\displaystyle{ n_{\mbox{each group}} = \frac{2 * (Z_{\alpha/2} + Z_\beta)^2 * \sigma^2}{\Delta^2} = \frac{2 * (Z_{\alpha/2} + Z_\beta)^2}{d^2} }[/math]

where [math]\displaystyle{ Z_\alpha }[/math] is value of the Normal distribution which cuts off an upper tail probability of [math]\displaystyle{ \alpha }[/math], [math]\displaystyle{ \Delta }[/math] is the difference sought, [math]\displaystyle{ \sigma }[/math] is the presumed standard deviation of the outcome, [math]\displaystyle{ \alpha }[/math] is the type 1 error, [math]\displaystyle{ \beta }[/math] is the type II error and (Cohen's) d is the effect size - difference between the means divided by the pooled standard deviation.

# An example from http://www.stat.columbia.edu/~gelman/stuff_for_blog/c13.pdf#page=3
# Method 1.
require(pwr)
pwr.t.test(d=200/450, power=.8, sig.level=.05,
           type="two.sample", alternative="two.sided")
#
#     Two-sample t test power calculation 
#
#              n = 80.4
#              d = 0.444
#      sig.level = 0.05
#          power = 0.8
#    alternative = two.sided
#
# NOTE: n is number in *each* group

# Method 2.
2*(qnorm(.975) + qnorm(.8))^2*450^2/(200^2)
# [1] 79.5
2*(1.96 + .84)^2*450^2 / (200^2)
# [1] 79.4

And stats::power.t.test() function.

power.t.test(n = 79.5, delta = 200, sd = 450, sig.level = .05,
             type ="two.sample", alternative = "two.sided")
#
#     Two-sample t test power calculation 
#
#              n = 79.5
#          delta = 200
#             sd = 450
#      sig.level = 0.05
#          power = 0.795
#    alternative = two.sided
#
# NOTE: n is number in *each* group

R package related to power analysis

CRAN Task View: Design of Experiments

powerAnalysis w/o vignette
powerbydesign w/o vignette
easypower w/ vignette
pwr w/ vignette, https://www.statmethods.net/stats/power.html. The reference is Cohen's book.
- Power analysis in Statistics with R
powerlmm Power Analysis for Longitudinal Multilevel/Linear Mixed-Effects Models.
ssize.fdr w/o vignette
samplesize w/o vignette
ssizeRNA w/ vignette
power.t.test(), power.anova.test(), power.prop.test() from stats package

RNA-seq

Feasibility of sample size calculation for RNA-seq studies Poplawski 2018.
- The ‘Scotty’ (MATLAB) tool performed best
- ‘Scotty’, ssizeRNA 2016 and PROPER 2015 generated comparable results.
- Bi et al. showed that ssizeRNA provided a more accurate estimate of power/sample size than RnaSeqSampleSize; ssizeRNA and RnaSeqSampleSize provided results much faster than PROPER. See Power and sample size calculations for high-throughput sequencing-based experiments 2018
Empirical assessment of the impact of sample number and read depth on RNA-Seq analysis workflow performance
Number of replicates needed is dependent on the following that varies from gene to gene
- Within group variance
- Read coverage
- Desired detectable effect size
powsimR: power analysis for bulk and single cell RNA-seq experiments Vieth 2017
RnaSeqSampleSize
RNASeqPower. It was used in GREIN: An Interactive Web Platform for Re-analyzing GEO RNA-seq Data 2019

ScRNA-seq

scRNA-seq

Russ Lenth Java applets

https://homepage.divms.uiowa.edu/~rlenth/Power/index.html

Bootstrap method

The upstrap Crainiceanu & Crainiceanu, Biostatistics 2018

Multiple Testing Case

Optimal Sample Size for Multiple Testing The Case of Gene Expression Microarrays

Unbalanced randomization

Can unbalanced randomization improve power?

Yes, unbalanced randomization can improve power, in some situations

Prediction

Minimum sample size for external validation of a clinical prediction model with a binary outcome Riley 2021

High dimensional data

Simulation To Understand Needed Sample Sizes from the online book Biostatistics for Biomedical Research

Survival data

Sample size and predictive performance of machine learning methods with survival data: A simulation study 2023. R code is in the supplementary.

@@ Line 1: / Line 1: @@
 = Power analysis/Sample Size determination =
+* https://en.m.wikipedia.org/wiki/Power_(statistics)
 * [https://en.wikipedia.org/wiki/Sample_size_determination Sample size determination] from Wikipedia
 * Power and Sample Size Determination http://www.stat.wisc.edu/~st571-1/10-power-2.pdf#page=12
@@ Line 5: / Line 6: @@
 * [http://r-video-tutorial.blogspot.com/2017/07/power-analysis-and-sample-size.html Power analysis and sample size calculation for Agriculture] ('''pwr, lmSupport, simr''' packages are used)
 * [http://daniellakens.blogspot.com/2016/11/why-within-subject-designs-require-less.html Why Within-Subject Designs Require Fewer Participants than Between-Subject Designs]
+== Binomial distribution ==
+* [https://en.wikipedia.org/wiki/Binomial_test Binomial test]. Calculating a '''p-value''' for a two-tailed test is slightly more complicated, since a binomial distribution isn't symmetric if <math>\pi _{0}\neq 0.5</math>.
+* [https://predictivehacks.com/how-to-get-the-power-of-test-in-hypothesis-testing-with-binomial-distribution/ How To Get The Power Of Test In Hypothesis Testing With Binomial Distribution]
+* [https://www.statology.org/binomial-test-r/ How to Perform a Binomial Test in R]
+* [https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/binom.test ?binom.test]
 = Power analysis for default Bayesian t-tests =
 http://daniellakens.blogspot.com/2016/01/power-analysis-for-default-bayesian-t.html
-= Using simulation for power analysis: an example based on a stepped wedge study design =
+= Using simulation for power analysis =
-https://www.rdatagen.net/post/using-simulation-for-power-analysis-an-example/
+* [https://www.rdatagen.net/post/using-simulation-for-power-analysis-an-example/ An example based on a stepped wedge study design]
+* [https://www.rdatagen.net/post/2021-03-16-framework-for-power-analysis-using-simulation/ Framework for power analysis using simulation]
+== Pearson correlation ==
+<pre>
+set.seed(0)
+simulate_p_value <- function(sample_size) {
+  # Simulate data
+  x <- rnorm(sample_size)
+  y <- x + rnorm(sample_size)
+  # Calculate correlation and p-value
+  cor.test(x, y)$p.value
+}
+simulate_p_value_ttest <- function(sample_size) {
+  # Simulate data
+  x <- rnorm(sample_size, mean = 0)
+  y <- rnorm(sample_size, mean = 0.5)
+  # Calculate p-value
+  t.test(x, y)$p.value
+}
+sample_sizes <- c(10, 30, 50, 100, 300, 500, 1000)
+p_values <- sapply(sample_sizes, simulate_p_value)
+plot(sample_sizes, p_values, log = "y", type = "o",
+     xlab = "Sample Size", ylab = "p-value",
+     main = "Effect of Sample Size on p-value")
+</pre>
+= T-test case =
+T-test variance could be large if the sample size is small. Below is a case of 30 vs 5 and effect size .5.
+<pre>
+set.seed(13)
+hist(replicate(100, t.test(rnorm(30), rnorm(5)+.5)$p.value),
+     main='', xlab='pvalue')
+set.seed(13); summary(replicate(100, t.test(rnorm(30), rnorm(5)+.5)$p.value))
+#      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
+# 0.0000157 0.1730494 0.3178627 0.3970372 0.6592276 0.9999164
+</pre>
 = Power analysis and sample size calculation for Agriculture =
@@ Line 104: / Line 153: @@
 * [https://cran.r-project.org/web/packages/easypower/index.html easypower] w/ vignette
 * [https://cran.r-project.org/web/packages/pwr/index.html pwr] w/ vignette, https://www.statmethods.net/stats/power.html. The reference is Cohen's book.
+** [https://finnstats.com/index.php/2021/05/08/power-analysis-in-statistics/ Power analysis in Statistics with R]
 * [https://github.com/rpsychologist/powerlmm powerlmm] Power Analysis for Longitudinal Multilevel/Linear Mixed-Effects Models.
 * [https://cran.r-project.org/web/packages/ssize.fdr/index.html ssize.fdr] w/o vignette
@@ Line 109: / Line 159: @@
 * [https://cran.r-project.org/web/packages/ssizeRNA/index.html ssizeRNA] w/ vignette
 * power.t.test(), power.anova.test(), power.prop.test() from [https://stat.ethz.ch/R-manual/R-devel/library/stats/html/00Index.html stats] package
+== RNA-seq ==
+* [https://academic.oup.com/bib/article/19/4/713/2920205#118987067 Feasibility of sample size calculation for RNA-seq studies] Poplawski 2018.
+** The ‘Scotty’ (MATLAB) tool performed best
+** ‘Scotty’, [https://cran.r-project.org/web/packages/ssizeRNA/index.html ssizeRNA] 2016 and [https://www.bioconductor.org/packages/release/bioc/html/PROPER.html PROPER] 2015 generated comparable results.
+** Bi et al. showed that ssizeRNA provided a more accurate estimate of power/sample size than RnaSeqSampleSize; ssizeRNA and RnaSeqSampleSize provided results much faster than PROPER. See [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6291796/ Power and sample size calculations for high-throughput sequencing-based experiments] 2018
+* [https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2445-2 Empirical assessment of the impact of sample number and read depth on RNA-Seq analysis workflow performance]
+* [https://youtu.be/WW94W-DBf2U?t=1260 Number of replicates needed] is dependent on the following that varies from gene to gene
+** Within group variance
+** Read coverage
+** Desired detectable effect size
+* [https://academic.oup.com/bioinformatics/article/33/21/3486/3952669?login=true powsimR: power analysis for bulk and single cell RNA-seq experiments] Vieth 2017
+* [https://www.bioconductor.org/packages/release/bioc/html/RnaSeqSampleSize.html RnaSeqSampleSize]
+* [https://bioconductor.org/packages/release/bioc/html/RNASeqPower.html RNASeqPower]. It was used in [https://www.nature.com/articles/s41598-019-43935-8 GREIN: An Interactive Web Platform for Re-analyzing GEO RNA-seq Data] 2019
+== ScRNA-seq ==
+[[ScRNA#Power_analysis|scRNA-seq]]
 = Russ Lenth Java applets =
@@ Line 118: / Line 185: @@
 = Multiple Testing Case =
 [https://www.tandfonline.com/doi/abs/10.1198/016214504000001646 Optimal Sample Size for Multiple Testing The Case of Gene Expression Microarrays]
+= Unbalanced randomization =
+[https://www.rdatagen.net/post/can-unbalanced-randomization-improve-power/ Can unbalanced randomization improve power?]
+[https://www.rdatagen.net/post/unbalanced-randomization-can-improve-power-in-some-situations/ Yes, unbalanced randomization can improve power, in some situations]
+= Prediction =
+[https://onlinelibrary.wiley.com/doi/10.1002/sim.9025 Minimum sample size for external validation of a clinical prediction model with a binary outcome] Riley 2021
+= High dimensional data =
+[https://hbiostat.org/bbr/hdata.html#sec-hdata-simor Simulation To Understand Needed Sample Sizes] from the online book '''Biostatistics for Biomedical Research'''
+= Survival data =
+[https://onlinelibrary.wiley.com/doi/10.1002/sim.9931 Sample size and predictive performance of machine learning methods with survival data: A simulation study] 2023. R code is in the supplementary.

Power: Difference between revisions

Latest revision as of 10:25, 14 November 2023

Contents

Power analysis/Sample Size determination

Binomial distribution

Power analysis for default Bayesian t-tests

Using simulation for power analysis

Pearson correlation

T-test case

Power analysis and sample size calculation for Agriculture

Power calculation for proportions (shiny app)

Derive the formula/manual calculation

Calculating required sample size in R and SAS

R package related to power analysis

RNA-seq

ScRNA-seq

Russ Lenth Java applets

Bootstrap method

Multiple Testing Case

Unbalanced randomization

Prediction

High dimensional data

Survival data

Navigation menu

Power: Difference between revisions

Latest revision as of 10:25, 14 November 2023

Power analysis/Sample Size determination

Binomial distribution

Power analysis for default Bayesian t-tests

Using simulation for power analysis

Pearson correlation

T-test case

Power analysis and sample size calculation for Agriculture

Power calculation for proportions (shiny app)

Derive the formula/manual calculation

Calculating required sample size in R and SAS

R package related to power analysis

RNA-seq

ScRNA-seq

Russ Lenth Java applets

Bootstrap method

Multiple Testing Case

Unbalanced randomization

Prediction

High dimensional data

Survival data

Navigation menu

Search