Power: Difference between revisions
Line 118: | Line 118: | ||
* [https://cran.r-project.org/web/packages/ssizeRNA/index.html ssizeRNA] w/ vignette | * [https://cran.r-project.org/web/packages/ssizeRNA/index.html ssizeRNA] w/ vignette | ||
* power.t.test(), power.anova.test(), power.prop.test() from [https://stat.ethz.ch/R-manual/R-devel/library/stats/html/00Index.html stats] package | * power.t.test(), power.anova.test(), power.prop.test() from [https://stat.ethz.ch/R-manual/R-devel/library/stats/html/00Index.html stats] package | ||
== RNA-seq == | |||
* [https://academic.oup.com/bib/article/19/4/713/2920205#118987067 Feasibility of sample size calculation for RNA-seq studies] Poplawski 2018. | |||
** The ‘Scotty’ (MATLAB) tool performed best | |||
** ‘Scotty’, [https://cran.r-project.org/web/packages/ssizeRNA/index.html ssizeRNA] 2016 and [https://www.bioconductor.org/packages/release/bioc/html/PROPER.html PROPER] 2015 generated comparable results. | |||
** Bi et al. showed that ssizeRNA provided a more accurate estimate of power/sample size than RnaSeqSampleSize; ssizeRNA and RnaSeqSampleSize provided results much faster than PROPER. See [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6291796/ Power and sample size calculations for high-throughput sequencing-based experiments] 2018 | |||
* [https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2445-2 Empirical assessment of the impact of sample number and read depth on RNA-Seq analysis workflow performance] | |||
* [https://youtu.be/WW94W-DBf2U?t=1260 Number of replicates needed] is dependent on the following that varies from gene to gene | |||
** Within group variance | |||
** Read coverage | |||
** Desired detectable effect size | |||
== ScRNA-seq == | == ScRNA-seq == |
Revision as of 08:29, 9 April 2022
Power analysis/Sample Size determination
- https://en.m.wikipedia.org/wiki/Power_(statistics)
- Sample size determination from Wikipedia
- Power and Sample Size Determination http://www.stat.wisc.edu/~st571-1/10-power-2.pdf#page=12
- http://biostat.mc.vanderbilt.edu/wiki/pub/Main/AnesShortCourse/HypothesisTestingPart1.pdf#page=40
- Power analysis and sample size calculation for Agriculture (pwr, lmSupport, simr packages are used)
- Why Within-Subject Designs Require Fewer Participants than Between-Subject Designs
Binomial distribution
- Binomial test. Calculating a p-value for a two-tailed test is slightly more complicated, since a binomial distribution isn't symmetric if [math]\displaystyle{ \pi _{0}\neq 0.5 }[/math].
- How To Get The Power Of Test In Hypothesis Testing With Binomial Distribution
- How to Perform a Binomial Test in R
- ?binom.test
Power analysis for default Bayesian t-tests
http://daniellakens.blogspot.com/2016/01/power-analysis-for-default-bayesian-t.html
Using simulation for power analysis
Power analysis and sample size calculation for Agriculture
http://r-video-tutorial.blogspot.com/2017/07/power-analysis-and-sample-size.html
Power calculation for proportions (shiny app)
https://juliasilge.shinyapps.io/power-app/
Derive the formula/manual calculation
- One-sample 1-sided test, One sample 2-sided test
- Two-sample 2-sided T test ([math]\displaystyle{ n }[/math] is the sample size in each group)
- [math]\displaystyle{ \begin{align} Power & = P_{\mu_1-\mu_2 = \Delta}(\frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\sigma^2/n + \sigma^2/n}} \gt Z_{\alpha /2}) + P_{\mu_1-\mu_2 = \Delta}(\frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\sigma^2/n + \sigma^2/n}} \lt -Z_{\alpha /2}) \\ & \approx P_{\mu_1-\mu_2 = \Delta}(\frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\sigma^2/n + \sigma^2/n}} \gt Z_{\alpha /2}) \\ & = P_{\mu_1-\mu_2 = \Delta}(\frac{\bar{X}_1 - \bar{X}_2 - \Delta}{\sqrt{2 * \sigma^2/n}} \gt Z_{\alpha /2} - \frac{\Delta}{\sqrt{2 * \sigma^2/n}}) \\ & = \Phi(-(Z_{\alpha /2} - \frac{\Delta}{\sqrt{2 * \sigma^2/n}})) \\ & = 1 - \beta =\Phi(Z_\beta) \end{align} }[/math]
Therefore
- [math]\displaystyle{ \begin{align} Z_{\beta} &= - Z_{\alpha/2} + \frac{\Delta}{\sqrt{2 * \sigma^2/n}} \\ Z_{\beta} + Z_{\alpha/2} & = \frac{\Delta}{\sqrt{2 * \sigma^2/n}} \\ 2 * (Z_{\beta} + Z_{\alpha/2})^2 * \sigma^2/\Delta^2 & = n \\ n & = 2 * (Z_{\beta} + Z_{\alpha/2})^2 * \sigma^2/\Delta^2 \end{align} }[/math]
# alpha = .05, delta = 200, n = 79.5, sigma=450 1 - pnorm(1.96 - 200*sqrt(79.5)/(sqrt(2)*450)) + pnorm(-1.96 - 200*sqrt(79.5)/(sqrt(2)*450)) # [1] 0.8 pnorm(-1.96 - 200*sqrt(79.5)/(sqrt(2)*450)) # [1] 9.58e-07 1 - pnorm(1.96 - 200*sqrt(79.5)/(sqrt(2)*450)) # [1] 0.8
Calculating required sample size in R and SAS
pwr package is used. For two-sided test, the formula for sample size is
- [math]\displaystyle{ n_{\mbox{each group}} = \frac{2 * (Z_{\alpha/2} + Z_\beta)^2 * \sigma^2}{\Delta^2} = \frac{2 * (Z_{\alpha/2} + Z_\beta)^2}{d^2} }[/math]
where [math]\displaystyle{ Z_\alpha }[/math] is value of the Normal distribution which cuts off an upper tail probability of [math]\displaystyle{ \alpha }[/math], [math]\displaystyle{ \Delta }[/math] is the difference sought, [math]\displaystyle{ \sigma }[/math] is the presumed standard deviation of the outcome, [math]\displaystyle{ \alpha }[/math] is the type 1 error, [math]\displaystyle{ \beta }[/math] is the type II error and (Cohen's) d is the effect size - difference between the means divided by the pooled standard deviation.
# An example from http://www.stat.columbia.edu/~gelman/stuff_for_blog/c13.pdf#page=3 # Method 1. require(pwr) pwr.t.test(d=200/450, power=.8, sig.level=.05, type="two.sample", alternative="two.sided") # # Two-sample t test power calculation # # n = 80.4 # d = 0.444 # sig.level = 0.05 # power = 0.8 # alternative = two.sided # # NOTE: n is number in *each* group # Method 2. 2*(qnorm(.975) + qnorm(.8))^2*450^2/(200^2) # [1] 79.5 2*(1.96 + .84)^2*450^2 / (200^2) # [1] 79.4
And stats::power.t.test() function.
power.t.test(n = 79.5, delta = 200, sd = 450, sig.level = .05, type ="two.sample", alternative = "two.sided") # # Two-sample t test power calculation # # n = 79.5 # delta = 200 # sd = 450 # sig.level = 0.05 # power = 0.795 # alternative = two.sided # # NOTE: n is number in *each* group
CRAN Task View: Design of Experiments
- powerAnalysis w/o vignette
- powerbydesign w/o vignette
- easypower w/ vignette
- pwr w/ vignette, https://www.statmethods.net/stats/power.html. The reference is Cohen's book.
- powerlmm Power Analysis for Longitudinal Multilevel/Linear Mixed-Effects Models.
- ssize.fdr w/o vignette
- samplesize w/o vignette
- ssizeRNA w/ vignette
- power.t.test(), power.anova.test(), power.prop.test() from stats package
RNA-seq
- Feasibility of sample size calculation for RNA-seq studies Poplawski 2018.
- The ‘Scotty’ (MATLAB) tool performed best
- ‘Scotty’, ssizeRNA 2016 and PROPER 2015 generated comparable results.
- Bi et al. showed that ssizeRNA provided a more accurate estimate of power/sample size than RnaSeqSampleSize; ssizeRNA and RnaSeqSampleSize provided results much faster than PROPER. See Power and sample size calculations for high-throughput sequencing-based experiments 2018
- Empirical assessment of the impact of sample number and read depth on RNA-Seq analysis workflow performance
- Number of replicates needed is dependent on the following that varies from gene to gene
- Within group variance
- Read coverage
- Desired detectable effect size
ScRNA-seq
Russ Lenth Java applets
https://homepage.divms.uiowa.edu/~rlenth/Power/index.html
Bootstrap method
The upstrap Crainiceanu & Crainiceanu, Biostatistics 2018
Multiple Testing Case
Optimal Sample Size for Multiple Testing The Case of Gene Expression Microarrays
Unbalanced randomization
Can unbalanced randomization improve power?
Yes, unbalanced randomization can improve power, in some situations