Revision as of 15:38, 10 July 2023

General

Bootstrap from Wikipedia.
- This contains an overview of different methods for computing bootstrap confidence intervals.
- boot.ci() from the 'boot' package provides a short explanation for different methods for computing bootstrap confidence intervals.
Bootstrapping made easy and tidy with slipper
bootstrap package. "An Introduction to the Bootstrap" by B. Efron and R. Tibshirani, 1993

boot package. Functions and datasets for bootstrapping from the book Bootstrap Methods and Their Application by A. C. Davison and D. V. Hinkley (1997, CUP). A short course material can be found here.The main functions are boot() and boot.ci().

https://www.rdocumentation.org/packages/boot/versions/1.3-20

R in Action Nonparametric bootstrapping

# Compute the bootstrapped 95% confidence interval for R-squared in the linear regression
rsq <- function(data, indices, formula) {
  d <- data[indices,] # allows boot to select sample
  fit <- lm(formula, data=d)
  return(summary(fit)$r.square)
} # 'formula' is optional depends on the problem

# bootstrapping with 1000 replications
set.seed(1234)
bootobject <- boot(data=mtcars, statistic=rsq, R=1000, 
                   formula=mpg~wt+disp)
plot(bootobject) # or plot(bootobject, index = 1) if we have multiple statistics
ci <- boot.ci(bootobject, conf = .95, type=c("perc", "bca") ) 
    # default type is "all" which contains c("norm","basic", "stud", "perc", "bca"). 
    # 'bca' (Bias Corrected and Accelerated) by Efron 1987 uses 
    # percentiles but adjusted to account for bias and skewness.
# Level     Percentile            BCa          
# 95%   ( 0.6838,  0.8833 )   ( 0.6344,  0.8549 )
# Calculations and Intervals on Original Scale
# Some BCa intervals may be unstable
ci$bca[4:5]  
# [1] 0.6343589 0.8549305
# the mean is not the same
mean(c(0.6838,  0.8833 ))
# [1] 0.78355
mean(c(0.6344,  0.8549 ))
# [1] 0.74465
summary(lm(mpg~wt+disp, data = mtcars))$r.square
# [1] 0.7809306

Resampling Methods in R: The boot Package by Canty
An introduction to bootstrap with applications with R by Davison and Kuonen.
http://people.tamu.edu/~alawing/materials/ESSM689/Btutorial.pdf
http://statweb.stanford.edu/~tibs/sta305files/FoxOnBootingRegInR.pdf
http://www.stat.wisc.edu/~larget/stat302/chap3.pdf
https://www.stat.cmu.edu/~cshalizi/402/lectures/08-bootstrap/lecture-08.pdf. Variance, se, bias, confidence interval (basic, percentile), hypothesis testing, parametric & non-parametric bootstrap, bootstrapping regression models.
Understanding Bootstrap Confidence Interval Output from the R boot Package which covers the nonparametric and parametric bootstrap.

http://www.math.ntu.edu.tw/~hchen/teaching/LargeSample/references/R-bootstrap.pdf No package is used
http://web.as.uky.edu/statistics/users/pbreheny/621/F10/notes/9-21.pdf Bootstrap confidence interval
http://www-stat.wharton.upenn.edu/~stine/research/spida_2005.pdf
Optimism corrected bootstrapping (Harrell et al 1996)
- Adjusting for optimism/overfitting in measures of predictive ability using bootstrapping
- Part 1: Optimism corrected bootstrapping: a problematic method
- Part 2: Optimism corrected bootstrapping is definitely bias, further evidence
- Part 3: Two more implementations of optimism corrected bootstrapping show shocking bias
- Part 4: Why does bias occur in optimism corrected bootstrapping?
- Part 5: Code corrections to optimism corrected bootstrapping series
Bootstrapping Part 2: Calculating p-values!!! from StatQuest
Using bootstrapped sampling to assess variability in score predictions. The rsample (General Resampling Infrastructure) package was used.
Chapter 8 Bootstrapping and Confidence Intervals from the ebook "Statistical Inference via Data Science"

Nonparametric bootstrap

This is the most common bootstrap method

The upstrap Crainiceanu & Crainiceanu, Biostatistics 2018

Parametric bootstrap

Parametric bootstraps resample a known distribution function, whose parameters are estimated from your sample
http://www.math.ntu.edu.tw/~hchen/teaching/LargeSample/notes/notebootstrap.pdf#page=3 No package is used
A parametric or non-parametric bootstrap?
https://www.stat.cmu.edu/~cshalizi/402/lectures/08-bootstrap/lecture-08.pdf#page=11
simulatorZ Bioc package

Examples

Standard error

Standard error from a mean

foo <- function() mean(sample(x, replace = TRUE))
set.seed(1234)
x <- rnorm(300)
set.seed(1)
sd(replicate(10000, foo()))
# [1] 0.05717679
sd(x)/sqrt(length(x)) # The se of mean is s/sqrt(n)
# [1] 0.05798401

set.seed(1234)
x <- rpois(300, 2)
set.seed(1)
sd(replicate(10000, foo()))
# [1] 0.08038607
sd(x)/sqrt(length(x)) # The se of mean is s/sqrt(n)
# [1] 0.08183151

Difference of means from two samples (cf 8.3 The two-sample problem from the book "An introduction to Bootstrap" by Efron & Tibshirani)

# Define the two samples
sample1 <- 1:10
sample2 <- 11:20

# Define the number of bootstrap replicates
nboot <- 100000

# Initialize a vector to store the bootstrap estimates
boot_estimates <- numeric(nboot)

# Run the bootstrap
set.seed(123)
for (i in seq_len(nboot)) {
  # Resample the data with replacement
  resample1 <- sample(sample1, replace = TRUE)
  resample2 <- sample(sample2, replace = TRUE)
  
  # Compute the difference of means
  boot_estimates[i] <- mean(resample1) - mean(resample2)
}

# Compute the standard error of the bootstrap estimates
se_boot <- sd(boot_estimates)

# Print the result
cat("Bootstrap SE estimate of difference of means:", se_boot, "\n")
# 1.283541

sd1 <- sd(sample1)
sd2 <- sd(sample2)

# Calculate the sample sizes
n1 <- length(sample1)
n2 <- length(sample2)

# Calculate the true standard error of the difference of means
se_true <- sqrt((sd1^2/n1) + (sd2^2/n2))

# Print the result
cat("True SE of difference of means:", se_true, "\n") \
# 1.354006

Bootstrapping Extreme Value Estimators

Bootstrapping Extreme Value Estimators de Haan, 2022

@@ Line 95: / Line 95: @@
 <li>Difference of means from two samples (cf [https://books.google.com/books?id=gLlpIUxRntoC&printsec=frontcover&hl=zh-TW#v=onepage&q&f=false 8.3 The two-sample problem] from the book "An introduction to Bootstrap" by Efron & Tibshirani)
-<pre>
+<syntaxhighlight lang="rsplus">
 # Define the two samples
 sample1 <- 1:10
@@ Line 137: / Line 137: @@
 cat("True SE of difference of means:", se_true, "\n") \
 # 1.354006
-</pre>
+</syntaxhighlight>
 </ul>
 == Bootstrapping Extreme Value Estimators ==
 [https://www.tandfonline.com/doi/full/10.1080/01621459.2022.2120400 Bootstrapping Extreme Value Estimators]  de Haan, 2022