Batch effect: Difference between revisions

From 太極
Jump to navigation Jump to search
No edit summary
No edit summary
Tags: mobile edit mobile web edit advanced mobile edit
Line 15: Line 15:


* [https://www.coursera.org/lecture/statistical-genomics/module-2-overview-1-12-cbqYZ Statistics for Genomic Data Science] (Coursera) and https://github.com/jtleek/genstats
* [https://www.coursera.org/lecture/statistical-genomics/module-2-overview-1-12-cbqYZ Statistics for Genomic Data Science] (Coursera) and https://github.com/jtleek/genstats
* Some possible batch variables: operators,  runs, machines, library kits, laboratories.
* [https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2263-6 Alternative empirical Bayes models for adjusting for batch effects in genomic studies] Zhang et al. BMC Bioinformatics 2018. The R package is [http://www.bioconductor.org/packages/release/bioc/html/BatchQC.html BatchQC] from Bioconductor.
* [https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2263-6 Alternative empirical Bayes models for adjusting for batch effects in genomic studies] Zhang et al. BMC Bioinformatics 2018. The R package is [http://www.bioconductor.org/packages/release/bioc/html/BatchQC.html BatchQC] from Bioconductor.
* [https://www.rdocumentation.org/packages/sva/versions/3.20.0/topics/ComBat sva::ComBat()] function in [http://www.bioconductor.org/packages/release/bioc/html/sva.html sva] package from Bioconductor.
* [https://www.rdocumentation.org/packages/sva/versions/3.20.0/topics/ComBat sva::ComBat()] function in [http://www.bioconductor.org/packages/release/bioc/html/sva.html sva] package from Bioconductor.

Revision as of 20:52, 16 May 2022

Merging two gene expression studies, ComBat

[math]\displaystyle{ \begin{align} Y_{ijg} = \alpha_g + X \beta_g + \gamma_{ig} + \delta_{ig} \epsilon_{ijg} \end{align} }[/math] where X consists of covariates of scientific interests, while [math]\displaystyle{ \gamma_{ig} }[/math] and [math]\displaystyle{ \delta_{ig} }[/math] characterize the additive and multiplicative batch effects of batch i for gene g.

The batch corrected data is [math]\displaystyle{ \begin{align} \frac{Y_{ijg} - \hat{\alpha_g} - X \hat{\beta_g} - \hat{\gamma_{ig}}}{\hat{\delta_{ig}}} + \hat{\alpha_g} + X \hat{\beta_g} \end{align} }[/math]

BiocManager::install("sva")
library(sva)
library(bladderbatch)
data(bladderdata)
pheno = pData(bladderEset)
edata = exprs(bladderEset)
batch = pheno$batch
table(pheno$cancer)
# Biopsy Cancer Normal 
#      9     40      8 
table(batch)
# batch
#  1  2  3  4  5 
# 11 18  4  5 19 

modcombat = model.matrix(~1, data=pheno)
combat_edata = ComBat(dat=edata, batch=batch, mod=modcombat, 
                      prior.plots=FALSE)
# This returns an expression matrix, with the same dimensions 
# as your original dataset (genes x samples).
# mod: Model matrix for outcome of interest and other covariates besides batch
# By default, it performs parametric empirical Bayesian adjustments. 
# If you would like to use nonparametric empirical Bayesian adjustments, 
# use the par.prior=FALSE option (this will take longer). 

combat_edata = ComBat(dat=edata, batch=batch, ref.batch=1)

MultiBaC- Multiomic Batch effect Correction

MultiBaC

Combat or limma?

Batch effects : ComBat or removebatcheffects (limma package) ?