Batch effect: Difference between revisions

From 太極
Jump to navigation Jump to search
No edit summary
Line 1: Line 1:
= Merging two gene expression studies, ComBat =
= Merging two gene expression studies, ComBat =
<math>
\begin{align}
Y_{ijg} = \alpha_g + X \beta_g + \gamma_{ig} + \delta_{ig} \epsilon_{ijg}
\end{align}
</math>
where X consists of covariates of scientific interests, while <span style="color: red"><math>\gamma_{ig}</math></span> and <span style="color: red"><math>\delta_{ig}</math></span> characterize the ''additive'' and ''multiplicative'' <span style="color: red">batch effects</span> of batch i for gene g.
The batch corrected data is
<math>
\begin{align}
\frac{Y_{ijg} - \hat{\alpha_g} - X \hat{\beta_g} - \hat{\gamma_{ig}}}{\hat{\delta_{ig}}} + \hat{\alpha_g} + X \hat{\beta_g}
\end{align}
</math>
* [https://www.coursera.org/lecture/statistical-genomics/module-2-overview-1-12-cbqYZ Statistics for Genomic Data Science] (Coursera) and https://github.com/jtleek/genstats
* [https://www.coursera.org/lecture/statistical-genomics/module-2-overview-1-12-cbqYZ Statistics for Genomic Data Science] (Coursera) and https://github.com/jtleek/genstats
* [https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2263-6 Alternative empirical Bayes models for adjusting for batch effects in genomic studies] Zhang et al. BMC Bioinformatics 2018. The R package is [http://www.bioconductor.org/packages/release/bioc/html/BatchQC.html BatchQC] from Bioconductor.
* [https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2263-6 Alternative empirical Bayes models for adjusting for batch effects in genomic studies] Zhang et al. BMC Bioinformatics 2018. The R package is [http://www.bioconductor.org/packages/release/bioc/html/BatchQC.html BatchQC] from Bioconductor.
Line 44: Line 58:
* [https://biodatascience.github.io/compbio/dist/batch.html Batch effects and GC content] of NGS by Michael Love
* [https://biodatascience.github.io/compbio/dist/batch.html Batch effects and GC content] of NGS by Michael Love
* [https://www.jianshu.com/p/99b3411ad6ad 困扰的batch effect]
* [https://www.jianshu.com/p/99b3411ad6ad 困扰的batch effect]
* [https://mdozmorov.github.io/BIOS567/assets/presentation_diffexpression/batch.pdf Some note] by Mikhail Dozmorov


= MultiBaC- Multiomic Batch effect Correction =
= MultiBaC- Multiomic Batch effect Correction =
[https://www.bioconductor.org/packages/release/bioc/html/MultiBaC.html MultiBaC]
[https://www.bioconductor.org/packages/release/bioc/html/MultiBaC.html MultiBaC]
= Combat or limma? =
[https://www.biostars.org/p/266507/ Batch effects : ComBat or removebatcheffects (limma package) ?]

Revision as of 20:41, 16 May 2022

Merging two gene expression studies, ComBat

[math]\displaystyle{ \begin{align} Y_{ijg} = \alpha_g + X \beta_g + \gamma_{ig} + \delta_{ig} \epsilon_{ijg} \end{align} }[/math] where X consists of covariates of scientific interests, while [math]\displaystyle{ \gamma_{ig} }[/math] and [math]\displaystyle{ \delta_{ig} }[/math] characterize the additive and multiplicative batch effects of batch i for gene g.

The batch corrected data is [math]\displaystyle{ \begin{align} \frac{Y_{ijg} - \hat{\alpha_g} - X \hat{\beta_g} - \hat{\gamma_{ig}}}{\hat{\delta_{ig}}} + \hat{\alpha_g} + X \hat{\beta_g} \end{align} }[/math]

BiocManager::install("sva")
library(sva)
library(bladderbatch)
data(bladderdata)
pheno = pData(bladderEset)
edata = exprs(bladderEset)
batch = pheno$batch
table(pheno$cancer)
# Biopsy Cancer Normal 
#      9     40      8 
table(batch)
# batch
#  1  2  3  4  5 
# 11 18  4  5 19 

modcombat = model.matrix(~1, data=pheno)
combat_edata = ComBat(dat=edata, batch=batch, mod=modcombat, 
                      prior.plots=FALSE)
# This returns an expression matrix, with the same dimensions 
# as your original dataset (genes x samples).
# mod: Model matrix for outcome of interest and other covariates besides batch
# By default, it performs parametric empirical Bayesian adjustments. 
# If you would like to use nonparametric empirical Bayesian adjustments, 
# use the par.prior=FALSE option (this will take longer). 

combat_edata = ComBat(dat=edata, batch=batch, ref.batch=1)

MultiBaC- Multiomic Batch effect Correction

MultiBaC

Combat or limma?

Batch effects : ComBat or removebatcheffects (limma package) ?