Batch effect: Difference between revisions

From 太極
Jump to navigation Jump to search
(Created page with "= Merging two gene expression studies, ComBat = * [https://www.coursera.org/lecture/statistical-genomics/module-2-overview-1-12-cbqYZ Statistics for Genomic Data Science] (Cou...")
 
Line 6: Line 6:
** It can remove both known batch effects and other potential latent sources of variation.
** It can remove both known batch effects and other potential latent sources of variation.
** The tutorial includes information on (1) how to estimate the number of latent sources of variation, (2) how to apply the sva package to estimate latent variables such as batch effects, (3) how to directly remove known batch effects using the ComBat function, (4) how to perform differential expression analysis using surrogate variables either directly or with thelimma package, and (4) how to apply “frozen” sva to improve prediction and clustering.
** The tutorial includes information on (1) how to estimate the number of latent sources of variation, (2) how to apply the sva package to estimate latent variables such as batch effects, (3) how to directly remove known batch effects using the ComBat function, (4) how to perform differential expression analysis using surrogate variables either directly or with thelimma package, and (4) how to apply “frozen” sva to improve prediction and clustering.
** [https://bmccancer.biomedcentral.com/track/pdf/10.1186/s12885-018-4546-8#page=4 Figure S1 shows the principal component analysis (PCA) before and after batch effect correction for training and validation datasets]
** Figure 1 shows 3 heatmaps. Each contains column annotation including Time, Treatment and Batch variables. a) No adjustment, '''b) standardize each gene within each batch''' (implemented in dChip software), c) EB batch adjustment. Note that there is no strong evidence of batch effects after adjustment in heat maps (b)–(c).
** [https://bmccancer.biomedcentral.com/track/pdf/10.1186/s12885-018-4546-8#page=4 Figure S1 shows the principal component analysis (PCA) before and after batch effect correction for training and validation datasets] from another paper
** [https://www.bioconductor.org/packages/release/bioc/vignettes/sva/inst/doc/sva.pdf#page=7 Tutorial example] to remove the batch effect  
** [https://www.bioconductor.org/packages/release/bioc/vignettes/sva/inst/doc/sva.pdf#page=7 Tutorial example] to remove the batch effect  
:<syntaxhighlight lang='bash'>
:<syntaxhighlight lang='bash'>
Line 16: Line 17:
edata = exprs(bladderEset)
edata = exprs(bladderEset)
batch = pheno$batch
batch = pheno$batch
table(pheno$cancer)
# Biopsy Cancer Normal
#      9    40      8
table(batch)
# batch
#  1  2  3  4  5
# 11 18  4  5 19
modcombat = model.matrix(~1, data=pheno)
modcombat = model.matrix(~1, data=pheno)
combat_edata = ComBat(dat=edata, batch=batch, mod=modcombat,  
combat_edata = ComBat(dat=edata, batch=batch, mod=modcombat,  
                       par.prior=TRUE, prior.plots=FALSE)
                       prior.plots=FALSE)
# This returns an expression matrix, with the same dimensions  
# This returns an expression matrix, with the same dimensions  
# as your original dataset.
# as your original dataset (genes x samples).
# mod: Model matrix for outcome of interest and other covariates besides batch
# By default, it performs parametric empirical Bayesian adjustments.  
# By default, it performs parametric empirical Bayesian adjustments.  
# If you would like to use nonparametric empirical Bayesian adjustments,  
# If you would like to use nonparametric empirical Bayesian adjustments,  
# use the par.prior=FALSE option (this will take longer).  
# use the par.prior=FALSE option (this will take longer).  
combat_edata = ComBat(dat=edata, batch=batch, ref.batch=1)
</syntaxhighlight>
</syntaxhighlight>
* '''ref.batch''' for reference-based batch adjustment. '''mean.only''' option if there is no need to adjust the variancec. Check out paper [https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2263-6 Alternative empirical Bayes models for adjusting for batch effects in genomic studies] Zhang 2018
* '''ref.batch''' for reference-based batch adjustment. '''mean.only''' option if there is no need to adjust the variancec. Check out paper [https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2263-6 Alternative empirical Bayes models for adjusting for batch effects in genomic studies] Zhang 2018. Figure 4 shows reference-based ComBat can clearly show the pathway activated samples in Batch 1 samples and show the true data pattern in Batch 2 samples from the simulated study (vs the original ComBat approach failed for both cases). In Figure 5 when we cluster genes using K-means, referenced-based Combat can better identify the role of DE or control genes (compared to the original ComBat method).
* [https://academic.oup.com/bioinformatics/article/24/9/1154/206630 Merging two gene-expression studies via cross-platform normalization] by Shabalin et al, Bioinformatics 2008. This method (called '''Cross-Platform Normalization/XPN''')was used by Ternès Biometrical Journal 2017.
* [https://academic.oup.com/bioinformatics/article/24/9/1154/206630 Merging two gene-expression studies via cross-platform normalization] by Shabalin et al, Bioinformatics 2008. This method (called '''Cross-Platform Normalization/XPN''')was used by Ternès Biometrical Journal 2017.
* [https://academic.oup.com/bib/article/14/4/469/191565 Batch effect removal methods for microarray gene expression data integration: a survey] by Lazar et al, Bioinformatics 2012. The R package is '''[http://bioconductor.org/packages/3.3/bioc/html/inSilicoMerging.html inSilicoMerging]''' which has been removed from Bioconductor 3.4.  
* [https://academic.oup.com/bib/article/14/4/469/191565 Batch effect removal methods for microarray gene expression data integration: a survey] by Lazar et al, Bioinformatics 2012. The R package is '''[http://bioconductor.org/packages/3.3/bioc/html/inSilicoMerging.html inSilicoMerging]''' which has been removed from Bioconductor 3.4.  

Revision as of 16:05, 16 May 2022

Merging two gene expression studies, ComBat

BiocManager::install("sva")
library(sva)
library(bladderbatch)
data(bladderdata)
pheno = pData(bladderEset)
edata = exprs(bladderEset)
batch = pheno$batch
table(pheno$cancer)
# Biopsy Cancer Normal 
#      9     40      8 
table(batch)
# batch
#  1  2  3  4  5 
# 11 18  4  5 19 

modcombat = model.matrix(~1, data=pheno)
combat_edata = ComBat(dat=edata, batch=batch, mod=modcombat, 
                      prior.plots=FALSE)
# This returns an expression matrix, with the same dimensions 
# as your original dataset (genes x samples).
# mod: Model matrix for outcome of interest and other covariates besides batch
# By default, it performs parametric empirical Bayesian adjustments. 
# If you would like to use nonparametric empirical Bayesian adjustments, 
# use the par.prior=FALSE option (this will take longer). 

combat_edata = ComBat(dat=edata, batch=batch, ref.batch=1)

MultiBaC- Multiomic Batch effect Correction

MultiBaC