Batch effect
Jump to navigation
Jump to search
Merging two gene expression studies, ComBat
- Statistics for Genomic Data Science (Coursera) and https://github.com/jtleek/genstats
- Alternative empirical Bayes models for adjusting for batch effects in genomic studies Zhang et al. BMC Bioinformatics 2018. The R package is BatchQC from Bioconductor.
- sva::ComBat() function in sva package from Bioconductor.
- The original paper Johnson 2007 is number 2 of highly cited articles. Figure 1 shows the EB adjustment has the advantage of being robust to outliers in small sample sizes (batch sizes are small).
- It can remove both known batch effects and other potential latent sources of variation.
- The tutorial includes information on (1) how to estimate the number of latent sources of variation, (2) how to apply the sva package to estimate latent variables such as batch effects, (3) how to directly remove known batch effects using the ComBat function, (4) how to perform differential expression analysis using surrogate variables either directly or with thelimma package, and (4) how to apply “frozen” sva to improve prediction and clustering.
- Figure S1 shows the principal component analysis (PCA) before and after batch effect correction for training and validation datasets
- Tutorial example to remove the batch effect
BiocManager::install("sva") library(sva) library(bladderbatch) data(bladderdata) pheno = pData(bladderEset) edata = exprs(bladderEset) batch = pheno$batch modcombat = model.matrix(~1, data=pheno) combat_edata = ComBat(dat=edata, batch=batch, mod=modcombat, par.prior=TRUE, prior.plots=FALSE) # This returns an expression matrix, with the same dimensions # as your original dataset. # By default, it performs parametric empirical Bayesian adjustments. # If you would like to use nonparametric empirical Bayesian adjustments, # use the par.prior=FALSE option (this will take longer).
- ref.batch for reference-based batch adjustment. mean.only option if there is no need to adjust the variancec. Check out paper Alternative empirical Bayes models for adjusting for batch effects in genomic studies Zhang 2018
- Merging two gene-expression studies via cross-platform normalization by Shabalin et al, Bioinformatics 2008. This method (called Cross-Platform Normalization/XPN)was used by Ternès Biometrical Journal 2017.
- Batch effect removal methods for microarray gene expression data integration: a survey by Lazar et al, Bioinformatics 2012. The R package is inSilicoMerging which has been removed from Bioconductor 3.4.
- Question: Combine hgu133a&b and hgu133plus2. Adjusting batch effects in microarray expression data using empirical Bayes methods
- removeBatchEffect() from limma package
- Batch effects and GC content of NGS by Michael Love
- 困扰的batch effect