Heatmap: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
|||
Line 1: | Line 1: | ||
= Evaluate the effect of centering | = Evaluate the effect of centering & scaling = | ||
== 1-correlation distance == | == 1-correlation distance == | ||
Effect of centering and scaling on clustering of genes and samples in terms of distance. | Effect of centering and scaling on clustering of genes and samples in terms of distance. | ||
Line 33: | Line 33: | ||
| Yes | | Yes | ||
|} | |} | ||
== Supporting R code == | |||
1. 1-Corr distance | |||
<pre> | |||
dist.no <- 1-cor(t(as.matrix(esetSel2))) | |||
dist.mean <- 1-cor(t(sweep(as.matrix(esetSel2), 1L, rowMeans(as.matrix(esetSel2))))) | |||
dist.median <- 1-cor(t(sweep(as.matrix(esetSel2), 1L, apply(esetSel2, 1, median)))) | |||
range(dist.no - dist.mean) # [1] -1.110223e-16 0.000000e+00 | |||
range(dist.no - dist.median) # [1] -1.110223e-16 0.000000e+00 | |||
range(dist.mean - dist.median) # [1] 0 0 | |||
</pre> | |||
So centering (no matter which measure: mean, median, ...), it won't affect 1-corr distance. | |||
<pre> | |||
dist.mean <- 1-cor(t(sweep(as.matrix(esetSel2), 1L, rowMeans(as.matrix(esetSel2)), "/"))) | |||
dist.median <- 1-cor(t(sweep(as.matrix(esetSel2), 1L, apply(esetSel2, 1, median), "/"))) | |||
range(dist.no - dist.mean) # [1] -8.881784e-16 6.661338e-16 | |||
range(dist.no - dist.median) # [1] -6.661338e-16 6.661338e-16 | |||
range(dist.mean - dist.median) # [1] -1.110223e-15 1.554312e-15 | |||
</pre> | |||
So scaling (no matter what measures: mean, median,...), it won't affect 1-corr distance. | |||
How about centering / scaling genes on array clustering? | |||
<pre> | |||
dist.no <- 1-cor(as.matrix(esetSel2)) | |||
dist.mean <- 1-cor(sweep(as.matrix(esetSel2), 1L, rowMeans(as.matrix(esetSel2)))) | |||
dist.median <- 1-cor(sweep(as.matrix(esetSel2), 1L, apply(esetSel2, 1, median))) | |||
range(dist.no - dist.mean) # [1] -1.547086 0.000000 | |||
range(dist.no - dist.median) # [1] -1.483427 0.000000 | |||
range(dist.mean - dist.median) # [1] -0.5283601 0.6164602 | |||
dist.mean <- 1-cor(sweep(as.matrix(esetSel2), 1L, rowMeans(as.matrix(esetSel2)), "/")) | |||
dist.median <- 1-cor(sweep(as.matrix(esetSel2), 1L, apply(esetSel2, 1, median), "/")) | |||
range(dist.no - dist.mean) # [1] -1.477407 0.000000 | |||
range(dist.no - dist.median) # [1] -1.349419 0.000000 | |||
range(dist.mean - dist.median) # [1] -0.5419534 0.6269875 | |||
</pre> | |||
2. Euclidean distance | |||
<pre> | |||
dist.no <- dist(as.matrix(esetSel2)) | |||
dist.mean <- dist(sweep(as.matrix(esetSel2), 1L, rowMeans(as.matrix(esetSel2)))) | |||
dist.median <- dist(sweep(as.matrix(esetSel2), 1L, apply(esetSel2, 1, median))) | |||
range(dist.no - dist.mean) # [1] 7.198864e-05 2.193487e+01 | |||
range(dist.no - dist.median) # [1] -0.3715231 21.9320846 | |||
range(dist.mean - dist.median) # [1] -0.923717629 -0.000088385 | |||
</pre> | |||
Centering does affect the Euclidean distance. | |||
<pre> | |||
dist.mean <- dist(sweep(as.matrix(esetSel2), 1L, rowMeans(as.matrix(esetSel2)), "/")) | |||
dist.median <- dist(sweep(as.matrix(esetSel2), 1L, apply(esetSel2, 1, median), "/")) | |||
range(dist.no - dist.mean) # [1] 0.7005071 24.0698991 | |||
range(dist.no - dist.median) # [1] 0.636749 24.068920 | |||
range(dist.mean - dist.median) # [1] -0.22122869 0.02906131 | |||
</pre> | |||
And scaling affects Euclidean distance too. | |||
How about centering / scaling genes on array clustering? | |||
<pre> | |||
dist.no <- dist(t(as.matrix(esetSel2))) | |||
dist.mean <- dist(t(sweep(as.matrix(esetSel2), 1L, rowMeans(as.matrix(esetSel2))))) | |||
dist.median <- dist(t(sweep(as.matrix(esetSel2), 1L, apply(esetSel2, 1, median)))) | |||
range(dist.no - dist.mean) # 0 0 | |||
range(dist.no - dist.median) # 0 0 | |||
range(dist.mean - dist.median) # 0 0 | |||
dist.mean <- dist(t(sweep(as.matrix(esetSel2), 1L, rowMeans(as.matrix(esetSel2)), "/"))) | |||
dist.median <- dist(t(sweep(as.matrix(esetSel2), 1L, apply(esetSel2, 1, median), "/"))) | |||
range(dist.no - dist.mean) # [1] 1.698960 9.383789 | |||
range(dist.no - dist.median) # [1] 1.683028 9.311603 | |||
range(dist.mean - dist.median) # [1] -0.09139173 0.02546394 | |||
</pre> | |||
= [http://cran.r-project.org/web/packages/gplots/index.html gplots] package = | = [http://cran.r-project.org/web/packages/gplots/index.html gplots] package = |
Revision as of 10:00, 13 November 2014
Evaluate the effect of centering & scaling
1-correlation distance
Effect of centering and scaling on clustering of genes and samples in terms of distance. 'Yes' means the distance was changed compared to the baseline where no centering or scaling was applied.
clustering genes | clustering samples | |
---|---|---|
centering on each genes | No | Yes |
scaling on each genes | No |
Euclidean distance
clustering genes | clustering samples | |
---|---|---|
centering on each genes | Yes | Yes |
scaling on each genes | Yes | Yes |
Supporting R code
1. 1-Corr distance
dist.no <- 1-cor(t(as.matrix(esetSel2))) dist.mean <- 1-cor(t(sweep(as.matrix(esetSel2), 1L, rowMeans(as.matrix(esetSel2))))) dist.median <- 1-cor(t(sweep(as.matrix(esetSel2), 1L, apply(esetSel2, 1, median)))) range(dist.no - dist.mean) # [1] -1.110223e-16 0.000000e+00 range(dist.no - dist.median) # [1] -1.110223e-16 0.000000e+00 range(dist.mean - dist.median) # [1] 0 0
So centering (no matter which measure: mean, median, ...), it won't affect 1-corr distance.
dist.mean <- 1-cor(t(sweep(as.matrix(esetSel2), 1L, rowMeans(as.matrix(esetSel2)), "/"))) dist.median <- 1-cor(t(sweep(as.matrix(esetSel2), 1L, apply(esetSel2, 1, median), "/"))) range(dist.no - dist.mean) # [1] -8.881784e-16 6.661338e-16 range(dist.no - dist.median) # [1] -6.661338e-16 6.661338e-16 range(dist.mean - dist.median) # [1] -1.110223e-15 1.554312e-15
So scaling (no matter what measures: mean, median,...), it won't affect 1-corr distance.
How about centering / scaling genes on array clustering?
dist.no <- 1-cor(as.matrix(esetSel2)) dist.mean <- 1-cor(sweep(as.matrix(esetSel2), 1L, rowMeans(as.matrix(esetSel2)))) dist.median <- 1-cor(sweep(as.matrix(esetSel2), 1L, apply(esetSel2, 1, median))) range(dist.no - dist.mean) # [1] -1.547086 0.000000 range(dist.no - dist.median) # [1] -1.483427 0.000000 range(dist.mean - dist.median) # [1] -0.5283601 0.6164602 dist.mean <- 1-cor(sweep(as.matrix(esetSel2), 1L, rowMeans(as.matrix(esetSel2)), "/")) dist.median <- 1-cor(sweep(as.matrix(esetSel2), 1L, apply(esetSel2, 1, median), "/")) range(dist.no - dist.mean) # [1] -1.477407 0.000000 range(dist.no - dist.median) # [1] -1.349419 0.000000 range(dist.mean - dist.median) # [1] -0.5419534 0.6269875
2. Euclidean distance
dist.no <- dist(as.matrix(esetSel2)) dist.mean <- dist(sweep(as.matrix(esetSel2), 1L, rowMeans(as.matrix(esetSel2)))) dist.median <- dist(sweep(as.matrix(esetSel2), 1L, apply(esetSel2, 1, median))) range(dist.no - dist.mean) # [1] 7.198864e-05 2.193487e+01 range(dist.no - dist.median) # [1] -0.3715231 21.9320846 range(dist.mean - dist.median) # [1] -0.923717629 -0.000088385
Centering does affect the Euclidean distance.
dist.mean <- dist(sweep(as.matrix(esetSel2), 1L, rowMeans(as.matrix(esetSel2)), "/")) dist.median <- dist(sweep(as.matrix(esetSel2), 1L, apply(esetSel2, 1, median), "/")) range(dist.no - dist.mean) # [1] 0.7005071 24.0698991 range(dist.no - dist.median) # [1] 0.636749 24.068920 range(dist.mean - dist.median) # [1] -0.22122869 0.02906131
And scaling affects Euclidean distance too.
How about centering / scaling genes on array clustering?
dist.no <- dist(t(as.matrix(esetSel2))) dist.mean <- dist(t(sweep(as.matrix(esetSel2), 1L, rowMeans(as.matrix(esetSel2))))) dist.median <- dist(t(sweep(as.matrix(esetSel2), 1L, apply(esetSel2, 1, median)))) range(dist.no - dist.mean) # 0 0 range(dist.no - dist.median) # 0 0 range(dist.mean - dist.median) # 0 0 dist.mean <- dist(t(sweep(as.matrix(esetSel2), 1L, rowMeans(as.matrix(esetSel2)), "/"))) dist.median <- dist(t(sweep(as.matrix(esetSel2), 1L, apply(esetSel2, 1, median), "/"))) range(dist.no - dist.mean) # [1] 1.698960 9.383789 range(dist.no - dist.median) # [1] 1.683028 9.311603 range(dist.mean - dist.median) # [1] -0.09139173 0.02546394
gplots package
The following example is extracted from DESeq2 package.
## ----loadDESeq2, echo=FALSE---------------------------------------------- # in order to print version number below library("DESeq2") ## ----loadExonsByGene, echo=FALSE----------------------------------------- library("parathyroidSE") library("GenomicFeatures") data(exonsByGene) ## ----locateFiles, echo=FALSE--------------------------------------------- bamDir <- system.file("extdata",package="parathyroidSE",mustWork=TRUE) fls <- list.files(bamDir, pattern="bam$",full=TRUE) ## ----bamfilepaired------------------------------------------------------- library( "Rsamtools" ) bamLst <- BamFileList( fls, yieldSize=100000 ) ## ----sumOver------------------------------------------------------------- library( "GenomicAlignments" ) se <- summarizeOverlaps( exonsByGene, bamLst, mode="Union", singleEnd=FALSE, ignore.strand=TRUE, fragments=TRUE ) ## ----libraries----------------------------------------------------------- library( "DESeq2" ) library( "parathyroidSE" ) ## ----loadEcs------------------------------------------------------------- data( "parathyroidGenesSE" ) se <- parathyroidGenesSE colnames(se) <- se$run ## ----fromSE-------------------------------------------------------------- ddsFull <- DESeqDataSet( se, design = ~ patient + treatment ) ## ----collapse------------------------------------------------------------ ddsCollapsed <- collapseReplicates( ddsFull, groupby = ddsFull$sample, run = ddsFull$run ) ## ----subsetCols---------------------------------------------------------- dds <- ddsCollapsed[ , ddsCollapsed$time == "48h" ] ## ----subsetRows, echo=FALSE---------------------------------------------- idx <- which(rowSums(counts(dds)) > 0)[1:4000] dds <- dds[idx,] ## ----runDESeq, cache=TRUE------------------------------------------------ dds <- DESeq(dds) rld <- rlog( dds) library( "genefilter" ) topVarGenes <- head( order( rowVars( assay(rld) ), decreasing=TRUE ), 35 ) ## ----beginner_geneHeatmap, fig.width=9, fig.height=9--------------------- library(RColorBrewer) library(gplots) heatmap.2( assay(rld)[ topVarGenes, ], scale="row", trace="none", dendrogram="column", col = colorRampPalette( rev(brewer.pal(9, "RdBu")) )(255))