Microarray: Difference between revisions

From 太極
Jump to navigation Jump to search
 
(12 intermediate revisions by the same user not shown)
Line 1: Line 1:
= Overview =
[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2435252/ Analysis of microarray experiments of gene expression profiling]
= Pathway =
* [http://guangchuangyu.github.io/2016/04/kegg-module-enrichment-analysis/ clusterProfile and KEGG]
= Time Course =
= Time Course =
* http://www.biostat.jhsph.edu/~ririzarr/688/
* http://www.biostat.jhsph.edu/~ririzarr/688/


== Limma package ==
== [http://www.bioconductor.org/packages/release/bioc/html/limma.html Limma] package with few time points ==


* http://master.bioconductor.org/help/course-materials/2005/BioC2005/labs/lab01/drosEmbryo/  or http://bioinf.wehi.edu.au/marray/jsm2005/lab5/lab5.html. Note the package drosEmbryo is not available on Bioc although it can be downloaded from [http://bioinf.wehi.edu.au/marray/jsm2005/Required/Software/drosEmbryo.html bioinf.wehi.edu.au]. It still cannot be used.
* http://master.bioconductor.org/help/course-materials/2005/BioC2005/labs/lab01/drosEmbryo/  or http://bioinf.wehi.edu.au/marray/jsm2005/lab5/lab5.html. Note the package drosEmbryo is not available on Bioc although it can be downloaded from [http://bioinf.wehi.edu.au/marray/jsm2005/Required/Software/drosEmbryo.html bioinf.wehi.edu.au]. It still cannot be used.
Line 11: Line 17:
</pre>
</pre>


* Limma Guide (no .Rd file nor R code!) Section 9.6.
* Limma Guide Section 9.6 (no .Rd file nor R code!).
<pre>
<pre>
# Step 1 - Read the design and create eset.
# Step 1 - Read the design and create eset.
targets = readTargets("targets.txt")
targets <- read.table(file='stdin', header=T)
eset = rma(ReadAffy())
FileName Target
File1 wt.0hr
File2 wt.0hr
File3 wt.6hr
File4 wt.24hr
File5 mu.0hr
File6 mu.0hr
File7 mu.6hr
File8 mu.24hr
# Hit Ctrl+D twice.
# eset = rma(ReadAffy())
exprs <- matrix(rnorm(1000*8), nr=1000)
eset <- ExpressionSet(assayData = exprs)
colnames(exprs) <- targets[,1]
 
lev <- c("wt.0hr","wt.6hr","wt.24hr","mu.0hr","mu.6hr","mu.24hr")
lev <- c("wt.0hr","wt.6hr","wt.24hr","mu.0hr","mu.6hr","mu.24hr")
f <- factor(targets$Target, levels=lev)
f <- factor(targets$Target, levels=lev)
Line 21: Line 41:
colnames(design) <- lev
colnames(design) <- lev
fit <- lmFit(eset, design)
fit <- lmFit(eset, design)
# Step 2 - Which genes respond at either the 6 hour or 24 hour times in the wild-type?
# Step 2 - Which genes respond at either the 6 hour or 24 hour times in the wild-type?
cont.wt <- makeContrasts(
cont.wt <- makeContrasts(
Line 29: Line 50:
fit2 <- eBayes(fit2)
fit2 <- eBayes(fit2)
topTableF(fit2, adjust="BH")
topTableF(fit2, adjust="BH")
# Output:
    wt.6hr.wt.0hr wt.24hr.wt.6hr    AveExpr        F    P.Value adj.P.Val
112    3.2749023    -4.5401508  0.23644620 7.192590 0.006274150 0.9079656
142    0.2028617      3.5727319  0.31292924 6.425034 0.009419387 0.9079656
151    -2.4795481      4.5473025  0.28953869 6.138177 0.011026934 0.9079656
830    0.9768946      2.7571045  0.55529723 5.982599 0.012027262 0.9079656
371    -2.5848778      4.4752189 -0.68087827 5.813166 0.013235184 0.9079656
861    -3.6229467      1.2348225 -0.19658086 5.810336 0.013256488 0.9079656
645    2.6729536    -4.3124373  0.09475226 5.789848 0.013411914 0.9079656
926    3.5387477    -0.9448989  0.15060787 5.484057 0.015994787 0.9079656
113    2.5057968    -4.0472781  0.12254199 5.372589 0.017072749 0.9079656
802    0.5210958      2.9178136 -0.54632441 5.140624 0.019589928 0.9079656
> cont.wt
        Contrasts
Levels    wt.6hr-wt.0hr wt.24hr-wt.6hr
  wt.0hr            -1              0
  wt.6hr              1            -1
  wt.24hr            0              1
  mu.0hr              0              0
  mu.6hr              0              0
  mu.24hr            0              0
# Step 3- Which genes respond (i.e., change over time) in the mutant?
# Step 3- Which genes respond (i.e., change over time) in the mutant?
cont.mu <- makeContrasts(
cont.mu <- makeContrasts(
Line 37: Line 80:
fit2 <- eBayes(fit2)
fit2 <- eBayes(fit2)
topTableF(fit2, adjust="BH")
topTableF(fit2, adjust="BH")
# Output:
    mu.6hr.mu.0hr mu.24hr.mu.6hr    AveExpr        F    P.Value adj.P.Val
203  -4.70500261      4.3795233 -0.52285214 8.753341 0.002919147 0.8967851
797  -3.39270257    -0.5862948 -0.02746123 8.328550 0.003568277 0.8967851
828  -4.25071499      1.8429580  0.34257837 7.590078 0.005125338 0.8967851
764    0.02937019      3.7623514  0.32860517 6.991332 0.006965102 0.8967851
570  -2.14961101    -2.1415450  0.52207117 6.558165 0.008764670 0.8967851
457    2.28175860    -4.5011533 -0.25060779 6.522692 0.008933944 0.8967851
593    2.57911585      1.3339224  0.18262853 6.460369 0.009240381 0.8967851
693  -0.43909594      3.8805006 -0.30368945 6.117644 0.011153404 0.8967851
414    1.76240994    -4.3652464  0.12864332 5.887795 0.012686997 0.8967851
309  -3.57816645      3.4458003 -0.09221823 5.847096 0.012982697 0.8967851
# Step 4- Which genes respond differently over time in the mutant relative to the wild-type?
# Step 4- Which genes respond differently over time in the mutant relative to the wild-type?
cont.dif <- makeContrasts(
cont.dif <- makeContrasts(
Line 45: Line 101:
fit2 <- eBayes(fit2)
fit2 <- eBayes(fit2)
topTableF(fit2, adjust="BH")
topTableF(fit2, adjust="BH")
# Output:
        Dif6hr  Dif24hr    AveExpr        F    P.Value adj.P.Val
797 -6.0630607  1.468086 -0.02746123 8.922239 0.002699040 0.8685775
18  -1.7460726  7.130525  0.12631684 8.707761 0.002981974 0.8685775
764 -0.3370846  5.759971  0.32860517 7.512472 0.005329540 0.8685775
564  5.9827916 -4.071109  0.11052980 7.237032 0.006132233 0.8685775
24  -5.3124510  2.462141 -0.07153017 6.291352 0.010133010 0.8685775
113 -4.1366307  5.990927  0.12254199 6.138815 0.011023028 0.8685775
313 -2.0419712  6.157636 -0.12618538 5.979547 0.012047886 0.8685775
913 -4.4262959  6.195062  0.46998972 5.562351 0.015283627 0.8685775
926 -4.9695447  1.222076  0.15060787 5.496894 0.015875656 0.8685775
674 -0.9636603 -3.808384  0.13392844 5.020330 0.021058895 0.8685775
</pre>
</pre>
== Patient + Trial + Time-point + Response ==
[https://support.bioconductor.org/p/9152299/ LIMMA: design with paired samples within and across treatments]


== A case study using Limma package ==
== A case study using Limma package ==
http://www.biomedcentral.com/1756-0500/3/81
http://www.biomedcentral.com/1756-0500/3/81


== timecourse package ==
== Limma package with many time points ==
The Limma user guide mentioned that when there are many time points for each group, it is reasonable to assume the expression changes smoothly over time rather than making discrete jumps from one time point to another. This type of time course can be analyzed by fitting a temporal trend using a regression spline or a polynomial.
 
Consider the case with 2 groups. Each group has 16 time points. We use a cubic spline curve with a modest number of knots. Choosing effective degrees of freedom to be in range 3-5 is reasonable.
<pre>
library(splines)
X <- ns(targets$Time, df =5)
Group <- factor(targets$Group)
design <- model.matrix(~Group*X)
fit <- lmFit(y, design)
fit <- eBayes(fit)
# This creates a model with 12 parameters, with the last 5 corresponding
# to interaction, i.e., to differences in the curves between groups.
# To detect genes with different time trends for treatment vs control:
topTable(fit, coef=8:12)
</pre>
This conducts a moderated F-test for each gene on 5 df, which can detect very general differences between the treatment and control curves.
 
Note that for this analysis, it is not necessary to have replicates, nor is it necessary for the two treatment groups to be observed at identical time points.
 
== [http://www.bioconductor.org/packages/release/bioc/html/maSigPro.html maSigPro] package ==
 
== [http://www.bioconductor.org/packages/release/bioc/html/timecourse.html timecourse] package ==
 
= meta analysis =
* [https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bbaa019/5753843?rss=1 A survey of gene expression meta-analysis: methods and applications] Daniel Toro-Domínguez 2020
* [https://cran.r-project.org/web/packages/rmeta/index.html rmeta package] which was used in the [https://bioconductor.org/packages/release/bioc/html/genefu.html genefu] package. See also [[R#R_Graphs_Gallery|R Graphs Gallery]].
* [https://youtu.be/IkduL5iRdqo Meta-Analysis in R with {metafor}] (video)
* [https://cran.r-project.org/web/packages/mada/index.html mada]: Meta-Analysis of Diagnostic Accuracy
* [https://cran.r-project.org/web/packages/metap/index.html metapp]: Meta-Analysis of Significance Values

Latest revision as of 10:34, 14 May 2023

Overview

Analysis of microarray experiments of gene expression profiling

Pathway

Time Course

Limma package with few time points

> data(drosEmbryoRMA)
Warning message:
'drosEmbryoRMA' looks like a pre-2.4.0 S4 object: please recreate it 
  • Limma Guide Section 9.6 (no .Rd file nor R code!).
# Step 1 - Read the design and create eset.
targets <- read.table(file='stdin', header=T)
FileName	Target
File1	wt.0hr
File2	wt.0hr
File3	wt.6hr
File4	wt.24hr
File5	mu.0hr
File6	mu.0hr
File7	mu.6hr
File8	mu.24hr
# Hit Ctrl+D twice.
# eset = rma(ReadAffy())
exprs <- matrix(rnorm(1000*8), nr=1000)
eset <- ExpressionSet(assayData = exprs)
colnames(exprs) <- targets[,1]

lev <- c("wt.0hr","wt.6hr","wt.24hr","mu.0hr","mu.6hr","mu.24hr")
f <- factor(targets$Target, levels=lev)
design <- model.matrix(~0+f)
colnames(design) <- lev
fit <- lmFit(eset, design)

# Step 2 - Which genes respond at either the 6 hour or 24 hour times in the wild-type?
cont.wt <- makeContrasts(
"wt.6hr-wt.0hr",
"wt.24hr-wt.6hr",
levels=design)
fit2 <- contrasts.fit(fit, cont.wt)
fit2 <- eBayes(fit2)
topTableF(fit2, adjust="BH")
# Output:
    wt.6hr.wt.0hr wt.24hr.wt.6hr     AveExpr        F     P.Value adj.P.Val
112     3.2749023     -4.5401508  0.23644620 7.192590 0.006274150 0.9079656
142     0.2028617      3.5727319  0.31292924 6.425034 0.009419387 0.9079656
151    -2.4795481      4.5473025  0.28953869 6.138177 0.011026934 0.9079656
830     0.9768946      2.7571045  0.55529723 5.982599 0.012027262 0.9079656
371    -2.5848778      4.4752189 -0.68087827 5.813166 0.013235184 0.9079656
861    -3.6229467      1.2348225 -0.19658086 5.810336 0.013256488 0.9079656
645     2.6729536     -4.3124373  0.09475226 5.789848 0.013411914 0.9079656
926     3.5387477     -0.9448989  0.15060787 5.484057 0.015994787 0.9079656
113     2.5057968     -4.0472781  0.12254199 5.372589 0.017072749 0.9079656
802     0.5210958      2.9178136 -0.54632441 5.140624 0.019589928 0.9079656
> cont.wt
         Contrasts
Levels    wt.6hr-wt.0hr wt.24hr-wt.6hr
  wt.0hr             -1              0
  wt.6hr              1             -1
  wt.24hr             0              1
  mu.0hr              0              0
  mu.6hr              0              0
  mu.24hr             0              0

# Step 3- Which genes respond (i.e., change over time) in the mutant?
cont.mu <- makeContrasts(
"mu.6hr-mu.0hr",
"mu.24hr-mu.6hr",
levels=design)
fit2 <- contrasts.fit(fit, cont.mu)
fit2 <- eBayes(fit2)
topTableF(fit2, adjust="BH")
# Output:
    mu.6hr.mu.0hr mu.24hr.mu.6hr     AveExpr        F     P.Value adj.P.Val
203   -4.70500261      4.3795233 -0.52285214 8.753341 0.002919147 0.8967851
797   -3.39270257     -0.5862948 -0.02746123 8.328550 0.003568277 0.8967851
828   -4.25071499      1.8429580  0.34257837 7.590078 0.005125338 0.8967851
764    0.02937019      3.7623514  0.32860517 6.991332 0.006965102 0.8967851
570   -2.14961101     -2.1415450  0.52207117 6.558165 0.008764670 0.8967851
457    2.28175860     -4.5011533 -0.25060779 6.522692 0.008933944 0.8967851
593    2.57911585      1.3339224  0.18262853 6.460369 0.009240381 0.8967851
693   -0.43909594      3.8805006 -0.30368945 6.117644 0.011153404 0.8967851
414    1.76240994     -4.3652464  0.12864332 5.887795 0.012686997 0.8967851
309   -3.57816645      3.4458003 -0.09221823 5.847096 0.012982697 0.8967851

# Step 4- Which genes respond differently over time in the mutant relative to the wild-type?
cont.dif <- makeContrasts(
Dif6hr =(mu.6hr-mu.0hr)-(wt.6hr-wt.0hr),
Dif24hr=(mu.24hr-mu.6hr)-(wt.24hr-wt.6hr),
levels=design)
fit2 <- contrasts.fit(fit, cont.dif)
fit2 <- eBayes(fit2)
topTableF(fit2, adjust="BH")
# Output:
        Dif6hr   Dif24hr     AveExpr        F     P.Value adj.P.Val
797 -6.0630607  1.468086 -0.02746123 8.922239 0.002699040 0.8685775
18  -1.7460726  7.130525  0.12631684 8.707761 0.002981974 0.8685775
764 -0.3370846  5.759971  0.32860517 7.512472 0.005329540 0.8685775
564  5.9827916 -4.071109  0.11052980 7.237032 0.006132233 0.8685775
24  -5.3124510  2.462141 -0.07153017 6.291352 0.010133010 0.8685775
113 -4.1366307  5.990927  0.12254199 6.138815 0.011023028 0.8685775
313 -2.0419712  6.157636 -0.12618538 5.979547 0.012047886 0.8685775
913 -4.4262959  6.195062  0.46998972 5.562351 0.015283627 0.8685775
926 -4.9695447  1.222076  0.15060787 5.496894 0.015875656 0.8685775
674 -0.9636603 -3.808384  0.13392844 5.020330 0.021058895 0.8685775

Patient + Trial + Time-point + Response

LIMMA: design with paired samples within and across treatments

A case study using Limma package

http://www.biomedcentral.com/1756-0500/3/81

Limma package with many time points

The Limma user guide mentioned that when there are many time points for each group, it is reasonable to assume the expression changes smoothly over time rather than making discrete jumps from one time point to another. This type of time course can be analyzed by fitting a temporal trend using a regression spline or a polynomial.

Consider the case with 2 groups. Each group has 16 time points. We use a cubic spline curve with a modest number of knots. Choosing effective degrees of freedom to be in range 3-5 is reasonable.

library(splines)
X <- ns(targets$Time, df =5)
Group <- factor(targets$Group)
design <- model.matrix(~Group*X)
fit <- lmFit(y, design)
fit <- eBayes(fit) 
# This creates a model with 12 parameters, with the last 5 corresponding 
# to interaction, i.e., to differences in the curves between groups. 
# To detect genes with different time trends for treatment vs control:
topTable(fit, coef=8:12)

This conducts a moderated F-test for each gene on 5 df, which can detect very general differences between the treatment and control curves.

Note that for this analysis, it is not necessary to have replicates, nor is it necessary for the two treatment groups to be observed at identical time points.

maSigPro package

timecourse package

meta analysis