GSEA: Difference between revisions
Jump to navigation
Jump to search
(→ssGSEA) |
(→ssGSEA) |
||
Line 101: | Line 101: | ||
<li>Some [https://www.biostars.org/p/402856/ discussions] from biostars.org. Find -> "ssgsea"</li> | <li>Some [https://www.biostars.org/p/402856/ discussions] from biostars.org. Find -> "ssgsea"</li> | ||
<li>Some papers. [https://www.sciencedirect.com/science/article/pii/S0092867420307443 Proteogenomic Characterization Reveals Therapeutic Vulnerabilities in Lung Adenocarcinoma] 2020 </li> | <li>Some papers. [https://www.sciencedirect.com/science/article/pii/S0092867420307443 Proteogenomic Characterization Reveals Therapeutic Vulnerabilities in Lung Adenocarcinoma] 2020 </li> | ||
<li>[https://youtu.be/AdxTuwAPskg 【生信分析 3】教你看懂GSEA和ssGSEA分析结果] </li> | <li>[https://youtu.be/AdxTuwAPskg 【生信分析 3】教你看懂GSEA和ssGSEA分析结果]. [https://youtu.be/AdxTuwAPskg?t=393 No groups/classes in the data] (6:33). Output is a heatmap. Each value is computed sample by sample. Rows = gene set. Columns = (sorted by the 1st gene set) samples.</li> | ||
</ul> | </ul> |
Revision as of 14:32, 20 March 2021
GSEA
https://en.wikipedia.org/wiki/Gene_set_enrichment_analysis
Determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states
- https://www.bioconductor.org/help/course-materials/2015/SeattleApr2015/E_GeneSetEnrichment.html, https://www.bioconductor.org/help/course-materials/2009/SeattleApr09/gsea/HyperG_Lecture.pdf. For hypergeometric test, it is
- finding Biocarta or KEGG pathways significantly enriched in the user's gene list
- Are DE genes in the set more common than DE genes not in the set?.
- Are selected genes more often in the GO category than expected by chance? (one-tailed)
We can draw a 2x2 table or a venn diagram to see the diagram.
In gene set Yes No DE Yes No
- Gene List Enrichment Analysis dhyper(), binom.test(), fisher.test().
- Pathway Commons
- Gene set analysis methods: statistical models and methodological differences 2014
- http://software.broadinstitute.org/gsea/index.jsp, Subramanian, et al 2005 paper
- HOW TO PERFORM GSEA - A tutorial on gene set enrichment analysis for RNA-seq (video)
- Algorithm. GSEA walks down the ranked list of genes, increasing a running-sum statistic when a gene belongs to the set and decreasing it when the gene does not.
- Interpretation of 3 enrichment plots. The 1st plot (ES on y-axis) tells you how over or under expressed is your gene respect to the ranked list. The 2nd part of the graph (barcode-like) shows where the members of the gene set appear in the ranked list of genes. The 3rd graph (y=ranked list metric, x=rank) shows how your metric is distributed along the list.
- slides with formulas.
- What does a negative enrichment score mean? A negative NES will indicate that the genes in the set S will be mostly at the bottom of your list L.
- GSEA R Implementation from GSEA-MSigDB.
- READMe.
- Using GSEA.1.0.R
- Statistical power of gene-set enrichment analysis is a function of gene set correlation structure by SWANSON 2017
- Towards a gold standard for benchmarking gene set enrichment analysis, GSEABenchmarkeR package
- piano package
- clusterProfiler package and the online book.
- Gene-set Enrichment with Regularized Regression Fang 2019
- msigdbr package. MSigDB Gene Sets for Multiple Organisms in a Tidy Data Format.
- fgsea and download stat
- Best method/package for Gene Set Enrichment Analysis in R? and the gage package
- ES could be negative; see Genome 559: Introduction to Statistical and Computational Genomics
- multiGSEA: a GSEA-based pathway enrichment analysis for multi-omics data
- How to use DAVID for functional annotation of genes, Using DAVID for Functional Enrichment Analysis in a Set of Genes (Part 1), (Part 2) (video)
Two categories of GSEA procedures:
- Competitive: compare genes in the test set relative to all other genes.
- Self-contained: whether the gene-set is more DE than one were to expect under the null of no association between two phenotype conditions (without reference to other genes in the genome). For example the method by Jiang & Gentleman Bioinformatics 2007
See also BRB-ArrayTools -> GSEA.
ssGSEA
- https://github.com/broadinstitute/ssGSEA2.0
- Use "ssgsea-gui.R". The first question is a folder containing input files GCT. The 2nd question is about gene set database in GMT format. This has to be very restrict. For example, "ptm.sig.db.all.uniprot.human.v1.9.0.gmt" and "ptm.sig.db.all.sitegrpid.human.v1.9.0.gmt" provided in github won't work with the example GCT file.
setwd("~/github/ssGSEA2.0/") source("ssgsea-gui.R") # select a folder containing gct files; e.g. PI3K_pert_logP_n2x23936.gct # select a gene set file; e.g. <ptm.sig.db.all.flanking.human.v1.8.1.gmt>
A new folder (e.g. 2021-03-01) will be created under the same parent folder as the gct file folder.
tree -L 1 ~/github/ssGSEA2.0/example/gct/2021-03-20/ ├── PI3K_pert_logP_n2x23936_ssGSEA-combined.gct ├── PI3K_pert_logP_n2x23936_ssGSEA-fdr-pvalues.gct ├── PI3K_pert_logP_n2x23936_ssGSEA-pvalues.gct ├── PI3K_pert_logP_n2x23936_ssGSEA-scores.gct ├── PI3K_pert_logP_n2x23936_ssGSEA.RData ├── parameters.txt ├── rank-plots ├── run.log └── signature_gct tree ~/github/ssGSEA2.0/example/gct/2021-03-20/rank-plots | head -3 # 102 files. One file per matched gene set ├── DISEASE.PSP_Alzheime_2.pdf ├── DISEASE.PSP_breast_c_2.pdf tree ~/github/ssGSEA2.0/example/gct/2021-03-20/signature_gct | head -3 # 102 files. One file per matched gene set ├── DISEASE.PSP_Alzheimer.s_disease_n2x23.gct ├── DISEASE.PSP_breast_cancer_n2x14.gct
- Some discussions from biostars.org. Find -> "ssgsea"
- Some papers. Proteogenomic Characterization Reveals Therapeutic Vulnerabilities in Lung Adenocarcinoma 2020
- 【生信分析 3】教你看懂GSEA和ssGSEA分析结果. No groups/classes in the data (6:33). Output is a heatmap. Each value is computed sample by sample. Rows = gene set. Columns = (sorted by the 1st gene set) samples.