GSEA: Difference between revisions

From 太極
Jump to navigation Jump to search
Line 63: Line 63:


See also [[Arraytools#GSEA|BRB-ArrayTools -> GSEA]].
See also [[Arraytools#GSEA|BRB-ArrayTools -> GSEA]].
= Subramanian algorithm =
In the plot, (x-axis) genes are sorted by their expression across all samples. Y-axis represents enrichment score. See [https://youtu.be/KY6SS4vRchY?t=412 HOW TO PERFORM GSEA - A tutorial on gene set enrichment analysis for RNA-seq]. [https://youtu.be/KY6SS4vRchY?t=429 Bars represents genes being in the gene set]. Genes on the LHS/RHS are more highly expressed in the experimental/control group. Small p means this gene set is enriched in this experimental sample.


= ssGSEA =
= ssGSEA =

Revision as of 15:54, 20 March 2021

GSEA

https://en.wikipedia.org/wiki/Gene_set_enrichment_analysis

Determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states

Two categories of GSEA procedures:

  • Competitive: compare genes in the test set relative to all other genes.
  • Self-contained: whether the gene-set is more DE than one were to expect under the null of no association between two phenotype conditions (without reference to other genes in the genome). For example the method by Jiang & Gentleman Bioinformatics 2007

See also BRB-ArrayTools -> GSEA.

Subramanian algorithm

In the plot, (x-axis) genes are sorted by their expression across all samples. Y-axis represents enrichment score. See HOW TO PERFORM GSEA - A tutorial on gene set enrichment analysis for RNA-seq. Bars represents genes being in the gene set. Genes on the LHS/RHS are more highly expressed in the experimental/control group. Small p means this gene set is enriched in this experimental sample.

ssGSEA

  • https://github.com/broadinstitute/ssGSEA2.0
  • Use "ssgsea-gui.R". The first question is a folder containing input files GCT. The 2nd question is about gene set database in GMT format. This has to be very restrict. For example, "ptm.sig.db.all.uniprot.human.v1.9.0.gmt" and "ptm.sig.db.all.sitegrpid.human.v1.9.0.gmt" provided in github won't work with the example GCT file.
    setwd("~/github/ssGSEA2.0/")
    source("ssgsea-gui.R")
    # select a folder containing gct files; e.g. PI3K_pert_logP_n2x23936.gct 
    # select a gene set file; e.g. <ptm.sig.db.all.flanking.human.v1.8.1.gmt>
    

    A new folder (e.g. 2021-03-01) will be created under the same parent folder as the gct file folder.

    tree -L 1 ~/github/ssGSEA2.0/example/gct/2021-03-20/                         
    
    ├── PI3K_pert_logP_n2x23936_ssGSEA-combined.gct
    ├── PI3K_pert_logP_n2x23936_ssGSEA-fdr-pvalues.gct
    ├── PI3K_pert_logP_n2x23936_ssGSEA-pvalues.gct
    ├── PI3K_pert_logP_n2x23936_ssGSEA-scores.gct
    ├── PI3K_pert_logP_n2x23936_ssGSEA.RData
    ├── parameters.txt
    ├── rank-plots
    ├── run.log
    └── signature_gct
    
    tree ~/github/ssGSEA2.0/example/gct/2021-03-20/rank-plots | head -3 
    # 102 files. One file per matched gene set
    ├── DISEASE.PSP_Alzheime_2.pdf
    ├── DISEASE.PSP_breast_c_2.pdf
    
    tree ~/github/ssGSEA2.0/example/gct/2021-03-20/signature_gct | head -3                    
    # 102 files. One file per matched gene set
    ├── DISEASE.PSP_Alzheimer.s_disease_n2x23.gct
    ├── DISEASE.PSP_breast_cancer_n2x14.gct
    
  • Some discussions from biostars.org. Find -> "ssgsea"
  • Some papers. Proteogenomic Characterization Reveals Therapeutic Vulnerabilities in Lung Adenocarcinoma 2020
  • 【生信分析 3】教你看懂GSEA和ssGSEA分析结果. No groups/classes in the data (6:33). Output is a heatmap. Each value is computed sample by sample. Rows = gene set. Columns = (sorted by the 1st gene set) samples.