Genome: Difference between revisions

From 太極
Jump to navigation Jump to search
Line 10: Line 10:
</pre>
</pre>


== Bowtie ==
Extremely fast, general purpose short read aligner


== [http://tophat.cbcb.umd.edu/tutorial.html Tophat] ==
== [http://tophat.cbcb.umd.edu/tutorial.html Tophat] ==
Aligns RNA-Seq reads to the genome using Bowtie/Discovers splice sites.


Linux part.
Linux part.
Line 36: Line 40:
accepted_hits.bam  deletions.bed  insertions.bed  junctions.bed  logs  prep_reads.info  unmapped.bam
accepted_hits.bam  deletions.bed  insertions.bed  junctions.bed  logs  prep_reads.info  unmapped.bam
</pre>
</pre>
== Cufflinks package ==
=== Cufflinks ===
Assembles transcripts
=== Cuffcompare ===
Compares transcript assemblies to annotation
=== Cuffmerge ===
Merges two or more transcript assemblies
=== Cuffdiff ===
Finds differentially expressed genes and transcripts/Detect differential splicing and promoter use.
== CummeRbund ==
Plots abundance and differential expression results from Cuffdiff.


= Other software =
= Other software =

Revision as of 15:03, 5 March 2013

RNA seq

   BWA/Bowtie     samtools        
fa ---------> sam ------> sam/bam (sorted indexed, short reads), vcf
   or tophat

Rsamtools    GenomeFeatures                  edgeR (normalization)
--------->   --------------> table of counts --------->

Bowtie

Extremely fast, general purpose short read aligner

Tophat

Aligns RNA-Seq reads to the genome using Bowtie/Discovers splice sites.


Linux part.

$ type -a tophat # Find out which command the shell executes:
tophat is /home/mli/binary/tophat
$ ls -l ~/binary

Quick test of Tophat program

$ wget http://tophat.cbcb.umd.edu/downloads/test_data.tar.gz
$ tar xzvf test_data.tar.gz
$ cd ~/tophat_test_data/test_data
$ PATH=$PATH:/home/mli/bowtie-0.12.8
$ export PATH
$ ls
reads_1.fq      test_ref.1.ebwt  test_ref.3.bt2  test_ref.rev.1.bt2   test_ref.rev.2.ebwt
reads_2.fq      test_ref.2.bt2   test_ref.4.bt2  test_ref.rev.1.ebwt
test_ref.1.bt2  test_ref.2.ebwt  test_ref.fa     test_ref.rev.2.bt2
$ tophat -r 20 test_ref reads_1.fq reads_2.fq
$ # This will generate a new folder <tophat_out>
$ ls tophat_out
accepted_hits.bam  deletions.bed  insertions.bed  junctions.bed  logs  prep_reads.info  unmapped.bam

Cufflinks package

Cufflinks

Assembles transcripts

Cuffcompare

Compares transcript assemblies to annotation

Cuffmerge

Merges two or more transcript assemblies

Cuffdiff

Finds differentially expressed genes and transcripts/Detect differential splicing and promoter use.

CummeRbund

Plots abundance and differential expression results from Cuffdiff.

Other software

dCHIP

IPA from Ingenuity

Login: There are web started version https://analysis.ingenuity.com/pa and Java applet version https://analysis.ingenuity.com/pa/login/choice.jsp. We can double click the file <IpaApplication.jnlp> in my machine's download folder.

Features:

  • easily search the scientific literature/integrate diverse biological information.
  • build dynamic pathway models
  • quickly analyze experimental data/Functional discovery: assign function to genes
  • share research and collaborate. On the other hand, IPA is web based, so it takes time for running analyses. Once submitted analyses are done, an email will be sent to the user.

Start Here

Expression data -> New core analysis -> Functions/Diseases -> Network analysis
                                        Canonical pathways        |
                                              |                   |
Simple or advanced search --------------------+                   |
                                              |                   |
                                              v                   |
                                        My pathways, Lists <------+
                                              ^
                                              |
Creating a custom pathway --------------------+

Resource:

Notes:

  • The input data file can be an Excel file with at least one gene ID and expression value at the end of columns (just what BRB-ArrayTools requires in general format importer).
  • The data to be uploaded (because IPA is web-based; the projects/analyses will not be saved locally) can be in different forms. See http://ingenuity.force.com/ipa/articles/Feature_Description/Data-Upload-definitions. It uses the term Single/Multiple Observation. An Observation is a list of molecule identifiers and their corresponding expression values for a given experimental treatment. A dataset file may contain a single observation or multiple observations. A Single Observation dataset contains only one experimental condition (i.e. wild-type). A Multiple Observation dataset contains more than one experimental condition (i.e. a time course experiment, a dose response experiment, etc) and can be uploaded into IPA in a single file (e.g. Excel). A maximum of 20 observations in a single file may be uploaded into IPA.
  • The instruction http://ingenuity.force.com/ipa/articles/Feature_Description/Data-Upload-definitions shows what kind of gene identifier types IPA accepts.
  • In this prostate example data tutorial, the term 'fold change' was used to replace log2 gene expression. The tutorial also uses 1.5 as the fold change expression cutoff.
  • The gene table given on the analysis output contains columns 'Fold change', 'ID', 'Notes', 'Symbol' (with tooltip), 'Entrez Gene Name', 'Location', 'Types', 'Drugs'. See a screenshot below.

Screenshots:

IngenuityAnalysisOutput.png

DAVID Bioinformatics Resource