Genome: Difference between revisions

From 太極
Jump to navigation Jump to search
Line 1: Line 1:
= RNA seq =
= RNA seq =
<pre>
  BWA/Bowtie    samtools       
fa ---------> sam ------> sam/bam (sorted indexed, short reads), vcf
  or tophat
Rsamtools    GenomeFeatures                  edgeR (normalization)
--------->  --------------> table of counts --------->
</pre>


== [http://tophat.cbcb.umd.edu/tutorial.html Tophat] ==
== [http://tophat.cbcb.umd.edu/tutorial.html Tophat] ==

Revision as of 15:58, 5 March 2013

RNA seq

   BWA/Bowtie     samtools        
fa ---------> sam ------> sam/bam (sorted indexed, short reads), vcf
   or tophat

Rsamtools    GenomeFeatures                  edgeR (normalization)
--------->   --------------> table of counts --------->


Tophat

Linux part.

$ type -a tophat # Find out which command the shell executes:
tophat is /home/mli/binary/tophat
$ ls -l ~/binary

Quick test of Tophat program

$ wget http://tophat.cbcb.umd.edu/downloads/test_data.tar.gz
$ tar xzvf test_data.tar.gz
$ cd ~/tophat_test_data/test_data
$ PATH=$PATH:/home/mli/bowtie-0.12.8
$ export PATH
$ ls
reads_1.fq      test_ref.1.ebwt  test_ref.3.bt2  test_ref.rev.1.bt2   test_ref.rev.2.ebwt
reads_2.fq      test_ref.2.bt2   test_ref.4.bt2  test_ref.rev.1.ebwt
test_ref.1.bt2  test_ref.2.ebwt  test_ref.fa     test_ref.rev.2.bt2
$ tophat -r 20 test_ref reads_1.fq reads_2.fq
$ # This will generate a new folder <tophat_out>
$ ls tophat_out
accepted_hits.bam  deletions.bed  insertions.bed  junctions.bed  logs  prep_reads.info  unmapped.bam

Other software

dCHIP

IPA from Ingenuity

Login: There are web started version https://analysis.ingenuity.com/pa and Java applet version https://analysis.ingenuity.com/pa/login/choice.jsp. We can double click the file <IpaApplication.jnlp> in my machine's download folder.

Features:

  • easily search the scientific literature/integrate diverse biological information.
  • build dynamic pathway models
  • quickly analyze experimental data/Functional discovery: assign function to genes
  • share research and collaborate. On the other hand, IPA is web based, so it takes time for running analyses. Once submitted analyses are done, an email will be sent to the user.

Start Here

Expression data -> New core analysis -> Functions/Diseases -> Network analysis
                                        Canonical pathways        |
                                              |                   |
Simple or advanced search --------------------+                   |
                                              |                   |
                                              v                   |
                                        My pathways, Lists <------+
                                              ^
                                              |
Creating a custom pathway --------------------+

Resource:

Notes:

  • The input data file can be an Excel file with at least one gene ID and expression value at the end of columns (just what BRB-ArrayTools requires in general format importer).
  • The data to be uploaded (because IPA is web-based; the projects/analyses will not be saved locally) can be in different forms. See http://ingenuity.force.com/ipa/articles/Feature_Description/Data-Upload-definitions. It uses the term Single/Multiple Observation. An Observation is a list of molecule identifiers and their corresponding expression values for a given experimental treatment. A dataset file may contain a single observation or multiple observations. A Single Observation dataset contains only one experimental condition (i.e. wild-type). A Multiple Observation dataset contains more than one experimental condition (i.e. a time course experiment, a dose response experiment, etc) and can be uploaded into IPA in a single file (e.g. Excel). A maximum of 20 observations in a single file may be uploaded into IPA.
  • The instruction http://ingenuity.force.com/ipa/articles/Feature_Description/Data-Upload-definitions shows what kind of gene identifier types IPA accepts.
  • In this prostate example data tutorial, the term 'fold change' was used to replace log2 gene expression. The tutorial also uses 1.5 as the fold change expression cutoff.
  • The gene table given on the analysis output contains columns 'Fold change', 'ID', 'Notes', 'Symbol' (with tooltip), 'Entrez Gene Name', 'Location', 'Types', 'Drugs'. See a screenshot below.

Screenshots:

IngenuityAnalysisOutput.png

DAVID Bioinformatics Resource