Seqtools: Difference between revisions

From 太極
Jump to navigation Jump to search
Line 239: Line 239:
* Reference genome from the automatic method: '''BRB_SeqTools_autosetup_reference_genome_files''' (parent directory is determined by the user)
* Reference genome from the automatic method: '''BRB_SeqTools_autosetup_reference_genome_files''' (parent directory is determined by the user)
* Database files from somatic mutation annotator tool: '''~/variantAnnoDatabase'''
* Database files from somatic mutation annotator tool: '''~/variantAnnoDatabase'''
= Reference sequencing and annotation files from iGenomes =
{| class="wikitable"
!
! size (bytes)
! md5sum
|-
| Ensembl grch37
Homo_sapiens_Ensembl_GRCh37.tar.gz
| 19971514224
|
|-
| ncbi grch38
Homo_sapiens_NCBI_GRCh38.tar.gz
| 15848139211
|
|-
| ucsc hg19
Homo_sapiens_UCSC_hg19.tar.gz
|
|
|-
| ucsc hg38
Homo_sapiens_UCSC_hg38.tar.gz
| 16006984068
|
|}


= Tips =
= Tips =
== Tutorial videos ==
== Tutorial videos ==
https://www.youtube.com/playlist?list=PL6A4OqNJzh1l1CnCRdO_Q7o_-0K5CjrP4
https://www.youtube.com/playlist?list=PL6A4OqNJzh1l1CnCRdO_Q7o_-0K5CjrP4

Revision as of 13:12, 25 April 2017

Wiki for BRB-SeqTools

Windows 10 Bash shell

Gene counting and variant call (both Samtools and GATK) works fine. For variant annotation, see the comment #2 below.

  1. Need to install Xming. Before calling ./SeqTools from the Bash shell, run export DISPLAY=:0 first.
    • If we like to start Xming automatically when Windows boots, follow the instruction How to Make a Program Run at Startup on Any Computer.
    • Press Windows + R (or 'run' in the search box) and click the Enter key. Type “shell:startup” into the Run dialog, and press Enter. Now drag-and-drop the Xming shortcut from the “All Apps” list in the Start menu directly into this folder. Reboot Windows to make this change to work. It also helps to download the BRB-SeqTools icon (read below).
  2. Automatic setup:
    • Need to install unzip utility sudo apt-get install unzip.
    • Install fontconfig library sudo apt-get install libfontconfig1-dev.
    • New gnome-terminal windows (use apt-get install) cannot be opened from Bash shell. This affect Automatic setup tools in Tools manager and profile manager.
    • Java JDK from ppa:webup8team does not work. We need to download/install it <jdk-8u112-linux-x64.tar.gz> from Oracle website. To make the installation silently, we need to add two lines to the installation script. See here for apt-get approach and here for tarball approach.
    • An issue in pandoc: timer-create function not implemented. In other words, if we run variant annotation, we will get a bug report message. The main output files (1 vcf and 2 texts files) are generated but the pdf/html files cannot be created.
  3. It is useful to create a Windows icon on the Windows desktop for quick access to BRB-SeqTools program. The BRB-SeqTools icon <BRB-SeqTools.lnk> can be found on Github. A modified automatic setup script <install_rnaseq.sh> can be also found there.

Windows10 seqtools OK.png Win10 seqtools icon.png

Performance

A subset of GSE48215 (about 1/10 of the original FASTQ files) created to run the benchmark.

mkdir GSE48215_22000000
head -n 22000000 GSE48215/SRR925751_1.fastq  > GSE48215_22000000/SRR925751_1.fastq
head -n 22000000 GSE48215/SRR925751_2.fastq  > GSE48215_22000000/SRR925751_2.fastq

The reference genome file is based on UCSC_hg19_chr1 as part of the DNA-Seq sample data.

time (min)
Ubuntu 14.04 host 11
Ubuntu 16.04 vm 26
Windows 10 vm 32

Both virtual machines have 6 cores CPU and 16GB memory. For this dataset, about 8GB memory is enough. VirtualBox 5.0.30 was used.

Software List

See Tools Manager -> Automatic setup. A developer version of the shell script is available on Github. Note: GATK and annovar will not be installed automatically due to the license issue.

Program Major language Version Linux OS Mac OS
bowtie2 C++ 2.2.6 src src
tophat C++ 2.1.0 Linux binary tar.gz Mac binary tar.gz
bwa C 0.7.12 src src
star C++ 2.5.1b one binary tar.gz one binary tar.gz
picard Java 1.141
samtools C 1.3 src src
GATK* Java 3.6
bcftools C 1.3 src src
htslib C 1.3 src src
annovar* Perl 2016Feb01
sratoolkit Shell 2.7.0 Linux binary tar.gz Mac binary tar.gz
fastqc Java 0.11.5 dmg
fastx C 0.0.13 Linux binary tar.bz2 Mac binary tar.bz2
snpeff Java 4_2
htseq Python 0.6.1p1 src src
R R 3.3.x apt-get app
pandoc Haskell 1.16.0.2 deb pkg
latex Ubuntu repository apt-get pkg
lftp Ubuntu repository apt-get homebrew
avfs Ubuntu repository apt-get NA
Java (jdk) 8u112 tar.gz dmg

Download failure

Several software repositories (eg Github, not sourceforge) are hosted by Amazon S3. So be ware of possible Amazon AWS outage.

Hard Disk Space

  • Tools Manager: Automatic setup will download 1.2GB data and take about 3GB disk space. Automatic setup will download tools required for automatic method in Profile Manager and COSMIC data download in Variant Annotation.
  • Profile Manager: Most genomes except hg19 will download 20GB data (hg19 will download 40GB data).
  • Variant Annotation: Each of snpEff and ANNOVAR will download 15GB database for dbNSFP.

Virtual machine

For a 100GB dynamic allocated space VM,

space in GB total used avail vdi ova
After Ubuntu installation 89 3.7 76 4.5 1.7
After running Automatic setup (Tools Manager) 89 6.9 72 8.4 4.2
After running the RNS-Seq sample data 89
After running the DNS-Seq sample data 89
After running snpEff on the DNS-Seq sample data 89

Predefined Locations

  • Demo data: testdata (parent directory is determined by the user)
  • Reference genome from the automatic method: BRB_SeqTools_autosetup_reference_genome_files (parent directory is determined by the user)
  • Database files from somatic mutation annotator tool: ~/variantAnnoDatabase

Reference sequencing and annotation files from iGenomes

size (bytes) md5sum
Ensembl grch37

Homo_sapiens_Ensembl_GRCh37.tar.gz

19971514224
ncbi grch38

Homo_sapiens_NCBI_GRCh38.tar.gz

15848139211
ucsc hg19

Homo_sapiens_UCSC_hg19.tar.gz

ucsc hg38

Homo_sapiens_UCSC_hg38.tar.gz

16006984068

Tips

Tutorial videos

https://www.youtube.com/playlist?list=PL6A4OqNJzh1l1CnCRdO_Q7o_-0K5CjrP4