ATdeveloper: Difference between revisions

From 太極
Jump to navigation Jump to search
Line 1: Line 1:
== Release timeline for BRB-ArrayTools ==
== Release timeline for BRB-ArrayTools ==
* 2016/3/?: v4.5.0 stable (R 3.2.4 and Bioc 3.1). Add new buttons in Dynamic heatmap viewer, update SOURCE annotation, Drugbank, MIT/MSigdb...
* Windows 10 released (7/29/2015)
* Windows 10 released (7/29/2015)
* 2015/6/8: v4.5.0 beta 1 (R 3.2.0 and Bioc 3.1). Add local fdr. Import and analysis of RNA-seq data by using edgeR and DESeq2 packages. GSE importer. ST arrays importer using 'oligo' package instead of 'aroma.affymetrix'.
* 2015/6/8: v4.5.0 beta 1 (R 3.2.0 and Bioc 3.1). Add local fdr. Import and analysis of RNA-seq data by using edgeR and DESeq2 packages. GSE importer. ST arrays importer using 'oligo' package instead of 'aroma.affymetrix'.

Revision as of 16:56, 1 March 2016

Release timeline for BRB-ArrayTools

  • 2016/3/?: v4.5.0 stable (R 3.2.4 and Bioc 3.1). Add new buttons in Dynamic heatmap viewer, update SOURCE annotation, Drugbank, MIT/MSigdb...
  • Windows 10 released (7/29/2015)
  • 2015/6/8: v4.5.0 beta 1 (R 3.2.0 and Bioc 3.1). Add local fdr. Import and analysis of RNA-seq data by using edgeR and DESeq2 packages. GSE importer. ST arrays importer using 'oligo' package instead of 'aroma.affymetrix'.
  • 2015/6/8: v4.4.1 stable
  • 2015: BDGE v0.1
  • 2014/11/20: v4.4.0 stable (R 3.1.2 and Bioc 3.0). Fix Excel 2013. Add gene dendrogram to DHV. DGIdb. Jaspar2014.
  • 2014/6/20: v4.4.0 beta 2 (R 3.1.0 and Bioc 2.14). maintenance release and new microRNA gene set.
  • 2014/2/26: v4.4.0 beta 1 (R 3.0.2 and Bioc 2.13). DHV was added.
  • 2013/9/12; v4.3.2 stable (R 3.0.2 and Bioc 2.13). (http://hawk.emmes.com/study/brbuploaddir/bugfix/ArrayTools_v4_3_2_Stable.exe)
  • 2013/6/12: v4.3.1 stable (R 3.0.1 and Bioc 2.12) Because changes in R 3.0.1, some Bioc packages compiled by R 3.0.1 may not work with R 3.0.0. Download link is here
  • 2013/5/24: v4.3.0 stable (R 3.0.0 and Bioc 2.12)
  • 2013/3/7: v4.3.0 beta3 (SOURCE website change, sorting of experiment worksheet). Downloaded from http://pub.emmes.com.
  • 2012/11/28: v4.3.0 beta2
  • 2012/8/15: v4.3.0 beta1
  • 2012/1/24: v4.2.1 stable
  • v4.2.0 stable
  • v4.1.0 stable
  • v3.8.0 stable
  • 2009/3/25: v3.7.1 stable (http://hawk.emmes.com/study/brbuploaddir/bugfix/ArrayTools_v3_7_1.Full.exe)
  • 2008/11/19: v3.7.0 stable
  • v3.6.0 stable
  • v3.5.0 stable
  • v3.4.0 stable
  • v3.3.0 stable
  • v3.2.3
  • v3.2.2
  • v3.2.1

Release timeline for R/Bioconductor

  • 2016/4/15: Bioconductor 3.3
  • 2016/3/10: R-3.2.4
  • 2015/12/10: R-3.2.3. Fix ftp download error in Windows's "wininet".
  • 2015/10/14: Bioconductor 3.2
  • 2015/8/14: R-3.2.2. setInternet2(TRUE) is now the default for windows. The default method for accessing URLs via download.file() and url() has been changed to be "wininet" using Windows API calls. This changes the way proxies need to be set and security settings made.
  • 2015/6/18: R-3.2.1. (cause could not find symbol KeepNA in nchar() function if R 3.2.0 is used with some Bioc packages compiled using R 3.2.1). See this and this posts.
  • 2015/4/17: R-3.2.0, Bioconductor 3.1
  • 2015/3/9: R-3.1.3
  • 2014/10/31: R-3.1.2
  • 2014/10/14: Bioconductor 3.0
  • 2014/4/14: Bioconductor 2.14
  • 2014/4/10: R-3.1.0
  • 2013/10/8: Bioconductor 2.13
  • 2013/9/25: R-3.0.2
  • 2013/5/16: R-3.0.1
  • 2013/4/3: R-3.0.0 and Bioconductor 2.12

See also the developer's page about the release schedule.

Release timeline for BRB DGE

  • 2015/1/14: version 0.1

Release Procedures (including check list for installer)

  1. Misc Folder:
    • CGHTools
    • R installer
    • Rserve
    • Java installer
    • Microsoft C++ 2010 SP1 Redistribute packages, vcredist_x64.exe & vcredist_x86.exe
  2. License file,
    • C:\Installer-2012\ArrayTools-Scripts\V4_3_And_CGH_1_3_RServe_Full\Setup Files\Compressed Files\Language Independent\OS Independent\License.doc. It is a text file.
    • C:\Installer-2012\CGHTools-Scripts\CGH_1_3_Rserve\Setup Files\Compressed Files\Language Independent\OS Independent\License.doc. It is a text file.
  3. License agreement menu, About ArrayTools and About CGHTools menus.
  4. Sample datasets.
  5. ArrayTools: License.doc, Readme.doc.
  6. ArrayTools\Updates: ArrayTools_CurrentVersion.txt.
  7. ArrayTools\Doc: Manual.doc, ManualFrame.doc, Overview of Analysis Tools.doc, Plug-in Manual.doc, StatMethods.doc, Table of Contents.doc, FAQs.pdf.
  8. CGHTools: LICENSE AGREEMENT.doc, Readme.doc
  9. CGHTools\Updates: CGHTools_CurrentVersion.txt.
  10. CGHTools\Doc: CGHManual.doc, CGHManualFrame.doc, Table of Contents.doc.
  11. ArrayTools_UpdateVersion.txt, it will keep on the ArrayTools server.
  12. http://linus.nci.nih.gov/~brb/news.html
  13. Different ArrayTools installers.
    • There are 3 installers: Full, Individual and Update. Main release has no Update installers.
    • Individual installer has no R and no Java installers in the Misc folder. It will detect the required R version (or newer than the required R version). If no required R version or lower, the installation will abort.
    • Update installer has changed files between current and previous freeze folders.
  14. The installers will keep a copy in K:\BRB\ARRAYTOOLS\archive\installers\. Alpha versions will not keep, only Beta versions and release versions will keep on K:.
  15. Puts the new release online:
    • Create a testing page: http://linus.nci.nih.gov/BRB-ArrayTools_test.html
    • Whats-new-v43b1.doc, Readme.doc, License.doc, Manual.doc
    • Java, R download links
    • Installers: Full, Individual, Update
    • ArrayTools_LatestVersion.txt (no more than 5 items)
    • Sending emails to users about new release need about 3 hours. Only Beta 1 and Stable versions need to send out user emails. Other Beta versions don't send to reduce user emails.

Trouble shooting

'This workbook is currently referenced by another workbook and cannot be closed' when I use VBAObjectManager

  1. uninstall AT & CGHTools
  2. use privilege manager to run installer
  3. use privilege manager to open excel and run VBAObjectManager

Another canonical method is

  1. Ctrl + F11 to open VBE
  2. click Project > ArrayTools. Click Tools > Reference. Make sure CGHTools is not checked.
  3. Click Project > CGHTools. Click Tools > Reference. Make sure ArrayTools is checked.

Registrition

http://linus.nci.nih.gov/cgi-bin/brb/matchregistration.cgi?email=xxx.gmail.com
http://linus-stage.nci.nih.gov/cgi-bin/brb/matchregistration.cgi?email=xxx.gmail.com

Check update error

BRB-ArrayTools was not able to check for software updates. The server may be down or too slow right now or you may not connect to the internet.

Solution: Check LatestVersion.txt file on server. It becomes one line. Better to change the format when it was uploaded from PC or manually copy the text in linux environment.

Related to Project Path

BRB-ArrayTools is running the batch job for your analysis.
The working path is E:\DASL data related\Direct Hyb samples\BRB analysis with Direct Hyb samples -Project\Fortran\ClassComparison between Normal and Tumors (Fold change 2 x)
Please wait...
Fortan program is running...'E:\DASL' is not recognized as an internal or external comand,
operable program or batch file.

How to Test

  • Test when internet is not available. For example, GO analysis requires downloading GO.db package. BugReport.tar.gz will be generated if there is a problem with GO.db installation.
  • Test special characters (# sign) in the experiment descriptor worksheet. A wrong <ExpDescWkSht> file used by the DHV (dynamic heatmap viewer) was generated. The reason is read.table() forgot to use comment.char() argument. See bugreportDessen project. PS the error was caught in RserveVBA.
  • Test column header contains double quotes (eg: my "GWAS" score). Some analyses like quantitative trait analysis will fail.
  • Test the case gene annotation was not done so gene symbols were not available. (miRNA project) This causes a bug in the class comparison -> IPA output. The GeneID file may miss some gene identifiers. See PreparegGeneHTMLtable.R.
  • Test the case Symbol was used in the first column of gene identifier file. This will result in two 'Symbol' column in Gene annotations worksheet and break the the code in CreateIPAOutput().
  • Extreme number of arrays (eg 3, 10) This is used to test chunking, ...
  • Extreme number of genes.
  • Use non-default options.
  • Analyze one project after another project to see if there is any memory issue.
  • Use unreasonable entry like 1.5 if the value should be less than or equal to 1.
  • Use long path name for project which causes an error when Fortran program was being launched. The wikipedia page said there is a 126 characters limit in interactive mode.
  • Test on fresh machine (no R libraries installed).
  • Data with lots of missing values.
  • Test on a subset of arrays only.
  • Run an analysis when we don't expect the data type fits in (eg RNA-Seq analysis on regular microarray data).

Misc

Conversion of an old project

  • For an old project needs to be converted, the old project folder will be backed up with a new name 'XXXX_ArrayToolsOldProject'.

After running the class comparison

Filtered log ratio worksheet

Arrays in 'filtered log ratio worksheet' will be sorted based on the class variable (still keeping empty arrays). For example, if we select 'BRCA1 V BRCA2' column, arrays will be sorted by BRCA1, BRCA2 and empty. If we select 'BRCA1 v Sporadic', arrays will be sorted by BRCA1, empty and Sporadic. The gene expression value Html page is also sorted by the class variable (excluding empty label arrays).

Clustered heatmap of significantly expressed genes

Note that the color legend says 'Centered log ratio' (dual channel) or 'Scaled and centered value' (single channel). What 'centered' means is it will center gene expression across arrays for each.

R folder

FileName Functions Notes
Annotations.R
  • CreateAnnotations
  • ChromDistBarPlot
  • GOanalysis
  • GetGOIDsFreq
  • GOOEHTMLTable
  • UpdateGeneIdsAfterAnnotation
  • AnnotateWithGeneIdFile
BioUtil.R
  • SetRPackageDir
  • LoadRPackage
  • InstallAllRPackages
  • OutputAnnotationTextFile
  • MatchAffyAnnotation
  • GetDetectionCall
  • MatchlumiAnnotation
  • MatchlumiMethyAnnotation
  • MatchBiocAnnotation
CreateGeneExpression.R No functions
FilterAndNormalize.R
  • ApplyFilteringAndNormalization
  • AppendWarningMessage
  • DetermineReferenceArray
  • CreateRefArray
  • ApplyNormalization
  • LowessNormalization
  • ApplyTruncation
  • InitializeParameters
  • ReadFilterParamFromFile, GetParam, WriteParam
FilterData.R
  • ComputeGeneFilters
  • CompareGeneFilters
GeneLists.R
  • ParseMSigDBXMLFile
  • CreateGODataFile
  • CreateGOSets
  • CreateGeneListDataFile
  • CheckAllGenelistsInFolder
  • GetPathwaysForOneGene
  • CreatePathwayDataFile
  • MatchGenelists, CheckGenelists
  • GetLongPathwayNames
  • CompareTwoGenelists
Illumina.R
  • lumifunc
  • methyfunc
Input_to_Output_function.R
  • GetParam
  • print.filter.parameters.to.output
  • print.affy.filter.parameters.to.output
  • print.SingleChannel.filter.parameters.to.output
Misc.R
  • SubsetFilteredAndNormalizedLog
  • CheckPhenoAverageDuplic
  • AverageDuplicLG
  • ComputNumberOfPermutations
PrepareGeneTableHTML.R
  • GetGenesHTML
  • outputClassname
  • CreateIPAOutput
PreProcessLog.R
  • GetDataRMatrix
  • LogTransformation
  • PreProcessMain
WriteDataToFiles.R
  • WriteDataToFiles

Plugins/BRBfun folder

FileName Functions Notes
CreateGeneTable.R
  • CreateGeneTable
CreateHtmlTable.R
  • CreateHtmlTable
CreateMasterAnnotationsTable.R
  • CreateMasterAnnotationsTable
CreatePath.R
  • CreatePath
OutputGeneList.R
  • RemoveInvalidChar
  • OutputGeneList

BinaryData folder

text file

  • <ExperimentTable>

R binary file

  • <ApplyFilter.rda> (FilterData.R)
  • <GeneIds>. (FilterData.R)
  • <FilteredAndNormalizedLog1.rda> (Misc.R)

Use load("") to load them to R.

General binary files

  • See GetDataRMatrix() and WriteDataRMatrix() functions from <PreProcessLog.R> file.
    • <GreenRaw1>, <GreenRaw2>, ...
    • <RedRaw1>, <RedRaw2>, ...
    • <Flag1>, <Flag2>, ...
    • <RawData1>, <RawData2>, ...
    • <RedBkg1>, <RedBkg2> ,...
    • <GreenBkg1>, <GreenBkg2>, ...
    • <RedAdj1>, <RedAdj2>, ...
    • <GreenAdj1>, <GreenAdj2>, ...
  • See the ApplyNormalization() function in <FilterAndNormalize.R>.
    • <FilteredAndNormalizedLog1>, <FilteredAndNormalizedLog2>, ...
    • <PreProcessLog1>, <PreProcessLog2>, ...
    • <PrintTipGroups1>
    • <PrintTip1>, <PrintTip2>, ...
    • <RedAdj1>, <RedAdj2>, ...
    • <RedRaw1>, <RedRaw2>, ...

We can read the data using readBin() function in R.

DataRMatrix<-matrix(readBin(con="C:/ArrayTools/Sample datasets/Perou/Perou -Project/BinaryData/GreenRaw1", 
                    what="numeric", n=2998*3, size=4), nr=2998, byrow=F) 
# check <Import.txt> to find number of genes and chunks
str(DataMatrix) # 2998 x 3

Total number of genes

See the d_NumUniqueGenes entry in <DataParam/Import.txt> file.

Total number of arrays

See the d_NumberOfArrays entry in <DataParam/Import.txt> file.

Data importer method

See the s_DataImporter entry in <DataParam/Import.txt> file. Possibly values: Unknown (Bhatt)

Original data type

See the s_OriginalDataType entry in <DataParam/Import.txt> file. Possible values:

  • UnloggedDualRedGreenWithoutBKGData (Perou) ,
  • UnloggedDualRedGreenAndBKGData
  • LoggedDualRatio (GSE22631),
  • LoggedSingleIntensity
  • UnloggedDualRatio (BRCA),
  • UnloggedSingleIntensity (Pomeroy, Bhatt).

These data types appear at PreProcessMain() function in <PreProcessLog.R> file.

First column lable

See the s_FirstColumnLabel entry in <DataParam/Import.txt> file.

Chip type for Affy data

See the s_AffyGeneChip entry in <DataParam/Import.txt> file. For example, hgu95av2.

Good R writing habit

  • Use indentation.
  • Use parenthesis if needed. For example, the following is bad
if (something) DoSomething
else DoAnother
  • read.delim() is better than read.table(). When we use read.table() or write.table(), use their options to avoid special characters woes. Note: quote=FALSE in write.table() does not give a correct result as we expect, so use quote="" specifically.
read.table(FILENAME, stringAsFactors=FALSE, header=TRUE, sep="\t",
           na.strings=c('NA',''), fill=TRUE, comment.char='', quote="")
write.table(FILENAME, row.names=FALSE, col.names=TRUE, sep="\t", quote="")

RSS

ChangeDetection

  • CRAN
  • Bioconductor
  • Drug Bank
  • GSEA
  • BRB-ArrayTools