ATdeveloper: Difference between revisions
Line 44: | Line 44: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
* 2021/2/17: | |||
* 2020/10/28: Bioconductor 3.12 | * 2020/10/28: Bioconductor 3.12 | ||
* 2020/4/28: Bioconductor 3.11 | * 2020/4/28: Bioconductor 3.11 |
Revision as of 09:49, 18 February 2021
Release timeline for BRB-ArrayTools
- 2021/2/x: v4.6.2 beta1 (R 3.5.1 and Bioc 3.7). Upgrade Rserve to 1.8-7. Fix MSigDB v6.2 URL link.
- 2020/6/22: v4.6.1 stable (R 3.5.1 and Bioc 3.7). BiocInstaller::biocVersion(). Update web links for GEO GDS importer and DrugBank utility.
- 2018/1/4: v4.6.0 beta2 (R 3.4.3 and Bioc 3.6). Fix a bug in DESeq when the gene filter was applied.
- 2017/7/12: v4.6.0 beta (R 3.4.1 and Bioc 3.5). find over-represented pathways in a gene list. Enhance DESeq and edgeR.
- 2016/8/9: v4.5.1 stable (R 3.2.5 and Bioc 3.3). Fix a bug in dynamic heatmap viewer gene labels and heatmap (reversed order).
- 2016/4/8: v4.5.0 stable (R 3.2.4 and Bioc 3.2). Add new buttons in Dynamic heatmap viewer, update SOURCE annotation, Drugbank, MIT/MSigdb...
- Windows 10 released (7/29/2015)
- 2015/6/8: v4.5.0 beta 1 (R 3.2.0 and Bioc 3.1). Add local fdr. Import and analysis of RNA-seq data by using edgeR and DESeq2 packages. GSE importer. ST arrays importer using 'oligo' package instead of 'aroma.affymetrix'.
- 2015/6/8: v4.4.1 stable
- 2015: BDGE v0.1
- 2014/11/20: v4.4.0 stable (R 3.1.2 and Bioc 3.0). Fix Excel 2013. Add gene dendrogram to DHV. DGIdb. Jaspar2014.
- 2014/6/20: v4.4.0 beta 2 (R 3.1.0 and Bioc 2.14). maintenance release and new microRNA gene set.
- 2014/2/26: v4.4.0 beta 1 (R 3.0.2 and Bioc 2.13). DHV was added.
- 2013/9/12; v4.3.2 stable (R 3.0.2 and Bioc 2.13). (http://hawk.emmes.com/study/brbuploaddir/bugfix/ArrayTools_v4_3_2_Stable.exe)
- 2013/6/12: v4.3.1 stable (R 3.0.1 and Bioc 2.12) Because changes in R 3.0.1, some Bioc packages compiled by R 3.0.1 may not work with R 3.0.0. Download link is here
- 2013/5/24: v4.3.0 stable (R 3.0.0 and Bioc 2.12)
- 2013/3/7: v4.3.0 beta3 (SOURCE website change, sorting of experiment worksheet). Downloaded from http://pub.emmes.com.
- 2012/11/28: v4.3.0 beta2
- 2012/8/15: v4.3.0 beta1
- 2012/1/24: v4.2.1 stable
- v4.2.0 stable
- v4.1.0 stable
- v3.8.0 stable
- 2009/3/25: v3.7.1 stable (http://hawk.emmes.com/study/brbuploaddir/bugfix/ArrayTools_v3_7_1.Full.exe)
- 2008/11/19: v3.7.0 stable
- v3.6.0 stable
- v3.5.0 stable
- v3.4.0 stable
- v3.3.0 stable
- v3.2.3
- v3.2.2
- v3.2.1
Release timeline for R/Bioconductor
The Bioconductor's release version is related to R version.
> R.version$version.string [1] "R version 3.2.5 (2016-04-14)" > source("http://bioconductor.org/biocLite.R") Bioconductor version 3.2 (BiocInstaller 1.20.3), ?biocLite for help A new version of Bioconductor is available after installing the most recent version of R; see http://bioconductor.org/install
- 2021/2/17:
- 2020/10/28: Bioconductor 3.12
- 2020/4/28: Bioconductor 3.11
- 2020/4/24: R 4.0.0
- 2019/10/30: Bioconductor 3.10
- 2019/5/3: Bioconductor 3.9
- 2019/4/26: R-3.6.0 What's new in R 3.6.0
- 2019/3/11: R-3.5.3
- 2018/12/20: R-3.5.2
- 2018/10/31: Bioconductor 3.8 (starting to use BiocManager, see this too)
- 2018/7/2: R-3.5.1
- 2018/5/1: Bioconductor 3.7
- 2018/4/23: R-3.5.0
- 2018/3/15: R-3.4.4
- 2017/11/30: R-3.4.3
- 2017/10/31: Bioconductor 3.6
- 2017/9/28: R-3.4.2. blog
- 2017/6/30: R-3.4.1
- 2017/4/25: Bioconductor 3.5
- 2017/4/21: R-3.4.0
- 2016/3/6: R-3.3.3
- 2016/10/31: R-3.3.2
- 2016/10/18: Bioconductor 3.4
- 2016/6/21: R-3.3.1
- 2016/5/4: Bioconductor 3.3
- 2016/5/3: R-3.3.0 (gcc in Rtools will be upgraded to 4.9.3 from 4.6.3)
- 2016/4/14: R-3.2.5 (fix 1. printing and formatting of POSIXIt objects/Daylight Savings Time wrong. 2. Makefile affecting system using R's bundled lzma library). In fact, Daily R News shows there will be an R 3.2.4 patch on 3/15/2016. That implies the creation of R 3.2.5.
- 2016/3/16: R-3.2.4-revised
- 2016/3/10: R-3.2.4
- 2015/12/10: R-3.2.3. Fix ftp download error in Windows's "wininet".
- 2015/10/14: Bioconductor 3.2
- 2015/8/14: R-3.2.2. setInternet2(TRUE) is now the default for windows. The default method for accessing URLs via download.file() and url() has been changed to be "wininet" using Windows API calls. This changes the way proxies need to be set and security settings made.
- 2015/6/18: R-3.2.1. (cause could not find symbol KeepNA in nchar() function if R 3.2.0 is used with some Bioc packages compiled using R 3.2.1). See this and this posts.
- 2015/4/17: R-3.2.0, Bioconductor 3.1
- 2015/3/9: R-3.1.3
- 2014/10/31: R-3.1.2
- 2014/10/14: Bioconductor 3.0
- 2014/4/14: Bioconductor 2.14
- 2014/4/10: R-3.1.0
- 2013/10/8: Bioconductor 2.13
- 2013/9/25: R-3.0.2
- 2013/5/16: R-3.0.1
- 2013/4/3: R-3.0.0 and Bioconductor 2.12
See also the developer's page about the release schedule.
- rversions package
- R Use version (no parentheses) to check the R version.
- Bioconductor Use the command BiocInstaller::biocVersion() to check the installed Bioconductor version.
- Use sessionInfo() to check the attached packages in the current session.
- Bioconductor releases This gives a table-like list of Bioconductor releases and R versions.
Release timeline for BRB-SeqTools
- 2017/8/3: version 1.2 (macOS, Windows 10, subread aligner, featureCount, xenograft)
- 2016/10/19: version 1.0 (BRB-SeqTools)
- 2015/1/14: version 0.1 (BRB-DGE)
Release Procedures (including check list for installer)
- Misc Folder:
- CGHTools
- R installer
- Rserve
- Java installer
- Microsoft C++ 2010 SP1 Redistribute packages, vcredist_x64.exe & vcredist_x86.exe
- License file,
- C:\Installer-2012\ArrayTools-Scripts\V4_3_And_CGH_1_3_RServe_Full\Setup Files\Compressed Files\Language Independent\OS Independent\License.doc. It is a text file.
- C:\Installer-2012\CGHTools-Scripts\CGH_1_3_Rserve\Setup Files\Compressed Files\Language Independent\OS Independent\License.doc. It is a text file.
- License agreement menu, About ArrayTools and About CGHTools menus.
- Sample datasets.
- ArrayTools: License.doc, Readme.doc.
- ArrayTools\Updates: ArrayTools_CurrentVersion.txt.
- ArrayTools\Doc: Manual.doc, ManualFrame.doc, Overview of Analysis Tools.doc, Plug-in Manual.doc, StatMethods.doc, Table of Contents.doc, FAQs.pdf.
- CGHTools: LICENSE AGREEMENT.doc, Readme.doc
- CGHTools\Updates: CGHTools_CurrentVersion.txt.
- CGHTools\Doc: CGHManual.doc, CGHManualFrame.doc, Table of Contents.doc.
- ArrayTools_UpdateVersion.txt, it will keep on the ArrayTools server.
- http://linus.nci.nih.gov/~brb/news.html (deprecated!)
- Different ArrayTools installers (deprecated).
- There are 3 installers: Full, Individual and Update. Main release has no Update installers.
- Individual installer has no R and no Java installers in the Misc folder. It will detect the required R version (or newer than the required R version). If no required R version or lower, the installation will abort.
- Update installer has changed files between current and previous freeze folders.
- The installers will keep a copy in K:\BRB\ARRAYTOOLS\archive\installers\. Alpha versions will not keep, only Beta versions and release versions will keep on K:.
- Puts the new release online:
- Update index.html and download.html on http://brb-stage.nci.nih.gov/BRB-ArrayTools/
- Upload related files including Whats-new-v43b1.doc, Readme (convert to pdf from doc), License.doc, Manual.doc and ArrayTools_LatestVersion.txt (no more than 5 items) to new_updates directory
- Java, R download links
- Sending emails to users about new release need about 3 hours. Only Beta 1 and Stable versions need to send out user emails. Other Beta versions don't send to reduce user emails.
Trouble shooting
'This workbook is currently referenced by another workbook and cannot be closed' when I use VBAObjectManager
- uninstall AT & CGHTools
- use privilege manager to run installer
- use privilege manager to open excel and run VBAObjectManager
Another canonical method is
- Ctrl + F11 to open VBE
- click Project > ArrayTools. Click Tools > Reference. Make sure CGHTools is not checked.
- Click Project > CGHTools. Click Tools > Reference. Make sure ArrayTools is checked.
Registrition
http://linus.nci.nih.gov/cgi-bin/brb/matchregistration.cgi?email=xxx.gmail.com http://linus-stage.nci.nih.gov/cgi-bin/brb/matchregistration.cgi?email=xxx.gmail.com
Check update error
BRB-ArrayTools was not able to check for software updates. The server may be down or too slow right now or you may not connect to the internet.
Solution: Check LatestVersion.txt file on server. When we download the file using wget on Windows, the content becomes one line. Better to change the format when it was uploaded from PC or manually copy the text in linux environment.
The source code is in Utilities:CheckForPath().
msiShellAndWait Chr(34) & RscriptExe & Chr(34) & " -e " & Chr(34) & "download.file('http://linus.nci.nih.gov/new_updates/ArrayTools_LatestVersion.txt', '" & GetArrayToolsDir("/") & "/updates/ArrayTools_LatestVersion.txt', mode='wb')" & Chr(34), False
That is
C:\\PROGRA~1\\R\\R-32~1.4\\bin\\Rscript.exe" -e "download.file('http://linus.nci.nih.gov/new_updates/ArrayTools_LatestVersion.txt', 'C:/Program Files (x86)/ArrayTools/updates/ArrayTools_LatestVersion.txt', mode='wb')
To see the format is DOS format (CRLF terminator or \r\n) or UNIX format (\n), we can use file command (the output will be different)
C:\Program Files\R>file C:\ArrayTools\Updates\ArrayTools_CurrentVersion.txt C:\ArrayTools\Updates\ArrayTools_CurrentVersion.txt: Non-ISO extended-ASCII English text, with CRLF line terminators C:\Program Files\R>wget http://linus.nci.nih.gov/new_updates/ArrayTools_LatestVersion.txt --2016-04-18 09:37:09-- http://linus.nci.nih.gov/new_updates/ArrayTools_LatestVersion.txt Resolving linus.nci.nih.gov... 129.43.254.99 Connecting to linus.nci.nih.gov|129.43.254.99|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 411 [text/plain] Saving to: `ArrayTools_LatestVersion.txt' 100%[==================================================================================================>] 411 --.-K/s in 0s 2016-04-18 09:37:09 (29.9 MB/s) - `ArrayTools_LatestVersion.txt' saved [411/411] C:\Program Files\R>file ArrayTools_LatestVersion.txt ArrayTools_LatestVersion.txt: ASCII English text
Related to Project Path
BRB-ArrayTools is running the batch job for your analysis. The working path is E:\DASL data related\Direct Hyb samples\BRB analysis with Direct Hyb samples -Project\Fortran\ClassComparison between Normal and Tumors (Fold change 2 x) Please wait... Fortan program is running...'E:\DASL' is not recognized as an internal or external comand, operable program or batch file.
How to Test
- Test when internet is not available. For example, GO analysis requires downloading GO.db package. BugReport.tar.gz will be generated if there is a problem with GO.db installation.
- Test special characters (# sign) in the experiment descriptor worksheet. A wrong <ExpDescWkSht> file used by the DHV (dynamic heatmap viewer) was generated. The reason is read.table() forgot to use comment.char() argument. See bugreportDessen project. PS the error was caught in RserveVBA.
- Test column header contains double quotes (eg: my "GWAS" score). Some analyses like quantitative trait analysis will fail.
- Test the case gene annotation was not done so gene symbols were not available. (miRNA project) This causes a bug in the class comparison -> IPA output. The GeneID file may miss some gene identifiers. See PreparegGeneHTMLtable.R.
- Test the case Symbol was used in the first column of gene identifier file. This will result in two 'Symbol' column in Gene annotations worksheet and break the the code in CreateIPAOutput().
- Extreme number of arrays (eg 3, 10) This is used to test chunking, ...
- Extreme number of genes.
- Use non-default options.
- Analyze one project after another project to see if there is any memory issue.
- Use unreasonable entry like 1.5 if the value should be less than or equal to 1.
- Use long path name for project which causes an error when Fortran program was being launched. The wikipedia page said there is a 126 characters limit in interactive mode.
- Test on fresh machine (no R libraries installed).
- Data with lots of missing values.
- Test on a subset of arrays only.
- Run an analysis when we don't expect the data type fits in (eg RNA-Seq analysis on regular microarray data).
Windows Virtual Machine
- Download Windows 10 iso from Microsoft (no log in is required)
- Use MS Office 2007
Misc
Conversion of an old project
- For an old project needs to be converted, the old project folder will be backed up with a new name 'XXXX_ArrayToolsOldProject'.
After running the class comparison
Filtered log ratio worksheet
Arrays in 'filtered log ratio worksheet' will be sorted based on the class variable (still keeping empty arrays). For example, if we select 'BRCA1 V BRCA2' column, arrays will be sorted by BRCA1, BRCA2 and empty. If we select 'BRCA1 v Sporadic', arrays will be sorted by BRCA1, empty and Sporadic. The gene expression value Html page is also sorted by the class variable (excluding empty label arrays).
Clustered heatmap of significantly expressed genes
Note that the color legend says 'Centered log ratio' (dual channel) or 'Scaled and centered value' (single channel). What 'centered' means is it will center gene expression across arrays for each.
R folder
FileName | Functions | Notes |
---|---|---|
Annotations.R |
|
|
BioUtil.R |
|
|
CreateGeneExpression.R | No functions | |
FilterAndNormalize.R |
|
|
FilterData.R |
|
|
GeneLists.R |
|
|
Illumina.R |
|
|
Input_to_Output_function.R |
|
|
Misc.R |
|
|
PrepareGeneTableHTML.R |
|
|
PreProcessLog.R |
|
|
WriteDataToFiles.R |
|
Plugins/BRBfun folder
FileName | Functions | Notes |
---|---|---|
CreateGeneTable.R |
|
|
CreateHtmlTable.R |
|
|
CreateMasterAnnotationsTable.R |
|
|
CreatePath.R |
|
|
OutputGeneList.R |
|
VBA source code
- AddCustomMenu() in CustomMenu.bas - the main menu starts here.
- You can step the code by using keyboard shortcuts
- F5: continue
- F8: step in
- Shift + F8: step over
- Ctrl + F8: go to the cursor
Annotate the data
- Consider the Affymetrix first.
- Search "Annotate the data" in <CustomMenu.bas>. It shows if users have click "Import Affymetrix annotations", it will trigger the function "ShowAffymetrixDownload" which is located in "AffyDialog" module.
- Set a break at some line in ShowAffymetrixDownload function
- At the end you will see it calls the form fmDownloadAffyFile.Show() to display the download affymetrix dialog. Once the command button is clicked, the process goes to the subroutine CommandButtonImport_Click() in the form.
- Eventually it will execute InsertGeneAnnotationSheet AnnotateFromMenu:=True. Press F8 to go into this function. We can see the function InsertGeneAnnotationSheet is defined in Annotation.bas. Note that there are 4 similar functions defined in Annotation.bas.
- InsertGeneAnnotationSheet
- InsertGeneAnnotationSheet_lumi
- InsertGeneAnnotationSheet_lumiMethy
- InsertGeneAnnotationSheet_Bioc
- In Annotation.bas, it executes MatchProbeSetsOK = MatchProbeSets(ChipType).
- The VBA MatchProbSets function executes a series of R commands. The most important one is the following
RRunSuccess "ReturnMsg <- MatchAffyAnnotation(ChipName, ProjectPath, UniqueIdHeader, ShowProgress=T)", , True
- In fact, we can see what exact R function (located in <BiocUtil.R>) is called in each VBA function (located in <Annotations.bas>).
- InsertGeneAnnotationSheet -> MatchProbeSets(ChipType) -> MatchAffyAnnotation(ChipName, QueryIdHeader)
- InsertGeneAnnotationSheet_lumi -> MatchNuID(annopkg) -> MatchlumiAnnotation(anno, UniqueIdHeader)
- InsertGeneAnnotationSheet_lumiMethy -> MatchTargetID(annopkg) -> MatchlumiMethyAnnotation(anno, UniqueIdHeader) or MatchMethy450kAnnotation(anno, UniqueIdHeader)
- InsertGeneAnnotationSheet_Bioc -> MatchQueryID(Species, QueryIdHeader) -> MatchBiocAnnotation(annoPkg, QueryIdHeader)
BinaryData folder
text file
- <ExperimentTable>
R binary file
- <ApplyFilter.rda> (FilterData.R)
- <GeneIds>. (FilterData.R)
- <FilteredAndNormalizedLog1.rda> (Misc.R)
Use load("") to load them to R.
General binary files
- See GetDataRMatrix() and WriteDataRMatrix() functions from <PreProcessLog.R> file.
- <GreenRaw1>, <GreenRaw2>, ...
- <RedRaw1>, <RedRaw2>, ...
- <Flag1>, <Flag2>, ...
- <RawData1>, <RawData2>, ...
- <RedBkg1>, <RedBkg2> ,...
- <GreenBkg1>, <GreenBkg2>, ...
- <RedAdj1>, <RedAdj2>, ...
- <GreenAdj1>, <GreenAdj2>, ...
- See the ApplyNormalization() function in <FilterAndNormalize.R>.
- <FilteredAndNormalizedLog1>, <FilteredAndNormalizedLog2>, ...
- <PreProcessLog1>, <PreProcessLog2>, ...
- <PrintTipGroups1>
- <PrintTip1>, <PrintTip2>, ...
- <RedAdj1>, <RedAdj2>, ...
- <RedRaw1>, <RedRaw2>, ...
We can read the data using readBin() function in R.
DataRMatrix<-matrix(readBin(con="C:/ArrayTools/Sample datasets/Perou/Perou -Project/BinaryData/GreenRaw1", what="numeric", n=2998*3, size=4), nr=2998, byrow=F) # check <Import.txt> to find number of genes and chunks str(DataMatrix) # 2998 x 3
Total number of genes
See the d_NumUniqueGenes entry in <DataParam/Import.txt> file.
Total number of arrays
See the d_NumberOfArrays entry in <DataParam/Import.txt> file.
Data importer method
See the s_DataImporter entry in <DataParam/Import.txt> file. Possibly values: Unknown (Bhatt)
Original data type
See the s_OriginalDataType entry in <DataParam/Import.txt> file. Possible values:
- UnloggedDualRedGreenWithoutBKGData (Perou) ,
- UnloggedDualRedGreenAndBKGData
- LoggedDualRatio (GSE22631),
- LoggedSingleIntensity
- UnloggedDualRatio (BRCA),
- UnloggedSingleIntensity (Pomeroy, Bhatt).
These data types appear at PreProcessMain() function in <PreProcessLog.R> file.
First column lable
See the s_FirstColumnLabel entry in <DataParam/Import.txt> file.
Chip type for Affy data
See the s_AffyGeneChip entry in <DataParam/Import.txt> file. For example, hgu95av2.
Run ArrayTools' Fortran Programs in a Linux environment
Yes, after you have installed WINE in Linux
brb@T3600 ~/.wine/drive_c $ wine GeneSetComparison.exe PathwayClassComparison/ Internal random seed = 123456789 will be used Randomized variance model will be used Linear approximation will be used Regularization: a= 3.10437924053453 b= 1.27713340089596 KSstat= 1.116416920309377E-002 writing p-values into file Startimg permutations to calibrate p-values... 8 percents completed 16 percents completed 25 percents completed 33 percents completed 41 percents completed 50 percents completed 58 percents completed 66 percents completed 75 percents completed 83 percents completed 91 percents completed 100 percents completed alpha = 5.000000000000000E-003 nDeg= 165 Writing DEGs.txt file... Max Number of significant GeneSet categories: 165 Finished writing output files brb@T3600 ~/.wine/drive_c $ head PathwayClassComparison/DEGs.txt Max Number of significant GeneSet categories: 165 Index nGenesInGeneSet statLS pLS statKS pKS 144 24 3.19595500 0.00010026 0.48423390 0.00389526 150 19 3.33429028 0.00019531 0.48850684 0.01059998 154 19 3.30687478 0.00022752 0.43818127 0.03596438 60 40 2.66205140 0.00039213 0.40681660 0.00504481 5 6 4.33475413 0.00213406 0.71447198 0.01376502 3 8 3.65201851 0.00349892 0.58742271 0.02999936 85 28 2.58118709 0.00467529 0.32909420 0.14689366 4 8 3.36955271 0.00871936 0.53517206 0.06790110 brb@T3600 ~/.wine/drive_c $ head PathwayClassComparison/Pvalues.txt index p 1 0.0163586197 2 0.4349814775 3 0.9409634273 4 0.1873022496 5 0.3977063364 6 0.3830773134 7 0.3279943699 8 0.9102655386 9 0.0014973998
Good R writing habit
- Use indentation.
- Use parenthesis if needed. For example, the following is bad
if (something) DoSomething else DoAnother
- read.delim() is better than read.table(). When we use read.table() or write.table(), use their options to avoid special characters woes. Note: quote=FALSE in write.table() does not give a correct result as we expect, so use quote="" specifically.
read.table(FILENAME, stringAsFactors=FALSE, header=TRUE, sep="\t", na.strings=c('NA',''), fill=TRUE, comment.char='', quote="") write.table(FILENAME, row.names=FALSE, col.names=TRUE, sep="\t", quote="")
VBA
- To launch VBA in Windows Office without using the keyboard shortcut (Alt + F11), go to File > Options > Customize Ribbon > Check Developer.
- Intermediate window. Use ? "Your vba statement".
RSS
- Bioconductor http://www.bioconductor.org/developers/svnlog/
ChangeDetection
- CRAN
- Bioconductor
- Drug Bank
- GSEA
- BRB-ArrayTools