ATdeveloper: Difference between revisions

From 太極
Jump to navigation Jump to search
No edit summary
 
(116 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Release timeline for BRB-ArrayTools ==
== Release timeline for BRB-ArrayTools ==
* 2022/6/1: v4.6.2 stable (R 3.5.1 and Bioc 3.7). No registration is needed for non-commercial users.
* 2021/2/x: v4.6.2 beta1 (R 3.5.1 and Bioc 3.7). Upgrade Rserve to 1.8-7. Fix MSigDB v6.2 URL link.
* 2020/6/22: v4.6.1 stable (R 3.5.1 and Bioc 3.7). BiocInstaller::biocVersion(). Update web links for GEO GDS importer and DrugBank utility.
* 2018/1/4: v4.6.0 beta2 (R 3.4.3 and Bioc 3.6). Fix a bug in DESeq when the gene filter was applied.
* 2017/7/12: v4.6.0 beta (R 3.4.1 and Bioc 3.5). find over-represented pathways in a gene list. Enhance DESeq and edgeR.
* 2016/8/9: v4.5.1 stable (R 3.2.5 and Bioc 3.3). Fix a bug in dynamic heatmap viewer gene labels and heatmap (reversed order).
* 2016/4/8: v4.5.0 stable (R 3.2.4 and Bioc 3.2). Add new buttons in Dynamic heatmap viewer, update SOURCE annotation, Drugbank, MIT/MSigdb...
* Windows 10 released (7/29/2015)
* 2015/6/8: v4.5.0 beta 1 (R 3.2.0 and Bioc 3.1). Add local fdr. Import and analysis of RNA-seq data by using edgeR and DESeq2 packages. GSE importer. ST arrays importer using 'oligo' package instead of 'aroma.affymetrix'.
* 2015/6/8: v4.4.1 stable
* 2015: BDGE v0.1
* 2014/11/20: v4.4.0 stable (R 3.1.2 and Bioc 3.0). Fix Excel 2013. Add gene dendrogram to DHV. DGIdb. Jaspar2014.
* 2014/6/20: v4.4.0 beta 2 (R 3.1.0 and Bioc 2.14). maintenance release and new microRNA gene set.
* 2014/2/26: v4.4.0 beta 1 (R 3.0.2 and Bioc 2.13). DHV was added.
* 2014/2/26: v4.4.0 beta 1 (R 3.0.2 and Bioc 2.13). DHV was added.
* 2013/9/12; v4.3.2 stable (R 3.0.2 and Bioc 2.13).
* 2013/9/12; v4.3.2 stable (R 3.0.2 and Bioc 2.13). (http://hawk.emmes.com/study/brbuploaddir/bugfix/ArrayTools_v4_3_2_Stable.exe)
* 2013/6/12: v4.3.1 stable (R 3.0.1 and Bioc 2.12) Because changes in R 3.0.1, some Bioc packages compiled by R 3.0.1 may not work with R 3.0.0. Download link is [http://linus.nci.nih.gov/new_updates/ArrayTools_v4_3_1_Stable.exe here]
* 2013/6/12: v4.3.1 stable (R 3.0.1 and Bioc 2.12) Because changes in R 3.0.1, some Bioc packages compiled by R 3.0.1 may not work with R 3.0.0. Download link is [http://linus.nci.nih.gov/new_updates/ArrayTools_v4_3_1_Stable.exe here]
* 2013/5/24: v4.3.0 stable (R 3.0.0 and Bioc 2.12)
* 2013/5/24: v4.3.0 stable (R 3.0.0 and Bioc 2.12)
Line 8: Line 21:
* 2012/8/15: v4.3.0 beta1
* 2012/8/15: v4.3.0 beta1
* 2012/1/24: v4.2.1 stable
* 2012/1/24: v4.2.1 stable
* v4.2.0  stable
* v4.1.0 stable
* v3.8.0 stable
* 2009/3/25: v3.7.1 stable (http://hawk.emmes.com/study/brbuploaddir/bugfix/ArrayTools_v3_7_1.Full.exe)
* 2008/11/19: v3.7.0 stable
* v3.6.0 stable
* v3.5.0 stable
* v3.4.0 stable
* v3.3.0 stable
* v3.2.3
* v3.2.2
* v3.2.1


== Release timeline for R/Bioconductor ==
== Release timeline for R/Bioconductor ==
[https://bioconductor.org/about/release-announcements/ Bioconductor releases announcements]
<pre>
get_bioc_r_versions <- function() {
  # Load required libraries
  if (!requireNamespace("rvest", quietly = TRUE)) {
    install.packages("rvest")
  }
  if (!requireNamespace("dplyr", quietly = TRUE)) {
    install.packages("dplyr")
  }
  library(rvest)
  library(dplyr)
  # URL of the release announcements page
  url <- "https://bioconductor.org/about/release-announcements/"
  # Read the HTML content
  page <- read_html(url)
  # Extract the table
  table <- page %>%
    html_node("table") %>%
    html_table()
  # Clean and format the data
  clean_table <- table %>%
    select(Bioconductor = Release, Date, R) %>%
    mutate(Date = as.Date(Date, format = "%B %d, %Y"),
          Bioconductor = as.numeric(Bioconductor))
  # Sort by date descending
  clean_table <- clean_table %>%
    arrange(desc(Date))
  return(clean_table)
}
# Usage
versions <- get_bioc_r_versions()
print(versions)
  Bioconductor Date          R
          <dbl> <date>    <dbl>
1        3.19 2024-05-01  4.4
2        3.18 2023-10-25  4.3
3        3.17 2023-04-26  4.3
4        3.16 2022-11-02  4.2
5        3.15 2022-04-27  4.2
6        3.14 2021-10-27  4.1
7        3.13 2021-05-20  4.1
8        3.12 2020-10-28  4 
9        3.11 2020-04-28  4 
10        3.1  2019-10-30  3.6
...
</pre>


* 2024/5/1: Bioconductor 3.19
* 2024/4/24: R 4.4.0
* 2024/2/29: R 4.3.3
* 2023/10/25: Bioconductor 3.18
* 2023/6/16: R 4.3.1
* 2023/4/26: [https://bioconductor.org/news/bioc_3_17_release/ Bioconductor 3.17]
* 2023/4/21: R 4.3.0
* 2023/3/15: R 4.2.3
* 2022/11/4: [https://bioconductor.org/news/bioc_3_16_release/ Bioconductor 3.16]
* 2022/4/27: [https://www.bioconductor.org/news/bioc_3_15_release/ Bioconductor 3.15]
* 2022/4/22: R 4.2.0
* 2022/3/10: R 4.1.3
* 2021/11/1: R 4.1.2
* 2021/10/27: Bioconductor 3.14
* 2021/8/10: R 4.1.1
* 2021/5/20: [http://bioconductor.org/news/bioc_3_13_release/ Bioconductor 3.13]
* 2021/5/18: R 4.1.0
* 2021/2/15: R 4.0.4
* 2020/10/28: Bioconductor 3.12
* 2020/4/28: Bioconductor 3.11
* 2020/4/24: R 4.0.0
* 2019/10/30: Bioconductor 3.10
* 2019/5/3: Bioconductor 3.9
* 2019/4/26: R-3.6.0 [https://blog.revolutionanalytics.com/2019/05/whats-new-in-r-360.html What's new in R 3.6.0]
* 2019/3/11: R-3.5.3
* 2018/12/20: R-3.5.2
* 2018/10/31: Bioconductor 3.8 (starting to use [https://cran.r-project.org/src/contrib/Archive/BiocManager/ BiocManager], see [https://campus.datacamp.com/courses/introduction-to-bioconductor-in-r/what-is-bioconductor?ex=3 this too])
* 2018/7/2: R-3.5.1
* 2018/5/1: Bioconductor 3.7
* 2018/4/23: R-3.5.0
* 2018/3/15: R-3.4.4
* 2017/11/30: R-3.4.3
* 2017/10/31: Bioconductor 3.6
* 2017/9/28: R-3.4.2. [http://blog.revolutionanalytics.com/2017/09/r-342-is-released.html blog]
* 2017/6/30: R-3.4.1
* 2017/4/25: Bioconductor 3.5
* 2017/4/21: R-3.4.0
* 2016/3/6: R-3.3.3
* 2016/10/31: R-3.3.2
* 2016/10/18: Bioconductor 3.4
* 2016/6/21: R-3.3.1
* 2016/5/4: Bioconductor 3.3
* 2016/5/3: R-3.3.0 (gcc in Rtools will be upgraded to 4.9.3 from 4.6.3)
* 2016/4/14: R-3.2.5 (fix 1. printing and formatting of POSIXIt objects/Daylight Savings Time wrong. 2. Makefile affecting system using R's bundled lzma library). In fact, Daily R News shows there will be an R 3.2.4 patch on 3/15/2016. That implies the creation of R 3.2.5.
* 2016/3/16: R-3.2.4-revised
* 2016/3/10: R-3.2.4
* 2015/12/10: R-3.2.3. Fix ftp download error in Windows's "wininet".
* 2015/10/14: Bioconductor 3.2
* 2015/8/14: R-3.2.2. setInternet2(TRUE) is now the default for windows. The default method for accessing URLs via download.file() and url() has been changed to be "wininet" using Windows API calls. This changes the way proxies need to be set and security settings made.
* 2015/6/18: R-3.2.1. (cause could not find symbol KeepNA in nchar() function if R 3.2.0 is used with some Bioc packages compiled using R 3.2.1). See [http://r.789695.n4.nabble.com/Development-version-of-R-Improved-nchar-nzchar-but-changed-API-td4706369.html this] and [https://support.bioconductor.org/p/69407/ this] posts.
* 2015/4/17: R-3.2.0, Bioconductor 3.1
* 2015/3/9:  R-3.1.3
* 2014/10/31: R-3.1.2
* 2014/10/14: Bioconductor 3.0
* 2014/4/14: Bioconductor 2.14
* 2014/4/14: Bioconductor 2.14
* 2014/4/10: R-3.1.0
* 2014/4/10: R-3.1.0
Line 19: Line 153:


See also the developer's page about the release schedule.
See also the developer's page about the release schedule.
* [http://developer.r-project.org/ R]
* [https://github.com/r-hub/rversions rversions] package
* [http://www.bioconductor.org/developers/release-schedule/ Bioconductor]
* [http://developer.r-project.org/ R] Use '''version''' (no parentheses) to check the R version.
* [http://www.bioconductor.org/developers/release-schedule/ Bioconductor] Use the command '''BiocInstaller::biocVersion()''' to check the installed Bioconductor version.
* Use '''sessionInfo()''' to check the attached packages in the current session.
* [http://bioconductor.org/about/release-announcements/ Bioconductor releases] This gives a table-like list of Bioconductor releases and R versions.
 
== Release timeline for BRB-SeqTools ==
* 2017/8/3: version 1.2 (macOS, Windows 10, subread aligner, featureCount, xenograft)
* 2016/10/19: version 1.0 (BRB-SeqTools)
* 2015/1/14: version 0.1 (BRB-DGE)


== Release Procedures (including check list for installer) ==
== Release Procedures (including check list for installer) ==
Line 42: Line 184:
# CGHTools\Doc: CGHManual.doc, CGHManualFrame.doc, Table of Contents.doc.
# CGHTools\Doc: CGHManual.doc, CGHManualFrame.doc, Table of Contents.doc.
# ArrayTools_UpdateVersion.txt, it will keep on the ArrayTools server.
# ArrayTools_UpdateVersion.txt, it will keep on the ArrayTools server.
# http://linus.nci.nih.gov/~brb/news.html
# http://linus.nci.nih.gov/~brb/news.html (deprecated!)
# Different ArrayTools installers.
# Different ArrayTools installers (deprecated).
#* There are 3 installers: Full, Individual and Update. Main release has no Update installers.
#* There are 3 installers: Full, Individual and Update. Main release has no Update installers.
#* Individual installer has no R and no Java installers in the Misc folder. It will detect the required R version (or newer than the required R version). If no required R version or lower, the installation will abort.
#* Individual installer has no R and no Java installers in the Misc folder. It will detect the required R version (or newer than the required R version). If no required R version or lower, the installation will abort.
#* Update installer has changed files between current and previous freeze folders.
#* Update installer has changed files between current and previous freeze folders.
# The installers will keep a copy in ''K:\BRB\ARRAYTOOLS\archive\installers\''. Alpha versions will not keep, only Beta versions and release versions will keep on K:.
# The installers will keep a copy in ''K:\BRB\ARRAYTOOLS\archive\installers\''. Alpha versions will not keep, only Beta versions and release versions will keep on K:.
# YD puts the new release online:
# Puts the new release online:
#* Create a testing page: http://linus.nci.nih.gov/BRB-ArrayTools_test.html
#* Update index.html and download.html on http://brb-stage.nci.nih.gov/BRB-ArrayTools/
#* Whats-new-v43b1.doc, Readme.doc, License.doc, Manual.doc
#* Upload related files including Whats-new-v43b1.doc, Readme (convert to pdf from doc), License.doc, Manual.doc and ArrayTools_LatestVersion.txt (no more than 5 items) to ''new_updates'' directory
#* Java, R download links
#* Java, R download links
#* Installers: Full, Individual, Update
#* ArrayTools_LatestVersion.txt (no more than 5 items)
#* Sending emails to users about new release need about 3 hours. Only Beta 1 and Stable versions need to send out user emails. Other Beta versions don't send to reduce user emails.
#* Sending emails to users about new release need about 3 hours. Only Beta 1 and Stable versions need to send out user emails. Other Beta versions don't send to reduce user emails.


Line 61: Line 201:
# use privilege manager to run installer
# use privilege manager to run installer
# use privilege manager to open excel and run VBAObjectManager
# use privilege manager to open excel and run VBAObjectManager
Another canonical method is
# Ctrl + F11 to open VBE
# click Project > ArrayTools. Click Tools > Reference. Make sure CGHTools is not checked.
# Click Project > CGHTools. Click Tools > Reference. Make sure ArrayTools is checked.


=== Registrition ===
=== Registrition ===
Line 67: Line 212:
http://linus-stage.nci.nih.gov/cgi-bin/brb/matchregistration.cgi?email=xxx.gmail.com
http://linus-stage.nci.nih.gov/cgi-bin/brb/matchregistration.cgi?email=xxx.gmail.com
</pre>
</pre>
For BRB-ArrayTools registration you will receive an email from [email protected] with the title 'Thank you for registering in BRB ArrayTools guestbook'. For BRB-SeqTools registration, you won't receive any email.


=== Check update error ===
=== Check update error ===
BRB-ArrayTools was not able to check for software updates. The server may be down or too slow right now or you may not connect to the internet.
BRB-ArrayTools was not able to check for software updates. The server may be down or too slow right now or you may not connect to the internet.


Solution: Check LatestVersion.txt file on server. It becomes one line. Better to change the format when it was uploaded from PC or manually copy the text in linux environment.
Solution: Check LatestVersion.txt file on server. When we download the file using ''wget'' on Windows, the content becomes one line. Better to change the format when it was uploaded from PC or manually copy the text in linux environment.
 
The source code is in Utilities:CheckForPath().
<pre style="white-space: pre-wrap; /* CSS 3 */ white-space: -moz-pre-wrap; /* Mozilla, since 1999 */ white-space: -pre-wrap; /* Opera 4-6 */ white-space: -o-pre-wrap; /* Opera 7 */ word-wrap: break-word; /* IE 5.5+ */ " >
msiShellAndWait Chr(34) & RscriptExe & Chr(34) & " -e " & Chr(34) & "download.file('http://linus.nci.nih.gov/new_updates/ArrayTools_LatestVersion.txt', '" & GetArrayToolsDir("/") & "/updates/ArrayTools_LatestVersion.txt', mode='wb')" & Chr(34), False
</pre>
That is
<pre style="white-space: pre-wrap; /* CSS 3 */ white-space: -moz-pre-wrap; /* Mozilla, since 1999 */ white-space: -pre-wrap; /* Opera 4-6 */ white-space: -o-pre-wrap; /* Opera 7 */ word-wrap: break-word; /* IE 5.5+ */ " >
C:\\PROGRA~1\\R\\R-32~1.4\\bin\\Rscript.exe" -e "download.file('http://linus.nci.nih.gov/new_updates/ArrayTools_LatestVersion.txt', 'C:/Program Files (x86)/ArrayTools/updates/ArrayTools_LatestVersion.txt', mode='wb')
</pre>
 
To see the format is DOS format (CRLF terminator or \r\n) or UNIX format (\n), we can use '''file''' command (the output will be different)
<pre>
C:\Program Files\R>file C:\ArrayTools\Updates\ArrayTools_CurrentVersion.txt
C:\ArrayTools\Updates\ArrayTools_CurrentVersion.txt: Non-ISO extended-ASCII English text, with CRLF line terminators
 
C:\Program Files\R>wget http://linus.nci.nih.gov/new_updates/ArrayTools_LatestVersion.txt
--2016-04-18 09:37:09--  http://linus.nci.nih.gov/new_updates/ArrayTools_LatestVersion.txt
Resolving linus.nci.nih.gov... 129.43.254.99
Connecting to linus.nci.nih.gov|129.43.254.99|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 411 [text/plain]
Saving to: `ArrayTools_LatestVersion.txt'
 
100%[==================================================================================================>] 411        --.-K/s  in 0s
 
2016-04-18 09:37:09 (29.9 MB/s) - `ArrayTools_LatestVersion.txt' saved [411/411]
 
C:\Program Files\R>file ArrayTools_LatestVersion.txt
ArrayTools_LatestVersion.txt: ASCII English text
</pre>


=== Related to Project Path ===
=== Related to Project Path ===
Line 84: Line 260:
== How to Test ==
== How to Test ==


* Test special characters (# sign) in the experiment descriptor worksheet. A wrong <ExpDescWkSht> file used by the DHV (dynamic heatmap viewer) was generated. The reason is read.table() forgot to use comment.char() argument. See bugreportDessen project.
* Test when internet is not available. For example, GO analysis requires downloading GO.db package. BugReport.tar.gz will be generated if there is a problem with GO.db installation.
* Test special characters (# sign) in the experiment descriptor worksheet. A wrong <ExpDescWkSht> file used by the DHV (dynamic heatmap viewer) was generated. The reason is read.table() forgot to use comment.char() argument. See bugreportDessen project. PS the error was caught in RserveVBA.
* Test column header contains double quotes (eg: my "GWAS" score). Some analyses like quantitative trait analysis will fail.  
* Test the case gene annotation was not done so gene symbols were not available. (miRNA project) This causes a bug in the class comparison -> IPA output. The GeneID file may miss some gene identifiers. See PreparegGeneHTMLtable.R.
* Test the case gene annotation was not done so gene symbols were not available. (miRNA project) This causes a bug in the class comparison -> IPA output. The GeneID file may miss some gene identifiers. See PreparegGeneHTMLtable.R.
* Test the case Symbol was used in the first column of gene identifier file. This will result in two 'Symbol' column in Gene annotations worksheet and break the the code in CreateIPAOutput().
* Extreme number of arrays (eg 3, 10) This is used to test chunking, ...
* Extreme number of arrays (eg 3, 10) This is used to test chunking, ...
* Extreme number of genes.
* Extreme number of genes.
Line 95: Line 274:
* Data with lots of missing values.
* Data with lots of missing values.
* Test on a subset of arrays only.
* Test on a subset of arrays only.
* Run an analysis when we don't expect the data type fits in (eg RNA-Seq analysis on regular microarray data).
=== Windows Virtual Machine ===
* Download [https://www.microsoft.com/en-us/software-download/windows10ISO Windows 10 iso] from Microsoft (no log in is required)
* Use MS Office 2007


== Misc ==
== Misc ==
=== Conversion of an old project ===
* For an old project needs to be converted, the old project folder will be backed up with a new name 'XXXX_ArrayToolsOldProject'.
* For an old project needs to be converted, the old project folder will be backed up with a new name 'XXXX_ArrayToolsOldProject'.
=== After running the class comparison ===
==== Filtered log ratio worksheet ====
Arrays in 'filtered log ratio worksheet' will be sorted based on the class variable (still keeping empty arrays). For example, if we select 'BRCA1 V BRCA2' column, arrays will be sorted by BRCA1, BRCA2 and empty. If we select 'BRCA1 v Sporadic', arrays will be sorted by BRCA1, empty and Sporadic. The gene expression value Html page is also sorted by the class variable (excluding empty label arrays).
==== Clustered heatmap of significantly expressed genes ====
Note that the color legend says 'Centered log ratio' (dual channel) or 'Scaled and centered value' (single channel). What 'centered' means is it will center gene expression across arrays for each.
=== R folder ===
{| class="wikitable"
! FileName
! Functions
! Notes
|-
| Annotations.R
|
* CreateAnnotations
* ChromDistBarPlot
* GOanalysis
* GetGOIDsFreq
* GOOEHTMLTable
* UpdateGeneIdsAfterAnnotation
* AnnotateWithGeneIdFile
|
|-
| BioUtil.R
|
* SetRPackageDir
* LoadRPackage
* InstallAllRPackages
* OutputAnnotationTextFile
* MatchAffyAnnotation
* GetDetectionCall
* MatchlumiAnnotation
* MatchlumiMethyAnnotation
* MatchBiocAnnotation
|
|-
| CreateGeneExpression.R
|
| No functions
|-
| FilterAndNormalize.R
|
* ApplyFilteringAndNormalization
* AppendWarningMessage
* DetermineReferenceArray
* CreateRefArray
* ApplyNormalization
* LowessNormalization
* ApplyTruncation
* InitializeParameters
* ReadFilterParamFromFile, GetParam, WriteParam
|
|-
| FilterData.R
|
* ComputeGeneFilters
* CompareGeneFilters
|
|-
| GeneLists.R
|
* ParseMSigDBXMLFile
* CreateGODataFile
* CreateGOSets
* CreateGeneListDataFile
* CheckAllGenelistsInFolder
* GetPathwaysForOneGene
* CreatePathwayDataFile
* MatchGenelists, CheckGenelists
* GetLongPathwayNames
* CompareTwoGenelists
|
|-
| Illumina.R
|
* lumifunc
* methyfunc
|
|-
| Input_to_Output_function.R
|
* GetParam
* print.filter.parameters.to.output
* print.affy.filter.parameters.to.output
* print.SingleChannel.filter.parameters.to.output
|
|-
| Misc.R
|
* SubsetFilteredAndNormalizedLog
* CheckPhenoAverageDuplic
* AverageDuplicLG
* ComputNumberOfPermutations
|
|-
| PrepareGeneTableHTML.R
|
* GetGenesHTML
* outputClassname
* CreateIPAOutput
|
|-
| PreProcessLog.R
|
* GetDataRMatrix
* LogTransformation
* PreProcessMain
|
|-
| WriteDataToFiles.R
|
* WriteDataToFiles
|
|}
=== Plugins/BRBfun folder ===
{| class="wikitable"
! FileName
! Functions
! Notes
|-
| CreateGeneTable.R
|
* CreateGeneTable
|
|-
| CreateHtmlTable.R
|
* CreateHtmlTable
|
|-
| CreateMasterAnnotationsTable.R
|
* CreateMasterAnnotationsTable
|
|-
| CreatePath.R
|
* CreatePath
|
|-
| OutputGeneList.R
|
* RemoveInvalidChar
* OutputGeneList
|
|}
=== VBA source code ===
* AddCustomMenu() in CustomMenu.bas - the main menu starts here.
* You can step the code by using [https://ss64.com/access/syntax-keyboard.html keyboard shortcuts]
** F5: continue
** F8: step in
** Shift + F8: step over
** Ctrl + F8: go to the cursor
==== Annotate the data  ====
* Consider the Affymetrix first.
* Search "Annotate the data" in <CustomMenu.bas>.  It shows if users have click "Import Affymetrix annotations", it will trigger the function "ShowAffymetrixDownload" which is located in "AffyDialog" module.
* Set a break at some line in ShowAffymetrixDownload function
* At the end you will see it calls the form fmDownloadAffyFile.Show() to display the download affymetrix dialog. Once the command button is clicked, the process goes to the subroutine CommandButtonImport_Click() in the form.
* Eventually it will execute '''InsertGeneAnnotationSheet AnnotateFromMenu:=True'''. Press F8 to go into this function. We can see the function '''InsertGeneAnnotationSheet''' is defined in '''Annotation.bas'''. Note that there are 4 similar functions defined in Annotation.bas.
** InsertGeneAnnotationSheet
** InsertGeneAnnotationSheet_lumi
** InsertGeneAnnotationSheet_lumiMethy
** InsertGeneAnnotationSheet_Bioc
* In '''Annotation.bas''', it executes '''MatchProbeSetsOK = MatchProbeSets(ChipType)'''.
* The VBA '''MatchProbSets''' function executes a series of R commands. The most important one is the following
<pre>
RRunSuccess "ReturnMsg <- MatchAffyAnnotation(ChipName, ProjectPath, UniqueIdHeader, ShowProgress=T)", , True
</pre>
* In fact, we can see what exact R function (located in <BiocUtil.R>) is called in each VBA function (located in <Annotations.bas>).
** InsertGeneAnnotationSheet -> MatchProbeSets(ChipType) -> MatchAffyAnnotation(ChipName, QueryIdHeader)
** InsertGeneAnnotationSheet_lumi -> MatchNuID(annopkg) -> MatchlumiAnnotation(anno, UniqueIdHeader)
** InsertGeneAnnotationSheet_lumiMethy -> MatchTargetID(annopkg) -> MatchlumiMethyAnnotation(anno, UniqueIdHeader) or MatchMethy450kAnnotation(anno, UniqueIdHeader)
** InsertGeneAnnotationSheet_Bioc -> MatchQueryID(Species, QueryIdHeader) -> MatchBiocAnnotation(annoPkg, QueryIdHeader)
=== BinaryData folder ===
==== text file ====
* <ExperimentTable>
==== R binary file ====
* <ApplyFilter.rda> ('''FilterData.R''')
* <GeneIds>.  ('''FilterData.R''')
* <FilteredAndNormalizedLog1.rda> ('''Misc.R''')
Use load("") to load them to R.
==== General binary files ====
* See GetDataRMatrix() and WriteDataRMatrix() functions from '''<PreProcessLog.R>''' file.
** <GreenRaw1>, <GreenRaw2>, ...
** <RedRaw1>, <RedRaw2>, ...
** <Flag1>, <Flag2>, ...
** <RawData1>, <RawData2>, ...
** <RedBkg1>, <RedBkg2> ,...
** <GreenBkg1>, <GreenBkg2>, ...
** <RedAdj1>, <RedAdj2>, ...
** <GreenAdj1>, <GreenAdj2>, ...
* See the ApplyNormalization() function in '''<FilterAndNormalize.R>'''.
** <FilteredAndNormalizedLog1>, <FilteredAndNormalizedLog2>, ...
** <PreProcessLog1>, <PreProcessLog2>, ...
** <PrintTipGroups1>
** <PrintTip1>, <PrintTip2>, ...
** <RedAdj1>, <RedAdj2>, ...
** <RedRaw1>, <RedRaw2>, ...
We can read the data using readBin() function in R.
<pre>
DataRMatrix<-matrix(readBin(con="C:/ArrayTools/Sample datasets/Perou/Perou -Project/BinaryData/GreenRaw1",
                    what="numeric", n=2998*3, size=4), nr=2998, byrow=F)
# check <Import.txt> to find number of genes and chunks
str(DataMatrix) # 2998 x 3
</pre>
==== Total number of genes ====
See the ''d_NumUniqueGenes'' entry in <DataParam/Import.txt> file.
==== Total number of arrays ====
See the ''d_NumberOfArrays'' entry in <DataParam/Import.txt> file.
==== Data importer method ====
See the ''s_DataImporter'' entry in <DataParam/Import.txt> file. Possibly values: Unknown (Bhatt)
==== Original data type ====
See the ''s_OriginalDataType'' entry in <DataParam/Import.txt> file. Possible values: 
* UnloggedDualRedGreenWithoutBKGData (Perou) ,
* UnloggedDualRedGreenAndBKGData
* LoggedDualRatio (GSE22631),
* LoggedSingleIntensity
* UnloggedDualRatio (BRCA),
* UnloggedSingleIntensity (Pomeroy, Bhatt).
These data types appear at PreProcessMain() function in <PreProcessLog.R> file.
==== First column lable ====
See the  ''s_FirstColumnLabel'' entry in <DataParam/Import.txt> file.
==== Chip type for Affy data ====
See the ''s_AffyGeneChip''  entry in <DataParam/Import.txt> file. For example, hgu95av2.
=== Run ArrayTools' Fortran Programs in a Linux environment ===
Yes, after you have installed WINE in Linux
<pre>
brb@T3600 ~/.wine/drive_c $ wine GeneSetComparison.exe PathwayClassComparison/
Internal random seed = 123456789 will be used
Randomized variance model will be used
Linear approximation will be used
Regularization: a=  3.10437924053453    b=  1.27713340089596    KSstat=
  1.116416920309377E-002
writing p-values into file
Startimg permutations to calibrate p-values...
          8 percents completed
          16 percents completed
          25 percents completed
          33 percents completed
          41 percents completed
          50 percents completed
          58 percents completed
          66 percents completed
          75 percents completed
          83 percents completed
          91 percents completed
        100 percents completed
alpha =  5.000000000000000E-003    nDeg=        165
Writing DEGs.txt file...
Max Number of significant GeneSet categories:  165
Finished writing output files
brb@T3600 ~/.wine/drive_c $ head PathwayClassComparison/DEGs.txt
Max Number of significant GeneSet categories:  165
Index    nGenesInGeneSet    statLS          pLS              statKS          pKS
        144        24        3.19595500      0.00010026      0.48423390      0.00389526
        150        19        3.33429028      0.00019531      0.48850684      0.01059998
        154        19        3.30687478      0.00022752      0.43818127      0.03596438
        60        40        2.66205140      0.00039213      0.40681660      0.00504481
          5          6        4.33475413      0.00213406      0.71447198      0.01376502
          3          8        3.65201851      0.00349892      0.58742271      0.02999936
        85        28        2.58118709      0.00467529      0.32909420      0.14689366
          4          8        3.36955271      0.00871936      0.53517206      0.06790110
brb@T3600 ~/.wine/drive_c $ head PathwayClassComparison/Pvalues.txt
index  p
          1        0.0163586197
          2        0.4349814775
          3        0.9409634273
          4        0.1873022496
          5        0.3977063364
          6        0.3830773134
          7        0.3279943699
          8        0.9102655386
          9        0.0014973998
</pre>


== Good R writing habit ==
== Good R writing habit ==
* Use indentaion.
* Use indentation.
* Use parenthesis if needed. For example, the following is bad
* Use parenthesis if needed. For example, the following is bad
<pre>
<syntaxhighlight lang='rsplus'>
if (something) DoSomething
if (something) DoSomething
else DoAnother
else DoAnother
</pre>
</syntaxhighlight>
* read.delim() is better than read.table(). When we use read.table() or write.table(), use their options to avoid special characters woes. Note: quote=FALSE in write.table() does not give a correct result as we expect, so use quote="" specifically.
<syntaxhighlight lang='rsplus'>
read.table(FILENAME, stringAsFactors=FALSE, header=TRUE, sep="\t",
          na.strings=c('NA',''), fill=TRUE, comment.char='', quote="")
write.table(FILENAME, row.names=FALSE, col.names=TRUE, sep="\t", quote="")
</syntaxhighlight>
 
== VBA ==
* To launch VBA in Windows Office without using the keyboard shortcut (Alt + F11), go to File > Options > Customize Ribbon > Check Developer.
* [https://msdn.microsoft.com/en-us/library/aa716276(v=vs.60).aspx Intermediate window]. Use ? "Your vba statement".
 
== RSS ==
* Bioconductor http://www.bioconductor.org/developers/svnlog/
 
== ChangeDetection ==
* CRAN
* Bioconductor
* Drug Bank
* GSEA
* BRB-ArrayTools
 
== NCI Office of Technology Transfer ==
[https://nciphub.org/resources/899/download/Guidelines_for_Releasing_Research_Software_04062015.pdf Guidelines for Releasing Research Software]

Latest revision as of 15:11, 29 September 2024

Release timeline for BRB-ArrayTools

  • 2022/6/1: v4.6.2 stable (R 3.5.1 and Bioc 3.7). No registration is needed for non-commercial users.
  • 2021/2/x: v4.6.2 beta1 (R 3.5.1 and Bioc 3.7). Upgrade Rserve to 1.8-7. Fix MSigDB v6.2 URL link.
  • 2020/6/22: v4.6.1 stable (R 3.5.1 and Bioc 3.7). BiocInstaller::biocVersion(). Update web links for GEO GDS importer and DrugBank utility.
  • 2018/1/4: v4.6.0 beta2 (R 3.4.3 and Bioc 3.6). Fix a bug in DESeq when the gene filter was applied.
  • 2017/7/12: v4.6.0 beta (R 3.4.1 and Bioc 3.5). find over-represented pathways in a gene list. Enhance DESeq and edgeR.
  • 2016/8/9: v4.5.1 stable (R 3.2.5 and Bioc 3.3). Fix a bug in dynamic heatmap viewer gene labels and heatmap (reversed order).
  • 2016/4/8: v4.5.0 stable (R 3.2.4 and Bioc 3.2). Add new buttons in Dynamic heatmap viewer, update SOURCE annotation, Drugbank, MIT/MSigdb...
  • Windows 10 released (7/29/2015)
  • 2015/6/8: v4.5.0 beta 1 (R 3.2.0 and Bioc 3.1). Add local fdr. Import and analysis of RNA-seq data by using edgeR and DESeq2 packages. GSE importer. ST arrays importer using 'oligo' package instead of 'aroma.affymetrix'.
  • 2015/6/8: v4.4.1 stable
  • 2015: BDGE v0.1
  • 2014/11/20: v4.4.0 stable (R 3.1.2 and Bioc 3.0). Fix Excel 2013. Add gene dendrogram to DHV. DGIdb. Jaspar2014.
  • 2014/6/20: v4.4.0 beta 2 (R 3.1.0 and Bioc 2.14). maintenance release and new microRNA gene set.
  • 2014/2/26: v4.4.0 beta 1 (R 3.0.2 and Bioc 2.13). DHV was added.
  • 2013/9/12; v4.3.2 stable (R 3.0.2 and Bioc 2.13). (http://hawk.emmes.com/study/brbuploaddir/bugfix/ArrayTools_v4_3_2_Stable.exe)
  • 2013/6/12: v4.3.1 stable (R 3.0.1 and Bioc 2.12) Because changes in R 3.0.1, some Bioc packages compiled by R 3.0.1 may not work with R 3.0.0. Download link is here
  • 2013/5/24: v4.3.0 stable (R 3.0.0 and Bioc 2.12)
  • 2013/3/7: v4.3.0 beta3 (SOURCE website change, sorting of experiment worksheet). Downloaded from http://pub.emmes.com.
  • 2012/11/28: v4.3.0 beta2
  • 2012/8/15: v4.3.0 beta1
  • 2012/1/24: v4.2.1 stable
  • v4.2.0 stable
  • v4.1.0 stable
  • v3.8.0 stable
  • 2009/3/25: v3.7.1 stable (http://hawk.emmes.com/study/brbuploaddir/bugfix/ArrayTools_v3_7_1.Full.exe)
  • 2008/11/19: v3.7.0 stable
  • v3.6.0 stable
  • v3.5.0 stable
  • v3.4.0 stable
  • v3.3.0 stable
  • v3.2.3
  • v3.2.2
  • v3.2.1

Release timeline for R/Bioconductor

Bioconductor releases announcements

get_bioc_r_versions <- function() {
  # Load required libraries
  if (!requireNamespace("rvest", quietly = TRUE)) {
    install.packages("rvest")
  }
  if (!requireNamespace("dplyr", quietly = TRUE)) {
    install.packages("dplyr")
  }
  library(rvest)
  library(dplyr)

  # URL of the release announcements page
  url <- "https://bioconductor.org/about/release-announcements/"

  # Read the HTML content
  page <- read_html(url)

  # Extract the table
  table <- page %>%
    html_node("table") %>%
    html_table()

  # Clean and format the data
  clean_table <- table %>%
    select(Bioconductor = Release, Date, R) %>%
    mutate(Date = as.Date(Date, format = "%B %d, %Y"),
           Bioconductor = as.numeric(Bioconductor))

  # Sort by date descending
  clean_table <- clean_table %>%
    arrange(desc(Date))

  return(clean_table)
}

# Usage
versions <- get_bioc_r_versions()
print(versions)

   Bioconductor Date           R
          <dbl> <date>     <dbl>
 1         3.19 2024-05-01   4.4
 2         3.18 2023-10-25   4.3
 3         3.17 2023-04-26   4.3
 4         3.16 2022-11-02   4.2
 5         3.15 2022-04-27   4.2
 6         3.14 2021-10-27   4.1
 7         3.13 2021-05-20   4.1
 8         3.12 2020-10-28   4  
 9         3.11 2020-04-28   4  
10         3.1  2019-10-30   3.6
...
  • 2024/5/1: Bioconductor 3.19
  • 2024/4/24: R 4.4.0
  • 2024/2/29: R 4.3.3
  • 2023/10/25: Bioconductor 3.18
  • 2023/6/16: R 4.3.1
  • 2023/4/26: Bioconductor 3.17
  • 2023/4/21: R 4.3.0
  • 2023/3/15: R 4.2.3
  • 2022/11/4: Bioconductor 3.16
  • 2022/4/27: Bioconductor 3.15
  • 2022/4/22: R 4.2.0
  • 2022/3/10: R 4.1.3
  • 2021/11/1: R 4.1.2
  • 2021/10/27: Bioconductor 3.14
  • 2021/8/10: R 4.1.1
  • 2021/5/20: Bioconductor 3.13
  • 2021/5/18: R 4.1.0
  • 2021/2/15: R 4.0.4
  • 2020/10/28: Bioconductor 3.12
  • 2020/4/28: Bioconductor 3.11
  • 2020/4/24: R 4.0.0
  • 2019/10/30: Bioconductor 3.10
  • 2019/5/3: Bioconductor 3.9
  • 2019/4/26: R-3.6.0 What's new in R 3.6.0
  • 2019/3/11: R-3.5.3
  • 2018/12/20: R-3.5.2
  • 2018/10/31: Bioconductor 3.8 (starting to use BiocManager, see this too)
  • 2018/7/2: R-3.5.1
  • 2018/5/1: Bioconductor 3.7
  • 2018/4/23: R-3.5.0
  • 2018/3/15: R-3.4.4
  • 2017/11/30: R-3.4.3
  • 2017/10/31: Bioconductor 3.6
  • 2017/9/28: R-3.4.2. blog
  • 2017/6/30: R-3.4.1
  • 2017/4/25: Bioconductor 3.5
  • 2017/4/21: R-3.4.0
  • 2016/3/6: R-3.3.3
  • 2016/10/31: R-3.3.2
  • 2016/10/18: Bioconductor 3.4
  • 2016/6/21: R-3.3.1
  • 2016/5/4: Bioconductor 3.3
  • 2016/5/3: R-3.3.0 (gcc in Rtools will be upgraded to 4.9.3 from 4.6.3)
  • 2016/4/14: R-3.2.5 (fix 1. printing and formatting of POSIXIt objects/Daylight Savings Time wrong. 2. Makefile affecting system using R's bundled lzma library). In fact, Daily R News shows there will be an R 3.2.4 patch on 3/15/2016. That implies the creation of R 3.2.5.
  • 2016/3/16: R-3.2.4-revised
  • 2016/3/10: R-3.2.4
  • 2015/12/10: R-3.2.3. Fix ftp download error in Windows's "wininet".
  • 2015/10/14: Bioconductor 3.2
  • 2015/8/14: R-3.2.2. setInternet2(TRUE) is now the default for windows. The default method for accessing URLs via download.file() and url() has been changed to be "wininet" using Windows API calls. This changes the way proxies need to be set and security settings made.
  • 2015/6/18: R-3.2.1. (cause could not find symbol KeepNA in nchar() function if R 3.2.0 is used with some Bioc packages compiled using R 3.2.1). See this and this posts.
  • 2015/4/17: R-3.2.0, Bioconductor 3.1
  • 2015/3/9: R-3.1.3
  • 2014/10/31: R-3.1.2
  • 2014/10/14: Bioconductor 3.0
  • 2014/4/14: Bioconductor 2.14
  • 2014/4/10: R-3.1.0
  • 2013/10/8: Bioconductor 2.13
  • 2013/9/25: R-3.0.2
  • 2013/5/16: R-3.0.1
  • 2013/4/3: R-3.0.0 and Bioconductor 2.12

See also the developer's page about the release schedule.

  • rversions package
  • R Use version (no parentheses) to check the R version.
  • Bioconductor Use the command BiocInstaller::biocVersion() to check the installed Bioconductor version.
  • Use sessionInfo() to check the attached packages in the current session.
  • Bioconductor releases This gives a table-like list of Bioconductor releases and R versions.

Release timeline for BRB-SeqTools

  • 2017/8/3: version 1.2 (macOS, Windows 10, subread aligner, featureCount, xenograft)
  • 2016/10/19: version 1.0 (BRB-SeqTools)
  • 2015/1/14: version 0.1 (BRB-DGE)

Release Procedures (including check list for installer)

  1. Misc Folder:
    • CGHTools
    • R installer
    • Rserve
    • Java installer
    • Microsoft C++ 2010 SP1 Redistribute packages, vcredist_x64.exe & vcredist_x86.exe
  2. License file,
    • C:\Installer-2012\ArrayTools-Scripts\V4_3_And_CGH_1_3_RServe_Full\Setup Files\Compressed Files\Language Independent\OS Independent\License.doc. It is a text file.
    • C:\Installer-2012\CGHTools-Scripts\CGH_1_3_Rserve\Setup Files\Compressed Files\Language Independent\OS Independent\License.doc. It is a text file.
  3. License agreement menu, About ArrayTools and About CGHTools menus.
  4. Sample datasets.
  5. ArrayTools: License.doc, Readme.doc.
  6. ArrayTools\Updates: ArrayTools_CurrentVersion.txt.
  7. ArrayTools\Doc: Manual.doc, ManualFrame.doc, Overview of Analysis Tools.doc, Plug-in Manual.doc, StatMethods.doc, Table of Contents.doc, FAQs.pdf.
  8. CGHTools: LICENSE AGREEMENT.doc, Readme.doc
  9. CGHTools\Updates: CGHTools_CurrentVersion.txt.
  10. CGHTools\Doc: CGHManual.doc, CGHManualFrame.doc, Table of Contents.doc.
  11. ArrayTools_UpdateVersion.txt, it will keep on the ArrayTools server.
  12. http://linus.nci.nih.gov/~brb/news.html (deprecated!)
  13. Different ArrayTools installers (deprecated).
    • There are 3 installers: Full, Individual and Update. Main release has no Update installers.
    • Individual installer has no R and no Java installers in the Misc folder. It will detect the required R version (or newer than the required R version). If no required R version or lower, the installation will abort.
    • Update installer has changed files between current and previous freeze folders.
  14. The installers will keep a copy in K:\BRB\ARRAYTOOLS\archive\installers\. Alpha versions will not keep, only Beta versions and release versions will keep on K:.
  15. Puts the new release online:
    • Update index.html and download.html on http://brb-stage.nci.nih.gov/BRB-ArrayTools/
    • Upload related files including Whats-new-v43b1.doc, Readme (convert to pdf from doc), License.doc, Manual.doc and ArrayTools_LatestVersion.txt (no more than 5 items) to new_updates directory
    • Java, R download links
    • Sending emails to users about new release need about 3 hours. Only Beta 1 and Stable versions need to send out user emails. Other Beta versions don't send to reduce user emails.

Trouble shooting

'This workbook is currently referenced by another workbook and cannot be closed' when I use VBAObjectManager

  1. uninstall AT & CGHTools
  2. use privilege manager to run installer
  3. use privilege manager to open excel and run VBAObjectManager

Another canonical method is

  1. Ctrl + F11 to open VBE
  2. click Project > ArrayTools. Click Tools > Reference. Make sure CGHTools is not checked.
  3. Click Project > CGHTools. Click Tools > Reference. Make sure ArrayTools is checked.

Registrition

http://linus.nci.nih.gov/cgi-bin/brb/matchregistration.cgi?email=xxx.gmail.com
http://linus-stage.nci.nih.gov/cgi-bin/brb/matchregistration.cgi?email=xxx.gmail.com

For BRB-ArrayTools registration you will receive an email from [email protected] with the title 'Thank you for registering in BRB ArrayTools guestbook'. For BRB-SeqTools registration, you won't receive any email.

Check update error

BRB-ArrayTools was not able to check for software updates. The server may be down or too slow right now or you may not connect to the internet.

Solution: Check LatestVersion.txt file on server. When we download the file using wget on Windows, the content becomes one line. Better to change the format when it was uploaded from PC or manually copy the text in linux environment.

The source code is in Utilities:CheckForPath().

msiShellAndWait Chr(34) & RscriptExe & Chr(34) & " -e " & Chr(34) & "download.file('http://linus.nci.nih.gov/new_updates/ArrayTools_LatestVersion.txt', '" & GetArrayToolsDir("/") & "/updates/ArrayTools_LatestVersion.txt', mode='wb')" & Chr(34), False

That is

C:\\PROGRA~1\\R\\R-32~1.4\\bin\\Rscript.exe" -e "download.file('http://linus.nci.nih.gov/new_updates/ArrayTools_LatestVersion.txt', 'C:/Program Files (x86)/ArrayTools/updates/ArrayTools_LatestVersion.txt', mode='wb')

To see the format is DOS format (CRLF terminator or \r\n) or UNIX format (\n), we can use file command (the output will be different)

C:\Program Files\R>file C:\ArrayTools\Updates\ArrayTools_CurrentVersion.txt
C:\ArrayTools\Updates\ArrayTools_CurrentVersion.txt: Non-ISO extended-ASCII English text, with CRLF line terminators

C:\Program Files\R>wget http://linus.nci.nih.gov/new_updates/ArrayTools_LatestVersion.txt
--2016-04-18 09:37:09--  http://linus.nci.nih.gov/new_updates/ArrayTools_LatestVersion.txt
Resolving linus.nci.nih.gov... 129.43.254.99
Connecting to linus.nci.nih.gov|129.43.254.99|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 411 [text/plain]
Saving to: `ArrayTools_LatestVersion.txt'

100%[==================================================================================================>] 411         --.-K/s   in 0s

2016-04-18 09:37:09 (29.9 MB/s) - `ArrayTools_LatestVersion.txt' saved [411/411]

C:\Program Files\R>file ArrayTools_LatestVersion.txt
ArrayTools_LatestVersion.txt: ASCII English text

Related to Project Path

BRB-ArrayTools is running the batch job for your analysis.
The working path is E:\DASL data related\Direct Hyb samples\BRB analysis with Direct Hyb samples -Project\Fortran\ClassComparison between Normal and Tumors (Fold change 2 x)
Please wait...
Fortan program is running...'E:\DASL' is not recognized as an internal or external comand,
operable program or batch file.

How to Test

  • Test when internet is not available. For example, GO analysis requires downloading GO.db package. BugReport.tar.gz will be generated if there is a problem with GO.db installation.
  • Test special characters (# sign) in the experiment descriptor worksheet. A wrong <ExpDescWkSht> file used by the DHV (dynamic heatmap viewer) was generated. The reason is read.table() forgot to use comment.char() argument. See bugreportDessen project. PS the error was caught in RserveVBA.
  • Test column header contains double quotes (eg: my "GWAS" score). Some analyses like quantitative trait analysis will fail.
  • Test the case gene annotation was not done so gene symbols were not available. (miRNA project) This causes a bug in the class comparison -> IPA output. The GeneID file may miss some gene identifiers. See PreparegGeneHTMLtable.R.
  • Test the case Symbol was used in the first column of gene identifier file. This will result in two 'Symbol' column in Gene annotations worksheet and break the the code in CreateIPAOutput().
  • Extreme number of arrays (eg 3, 10) This is used to test chunking, ...
  • Extreme number of genes.
  • Use non-default options.
  • Analyze one project after another project to see if there is any memory issue.
  • Use unreasonable entry like 1.5 if the value should be less than or equal to 1.
  • Use long path name for project which causes an error when Fortran program was being launched. The wikipedia page said there is a 126 characters limit in interactive mode.
  • Test on fresh machine (no R libraries installed).
  • Data with lots of missing values.
  • Test on a subset of arrays only.
  • Run an analysis when we don't expect the data type fits in (eg RNA-Seq analysis on regular microarray data).

Windows Virtual Machine

  • Download Windows 10 iso from Microsoft (no log in is required)
  • Use MS Office 2007

Misc

Conversion of an old project

  • For an old project needs to be converted, the old project folder will be backed up with a new name 'XXXX_ArrayToolsOldProject'.

After running the class comparison

Filtered log ratio worksheet

Arrays in 'filtered log ratio worksheet' will be sorted based on the class variable (still keeping empty arrays). For example, if we select 'BRCA1 V BRCA2' column, arrays will be sorted by BRCA1, BRCA2 and empty. If we select 'BRCA1 v Sporadic', arrays will be sorted by BRCA1, empty and Sporadic. The gene expression value Html page is also sorted by the class variable (excluding empty label arrays).

Clustered heatmap of significantly expressed genes

Note that the color legend says 'Centered log ratio' (dual channel) or 'Scaled and centered value' (single channel). What 'centered' means is it will center gene expression across arrays for each.

R folder

FileName Functions Notes
Annotations.R
  • CreateAnnotations
  • ChromDistBarPlot
  • GOanalysis
  • GetGOIDsFreq
  • GOOEHTMLTable
  • UpdateGeneIdsAfterAnnotation
  • AnnotateWithGeneIdFile
BioUtil.R
  • SetRPackageDir
  • LoadRPackage
  • InstallAllRPackages
  • OutputAnnotationTextFile
  • MatchAffyAnnotation
  • GetDetectionCall
  • MatchlumiAnnotation
  • MatchlumiMethyAnnotation
  • MatchBiocAnnotation
CreateGeneExpression.R No functions
FilterAndNormalize.R
  • ApplyFilteringAndNormalization
  • AppendWarningMessage
  • DetermineReferenceArray
  • CreateRefArray
  • ApplyNormalization
  • LowessNormalization
  • ApplyTruncation
  • InitializeParameters
  • ReadFilterParamFromFile, GetParam, WriteParam
FilterData.R
  • ComputeGeneFilters
  • CompareGeneFilters
GeneLists.R
  • ParseMSigDBXMLFile
  • CreateGODataFile
  • CreateGOSets
  • CreateGeneListDataFile
  • CheckAllGenelistsInFolder
  • GetPathwaysForOneGene
  • CreatePathwayDataFile
  • MatchGenelists, CheckGenelists
  • GetLongPathwayNames
  • CompareTwoGenelists
Illumina.R
  • lumifunc
  • methyfunc
Input_to_Output_function.R
  • GetParam
  • print.filter.parameters.to.output
  • print.affy.filter.parameters.to.output
  • print.SingleChannel.filter.parameters.to.output
Misc.R
  • SubsetFilteredAndNormalizedLog
  • CheckPhenoAverageDuplic
  • AverageDuplicLG
  • ComputNumberOfPermutations
PrepareGeneTableHTML.R
  • GetGenesHTML
  • outputClassname
  • CreateIPAOutput
PreProcessLog.R
  • GetDataRMatrix
  • LogTransformation
  • PreProcessMain
WriteDataToFiles.R
  • WriteDataToFiles

Plugins/BRBfun folder

FileName Functions Notes
CreateGeneTable.R
  • CreateGeneTable
CreateHtmlTable.R
  • CreateHtmlTable
CreateMasterAnnotationsTable.R
  • CreateMasterAnnotationsTable
CreatePath.R
  • CreatePath
OutputGeneList.R
  • RemoveInvalidChar
  • OutputGeneList

VBA source code

  • AddCustomMenu() in CustomMenu.bas - the main menu starts here.
  • You can step the code by using keyboard shortcuts
    • F5: continue
    • F8: step in
    • Shift + F8: step over
    • Ctrl + F8: go to the cursor

Annotate the data

  • Consider the Affymetrix first.
  • Search "Annotate the data" in <CustomMenu.bas>. It shows if users have click "Import Affymetrix annotations", it will trigger the function "ShowAffymetrixDownload" which is located in "AffyDialog" module.
  • Set a break at some line in ShowAffymetrixDownload function
  • At the end you will see it calls the form fmDownloadAffyFile.Show() to display the download affymetrix dialog. Once the command button is clicked, the process goes to the subroutine CommandButtonImport_Click() in the form.
  • Eventually it will execute InsertGeneAnnotationSheet AnnotateFromMenu:=True. Press F8 to go into this function. We can see the function InsertGeneAnnotationSheet is defined in Annotation.bas. Note that there are 4 similar functions defined in Annotation.bas.
    • InsertGeneAnnotationSheet
    • InsertGeneAnnotationSheet_lumi
    • InsertGeneAnnotationSheet_lumiMethy
    • InsertGeneAnnotationSheet_Bioc
  • In Annotation.bas, it executes MatchProbeSetsOK = MatchProbeSets(ChipType).
  • The VBA MatchProbSets function executes a series of R commands. The most important one is the following
RRunSuccess "ReturnMsg <- MatchAffyAnnotation(ChipName, ProjectPath, UniqueIdHeader, ShowProgress=T)", , True
  • In fact, we can see what exact R function (located in <BiocUtil.R>) is called in each VBA function (located in <Annotations.bas>).
    • InsertGeneAnnotationSheet -> MatchProbeSets(ChipType) -> MatchAffyAnnotation(ChipName, QueryIdHeader)
    • InsertGeneAnnotationSheet_lumi -> MatchNuID(annopkg) -> MatchlumiAnnotation(anno, UniqueIdHeader)
    • InsertGeneAnnotationSheet_lumiMethy -> MatchTargetID(annopkg) -> MatchlumiMethyAnnotation(anno, UniqueIdHeader) or MatchMethy450kAnnotation(anno, UniqueIdHeader)
    • InsertGeneAnnotationSheet_Bioc -> MatchQueryID(Species, QueryIdHeader) -> MatchBiocAnnotation(annoPkg, QueryIdHeader)

BinaryData folder

text file

  • <ExperimentTable>

R binary file

  • <ApplyFilter.rda> (FilterData.R)
  • <GeneIds>. (FilterData.R)
  • <FilteredAndNormalizedLog1.rda> (Misc.R)

Use load("") to load them to R.

General binary files

  • See GetDataRMatrix() and WriteDataRMatrix() functions from <PreProcessLog.R> file.
    • <GreenRaw1>, <GreenRaw2>, ...
    • <RedRaw1>, <RedRaw2>, ...
    • <Flag1>, <Flag2>, ...
    • <RawData1>, <RawData2>, ...
    • <RedBkg1>, <RedBkg2> ,...
    • <GreenBkg1>, <GreenBkg2>, ...
    • <RedAdj1>, <RedAdj2>, ...
    • <GreenAdj1>, <GreenAdj2>, ...
  • See the ApplyNormalization() function in <FilterAndNormalize.R>.
    • <FilteredAndNormalizedLog1>, <FilteredAndNormalizedLog2>, ...
    • <PreProcessLog1>, <PreProcessLog2>, ...
    • <PrintTipGroups1>
    • <PrintTip1>, <PrintTip2>, ...
    • <RedAdj1>, <RedAdj2>, ...
    • <RedRaw1>, <RedRaw2>, ...

We can read the data using readBin() function in R.

DataRMatrix<-matrix(readBin(con="C:/ArrayTools/Sample datasets/Perou/Perou -Project/BinaryData/GreenRaw1", 
                    what="numeric", n=2998*3, size=4), nr=2998, byrow=F) 
# check <Import.txt> to find number of genes and chunks
str(DataMatrix) # 2998 x 3

Total number of genes

See the d_NumUniqueGenes entry in <DataParam/Import.txt> file.

Total number of arrays

See the d_NumberOfArrays entry in <DataParam/Import.txt> file.

Data importer method

See the s_DataImporter entry in <DataParam/Import.txt> file. Possibly values: Unknown (Bhatt)

Original data type

See the s_OriginalDataType entry in <DataParam/Import.txt> file. Possible values:

  • UnloggedDualRedGreenWithoutBKGData (Perou) ,
  • UnloggedDualRedGreenAndBKGData
  • LoggedDualRatio (GSE22631),
  • LoggedSingleIntensity
  • UnloggedDualRatio (BRCA),
  • UnloggedSingleIntensity (Pomeroy, Bhatt).

These data types appear at PreProcessMain() function in <PreProcessLog.R> file.

First column lable

See the s_FirstColumnLabel entry in <DataParam/Import.txt> file.

Chip type for Affy data

See the s_AffyGeneChip entry in <DataParam/Import.txt> file. For example, hgu95av2.

Run ArrayTools' Fortran Programs in a Linux environment

Yes, after you have installed WINE in Linux

brb@T3600 ~/.wine/drive_c $ wine GeneSetComparison.exe PathwayClassComparison/
Internal random seed = 123456789 will be used
Randomized variance model will be used
Linear approximation will be used
Regularization: a=   3.10437924053453     b=   1.27713340089596     KSstat=
  1.116416920309377E-002
writing p-values into file
Startimg permutations to calibrate p-values...
           8 percents completed
          16 percents completed
          25 percents completed
          33 percents completed
          41 percents completed
          50 percents completed
          58 percents completed
          66 percents completed
          75 percents completed
          83 percents completed
          91 percents completed
         100 percents completed
alpha =   5.000000000000000E-003     nDeg=         165
Writing DEGs.txt file...
Max Number of significant GeneSet categories:   165
Finished writing output files
brb@T3600 ~/.wine/drive_c $ head PathwayClassComparison/DEGs.txt
Max Number of significant GeneSet categories:   165
Index    nGenesInGeneSet    statLS           pLS              statKS           pKS
        144         24        3.19595500       0.00010026       0.48423390       0.00389526
        150         19        3.33429028       0.00019531       0.48850684       0.01059998
        154         19        3.30687478       0.00022752       0.43818127       0.03596438
         60         40        2.66205140       0.00039213       0.40681660       0.00504481
          5          6        4.33475413       0.00213406       0.71447198       0.01376502
          3          8        3.65201851       0.00349892       0.58742271       0.02999936
         85         28        2.58118709       0.00467529       0.32909420       0.14689366
          4          8        3.36955271       0.00871936       0.53517206       0.06790110
brb@T3600 ~/.wine/drive_c $ head PathwayClassComparison/Pvalues.txt
index  p
          1        0.0163586197
          2        0.4349814775
          3        0.9409634273
          4        0.1873022496
          5        0.3977063364
          6        0.3830773134
          7        0.3279943699
          8        0.9102655386
          9        0.0014973998

Good R writing habit

  • Use indentation.
  • Use parenthesis if needed. For example, the following is bad
if (something) DoSomething
else DoAnother
  • read.delim() is better than read.table(). When we use read.table() or write.table(), use their options to avoid special characters woes. Note: quote=FALSE in write.table() does not give a correct result as we expect, so use quote="" specifically.
read.table(FILENAME, stringAsFactors=FALSE, header=TRUE, sep="\t",
           na.strings=c('NA',''), fill=TRUE, comment.char='', quote="")
write.table(FILENAME, row.names=FALSE, col.names=TRUE, sep="\t", quote="")

VBA

  • To launch VBA in Windows Office without using the keyboard shortcut (Alt + F11), go to File > Options > Customize Ribbon > Check Developer.
  • Intermediate window. Use ? "Your vba statement".

RSS

ChangeDetection

  • CRAN
  • Bioconductor
  • Drug Bank
  • GSEA
  • BRB-ArrayTools

NCI Office of Technology Transfer

Guidelines for Releasing Research Software