Expression data, Illumina methylation data, Copy number data (CGH-Tools), RNA-Seq count data processed through Galaxy web tool.

Class comparison for differential expression, class prediction, graphical 2d and 3D interactive plots, gene set analysis, and more.

Gene ontology, pathways, protein domain, broad msigdb, lymphoid signatures, experimentally verified transcription factor targets, computationally predicted microRNA targets.

### Interactive 3D scatterplot of genes on the Pomeroy dataset.

X-axis is from array 'Brain_MD_1', y-axis is 'Brain_MD_2' and z-axis is 'Brian_MD_3'.

### Interactive 2D scatterplot of samples with gene annotation from a selected gene using right click menu.

The right click menu gives an option to highlight up/down-regulated genes, export gene list, copy plot to clipboard, highlight genes in gene set, link genes among plots and change properties of the plot like title, point size, color of points, fold change threshold for up/down regulated genes.

### Interactive volcano plot from the output of running a class comparison tool.

When you move mouse over a gene (point), the gene unique ID and/or symbol will be popped up.

### Dynamic Heatmap Viewer

Note that the center and scaling options in "Analysis Options" block in the dialog only affects the heatmap. The options only affect the heatmap but they do not affect the clustering. That is, the clustering was run using the original data (log intensities/ratios) without further transformation. This can be verified by running the analysis and varying the options 'center and scaling', 'center' or 'None'. The clustering dendrograms should be the same.

The same plot when we zoom in to a subset of genes (use PC mouse to select a range of genes) is

# FAQs

## General

### Error in installation

If you see the following error (The installed version of the application could not be determined. The setup will now terminate), try to move the installer to your local drive.

### NIH users

Please install the Privilege Manager software and install the BRB-ArrayTools by using the elevated permissions with your account (right click the installer and select the elevated privilege option). Do not let IT to use their administration account to install the BRB-ArrayTools for you!

### How to install BRB-ArrayTools if you have 64-bit MS-Office?

There is no difference in terms of the installation.

### After installation, I did not find the BRB-ArrayTools in Windows > Start > All Programs.

Check EXCEL. ArrayTools and CGHTools are under the Excel menu of Addon.

If you only see 'CGHTools' under the ADD-INS, it means you have skipped/ignored the screen of an instruction to the user. See the next item.

### After installation, I did not find the BRB-ArrayTools in Excel (Office 2016/365) Add-ins/ribbon

1. Click on “File->Options”. Select “Trust Center” in the left box, click on “Trust Center Setting…” and a new dialogue will pop up. In the new dialogue, select “Macro settings” and make sure “Trust access to the VBA project object model” is selected.
2. Go to “File->Options”. Click on “Add-ins” on the left, and click on the “Go…” button. Click on the “Browse” Button, browse for your ArrayTools installation folder (default C:\Program Files (x86)\ArrayTools\) Excel\ArrayTools.xla, and click on OK. Click “OK” in the “Add-ins” Dialogue.
3. Close Excel and then restart Excel.

If the "Add-ins" tab appears, please click on it to see if ArrayTools is there. If the "Add-ins" tab still cannot be found, you can go to the folder (C:\Program Files (x86)\ArrayTools\Excel) containing the "ArrayTools.xla" file, double-click to open the file, and the add-ins will be loaded as you have seen.

### After open Excel, what options I need to do in Excel before using BRB-ArrayTools

Proceed the following no matter the BRB-ArrayTools' instruction to users is on screen or not.

(Office 2007) Excel -> Home -> Options -> Trust Center -> Trust Center Settings -> Macro Settings -> Check 'Trust access to the VBA project object model' -> OK. Restart Excel.

(Office 2010 & 2013) Excel -> File -> Options -> Trust Center -> Trust Center Settings -> Macro Settings -> Check 'Trust access to the VBA project object model' -> OK. Restart Excel.

Once Excel is restarted, it will ask to enter the email address you have registered with the BRB-ArrayTools. Then click the 'Activate' button. An Rserve app will be opened and sitting on the Windows' task bar. Do not worry about it. It will be used by BRB-ArrayTools and will be closed when the Excel is closed.

2. log in as standard user and click the Go button for Manage Excel add-ins. Since BRB-ArrayTools add-in was still not showing up in the dialog box, I had to manually add it by clicking on the Browse button.

### Foreign language users, decimal mark, thousands separator

Below are some of the recommendations we typical make to foreign language users: 1: Please, make sure that the regional language settings on your machine to the "English". You can do so by going to the Control Panel -> Regional and Language Options and choose English. 2: Windows Start > Programs > Microsoft office > Microsoft Office Tools ->Microsoft Office 20XX language settings. Make sure that the "primary editing language" is "English". If the "Primary editing language" is not "English", please change it to "English" and then re-boot your machine.

If we don't want to change the setting to "English". The trick was to: in Excel Add-Ins, look for the add-ins among folders (in the German version: Durchsuchen) and to folder programs v 64 bit. Then it finds ArrayTools there, go to subfolder Excel and there it finds the Add-in. Otherwise it was invisible.

And in Excel Options -> Advanced -> Change decimal to dot . and thousands to comma. See Change the character used to separate thousands or decimals.

### Java installation

Note that the current version of Java does not work well with clusterLG.exe program on Windows 7 OS. When you run the Gene Cluster 3.0, an error screen will show up saying Error starting Java. Please make sure that javaw.exe is in your path. when you click one of the linkage method buttons (this will trigger the execution). Continue to read.

When Java run time library is installed, it will add C:\ProgramData\Oracle\Java\javapath to the environment variable PATH. Within this directory, there are 3 symbolic links java, javaw and javaws. They point to

We can check the current version of Java by running

java -version


The Java I am using is version 8 update 60 (build 1.8.0_60-b27, 8/21/2015) available from http://www.java.com/en/download/win10.jsp. If I manually download the file, the file name is called JavaSetup8u60.exe.

One possible solution is to let VBA to get Java path and then pass/add the java path to the <EisenCluster/RunEisenCluster.bat> file before calling clusterLG.exe.

### R installation directory

The default installation location for R software is OK. But if you use some other R packages like Rcpp, it is recommended to install R to C:\R folder.

Caution: Do not open another instance of R when BRB-ArrayTools is working. This may make R packages installation/update impossible.

### R: Unable to install packages

If you see the following message

Error in install.packages(update[instlib == l, "Package"], l, repos = repos,  :
unable to install packages


you want to check if you have a full privilege on R or R-x.x.x folder.

1. Open Windows Explorer (Win + e), go to C:\Program Files
2. Right click the 'R' folder ('R' folder is a parent of 'R-x.x.x' folder, so selecting it is better than selecting 'R-x.x.x'), choose 'Properties'
4. Click 'Owner' and select from the list to make sure the current user is the owner.
5. Click OK button multiple times to finish the change.

### Error: package 'X' required by 'Y' could not be found

The 'X' and 'Y' could be anything from CRAN or Bioconductor repository. One direct way to tackle the error is to open an R gui and install the missing packages manually. For example, if 'X' is 'preprocessCore' (a package in Bioconductor).

source("http://bioconductor.org/biocLite.R")
biocLite("preprocessCore")


If the missing package is from CRAN, we can use install.packages() function directly.

### Can I upgrade R or install multiple versions of R?

Installing multiple versions of R is OK provided you know some details described below.

Each version of BRB-ArrayTools has been tested with a certain version of R. So there may be a compatibility problem with certain functions used in the code if you decide to a non-default version of R.

• BRB-ArrayTools (before v4.3.0) requires StatconnDCOM which means the following conditions have to be satisfied:
• R needs to registered in the Windows's registry (it should be done if you accept all default options when R was installed).
• The R package 'rscproxy' has to be installed under the library folder the registered R. It cannot be installed under user's Document's folder as other R's packages.
• BRB-ArrayTools (from v4.3.0) requires Rserve package. That means
• Run install.packages("Rserve", repos="https://cran.rstudio.com") in an R console.
• Open Window's file manager. Check whether the Rserve package is installed under C:\Program Files\R\R-X.Y.Z\R\library folder Document\R\win-library\X.Y folder where X is the major, Y is the minor and Z is the patch number of R version.
• (If Rserve package is located under Document\R\win-library\X.Y) Rserve.exe from Documents\R\win-library\X.Y\Rserve\libs\i386 and Documents\R\win-library\X.Y\Rserve\libs\x64 subfolder has to be copied to C:\Program Files\R\R-X.Y.Z\bin\i386 and R\bin\x64 folder
• (If Rserve package is located under C:\Program Files\R\R-X.Y.Z\library) Rserve.exe from C:\Program Files\R\R-X.Y.Z\library\Rserve\libs\i386 and C:\Program Files\R\R-X.Y.Z\library\Rserve\libs\x64 subfolder has to be copied to C:\Program Files\R\R-X.Y.Z\bin\i386 and R\bin\x64 folder where X is the major, Y is the minor and Z is the patch number of R version.

Understand above will allow you to use a non-default version of R with BRB-ArrayTools.

If you want use the latest version of R for your own analysis but still use the default version of R in BRB-ArrayTools, keep reading. First, install the latest version of R as usual. Then install again the full-version of BRB-ArrayTools and R included in the BRB-ArrayTools installer. This will possibly install another version of R and register it in the Windows's registry. Now you can enjoy both versions of R as you want. The idea is when you install R, it will by default register R, but this behavior can break the setting for R to be used by BRB-ArrayTools. Once you install BRB-ArrayTools again, it will install a version of R that BRB-ArrayTools has tested and not erase any other versions of R you already have.

For example, my bioc is 2.12 which was first installed when I use R 3.0.1. But bioc 2.13 is the current version when R 3.0.2 was used. When I need to install a new package from bioc, the new package may requires a new version of bioc package.

source("http://bioconductor.org/biocLite.R")


This command will upgrade all currently installed bioc related packages. But it seems it will install lots of other bioc packages I don't need.

### My institute is using a proxy server. So how do I do with BRB-ArrayTools

See the message on BRB-ArrayTools message board. Essentially we need to create a Windows environment variable http_proxy with a value like

http://myproxy:myport
OR
OR


### Rserve

Since version 4.3.0, BRB-ArrayTools started to use Rserve as a media for the communication between R and Excel. When Rserve is required, an R window will be pop up. This R window has a blue icon on the Windows' taskbar. If you accidentally close it, it will be automatically popped up when it is needed.

See my Rserve wiki page.

### Biological replciates vs technical replicates

When the same type of organism is grown/treated under the same conditions. Or if you repeat the experiment, and keep everything the same, it is a biological replicate.

When the exact same sample (after all preparatory techniques) is analyzed multiple times, it is called the technical replicates.

### Average the replicate spots within an array

In the 'Refilter, normalize and subset the data' dialog, the checkbox of 'Average the replicate spots within an array' will compute the average on the INTENSITY level. This is different from the way other analyses are doing.

For example, if two spots (same gene ID) have log2 expression values a and b, then the average log2 expression value will be log2((2^a + 2^b)/2).

### Windows Security Warning

The message pops up when an R script was run in a parallel fashion on Windows OS. See here. Note that parallel computing is still working even the 'Cancel' button was chosen.

## Importing

### Summary

#### Data import wizard

• Affy Cel file
• Affy Gene ST array including Affymetrix human/mouse/rat Clariom S/D arrays (HT). Alternatively choose General Importer and pick Human Clariom D Array from the dropdown list. PS. it is OK to ignore the message that ID column does not have “_” at the end of probeset name. Also Clariom S and D arrays have different annotations. If you use the Affymetrix Gene ST Array Importer in ArrayTools, it will take care of annotations automatically.
• Affy probe-set summary
• RNA-Seq data from Galaxy; more specifically the FPKM value (see http://linus.nci.nih.gov/~brb/GalaxyDoc.pdf and an example of <genes.fpkm_tracking> file output from cuffdiff program (GSE32038)
• Agilent dual channel data
• Agilent single channel data
• Genepix dual channel data
• Genepix single channel data
• Illumina single channel data
• Illumina methylation data

#### General format importer

When generating the txt files of expression or gene identifiers or experiment descriptor from R, remember to choose the option row.names = FALSE.

require(tibble)
foo_at <- function(x) {
# utility to create BRB-ArrayTools expression matrix file
# It will add a column 'Id' in front of the matrix 'x'
cbind(tibble(Id=rownames(x)), as_tibble(x))
}
# Suppose the rows of 'x' contain the gene/feature ID
write.table(foo_at(someMatrix),
file = "expression.txt",
quote = F, row.names = F, sep = "\t")


#### NCBI GEO GDS

ArrayTools will download 3 files (GPLXXX.annot.txt, GDSXXX.txt and Readme_GDSXXX.txt). The <Experiment Descriptor file.txt> file is generated from the soft file (ParseGEOProjectFile() function).

The download command for the soft file (including expression and experiment descriptor) is wget -N -nd ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SOFT/GDS/GDSGDSNumber.soft.gz. The gz file (eg GDS1344.soft.gz) contains a soft file (eg GDS1344.soft) where the soft extension will be renamed to .txt.

The Readme file is only for the record and seems not to be used anymore.

#### NCBI GEO GSE

The importer will ask whether we want to do the log transformation. It's not clear what to do. However, GEO2R has a special way to guess this. For example in GSE22093, it use the following way:

gset <- getGEO("GSE22093", GSEMatrix =TRUE, getGPL=FALSE)
if (length(gset) > 1) idx <- grep("GPL96", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]

ex <- exprs(gset)

# log2 transform
qx <- as.numeric(quantile(ex, c(0., 0.25, 0.5, 0.75, 0.99, 1.0), na.rm=T))
# On an already log2 transformed data such as GSE22093, qx is
# [1] -2.994992  6.531386  8.156300  9.475783 13.930978 18.619087
# On an intensity-based data such as GSE7810, qx is
# [1]     0.00    36.20   122.80   527.80 10338.15 34114.50

LogC <- (qx[5] > 100) ||
(qx[6]-qx[1] > 50 && qx[2] > 0)
if (LogC) { ex[which(ex <= 0)] <- NaN
ex <- log2(ex) }

source("https://bioconductor.org/biocLite.R")
biocLite("GEOquery")

require(GEOquery)
# VBA
GSA <- getGEO("GSE22631", GSEMatrix = F)  # by default GSEMatrix = TRUE
DataType <- Meta(GSE)$type NumPlatform <- length(Meta(GSE)$platform_id)

table(knn=p1, nc=p2predictedTest) # nc # knn c s v # c 12 1 14 # s 0 24 1 # v 7 1 15  #### Top scoring pairs ### Quantitative trait analysis #### Find genes correlated with quantitative trait #### Predict quantitative trait algorithm (lars/lasso), include 2 way interaction (T/F), Training/predict samples indicators, % error threshold. ### Survival analysis #### Find genes correlated with survival #### Survival gene set analysis #### Survival risk prediction From the BRB-ArrayTools manual under the "Survival analysis" section: • It is important to note that the risk group for each case was determined based on a predictor that did not use that case in any way in its construction. Hence, the (cross-validated) Kaplan-Meier curves are essentially unbiased and the separation between the curves gives a fair representation of the value of the expression profiles for predicting survival risk. • (Model 1, gene expression only) The Survival Risk Group Prediction tool also provides an assessment of whether the association of expression data to survival data is statistically significant. A log-rank statistic is computed for the cross-validated Kaplan-Meier curves described above. Unfortunately, this log-rank statistic does not have the usual chi-squared distribution under the null hypothesis. This is because the data was used to create the risk groups. We can, however, obtain a valid statistical significance test by randomly shuffling the survival data among the cases and repeating the entire cross-validation process. ... The tail area of this null distribution beyond the value log-rank statistic LRd obtained for the real data is the permutation significance level for testing the null hypothesis that there is no relation between the expression data and survival. • (Model 3, gene expression + clinical covariates) The Survival Risk Group Prediction tool also lets the user evaluate whether the expression data provides more accurate predictions than that provided by standard clinical or pathological covariates or a staging system. ... The cross-validated Kaplan-Meier curves and log-rank statistics are generated for those permutations and finally a p value is determined which measures whether the expression data adds significantly to risk prediction compared to the covariates. Permutation method is used to test the following two hypotheses • the genomic data is independent of survival data (Model 1) • the genomic data do not add predictive accuracy to a survival risk model developed using a smaller number of standard covariates (Model 3) ## R packages ### Affy related packages • GCRMA: install cdf, probe, .db packages • MAS5: install cdf, .db ### Common download issue Try to download/install manually in R console. • lumi source("http://bioconductor.org/biocLite.R") biocLite("lumi", ask=F) library(lumi)  • GO.db source("http://bioconductor.org/biocLite.R") biocLite("GO.db", ask=F)  • rtiff install.packages('rtiff',repos='http://cran.r-project.org') library('rtiff')  Some integrity check should be done. From my experience, R package installation often fails in VirtualBox environment where some library folder name becomes like file8b8182625c which is a temporary folder name. In my case the folder contains another sub-folder name call "RcppArmadillo" which is required by DESeq2 package. That is, even DESeq2 can be found under the library folder, some of its dependencies may be broken. Recall that R package installation paths can be found by using the .libPaths() function. ### Writable R package directory cannot be found If you got the above message or saw the message C:/Program Files/R/R-X.Y.Z/lib is not writable, the possible causes/solutions are 1. Another non-BRB-ArrayTools initialized R GUI/Terminal is running at the same time when BRB-ArrayTools is running. 2. The current user does not have a full control on the C:\Program Files\R\ or a specific R version directory (even R was installed by the current user). Open the Windows Explorer, go to C:\Program Files\ , right click R folder and then choose Properties. In the 'Security' tab, click 'Edit' button. Check the box next to full control under User. Click OK button twice to enable the change. Done! After the change, the R's main packages will be still installed under the C:\Program Files\R\R-X.Y.Z\library folder. The other packages still go to Documents\R\win-library\X.Y\ directory if this folder has been created before; otherwise, new packages will go to C:\Program Files\R\R-X.Y.Z\library folder. ### Download required R/Bioconductor (software) packages Utilities -> Download/install required R/Bioconductor software packages. It runs the following R snippet. ArrayToolsPath <-'C:/Program Files (x86)/ArrayTools' source('C:/Program Files (x86)/ArrayTools/R/BiocUtil.r') InstallAllRPackages()  Note: An error may happen and interrupt the installation. It is caused by updating the default R packages (e.g. cluster, codetools, foreign, lattice, Matrix, mgcv, nlme, survival) [R will pop up a message Would you like to use a personal library instead?]. It happened if the R packages were installed into C:\Users\USERNAME\Documents\R\win-library\x.y instead of C:\Program Files\R\R-x.y.z\library directory. The same error can be reproduced by running the update.packages() function. A simple way to fix the problem is to ensure the user has a full control on the folder C:\Program Files\R\R-X.Y.Z\library. See Writable R package directory cannot be found. ### R We have a report that R packages cannot be installed. If we answer 'yes' to the following question, -------------- Question -------------- Would you like to create a personal library 'C:\Users\<U+C591><U+C0C1><U+D654>\Documents/R/win-library/3.1' to install packages into?  we will get an error message:  unable to create 'C:\Users\<U+C591><U+C0C1><U+D654>\Documents/R/win-library/3.1'. In addition: Warning messages: In dir.create(userdir, recursive = TRUE) : cannot create dir 'C:\Users\<U+C591><U+C0C1><U+D654>', reason 'Invalid argument'.  A similar report can be found on R help mailing list. See also this and Wikipedia for Korean char. ## Websites BRB-ArrayTools uses List of websites and their purposes (No files were sent to a cloud to be processed) http://linus.nci.nih.gov -- Check updates & registration http://cran.r-project.org -- R packages http://www.bioconductor.org -- Bioconductor packages & annotations http://www.rforge.net -- R packages http://r-forge.r-project.org -- R packages http://software.broadinstitute.org/gsea -- BROAD http://www.broad.mit.edu -- BROAD http://source-search.princeton.edu -- Annotation http://smd.stanford.edu -- Annotation http://amigo.geneontology.org -- GO http://cgap.nci.nih.gov -- Biocarta & KEGG http://www.ncbi.nlm.nih.gov -- Gene symbols http://nciarray.nci.nih.gov -- Gene symbols http://www.godatabase.org -- GO http://lymphochip.nih.gov -- Lymphoid malignancies http://www.drugbank.ca -- Drug bank http://brainarray.mbni.med.umich.edu -- Annotation http://www.gene-regulation.com -- Transcription factor http://microrna.sanger.ac.uk -- microRNA http://www.mirbase.org -- microRNA http://pfam.sanger.ac.uk --Protein domain http://smart.embl.de -- Protein domain http://cran.fhcrc.org -- R packages http://java.com -- Java http://dgidb.genome.wustl.edu -- Drug Gene Interaction database http://www.genome.jp/kegg/pathway.html -- Human diseases pathways http://www.genecards.org -- Gene information  ## Quirks • (This bug is fixed in v4.5.0 stable) Do not place the project in a very deep path. Or you may get an error: reads "Error in 'exportToR' function" and then reads "Data was not successfully exported to R. Plug-in script is now aborting." • Do not include special characters (single/double quote, percent sign, etc), in the project name, output name, column header in the experiment descriptor worksheet. The special characters include * ? < > | = + ~ @ # % ^ & |. It is defined in <PublicFunctionProcedure.bas/CheckSpecialCharactersReturnBoolean>
• Do not sort the experiment descriptor worksheet.
• R's impute package tends to crash R when the number of genes is small.
• R's pamr package failed when the number of genes is only one. The error message is
Error in rep(1, p) : invalid 'times' argument

It is a bug in pamr.train() -> nsc().
• Write the R file used in plugins in a conventional format.
• Bioconductor package 'affxparser' does not work on Windows XP. The alternative is to use Affymetrix Expression Console to pre-process your ST arrays data and then import the .txt file that are outputted from Affymetrix Expression Console into ArrayTools by using the general format importer.
• Sometimes we need to delete the parameter file (under \$ProjectFolder\BinaryData\DataParam) to solve a problem. For example, two projects were opened at the same time, or an analysis was broken during execution.
• Running in the VirtualBox environment can have some unexpected result. For example, running Import RNASeq count data wizard will hang EXCEL ("Not Responding" is showing on the title bar of the VB dialog. Also the CPU is busy too). If it is for testing purpose, we can subset the data and it will help.
• Some characters will be automatically changed by Excel. See below for some cases. To see a list of automatically converted symbols, sort the column. In additional to March, I see Sep and Dec. A solution to open the file without auto conversion is to change the column data format from 'General' to 'Text' in the import wizard. Also check HGNChelper R package.
excel real
1-Mar MARC1
2-Mar MARC2
1-Mar MARCH1
10-Mar MARCH10
2-Mar MARCH2

## Run time errors

This is a collection of run-time errors from users' report or testing.

• error 1004: make sure the specified folder exists or file can be accessed.
• error 91: 'block variable not set'. It is rscproxy is not installed correctly. This should not happen again since rcom is no longer used.
• error 76: write permission, administrative privileges.
• error 75: write permission
• error 52. Bad file name or number
• error 53: Fie not found (RServeVBA.dll). Or the data is saved in a network drive (path starts with double backslash \\). Or there is a special symbol in the array file name.
• error 14: Out of string space
• error 13: type mismatch. Special characters in files. Delete e.g. BinaryData\DataParam\ClassComparison.txt and run the class comparison again.
• error 9: subscription out of range. Variable name used in dialog is changed.
• error 7: out of memory
• error 6: Overflow. For example, when the input file is not tab delimited and we use RNA-Seq count data importer.
• error 5: invalid procedure call or argument.
• error '-2147319779' or '-2147221500': rscproxy package is not installed. This should not happen again since rcom is no longer used.

## Large data

• If the number of arrays is small but the number of probesets is large (9 arrays and 947425 probesets), the random model estimation will have a problem (700,000 probesets limit).
# Plugins Developers

• Experiment descriptor is a data frame with numerical or character data type (no factor). So we shall take extra care for cell with NA (numerical data type) or blank (character data type) value.
• Gene identifier is a data frame with factor data type.

# Support

## Self trouble-shooting

• If you got a message asking to send a bug report to [email protected], you can take a look at the <OutputReport.txt> file first (part of BugReport.tar.gz file).
• For example if the file shows Error : package 'pixmap' required by 'rtiff' could not be found, it indicates the rtiff R package cannot be found. While BRB-ArrayTools tries to install missing R packages automatically when it is running an analysis, the installation process can be broken by several reasons. In this case, you can close MS-Excel and all R processes and then open an R console and install pixmap package manually.
• Sometimes the R packages installation problem (e.g. Error in read.dcf(): cannot open the connection) can be solved by choosing a different mirror.
• Delete the ArrayTools folder (or only the Pathway folder if the problem comes from running the Biocarta Pathway analysis). Some extra old files will create an error even BRB-ArrayTools has been re-installed.

## Send an email to [email protected]

Please provide enough information to us so we can understand the problem.

• If a bug report file was generated, be sure to send it to us.
• If the question is like 'what method or parameters should be choose to run my analysis', please consult other experienced people near you.
• Since the software depends on a couple factors like Windows operation system, MS-Office, R. Please provide us more detailed information about the software background including BRB-ArrayTools.
• When sending screenshots to us, please provide all error screenshots. If you only provide any random of them, it will create a misleading to us.

# MISC

## GEO

• Illumina. For example, the txt file <GSE13040_nonorm_nobkgd.txt> from GSE13040 (Illumina MouseRef-8 v2.0 expression beadchip) is close but not the one BRB-ArrayTools requires. We can modify the header to satisfy AT's requirement.

## Affymetrix SNP arrays and cnt file for copy number analysis

To use the ‘Platform special importer’ for Affymetrix SNP arrays, the user should run the Copy Number Analysis Tool (CNAT) software, and output the *.CNT files in a batch process, where one *.CNT file is produced for each SNP array that was performed. The *.CNT output files should all be placed in one data folder, to be read by CGHTools. The “ProbeSet”, “Chromosome”, “Position” and “Log2Ratio” data columns will be automatically extracted from each *.CNT file.

Affymetrix website contains some info about Chromosome Copy Number Analysis Tool (CNAT) software. This site provides links to command line tools to process 10K, 100K, and 500K (no SNP 5 or SNP 6) data from CHP file to CNT file. In fact, the tool download page here even provides sample output for download.

We may obtain raw files of copy number data from GEO website by searching 'affymetrix' and '10k' under 'Platforms' tab as keywords.

Another approach is to use the Copy Number Analyzer for Affymetrix GeneChip (CNAG) software and then process the data file to obtain data files in tab-delimited .txt format, which can be imported into BRB-CGHTools through the General Importer.

## CGHTools

The CGHTools has several tools (segmentation, gain/loss, Gistic, pathway).

Once segmentation has been run, it is OK to jump to gain/loss, Gistic or pathway analysis. These 3 analyses have no mutual dependencies.

The CGHTools manual said when inferred integer copy number is imported at the importing step, the pathway enrichment analysis can be conducted without segmentation being performed.

We can find some Copy number data from GEO. For example, the GSE46452 has the Experiment type Genome variation profiling by genome tiling array while GSE26689 has an experiment type Genome variation profiling by array. GSE5013 from the paper has an experiment type Genome variation profiling by SNP array; SNP genotyping by SNP array. And GSE11960 has an experiment type Genome variation profiling by SNP array. GSE35873 has a type Genome variation profiling by SNP array; Genome binding/occupancy profiling by SNP array.

Copy number variation and gene expression are related in Identification of genes with a correlation between copy number and expression in gastric cancer where CGH data is in GSE33428 and gene expression data is in GSE33335. Another study is in GSE10744.

Some tutorial created by Sean Davis.