R: Difference between revisions
(→Editor) |
|||
Line 1,428: | Line 1,428: | ||
== Tricks == | == Tricks == | ||
=== Query about an R package === | |||
<pre> | |||
packageDescription() | |||
packageVersion() | |||
</pre> | |||
=== Editor === | === Editor === |
Revision as of 14:23, 7 August 2013
Install Rtools for Windows users
See http://goo.gl/gYh6C for a step-by-step instruction with screenshot.
My preferred way is not to check the option of setting PATH environment. But I manually add the followings to the PATH environment (based on Rtools v3.0)
c:\Rtools\bin; c:\Rtools\gcc-4.6.3\bin; C:\Program Files\R\R-2.15.2\bin\i386;
We can make our life easy by creating a file <Rcommand.bat> with the content (also useful if you have C:\cygwin\bin in your PATH although cygwin setup will not do it automatically for you.)
PS. I put <Rcommand.bat> under C:\Program Files\R folder. I create a shortcut called 'Rcmd' on desktop. I enter C:\Windows\System32\cmd.exe /K "Rcommand.bat" in the Target entry and "C:\Program Files\R" in Start in entry.
@echo off set PATH=C:\Rtools\bin;c:\Rtools\gcc-4.6.3\bin;%PATH% set PATH=C:\Program Files\R\R-2.15.2\bin\i386;%PATH% set PKG_LIBS=`Rscript -e "Rcpp:::LdFlags()"` set PKG_CPPFLAGS=`Rscript -e "Rcpp:::CxxFlags()"` echo Setting environment for using R cmd
So we can open the Command Prompt anywhere and run <Rcommand.bat> to get all environment variables ready! On Windows Vista, 7 and 8, we need to run it as administrator. OR we can change the security of the property so the current user can have an executive right.
Compile and install an R package
cd C:\Documents and Settings\brb wget http://www.bioconductor.org/packages/2.11/bioc/src/contrib/affxparser_1.30.2.tar.gz C:\progra~1\r\r-2.15.2\bin\R CMD INSTALL --build affxparser_1.30.2.tar.gz
Helpful - check Chapter 6 of R Installation and Administration
Check/Upload to CRAN
http://win-builder.r-project.org/
Install R using binary package
Ubuntu/Debian
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9 gksudo gedit /etc/apt/sources.list # deb http://cran.fhcrc.org/bin/linux/ubuntu precise/ sudo apt-get update sudo apt-get install r-base
Redhat el6
It should be pretty easy to install via the EPEL: http://fedoraproject.org/wiki/EPEL
Just follow the instructions to enable the EPEL and then from the CLI as root:
yum install R
or via sudo:
sudo yum install R
Install R from source (ix86, x86_64 and arm platforms, Linux system)
Debian system (focus on arm architecture with notes from x86 system)
Simplest configuration
On my debian system in Pogoplug (armv5) OR Raspberry Pi (armv6), I can compile R. See R's admin manual. If I don't need x11, I just need to install 2 required packages.
- install gfortran: apt-get install gfortran (gfortran is not part of build-essential)
- install readline library: apt-get install libreadline5-dev (pogoplug), apt-get install libreadline6-dev (raspberry pi, ubuntu)
Note: if I need x11, I should install
- libx11 and libx11-devel, libXt, libXt-devel (for fedora)
- libx11-dev (for debian) or xorg-dev (for pogoplug/raspberry pi)
and optional
- texinfo (to fix 'WARNING: you cannot build info or HTML versions of the R manuals')
Note that it is also safe to install required tools via (in ubuntu)
sudo apt-get install r-base-dev
See #Install r-base and r-base-dev Or even better with
sudo apt-get build-dep r-base
See #Install all dependencies for building R Since with the first approach, running ./configure still complains cannot x11 header/libs still missing. The second approach will pull in dependence like jdk, tcl, tex and more. The apt-get build-dep gave a more complete list than apt-get install r-base-dev for some reasons.
[Arm architecture]I also run apt-get install readline-common. I don't know if this is necessary. Since I don't need x11, I use the option in configure command. After running
wget http://cran.r-project.org/src/base/R-2/R-2.15.2.tar.gz tar xzvf R-2.15.2.tar.gz cd R-2.15.2 ./configure --with-x=no --enable-R-shlib
I got
R is now configured for armv5tel-unknown-linux-gnueabi Source directory: . Installation directory: /usr/local C compiler: gcc -std=gnu99 -g -O2 Fortran 77 compiler: gfortran -g -O2 C++ compiler: g++ -g -O2 Fortran 90/95 compiler: gfortran -g -O2 Obj-C compiler: Interfaces supported: External libraries: readline Additional capabilities: NLS Options enabled: shared R library, shared BLAS, R profiling Recommended packages: yes configure: WARNING: you cannot build info or HTML versions of the R manuals configure: WARNING: you cannot build PDF versions of the R manuals configure: WARNING: you cannot build PDF versions of vignettes and help pages configure: WARNING: I could not determine a browser configure: WARNING: I could not determine a PDF viewer
PS 1. On my raspberry pi machine, it shows R is now configured for armv6l-unknown-linux-gnueabihf.
PS 2. On my x86 system, it shows
R is now configured for x86_64-unknown-linux-gnu Source directory: . Installation directory: /usr/local C compiler: gcc -std=gnu99 -g -O2 Fortran 77 compiler: gfortran -g -O2 C++ compiler: g++ -g -O2 Fortran 90/95 compiler: gfortran -g -O2 Obj-C compiler: Interfaces supported: X11, tcltk External libraries: readline, lzma Additional capabilities: PNG, JPEG, TIFF, NLS, cairo Options enabled: shared R library, shared BLAS, R profiling, Java Recommended packages: yes
[arm] However, make gave errors for recommanded packages like KernSmooth, MASS, boot, class, cluster, codetools, foreign, lattice, mgcv, nlme, nnet, rpart, spatial, and survival. The error stems from gcc: SHLIB_LIBADD: No such file or directory. Note that I can get this error message even I try install.packages("MASS", type="source"). A suggested fix is here; adding perl = TRUE in sub() call for two lines in src/library/tools/R/install.R file. However, I got another error shared object 'MASS.so' not found. See also http://ftp.debian.org/debian/pool/main/r/r-base/.
make[1]: Entering directory `/mnt/usb/R-2.15.2/src/library/Recommended' make[2]: Entering directory `/mnt/usb/R-2.15.2/src/library/Recommended' begin installing recommended package MASS * installing *source* package 'MASS' ... ** libs make[3]: Entering directory `/tmp/Rtmp4caBfg/R.INSTALL1d1244924c77/MASS/src' gcc -std=gnu99 -I/mnt/usb/R-2.15.2/include -DNDEBUG -I/usr/local/include -fpic -g -O2 -c MASS.c -o MASS.o gcc -std=gnu99 -I/mnt/usb/R-2.15.2/include -DNDEBUG -I/usr/local/include -fpic -g -O2 -c lqs.c -o lqs.o gcc -std=gnu99 -shared -L/usr/local/lib -o MASSSHLIB_EXT MASS.o lqs.o SHLIB_LIBADD -L/mnt/usb/R-2.15.2/lib -lR gcc: SHLIB_LIBADD: No such file or directory make[3]: *** [MASSSHLIB_EXT] Error 1 make[3]: Leaving directory `/tmp/Rtmp4caBfg/R.INSTALL1d1244924c77/MASS/src' ERROR: compilation failed for package 'MASS' * removing '/mnt/usb/R-2.15.2/library/MASS' make[2]: *** [MASS.ts] Error 1 make[2]: Leaving directory `/mnt/usb/R-2.15.2/src/library/Recommended' make[1]: *** [recommended-packages] Error 2 make[1]: Leaving directory `/mnt/usb/R-2.15.2/src/library/Recommended' make: *** [stamp-recommended] Error 2 root@debian:/mnt/usb/R-2.15.2# root@debian:/mnt/usb/R-2.15.2# bin/R R version 2.15.2 (2012-10-26) -- "Trick or Treat" Copyright (C) 2012 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: armv5tel-unknown-linux-gnueabi (32-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > library(MASS) Error in library(MASS) : there is no package called 'MASS' > library() Packages in library '/mnt/usb/R-2.15.2/library': base The R Base Package compiler The R Compiler Package datasets The R Datasets Package grDevices The R Graphics Devices and Support for Colours and Fonts graphics The R Graphics Package grid The Grid Graphics Package methods Formal Methods and Classes parallel Support for Parallel computation in R splines Regression Spline Functions and Classes stats The R Stats Package stats4 Statistical Functions using S4 Classes tcltk Tcl/Tk Interface tools Tools for Package Development utils The R Utils Package > Sys.info()["machine"] machine "armv5tel" > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 170369 4.6 350000 9.4 350000 9.4 Vcells 163228 1.3 905753 7.0 784148 6.0
See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=679180
PS 3. The complete log of building R from source is in here File:Build R log.txt
Full configuration
Interfaces supported: X11, tcltk External libraries: readline Additional capabilities: PNG, JPEG, TIFF, NLS, cairo Options enabled: shared R library, shared BLAS, R profiling, Java
Update: R 3.0.1 on Beaglebone Black (armv7a) + Ubuntu 13.04
See the page here.
Install all dependencies for building R
This is a comprehensive list. This list is even larger than r-base-dev.
root@debian:/mnt/usb/R-2.15.2# apt-get build-dep r-base Reading package lists... Done Building dependency tree Reading state information... Done The following packages will be REMOVED: libreadline5-dev The following NEW packages will be installed: bison ca-certificates ca-certificates-java debhelper defoma ed file fontconfig gettext gettext-base html2text intltool-debian java-common libaccess-bridge-java libaccess-bridge-java-jni libasound2 libasyncns0 libatk1.0-0 libaudit0 libavahi-client3 libavahi-common-data libavahi-common3 libblas-dev libblas3gf libbz2-dev libcairo2 libcairo2-dev libcroco3 libcups2 libdatrie1 libdbus-1-3 libexpat1-dev libflac8 libfontconfig1-dev libfontenc1 libfreetype6-dev libgif4 libglib2.0-dev libgtk2.0-0 libgtk2.0-common libice-dev libjpeg62-dev libkpathsea5 liblapack-dev liblapack3gf libnewt0.52 libnspr4-0d libnss3-1d libogg0 libopenjpeg2 libpango1.0-0 libpango1.0-common libpango1.0-dev libpcre3-dev libpcrecpp0 libpixman-1-0 libpixman-1-dev libpng12-dev libpoppler5 libpulse0 libreadline-dev libreadline6-dev libsm-dev libsndfile1 libthai-data libthai0 libtiff4-dev libtiffxx0c2 libunistring0 libvorbis0a libvorbisenc2 libxaw7 libxcb-render-util0 libxcb-render-util0-dev libxcb-render0 libxcb-render0-dev libxcomposite1 libxcursor1 libxdamage1 libxext-dev libxfixes3 libxfont1 libxft-dev libxi6 libxinerama1 libxkbfile1 libxmu6 libxmuu1 libxpm4 libxrandr2 libxrender-dev libxss-dev libxt-dev libxtst6 luatex m4 openjdk-6-jdk openjdk-6-jre openjdk-6-jre-headless openjdk-6-jre-lib openssl pkg-config po-debconf preview-latex-style shared-mime-info tcl8.5-dev tex-common texi2html texinfo texlive-base texlive-binaries texlive-common texlive-doc-base texlive-extra-utils texlive-fonts-recommended texlive-generic-recommended texlive-latex-base texlive-latex-extra texlive-latex-recommended texlive-pictures tk8.5-dev tzdata-java whiptail x11-xkb-utils x11proto-render-dev x11proto-scrnsaver-dev x11proto-xext-dev xauth xdg-utils xfonts-base xfonts-encodings xfonts-utils xkb-data xserver-common xvfb zlib1g-dev 0 upgraded, 136 newly installed, 1 to remove and 0 not upgraded. Need to get 139 MB of archives. After this operation, 410 MB of additional disk space will be used. Do you want to continue [Y/n]?
Web Applications
Create HTML5 web and slides
http://www.gastonsanchez.com/depot/knitr-slides. The HTML5 slides work on my IE 8 too.
HTML5 slides examples
- http://yihui.name/slides/knitr-slides.html
- http://yihui.name/slides/2012-knitr-RStudio.html
- http://yihui.name/slides/2011-r-dev-lessons.html#slide1
- http://inundata.org/R_talks/BARUG/#intro
Software requirement
- Rstudio
- knitr, XML, RCurl (See omegahat for installation on Ubuntu)
- pandoc package This is a command line tool. I am testing it on Windows 7.
Slide #22 gives an instruction to create
- regular html file by using RStudio -> Knit HTML button
- HTML5 slides by using pandoc from command line.
Files:
- Rcmd source: 009-slides.Rmd Note that IE 8 was not supported by github. For IE 9, be sure to turn off "Compatibility View".
- markdown output: 009-slides.md
- HTML output: 009-slides.html
We can create Rcmd source in Rstudio by File -> New -> R Markdown.
There are 4 ways to produce slides with pandoc
- S5
- DZSlides
- Slidy
- Slideous
Use the markdown file (md) and convert it with pandoc
pandoc -s -S -i -t dzslides --mathjax html5_slides.md -o html5_slides.html
If we are comfortable with HTML and CSS code, open the html file (generated by pandoc) and modify the CSS style at will.
Markdown language
According to wikipedia:
Markdown is a lightweight markup language, originally created by John Gruber with substantial contributions from Aaron Swartz, allowing people “to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML)”.
- Markup is a general term for content formatting - such as HTML - but markdown is a library that generates HTML markup.
- Nice summary from stackoverflow.com and more complete list from github.
- An example https://gist.github.com/jeromyanglim/2716336
- Convert mediawiki to markdown using online conversion tool from pandoc.
- R markdown file and use it in RStudio. Customizing Chunk Options can be found in knitr page and rpubs.com.
HTTP protocol
- http://en.wikipedia.org/wiki/File:Http_request_telnet_ubuntu.png
- Query string
- How to capture http header? Use curl -i en.wikipedia.org.
- Web Inspector. Build-in in Chrome. Right click on any page and choose 'Inspect Element'.
- Web server
shiny
The following is what we see on a browser after we run an example from shiny package. See http://rstudio.github.com/shiny/tutorial/#hello-shiny. Note that the R session needs to be on; i.e. R command prompt will not be returned unless we press Ctrl+C or ESC.
shiny depends on websockets, caTools, bitops, digest packages.
Q & A:
- Q: If we run runExample('01_hello') in Rserve from an R client, we can continue our work in R client without losing the functionality of the GUI from shiny. Question: how do we kill the job?
- If I run the example "01_hello", the browser only shows the control but not graph on Firefox? A: Use Chrome or Opera as the default browser.
- If I run the example "01_hello" on RHEL the first time, it works fine. But if I click 'Ctrl + C' to stop it and run it again, I got a message
Warning in .SOCK_SERVE(port) : R-Websockets(tcpserv): bind() failed. Error in createContext(port, webpage, is.binary = is.binary) : Unable to bind socket on port 8100; is it realsy in use?
A simple solution is to close R and open it again.
- Q: Deployment on web. A: Not ready yet. Shiny server platform is still under beta testing. Shiny apps are hosted using the R websockets package which acts more like a tcp server than a web server, and that architecture just doesn't fit with rApache, or even apache for that matter.
- Q: How difficult to put the code in Gist:github? A: Just create an account. Do not even need to create a repository. Just go to http://gist.github.com and create a new gist. The new gist can be secret or public. A secret gist can not be edited again after it is created although it works fine when it was used in runGist() function.
shiny server
See https://github.com/rstudio/shiny-server
It works on my ubuntu server. To test, I need to run
sudo shiny-server # maybe I need to add ampersand '&' sign to the end of the above command
Heatmap Example
http://taichi.selfip.net:3838/hello/
RApache
gWidgetsWWW
- http://www.jstatsoft.org/v49/i10/paper
- gWidgetsWWW2 gWidgetsWWW based on Rook
- Compare shiny with gWidgetsWWW2.rapache
Rook
Since R 2.13, the internal web server was exposed.
Tutorual from useR2012 and Jeffrey Horner
Here is another one from http://www.rinfinance.com.
Rook is also supported by [rApache too. See http://rapache.net/manual.html.
Google group. https://groups.google.com/forum/?fromgroups#!forum/rrook
Advantage
- the web applications are created on desktop, whether it is Windows, Mac or Linux.
- No Apache is needed.
- create multiple applications at the same time. This complements the limit of rApache.
4 lines of code example.
library(Rook) s <- Rhttpd$new() s$start(quiet=TRUE) s$print() s$browse(1) # OR s$browse("RookTest")
Notice that after s$browse() command, the cursor will return to R because the command just a shortcut to open the web page http://127.0.0.1:10215/custom/RookTest.
We can add Rook application to the server; see ?Rhttpd.
s$add( app=system.file('exampleApps/helloworld.R',package='Rook'),name='hello' ) s$add( app=system.file('exampleApps/helloworldref.R',package='Rook'),name='helloref' ) s$add( app=system.file('exampleApps/summary.R',package='Rook'),name='summary' ) s$print() #Server started on 127.0.0.1:10221 #[1] RookTest http://127.0.0.1:10221/custom/RookTest #[2] helloref http://127.0.0.1:10221/custom/helloref #[3] summary http://127.0.0.1:10221/custom/summary #[4] hello http://127.0.0.1:10221/custom/hello # Stops the server but doesn't uninstall the app ## Not run: s$stop() ## End(Not run) s$remove(all=TRUE) rm(s)
For example, the interface and the source code of summary app are given below
app <- function(env) { req <- Rook::Request$new(env) res <- Rook::Response$new() res$write('Choose a CSV file:\n') res$write('<form method="POST" enctype="multipart/form-data">\n') res$write('<input type="file" name="data">\n') res$write('<input type="submit" name="Upload">\n</form>\n<br>') if (!is.null(req$POST())){ data <- req$POST()[['data']] res$write("<h3>Summary of Data</h3>"); res$write("<pre>") res$write(paste(capture.output(summary(read.csv(data$tempfile,stringsAsFactors=FALSE)),file=NULL),collapse='\n')) res$write("</pre>") res$write("<h3>First few lines (head())</h3>"); res$write("<pre>") res$write(paste(capture.output(head(read.csv(data$tempfile,stringsAsFactors=FALSE)),file=NULL),collapse='\n')) res$write("</pre>") } res$finish() }
More example:
- http://lamages.blogspot.com/2012/08/rook-rocks-example-with-googlevis.html
- Self-organizing map
- Deploy Rook apps with rApache. First one and two.
Stockplot
FastRWeb
Rwui
CGHWithR (removed from CRAN)
But it is still working with old version of R.
Creating local repository for CRAN and Bioconductor (focus on Windows binary packages only)
How to set up a local repository
- CRAN specific: http://cran.r-project.org/mirror-howto.html
- Bioconductor specific: http://www.bioconductor.org/about/mirrors/mirror-how-to/
General guide: http://cran.r-project.org/doc/manuals/R-admin.html#Setting-up-a-package-repository
Utilities such as install.packages can be pointed at any CRAN-style repository, and R users may want to set up their own. The ‘base’ of a repository is a URL such as http://www.omegahat.org/R/: this must be an URL scheme that download.packages supports (which also includes ‘ftp://’ and ‘file://’, but not on most systems ‘https://’). Under that base URL there should be directory trees for one or more of the following types of package distributions:
- "source": located at src/contrib and containing .tar.gz files. Other forms of compression can be used, e.g. .tar.bz2 or .tar.xz files.
- "win.binary": located at bin/windows/contrib/x.y for R versions x.y.z and containing .zip files for Windows.
- "mac.binary.leopard": located at bin/macosx/leopard/contrib/x.y for R versions x.y.z and containing .tgz files.
Each terminal directory must also contain a PACKAGES file. This can be a concatenation of the DESCRIPTION files of the packages separated by blank lines, but only a few of the fields are needed. The simplest way to set up such a file is to use function write_PACKAGES in the tools package, and its help explains which fields are needed. Optionally there can also be a PACKAGES.gz file, a gzip-compressed version of PACKAGES—as this will be downloaded in preference to PACKAGES it should be included for large repositories. (If you have a mis-configured server that does not report correctly non-existent files you will need PACKAGES.gz.)
To add your repository to the list offered by setRepositories(), see the help file for that function.
A repository can contain subdirectories, when the descriptions in the PACKAGES file of packages in subdirectories must include a line of the form
Path: path/to/subdirectory
—once again write_PACKAGES is the simplest way to set this up.
Space requirement if we want to mirror WHOLE repository
- Whole CRAN takes about 92GB (rsync -avn cran.r-project.org::CRAN > ~/Downloads/cran).
- Bioconductor is big (> 64G for BioC 2.11). Please check the size of what will be transferred with e.g. (rsync -avn bioconductor.org::2.11 > ~/Downloads/bioc) and make sure you have enough room on your local disk before you start.
On the other hand, we if only care about Windows binary part, the space requirement is largely reduced.
- CRAN: 2.7GB
- Bioconductor: 28GB.
Misc notes
- If the binary package was built on R 2.15.1, then it cannot be installed on R 2.15.2. But vice is OK.
- Remember to issue "--delete" option in rsync, otherwise old version of package will be installed.
- The repository still need src directory. If it is missing, we will get an error
Warning: unable to access index for repository http://arraytools.no-ip.org/CRAN/src/contrib Warning message: package ‘glmnet’ is not available (for R version 2.15.2)
The error was given by available.packages() function.
To bypass the requirement of src directory, I can use
install.packages("glmnet", contriburl = contrib.url(getOption('repos'), "win.binary"))
but there may be a problem when we use biocLite() command.
I find a workaround. Since the error comes from missing CRAN/src directory, we just need to make sure the directory CRAN/src/contrib exists AND either CRAN/src/contrib/PACKAGES or CRAN/src/contrib/PACKAGES.gz exists.
To create CRAN repository
Before creating a local repository please give a dry run first. You don't want to be surprised how long will it take to mirror a directory.
Dry run (-n option). Pipe out the process to a text file for an examination.
rsync -avn cran.r-project.org::CRAN > crandryrun.txt
To mirror only partial repository, it is necessary to create directories before running rsync command.
cd mkdir -p ~/Rmirror/CRAN/bin/windows/contrib/2.15 rsync -rtlzv --delete cran.r-project.org::CRAN/bin/windows/contrib/2.15/ ~/Rmirror/CRAN/bin/windows/contrib/2.15 (one line with space before ~/Rmirror) # src directory is very large (~27GB) since it contains source code for each R version. # We just need the files PACKAGES and PACKAGES.gz in CRAN/src/contrib. So I comment out the following line. # rsync -rtlzv --delete cran.r-project.org::CRAN/src/ ~/Rmirror/CRAN/src/ mkdir -p ~/Rmirror/CRAN/src/contrib rsync -rtlzv --delete cran.r-project.org::CRAN/src/contrib/PACKAGES ~/Rmirror/CRAN/src/contrib/ rsync -rtlzv --delete cran.r-project.org::CRAN/src/contrib/PACKAGES.gz ~/Rmirror/CRAN/src/contrib/
And optionally
library(tools) write_PACKAGES("~/Rmirror/CRAN/bin/windows/contrib/2.15", type="win.binary")
and if we want to get src directory
rsync -rtlzv --delete cran.r-project.org::CRAN/src/contrib/*.tar.gz ~/Rmirror/CRAN/src/contrib/ rsync -rtlzv --delete cran.r-project.org::CRAN/src/contrib/2.15.3 ~/Rmirror/CRAN/src/contrib/
We can use du -h to check the folder size.
For example (as of 1/7/2013),
$ du -k ~/Rmirror --max-depth=1 --exclude ".*" | sort -nr | cut -f2 | xargs -d '\n' du -sh 30G /home/brb/Rmirror 28G /home/brb/Rmirror/Bioc 2.7G /home/brb/Rmirror/CRAN
To create Bioconductor repository
Dry run
rsync -avn bioconductor.org::2.11 > biocdryrun.txt
Then creates directories before running rsync.
cd mkdir -p ~/Rmirror/Bioc wget -N http://www.bioconductor.org/biocLite.R -P ~/Rmirror/Bioc
where -N is to overwrite original file if the size or timestamp change and -P in wget means an output directory, not a file name.
Optionally, we can add the following in order to see the Bioconductor front page.
rsync -zrtlv --delete bioconductor.org::2.11/BiocViews.html ~/Rmirror/Bioc/packages/2.11/ rsync -zrtlv --delete bioconductor.org::2.11/index.html ~/Rmirror/Bioc/packages/2.11/
The software part (aka bioc directory) installation:
cd mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/bin/windows mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/src rsync -zrtlv --delete bioconductor.org::2.11/bioc/bin/windows/ ~/Rmirror/Bioc/packages/2.11/bioc/bin/windows # Either rsync whole src directory or just essential files # rsync -zrtlv --delete bioconductor.org::2.11/bioc/src/ ~/Rmirror/Bioc/packages/2.11/bioc/src rsync -zrtlv --delete bioconductor.org::2.11/bioc/src/contrib/PACKAGES ~/Rmirror/Bioc/packages/2.11/bioc/src/contrib/ rsync -zrtlv --delete bioconductor.org::2.11/bioc/src/contrib/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/bioc/src/contrib/ # Optionally the html part mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/html rsync -zrtlv --delete bioconductor.org::2.11/bioc/html/ ~/Rmirror/Bioc/packages/2.11/bioc/html mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/vignettes rsync -zrtlv --delete bioconductor.org::2.11/bioc/vignettes/ ~/Rmirror/Bioc/packages/2.11/bioc/vignettes mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/news rsync -zrtlv --delete bioconductor.org::2.11/bioc/news/ ~/Rmirror/Bioc/packages/2.11/bioc/news mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/licenses rsync -zrtlv --delete bioconductor.org::2.11/bioc/licenses/ ~/Rmirror/Bioc/packages/2.11/bioc/licenses mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/manuals rsync -zrtlv --delete bioconductor.org::2.11/bioc/manuals/ ~/Rmirror/Bioc/packages/2.11/bioc/manuals mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/readmes rsync -zrtlv --delete bioconductor.org::2.11/bioc/readmes/ ~/Rmirror/Bioc/packages/2.11/bioc/readmes
and annotation (aka data directory) part:
mkdir -p ~/Rmirror/Bioc/packages/2.11/data/annotation/bin/windows mkdir -p ~/Rmirror/Bioc/packages/2.11/data/annotation/src/contrib # one line for each of the following rsync -zrtlv --delete bioconductor.org::2.11/data/annotation/bin/windows/ ~/Rmirror/Bioc/packages/2.11/data/annotation/bin/windows rsync -zrtlv --delete bioconductor.org::2.11/data/annotation/src/contrib/PACKAGES ~/Rmirror/Bioc/packages/2.11/data/annotation/src/contrib/ rsync -zrtlv --delete bioconductor.org::2.11/data/annotation/src/contrib/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/data/annotation/src/contrib/
and experiment directory:
mkdir -p ~/Rmirror/Bioc/packages/2.11/data/experiment/bin/windows/contrib/2.15 mkdir -p ~/Rmirror/Bioc/packages/2.11/data/experiment/src/contrib # one line for each of the following # Note that we are cheating by only downloading PACKAGES and PACKAGES.gz files rsync -zrtlv --delete bioconductor.org::2.11/data/experiment/bin/windows/contrib/2.15/PACKAGES ~/Rmirror/Bioc/packages/2.11/data/experiment/bin/windows/contrib/2.15/ rsync -zrtlv --delete bioconductor.org::2.11/data/experiment/bin/windows/contrib/2.15/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/data/experiment/bin/windows/contrib/2.15/ rsync -zrtlv --delete bioconductor.org::2.11/data/experiment/src/contrib/PACKAGES ~/Rmirror/Bioc/packages/2.11/data/experiment/src/contrib/ rsync -zrtlv --delete bioconductor.org::2.11/data/experiment/src/contrib/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/data/experiment/src/contrib/
and extra directory:
mkdir -p ~/Rmirror/Bioc/packages/2.11/extra/bin/windows/contrib/2.15 mkdir -p ~/Rmirror/Bioc/packages/2.11/extra/src/contrib # one line for each of the following # Note that we are cheating by only downloading PACKAGES and PACKAGES.gz files rsync -zrtlv --delete bioconductor.org::2.11/extra/bin/windows/contrib/2.15/PACKAGES ~/Rmirror/Bioc/packages/2.11/extra/bin/windows/contrib/2.15/ rsync -zrtlv --delete bioconductor.org::2.11/extra/bin/windows/contrib/2.15/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/extra/bin/windows/contrib/2.15/ rsync -zrtlv --delete bioconductor.org::2.11/extra/src/contrib/PACKAGES ~/Rmirror/Bioc/packages/2.11/extra/src/contrib/ rsync -zrtlv --delete bioconductor.org::2.11/extra/src/contrib/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/extra/src/contrib/
To test local repository
Create soft links in Apache server
su ln -s /home/brb/Rmirror/CRAN /var/www/html/CRAN ln -s /home/brb/Rmirror/Bioc /var/www/html/Bioc ls -l /var/www/html
The soft link mode should be 777.
To test CRAN
Replace the host name arraytools.no-ip.org by IP address 10.133.2.111 if necessary.
r <- getOption("repos"); r["CRAN"] <- "http://arraytools.no-ip.org/CRAN" options(repos=r) install.packages("glmnet")
We can test if the backup server is working or not by installing a package which was removed from the CRAN. For example, 'ForImp' was removed from CRAN in 11/8/2012, but I still a local copy built on R 2.15.2 (run rsync on 11/6/2012).
r <- getOption("repos"); r["CRAN"] <- "http://cran.r-project.org" r <- c(r, BRB='http://arraytools.no-ip.org/CRAN') # CRAN CRANextra BRB # "http://cran.r-project.org" "http://www.stats.ox.ac.uk/pub/RWin" "http://arraytools.no-ip.org/CRAN" options(repos=r) install.packages('ForImp')
Note by default, CRAN mirror is selected interactively.
> getOption("repos") CRAN CRANextra "@CRAN@" "http://www.stats.ox.ac.uk/pub/RWin"
To test Bioconductor
# CRAN part: r <- getOption("repos"); r["CRAN"] <- "http://arraytools.no-ip.org/CRAN" options(repos=r) # Bioconductor part: options("BioC_mirror" = "http://arraytools.no-ip.org/Bioc") source("http://bioconductor.org/biocLite.R") # This source biocLite.R line can be placed either before or after the previous 2 lines biocLite("aCGH")
If there is a connection problem, check folder attributes.
chmod -R 755 ~/CRAN/bin
- Note that if a binary package was created for R 2.15.1, then it can be installed under R 2.15.1 but not R 2.15.2. The R console will show package xxx is not available (for R version 2.15.2).
- For binary installs, the function also checks for the availability of a source package on the same repository, and reports if the source package has a later version, or is available but no binary version is.
So for example, if the mirror does not have contents under src directory, we need to run the following line in order to successfully run install.packages() function.
options(install.packages.check.source = "no")
- If we only mirror the essential directories, we can run biocLite() successfully. However, the R console will give some warning
> biocLite("aCGH") BioC_mirror: http://arraytools.no-ip.org/Bioc Using Bioconductor version 2.11 (BiocInstaller 1.8.3), R version 2.15. Installing package(s) 'aCGH' Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/data/experiment/src/contrib Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/extra/src/contrib Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/data/experiment/bin/windows/contrib/2.15 Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/extra/bin/windows/contrib/2.15 trying URL 'http://arraytools.no-ip.org/Bioc/packages/2.11/bioc/bin/windows/contrib/2.15/aCGH_1.36.0.zip' Content type 'application/zip' length 2431158 bytes (2.3 Mb) opened URL downloaded 2.3 Mb package ‘aCGH’ successfully unpacked and MD5 sums checked The downloaded binary packages are in C:\Users\limingc\AppData\Local\Temp\Rtmp8IGGyG\downloaded_packages Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/data/experiment/bin/windows/contrib/2.15 Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/extra/bin/windows/contrib/2.15 > library()
CRAN repository directory structure
The information below is specific to R 2.15.2. There are linux and macosx subdirecotries whenever there are windows subdirectory.
bin/winows/contrib/2.15 src/contrib /contrib/2.15.2 /contrib/Archive web/checks /dcmeta /packages /views
A clickable map [1]
Bioconductor repository directory structure
The information below is specific to Bioc 2.11. There are linux and macosx subdirecotries whenever there are windows subdirectory.
bioc/bin/windows/contrib/2.15 /html /install /license /manuals /news /src /vignettes data/annotation/bin/windows/contrib/2.15 /html /licenses /manuals /src /vignettes /experiment/bin/windows/contrib/2.15 /html /manuals /src/contrib /vignettes extra/bin/windows/contrib /html /src /vignettes
List all R packages from CRAN/Bioconductor
Check my daily result based on R 2.15 and Bioc 2.11 in [2]
Parallel Computing
Example code for the book Parallel R by McCallum and Weston.
snowfall package
Cloud Computing
Install R on Amazon EC2
http://randyzwitch.com/r-amazon-ec2/
Big Data Analysis
http://blog.comsysto.com/2013/02/14/my-favorite-community-links/
Useful R packages
RInside
- http://dirk.eddelbuettel.com/code/rinside.html
- http://dirk.eddelbuettel.com/papers/rfinance2010_rcpp_rinside_tutorial_handout.pdf
See my demo on Youtube of RInside + Qt.
With RInside + web toolkit, we can also create a web application. To demonstrate the example in examples/wt directory, we can do
cd ~/R/x86_64-pc-linux-gnu-library/3.0/RInside/examples/wt make sudo ./wtdensity --docroot . --http-address localhost --http-port 8080
Then we can go to the browser's address bar and type http://localhost:8080 to see how it works (a screenshot is in here).
Ubuntu
Straightforward. If we want to run the example from examples/qt directory, we simply need to install Qt from apt-get. I have tested the qtdensity example successfully with Qt 4.
Windows 7
- Make sure R is installed under C:\ instead of C:\Program Files if we don't want to get an error like g++.exe: error: Files/R/R-3.0.1/library/RInside/include: No such file or directory.
- Install RTools
- Instal RInside package from source (the binary version will give an error )
- Create a DOS batch file containing necessary paths in PATH environment variable
@echo off set PATH=C:\Rtools\bin;c:\Rtools\gcc-4.6.3\bin;%PATH% set PATH=C:\R\R-3.0.1\bin\i386;%PATH% set PKG_LIBS=`Rscript -e "Rcpp:::LdFlags()"` set PKG_CPPFLAGS=`Rscript -e "Rcpp:::CxxFlags()"` set R_HOME=C:\R\R-3.0.1 echo Setting environment for using R cmd
- In the Windows command prompt
cd C:\R\R-3.0.1\library\RInside\examples\standard make -f Makefile.win
Now we can test by running any of executable files that make generates. For example, rinside_sample0.
rinside_sample0
As for the qdensity program, we need to make sure the same version of MinGW was used in building RInside/Rcpp and Qt. See some discussion in
- http://stackoverflow.com/questions/12280707/using-rinside-with-qt-in-windows
- http://www.mail-archive.com/[email protected]/msg04377.html
Large data (~100 terabytes)
See also HighPerformanceComputing
- RHadoop
- Hive
- MapReduce. Introduction by Linux Journal.
XML
On Ubuntu, we need to install libxml2-dev before we can install XML package.
sudo apt-get update sudo apt-get install libxml2-dev
GenOrd: Generate ordinal and discrete variables with given correlation matrix and marginal distributions
rjson
http://heuristically.wordpress.com/2013/05/20/geolocate-ip-addresses-in-r/
RJSONIO
Rcpp
caret
xlsx package
ggplot2
stringr and plyr
http://martinsbioblogg.wordpress.com/2013/03/24/using-r-reading-tables-that-need-a-little-cleaning/
A data.frame is pretty much a list of vectors, so we use plyr to apply over the list and stringr to search and replace in the vectors.
jpeg
If we want to create the image on this wiki left hand side panel, we can use jpeg package to read an existing plot and then edit and save it.
cairoDevice
For ubuntu OS, we need to install 2 libraries
sudo apt-get install libgtk2.0-dev libcairo2-dev
Different ways of using R
Create HTML 5 web and slides
See here
Create academic report
reports package and github repository. The youtube video gives an overview of the package.
Create Word report
knitr + pandoc
- http://www.r-statistics.com/2013/03/write-ms-word-document-using-r-with-as-little-overhead-as-possible/
- http://www.carlboettiger.info/2012/04/07/writing-reproducibly-in-the-open-with-knitr.html
It is better to create rmd file in RStudio. Rstudio provides a template for rmd file and it also provides a quick reference to R markdown language.
# Idea: # knitr pandoc # rmd -------> md --------> docx library(knitr) knit2html("example.rmd") #Create md and html files
and then
FILE <- "example" system(paste0("pandoc -o ", FILE, ".docx ", FILE, ".md"))
Note. For example reason, if I play around the above 2 commands for several times, the knit2html() does not work well. However, if I click 'Knit HTML' button on the RStudio, it then works again.
Another way is
library(pander) name = "demo" knit(paste0(name, ".Rmd"), encoding = "utf-8") Pandoc.brew(file = paste0(name, ".md"), output = paste0(-name, "docx"), convert = "docx")
Note that once we have used knitr command to create a md file, we can use pandoc shell command to convert it to different formats:
- A pdf file: pandoc -s report.md -t latex -o report.pdf
- A html file: pandoc -s report.md -o report.html (with the -c flag html files can be added easily)
- Openoffice: pandoc report.md -o report.odt
- Word docx: pandoc report.md -o report.docx
pander
Try pandoc[1] with a minimal reproducible example, you might give a try to my "pander" package [2] too:
library(pander) Pandoc.brew(system.file('examples/minimal.brew', package='pander'), output = tempfile(), convert = 'docx')
Where the content of the "minimal.brew" file is something you might have got used to with Sweave - although it's using "brew" syntax instead. See the examples of pander [3] for more details. Please note that pandoc should be installed first, which is pretty easy on Windows.
- http://johnmacfarlane.net/pandoc/
- http://rapporter.github.com/pander/
- http://rapporter.github.com/pander/#examples
R2wd
Use R2wd package. However, only 32-bit R is allowed and sometimes it can not produce all 'table's.
> library(R2wd) > wdGet() Loading required package: rcom Loading required package: rscproxy rcom requires a current version of statconnDCOM installed. To install statconnDCOM type installstatconnDCOM() This will download and install the current version of statconnDCOM You will need a working Internet connection because installation needs to download a file. Error in if (wdapp[["Documents"]][["Count"]] == 0) wdapp[["Documents"]]$Add() : argument is of length zero
The solution is to launch 32-bit R instead of 64-bit R since statconnDCOM does not support 64-bit R.
Convert from pdf to word
The best rendering of advanced tables is done by converting from pdf to Word. See http://biostat.mc.vanderbilt.edu/wiki/Main/SweaveConvert
rtf
Use rtf package for Rich Text Format (RTF) Output.
xtable
Package xtable will produce html output. If you save the file and then open it with Word, you will get serviceable results. I've had better luck copying the output from xtable and pasting it into Excel.
R and C/C++ communicate
Call R from C/C++
- Use eval() function. See R-Ext 8.1 and 8.2.
- http://stackoverflow.com/questions/7457635/calling-r-function-from-c
R calls C/C++
- http://faculty.washington.edu/kenrice/sisg-adv/sisg-07.pdf
- http://www.stat.berkeley.edu/scf/paciorek-cppWorkshop.pdf
- http://www.stat.harvard.edu/ccr2005/
- http://www.sfu.ca/~sblay/R-C-interface.txt
SEXP
Use R under proxy
http://support.rstudio.org/help/kb/faq/configuring-r-to-use-an-http-proxy
What is the best place to save Rconsole on Windows platform
Put it in C:/Users/USERNAME/Documents folder so no matter how R was upgraded/downgraded, it always find my preference.
Web scraping
http://www.slideshare.net/schamber/web-data-from-r#btnNext
Launch Rstudio
If multiple versions of R was detected, Rstudio can not be launched successfully. A java-like clock will be spinning without a stop. The trick is to click Ctrl key and click the Rstudio at the same time. After done that, it will show up a selection of R to choose from.
List files using regular expression
- Extension
list.files(pattern = "\\.txt$")
- Start with
list.files(pattern = "^Something")
Hidden tool: rsync in Rtools
c:\Rtools\bin>rsync -avz "/cygdrive/c/users/limingc/Downloads/a.exe" "/cygdrive/c/users/limingc/Documents/" sending incremental file list a.exe sent 323142 bytes received 31 bytes 646346.00 bytes/sec total size is 1198416 speedup is 3.71 c:\Rtools\bin>
And rsync works best when we need to sync folder.
c:\Rtools\bin>rsync -avz "/cygdrive/c/users/limingc/Downloads/binary" "/cygdrive/c/users/limingc/Documents/" sending incremental file list binary/ binary/Eula.txt binary/cherrytree.lnk binary/depends64.chm binary/depends64.dll binary/depends64.exe binary/mtputty.exe binary/procexp.chm binary/procexp.exe binary/pscp.exe binary/putty.exe binary/sqlite3.exe binary/wget.exe sent 4115294 bytes received 244 bytes 1175868.00 bytes/sec total size is 8036311 speedup is 1.95 c:\Rtools\bin>rm c:\users\limingc\Documents\binary\procexp.exe cygwin warning: MS-DOS style path detected: c:\users\limingc\Documents\binary\procexp.exe Preferred POSIX equivalent is: /cygdrive/c/users/limingc/Documents/binary/procexp.exe CYGWIN environment variable option "nodosfilewarning" turns off this warning. Consult the user's guide for more details about POSIX paths: http://cygwin.com/cygwin-ug-net/using.html#using-pathnames c:\Rtools\bin>rsync -avz "/cygdrive/c/users/limingc/Downloads/binary" "/cygdrive/c/users/limingc/Documents/" sending incremental file list binary/ binary/procexp.exe sent 1767277 bytes received 35 bytes 3534624.00 bytes/sec total size is 8036311 speedup is 4.55 c:\Rtools\bin>
Unforunately, if the destination is a network drive, I could get a permission denied (13) error. See also http://superuser.com/questions/69620/rsync-file-permissions-on-windows
Install rgdal package on ubuntu
sudo apt-get install libgdal1-dev libproj-dev R > install.packages("rgdal")
Embedding R
- See Writing for R Extensions Manual Chapter 8.
- Talk by Simon Urbanek in UseR 2004.
- https://stat.ethz.ch/pipermail/r-help/attachments/20110729/b7d86ed7/attachment.pl
An Example from Bioconductor Workshop
First make sure before 'make' R, R is configured with
./configure --enable-R-shlib
Reference http://bioconductor.org/help/course-materials/2012/Seattle-Oct-2012/AdvancedR.pdf
mli@PhenomIIx6:~/Downloads/R-2.15.2/library/AdvancedR/embedding$ export R_HOME=/home/mli/Downloads/R-2.15.2 mli@PhenomIIx6:~/Downloads/R-2.15.2/library/AdvancedR/embedding$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/mli/Downloads/R-2.15.2/lib mli@PhenomIIx6:~/Downloads/R-2.15.2/library/AdvancedR/embedding$ g++ embed.c -I/home/mli/Downloads/R-2.15.2/include -L/home/mli/Downloads/R-2.15.2/lib -lR mli@PhenomIIx6:~/Downloads/R-2.15.2/library/AdvancedR/embedding$ R CMD ./a.out WARNING: ignoring environment value of R_HOME R version 2.15.2 (2012-10-26) -- "Trick or Treat" Copyright (C) 2012 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. ns> require(stats); require(graphics) ns> ns(women$height, df = 5) 1 2 3 4 5 [1,] 0.000000e+00 0.000000e+00 0.00000000 0.00000000 0.0000000000 [2,] 7.592323e-03 0.000000e+00 -0.08670223 0.26010669 -0.1734044626 [3,] 6.073858e-02 0.000000e+00 -0.15030440 0.45091320 -0.3006088020 [4,] 2.047498e-01 6.073858e-05 -0.16778345 0.50335034 -0.3355668952 [5,] 4.334305e-01 1.311953e-02 -0.13244035 0.39732106 -0.2648807067 [6,] 6.256681e-01 8.084305e-02 -0.07399720 0.22199159 -0.1479943948 [7,] 6.477162e-01 2.468416e-01 -0.02616007 0.07993794 -0.0532919575 [8,] 4.791667e-01 4.791667e-01 0.01406302 0.02031093 -0.0135406187 [9,] 2.468416e-01 6.477162e-01 0.09733619 0.02286023 -0.0152401533 [10,] 8.084305e-02 6.256681e-01 0.27076826 0.06324188 -0.0405213106 [11,] 1.311953e-02 4.334305e-01 0.48059836 0.12526031 -0.0524087186 [12,] 6.073858e-05 2.047498e-01 0.59541597 0.19899261 0.0007809246 [13,] 0.000000e+00 6.073858e-02 0.50097182 0.27551020 0.1627793975 [14,] 0.000000e+00 7.592323e-03 0.22461127 0.35204082 0.4157555879 [15,] 0.000000e+00 0.000000e+00 -0.14285714 0.42857143 0.7142857143 attr(,"degree") [1] 3 attr(,"knots") 20% 40% 60% 80% 60.8 63.6 66.4 69.2 attr(,"Boundary.knots") [1] 58 72 attr(,"intercept") [1] FALSE attr(,"class") [1] "ns" "basis" "matrix" ns> summary(fm1 <- lm(weight ~ ns(height, df = 5), data = women)) Call: lm(formula = weight ~ ns(height, df = 5), data = women) Residuals: Min 1Q Median 3Q Max -0.38333 -0.12585 0.07083 0.15401 0.30426 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 114.7447 0.2338 490.88 < 2e-16 *** ns(height, df = 5)1 15.9474 0.3699 43.12 9.69e-12 *** ns(height, df = 5)2 25.1695 0.4323 58.23 6.55e-13 *** ns(height, df = 5)3 33.2582 0.3541 93.93 8.91e-15 *** ns(height, df = 5)4 50.7894 0.6062 83.78 2.49e-14 *** ns(height, df = 5)5 45.0363 0.2784 161.75 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.2645 on 9 degrees of freedom Multiple R-squared: 0.9998, Adjusted R-squared: 0.9997 F-statistic: 9609 on 5 and 9 DF, p-value: < 2.2e-16 ns> ## example of safe prediction ns> plot(women, xlab = "Height (in)", ylab = "Weight (lb)") ns> ht <- seq(57, 73, length.out = 200) ns> lines(ht, predict(fm1, data.frame(height=ht))) ns> ## Don't show: ns> ## Consistency: ns> x <- c(1:3,5:6) ns> stopifnot(identical(ns(x), ns(x, df = 1)), ns+ identical(ns(x, df=2), ns(x, df=2, knots=NULL)),# not true till 2.15.2 ns+ !is.null(kk <- attr(ns(x), "knots")),# not true till 1.5.1 ns+ length(kk) == 0) ns> ## End Don't show ns> ns> ns>
The above result can be compared with running
mli@PhenomIIx6:~/Downloads/R-2.15.2/library/AdvancedR/embedding$ R WARNING: ignoring environment value of R_HOME
R version 2.15.2 (2012-10-26) -- "Trick or Treat" Copyright (C) 2012 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R.
> library(splines) > example("ns")
ns> require(stats); require(graphics)
ns> ns(women$height, df = 5)
1 2 3 4 5 [1,] 0.000000e+00 0.000000e+00 0.00000000 0.00000000 0.0000000000 [2,] 7.592323e-03 0.000000e+00 -0.08670223 0.26010669 -0.1734044626 [3,] 6.073858e-02 0.000000e+00 -0.15030440 0.45091320 -0.3006088020 [4,] 2.047498e-01 6.073858e-05 -0.16778345 0.50335034 -0.3355668952 [5,] 4.334305e-01 1.311953e-02 -0.13244035 0.39732106 -0.2648807067 [6,] 6.256681e-01 8.084305e-02 -0.07399720 0.22199159 -0.1479943948 [7,] 6.477162e-01 2.468416e-01 -0.02616007 0.07993794 -0.0532919575 [8,] 4.791667e-01 4.791667e-01 0.01406302 0.02031093 -0.0135406187 [9,] 2.468416e-01 6.477162e-01 0.09733619 0.02286023 -0.0152401533
[10,] 8.084305e-02 6.256681e-01 0.27076826 0.06324188 -0.0405213106 [11,] 1.311953e-02 4.334305e-01 0.48059836 0.12526031 -0.0524087186 [12,] 6.073858e-05 2.047498e-01 0.59541597 0.19899261 0.0007809246 [13,] 0.000000e+00 6.073858e-02 0.50097182 0.27551020 0.1627793975 [14,] 0.000000e+00 7.592323e-03 0.22461127 0.35204082 0.4157555879 [15,] 0.000000e+00 0.000000e+00 -0.14285714 0.42857143 0.7142857143 attr(,"degree") [1] 3 attr(,"knots")
20% 40% 60% 80%
60.8 63.6 66.4 69.2 attr(,"Boundary.knots") [1] 58 72 attr(,"intercept") [1] FALSE attr(,"class") [1] "ns" "basis" "matrix"
ns> summary(fm1 <- lm(weight ~ ns(height, df = 5), data = women))
Call: lm(formula = weight ~ ns(height, df = 5), data = women)
Residuals:
Min 1Q Median 3Q Max
-0.38333 -0.12585 0.07083 0.15401 0.30426
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 114.7447 0.2338 490.88 < 2e-16 *** ns(height, df = 5)1 15.9474 0.3699 43.12 9.69e-12 *** ns(height, df = 5)2 25.1695 0.4323 58.23 6.55e-13 *** ns(height, df = 5)3 33.2582 0.3541 93.93 8.91e-15 *** ns(height, df = 5)4 50.7894 0.6062 83.78 2.49e-14 *** ns(height, df = 5)5 45.0363 0.2784 161.75 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2645 on 9 degrees of freedom Multiple R-squared: 0.9998, Adjusted R-squared: 0.9997 F-statistic: 9609 on 5 and 9 DF, p-value: < 2.2e-16
ns> ## example of safe prediction
ns> plot(women, xlab = "Height (in)", ylab = "Weight (lb)")
ns> ht <- seq(57, 73, length.out = 200)
ns> lines(ht, predict(fm1, data.frame(height=ht)))
ns> ## Don't show: ns> ## Consistency: ns> x <- c(1:3,5:6)
ns> stopifnot(identical(ns(x), ns(x, df = 1)), ns+ identical(ns(x, df=2), ns(x, df=2, knots=NULL)),# not true till 2.15.2 ns+ !is.null(kk <- attr(ns(x), "knots")),# not true till 1.5.1 ns+ length(kk) == 0)
ns> ## End Don't show ns> ns> ns>
Note that if I follow the instruction to put embed.c at the end of g++ command, I will get an error.
mli@PhenomIIx6:~/Downloads/R-2.15.2/library/AdvancedR/embedding$ g++ -I/home/mli/Downloads/R-2.15.2/include -L/home/mli/Downloads/R-2.15.2/lib -lR embed.c /tmp/cc7Vum5j.o: In function `main': embed.c:(.text+0x1c): undefined reference to `Rf_initEmbeddedR' embed.c:(.text+0x2b): undefined reference to `Rf_endEmbeddedR' /tmp/cc7Vum5j.o: In function `doSplinesExample()': embed.c:(.text+0x45): undefined reference to `Rf_mkString' embed.c:(.text+0x52): undefined reference to `Rf_install' embed.c:(.text+0x5d): undefined reference to `Rf_lang2' embed.c:(.text+0x6d): undefined reference to `Rf_protect' embed.c:(.text+0x74): undefined reference to `R_GlobalEnv' embed.c:(.text+0x87): undefined reference to `R_tryEval' embed.c:(.text+0x91): undefined reference to `Rf_unprotect' embed.c:(.text+0x9b): undefined reference to `Rf_ScalarLogical' embed.c:(.text+0xa8): undefined reference to `Rf_install' embed.c:(.text+0xb3): undefined reference to `Rf_lang2' embed.c:(.text+0xc3): undefined reference to `Rf_protect' embed.c:(.text+0xcd): undefined reference to `Rf_install' embed.c:(.text+0xdc): undefined reference to `CDR' embed.c:(.text+0xe7): undefined reference to `SET_TAG' embed.c:(.text+0xee): undefined reference to `R_GlobalEnv' embed.c:(.text+0x102): undefined reference to `R_tryEval' embed.c:(.text+0x10c): undefined reference to `Rf_unprotect' embed.c:(.text+0x116): undefined reference to `Rf_mkString' embed.c:(.text+0x123): undefined reference to `Rf_install' embed.c:(.text+0x12e): undefined reference to `Rf_lang2' embed.c:(.text+0x13e): undefined reference to `Rf_protect' embed.c:(.text+0x145): undefined reference to `R_GlobalEnv' embed.c:(.text+0x158): undefined reference to `R_tryEval' embed.c:(.text+0x162): undefined reference to `Rf_unprotect' collect2: ld returned 1 exit status
RApache
Rserve
(Commercial) StatconnDcom
R.NET
RJava
RCaller
Set up Emacs on Windows
Edit the file C:\Program Files\GNU Emacs 23.2\site-lisp\site-start.el with something like
(setq-default inferior-R-program-name "c:/program files/r/r-2.15.2/bin/i386/rterm.exe")
Database
RMySQL
RSQLite
Not suitable for client/server architecture. The limit is quite large; see here.
Github
R source
- https://github.com/wch/r-source/ Daily update, interesting, should be visited every day. Clicking 1000+ commits to look at daily changes.
- https://github.com/SurajGupta/r-source (update for each R release)
github
https://github.com/languages/R
My collection
- https://github.com/arraytools
- https://gist.github.com/4383351 heatmap using leukemia data
- https://gist.github.com/4382774 heatmap using sequential data
- https://gist.github.com/4484270 biocLite
How to download
Clone ~ Download.
- Command line
git clone https://gist.github.com/4484270.git
This will create a subdirectory called '4484270' with all cloned files there.
- Within R
library(devtools) source_gist("4484270")
or First download the json file from
https://api.github.com/users/MYUSERLOGIN/gists
and then
library(RJSONIO) x <- fromJSON("~/Downloads/gists.json") setwd("~/Downloads/") gist.id <- lapply(x, "[[", "id") lapply(gist.id, function(x){ cmd <- paste0("git clone https://gist.github.com/", x, ".git") system(cmd) })
Tricks
Query about an R package
packageDescription() packageVersion()
Editor
http://en.wikipedia.org/wiki/R_(programming_language)#Editors_and_IDEs
- Rstudio - editor/R terminal/R graphics/file browser/package manager.
- geany - I like the feature that it shows defined functions on the side panel even for R code.
- Rgedit which includes a feature of splitting screen into two panes and run R in the bottom panel. See here.
- Komodo IDE with browser preview http://www.youtube.com/watch?v=wv89OOw9roI at 4:06 and http://docs.activestate.com/komodo/4.4/editor.html
Create a new R package, namespace, documentation
Create R package from R code with roxyPackage
http://lamages.blogspot.com/2013/03/create-r-package-from-single-r-file.html
llply() from plyr package
llply is equivalent to lapply except that it will preserve labels and can display a progress bar. This is handy if we want to do a crazy thing.
LLID2GOIDs <- lapply(rLLID, function(x) get("org.Hs.egGO")[[x]])
where rLLID is a list of entrez ID. For example,
get("org.Hs.egGO")[["6772"]]
returns a list of 49 GOs.
mclapply() from paralle package is a mult-core version of lapply()
Note that Windows OS can not take advantage of it.
Another choice for Windows OS is to use parLapply() function in parallel package.
ncores <- as.integer( Sys.getenv('NUMBER_OF_PROCESSORS') ) cl <- makeCluster(getOption("cl.cores", ncores)) LLID2GOIDs2 <- parLapply(cl, rLLID, function(x) { library(org.Hs.eg.db); get("org.Hs.egGO")[[x]]} ) stopCluster(cl)
It does work. Cut the computing time from 100 sec to 29 sec on 4 cores.
regular expression
- ?regexpr in R
- http://biostat.mc.vanderbilt.edu/wiki/pub/Main/SvetlanaEdenRFiles/regExprTalk.pdf
- http://www.johndcook.com/r_language_regex.html
- http://en.wikibooks.org/wiki/R_Programming/Text_Processing#Regular_Expressions
- http://www.endmemo.com/program/R/grep.php
- http://ucfagls.wordpress.com/2012/08/15/processing-sample-labels-using-regular-expressions-in-r/
- http://www.dummies.com/how-to/content/how-to-use-regular-expressions-in-r.html
- http://www.r-bloggers.com/example-8-27-using-regular-expressions-to-read-data-with-variable-number-of-words-in-a-field/
- http://www.r-bloggers.com/using-regular-expressions-in-r-case-study-in-cleaning-a-bibtex-database/
- http://cbio.ensmp.fr/~thocking/papers/2011-08-16-directlabels-and-regular-expressions-for-useR-2011/2011-useR-named-capture-regexp.pdf
- http://stackoverflow.com/questions/5214677/r-find-the-last-dot-in-a-string
- http://stackoverflow.com/questions/10294284/remove-all-special-characters-from-a-string-in-r
Not specific to R
- http://www.autohotkey.com/docs/misc/RegEx-QuickRef.htm
- http://opencompany.org/download/regex-cheatsheet.pdf
Example
- grep("\\.zip$", pkgs) or grep("\\.tar.gz$", pkgs)
Clipboard
source("clipboard") read.table("clipboard")
read/manipulate binary data
- x <- readBin(fn, raw(), file.info(fn)$size)
- rawToChar(x[1:16])
- See Biostrings C API
read/download/source a file from internet
- Simple text file
retail <- read.csv("http://robjhyndman.com/data/ausretail.csv",header=FALSE)
- Zip file
con = gzcon(url('http://www.systematicportfolio.com/sit.gz', 'rb')) source(con) close(con)
- Google drive file based on https
require(RCurl) myCsv <- getURL("https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AkuuKBh0jM2TdGppUFFxcEdoUklCQlJhM2kweGpoUUE&single=true&gid=0&output=csv") read.csv(textConnection(myCsv))
- Github files https
http://support.rstudio.org/help/kb/faq/configuring-r-to-use-an-http-proxy
Create publication tables using tables package
See p13 for example in http://www.ianwatson.com.au/stata/tabout_tutorial.pdf
R's tables packages is the best solution. For example,
> library(tables) > tabular( (Species + 1) ~ (n=1) + Format(digits=2)* + (Sepal.Length + Sepal.Width)*(mean + sd), data=iris ) Sepal.Length Sepal.Width Species n mean sd mean sd setosa 50 5.01 0.35 3.43 0.38 versicolor 50 5.94 0.52 2.77 0.31 virginica 50 6.59 0.64 2.97 0.32 All 150 5.84 0.83 3.06 0.44 > str(iris) 'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
and
# This example shows some of the less common options > Sex <- factor(sample(c("Male", "Female"), 100, rep=TRUE)) > Status <- factor(sample(c("low", "medium", "high"), 100, rep=TRUE)) > z <- rnorm(100)+5 > fmt <- function(x) { s <- format(x, digits=2) even <- ((1:length(s)) %% 2) == 0 s[even] <- sprintf("(%s)", s[even]) s } > tabular( Justify(c)*Heading()*z*Sex*Heading(Statistic)*Format(fmt())*(mean+sd) ~ Status ) Status Sex Statistic high low medium Female mean 4.88 4.96 5.17 sd (1.20) (0.82) (1.35) Male mean 4.45 4.31 5.05 sd (1.01) (0.93) (0.75)
See also a collection of R packages related to reproducible research in http://cran.r-project.org/web/views/ReproducibleResearch.html
Create flat tables in R console using ftable()
> ftable(Titanic, row.vars = 1:3) Survived No Yes Class Sex Age 1st Male Child 0 5 Adult 118 57 Female Child 0 1 Adult 4 140 2nd Male Child 0 11 Adult 154 14 Female Child 0 13 Adult 13 80 3rd Male Child 35 13 Adult 387 75 Female Child 17 14 Adult 89 76 Crew Male Child 0 0 Adult 670 192 Female Child 0 0 Adult 3 20 > ftable(Titanic, row.vars = 1:2, col.vars = "Survived") Survived No Yes Class Sex 1st Male 118 62 Female 4 141 2nd Male 154 25 Female 13 93 3rd Male 422 88 Female 106 90 Crew Male 670 192 Female 3 20 > ftable(Titanic, row.vars = 2:1, col.vars = "Survived") Survived No Yes Sex Class Male 1st 118 62 2nd 154 25 3rd 422 88 Crew 670 192 Female 1st 4 141 2nd 13 93 3rd 106 90 Crew 3 20 > str(Titanic) table [1:4, 1:2, 1:2, 1:2] 0 0 35 0 0 0 17 0 118 154 ... - attr(*, "dimnames")=List of 4 ..$ Class : chr [1:4] "1st" "2nd" "3rd" "Crew" ..$ Sex : chr [1:2] "Male" "Female" ..$ Age : chr [1:2] "Child" "Adult" ..$ Survived: chr [1:2] "No" "Yes" > x <- ftable(mtcars[c("cyl", "vs", "am", "gear")]) > x gear 3 4 5 cyl vs am 4 0 0 0 0 0 1 0 0 1 1 0 1 2 0 1 0 6 1 6 0 0 0 0 0 1 0 2 1 1 0 2 2 0 1 0 0 0 8 0 0 12 0 0 1 0 0 2 1 0 0 0 0 1 0 0 0 > ftable(x, row.vars = c(2, 4)) cyl 4 6 8 am 0 1 0 1 0 1 vs gear 0 3 0 0 0 0 12 0 4 0 0 0 2 0 0 5 0 1 0 1 0 2 1 3 1 0 2 0 0 0 4 2 6 2 0 0 0 5 0 1 0 0 0 0 > > ## Start with expressions, use table()'s "dnn" to change labels > ftable(mtcars$cyl, mtcars$vs, mtcars$am, mtcars$gear, row.vars = c(2, 4), dnn = c("Cylinders", "V/S", "Transmission", "Gears")) Cylinders 4 6 8 Transmission 0 1 0 1 0 1 V/S Gears 0 3 0 0 0 0 12 0 4 0 0 0 2 0 0 5 0 1 0 1 0 2 1 3 1 0 2 0 0 0 4 2 6 2 0 0 0 5 0 1 0 0 0 0
Handling length 2^31 and more in R 3.0.0
From R News for 3.0.0 release:
There is a subtle change in behaviour for numeric index values 2^31 and larger. These never used to be legitimate and so were treated as NA, sometimes with a warning. They are now legal for long vectors so there is no longer a warning, and x[2^31] <- y will now extend the vector on a 64-bit platform and give an error on a 32-bit one.
In R 2.15.2, if I try to assign a vector of length 2^31, I will get an error
> x <- seq(1, 2^31) Error in from:to : result would be too long a vector
However, for R 3.0.0 (tested on my 64-bit Ubuntu with 16GB RAM. The R was compiled by myself):
> system.time(x <- seq(1,2^31)) user system elapsed 8.604 11.060 120.815 > length(x) [1] 2147483648 > length(x)/2^20 [1] 2048 > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 183823 9.9 407500 21.8 350000 18.7 Vcells 2147764406 16386.2 2368247221 18068.3 2148247383 16389.9 >
Note:
- 2^31 length is about 2 Giga length. It takes about 16 GB (2^31*8/2^20 MB) memory.
- On Windows, it is almost impossible to work with 2^31 length of data if the memory is less than 16 GB because virtual disk on Windows does not work well. For example, when I tested on my 12 GB Windows 7, the whole Windows system freezes for several minutes before I force to power off the machine.
- My slide in http://goo.gl/g7sGX shows the screenshots of running the above command on my Ubuntu and RHEL machines. As you can see the linux is pretty good at handling large (> system RAM) data. That said, as long as your linux system is 64-bit, you can possibly work on large data without too much pain.
- For large dataset, it makes sense to use database or specially crafted packages like bigmemory or ff.
NA in index
- Question: what is seq(1, 3)[c(1, 2, NA)]?
Answer: It will reserve the element with NA in indexing and return the value NA for it.
- Question: What is TRUE & NA?
Answer: NA
- Question: What is FALSE & NA?
Answer: FALSE
- Question: c("A", "B", NA) != "" ?
Answer: TRUE TRUE NA
- Question: which(c("A", "B", NA) != "") ?
Answer: 1 2
- Question: c(1, 2, NA) != "" & !is.na(c(1, 2, NA)) ?
Answer: TRUE TRUE FALSE
- Question: c("A", "B", NA) != "" & !is.na(c("A", "B", NA)) ?
Answer: TRUE TRUE FALSE
Conclusion: In order to exclude empty or NA for numerical or character data type, we can use which() or a convenience function keep.complete(x) <- function(x) x != "" & !is.na(x). This will guarantee return logical values and not contain NAs.
Don't just use x != "" OR !is.na(x).
Creating publication quality graphs in R
with() and within() functions
within() is similar to with() except it is used to create new columns and merge them with the original data sets. See youtube video.
closePr <- with(mariokart, totalPr - shipPr) head(closePr, 20) mk <- within(mariokart, { closePr <- totalPr - shipPr }) head(mk) # new column closePr mk <- mariokart aggregate(. ~ wheels + cond, mk, mean) # create mean according to each level of (wheels, cond) aggregate(totalPr ~ wheels + cond, mk, mean) tapply(mk$totalPr, mk[, c("wheels", "cond")], mean)
Draw Color Palette
Read excel files
My experience is to save the Excel file as csv file (before it can be read into R) if it is possible.
Serialization
If we want to pass an R object to C (use recv() function), we can use writeBin() to output the stream size and then use serialize() function to output the stream to a file. See the post on R mailing list.
> a <- list(1,2,3) > a_serial <- serialize(a, NULL) > a_length <- length(a_serial) > a_length [1] 70 > writeBin(as.integer(a_length), connection, endian="big") > serialize(a, connection)
In C++ process, I receive one int variable first to get the length, and then read <length> bytes from the connection.
socketConnection
See ?socketconnection.
Simple example
from the socketConnection's manual.
Open one R session
con1 <- socketConnection(port = 22131, server = TRUE) # wait until a connection from some client writeLines(LETTERS, con1) close(con1)
Open another R session (client)
con2 <- socketConnection(Sys.info()["nodename"], port = 22131) # as non-blocking, may need to loop for input readLines(con2) while(isIncomplete(con2)) { Sys.sleep(1) z <- readLines(con2) if(length(z)) print(z) } close(con2)
Use nc in client
The client does not have to be the R. We can use telnet, nc, etc. See the post here. For example, on the client machine, we can issue
nc localhost 22131 [ENTER]
Then the client will wait and show anything written from the server machine. The connection from nc will be terminated once close(con1) is given.
If I use the command
nc -v -w 2 localhost -z 22130-22135
then the connection will be established for a short time which means the cursor on the server machine will be returned. If we issue the above nc command again on the client machine it will show the connection to the port 22131 is refused. PS. "-w" switch denotes the number of seconds of the timeout for connects and final net reads.
Some post I don't have a chance to read. http://digitheadslabnotebook.blogspot.com/2010/09/how-to-send-http-put-request-from-r.html
Use curl command in client
On the server,
con1 <- socketConnection(port = 8080, server = TRUE)
On the client,
curl --trace-ascii debugdump.txt http://localhost:8080/
Then go to the server,
while(nchar(x <- readLines(con1, 1)) > 0) cat(x, "\n") close(con1) # return cursor in the client machine
Use telnet command in client
On the server,
con1 <- socketConnection(port = 8080, server = TRUE)
On the client,
sudo apt-get install telnet telnet localhost 8080 abcdefg hijklmn qestst
Go to the server,
readLines(con1, 1) readLines(con1, 1) readLines(con1, 1) close(con1) # return cursor in the client machine
Some tutorial about using telnet on http request. And this is a summary of using telnet.