R
Install Rtools for Windows users
See http://goo.gl/gYh6C for a step-by-step instruction with screenshot.
My preferred way is not to check the option of setting PATH environment. But I manually add the followings to the PATH environment (based on Rtools v3.0)
c:\Rtools\bin; c:\Rtools\gcc-4.6.3\bin; C:\Program Files\R\R-2.15.2\bin\i386;
We can make our life easy by creating a file <Rcommand.bat> with the content (also useful if you have C:\cygwin\bin in your PATH although cygwin setup will not do it automatically for you.)
PS. I put <Rcommand.bat> under C:\Program Files\R folder. I create a shortcut called 'Rcmd' on desktop. I enter C:\Windows\System32\cmd.exe /K "Rcommand.bat" in the Target entry and "C:\Program Files\R" in Start in entry.
@echo off set PATH=C:\Rtools\bin;c:\Rtools\gcc-4.6.3\bin;%PATH% set PATH=C:\Program Files\R\R-2.15.2\bin\i386;%PATH% set PKG_LIBS=`Rscript -e "Rcpp:::LdFlags()"` set PKG_CPPFLAGS=`Rscript -e "Rcpp:::CxxFlags()"` echo Setting environment for using R cmd
So we can open the Command Prompt anywhere and run <Rcommand.bat> to get all environment variables ready! On Windows Vista, 7 and 8, we need to run it as administrator. OR we can change the security of the property so the current user can have an executive right.
Windows Toolset
Note that R on Windows supports Mingw-w64 (not Mingw which is a separate project). See here for the issue of developing a Qt application that links against R using Rcpp. And http://qt-project.org/wiki/MinGW is the wiki for compiling Qt using MinGW and MinGW-w64.
Build R from its source
Reference: http://cran.r-project.org/doc/manuals/R-admin.html#Installing-R-under-Windows
Download tcl file from http://www.stats.ox.ac.uk/pub/Rtools/R_Tcl_8-5-8.zip Unzip and put 'Tcl' into R_HOME folder.
"Open a command prompt as Administrator"
Below it is supposed we have created a directory C:\R and the source tar ball is placed under this directory.
set PATH=c:\Rtools\bin;c:\Rtools\gcc-4.6.3\bin;%PATH% cd C:\R tar xzvf R-3.0.2.tar.gz cd R-3.0.2\src\gnuwin32 set TMPDIR=C:/tmp make all recommended
This builds R (3.0.2) and all recommended packages.
See also Create_a_standalone_Rmath_library below about how to create and use a standalone Rmath library in your own C/C++/Fortran program. For example, if you want to know the 95-th percentile of a T distribution or generate a bunch of random variables, you don't need to search internet to find a library; you can just use Rmath library.
Compile and install an R package
cd C:\Documents and Settings\brb wget http://www.bioconductor.org/packages/2.11/bioc/src/contrib/affxparser_1.30.2.tar.gz C:\progra~1\r\r-2.15.2\bin\R CMD INSTALL --build affxparser_1.30.2.tar.gz
Helpful - check Chapter 6 of R Installation and Administration
Check/Upload to CRAN
http://win-builder.r-project.org/
64 bit toolchain
See January 2010 email https://stat.ethz.ch/pipermail/r-devel/2010-January/056301.html and R-Admin manual.
From R 2.11.0 there is 64 bit Windows binary for R.
Install R using binary package on Linux OS
Ubuntu/Debian
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9 sudo nano /etc/apt/sources.list # For Ubuntu 14.04 (codename is trusty) # deb http://cran.rstudio.com/bin/linux/ubuntu trusty/ sudo apt-get update sudo apt-get install r-base
Redhat el6
It should be pretty easy to install via the EPEL: http://fedoraproject.org/wiki/EPEL
Just follow the instructions to enable the EPEL and then from the CLI as root:
yum install R
or via sudo:
sudo yum install R
Install R from source (ix86, x86_64 and arm platforms, Linux system)
Debian system (focus on arm architecture with notes from x86 system)
Simplest configuration
On my debian system in Pogoplug (armv5), Raspberry Pi (armv6) OR Beaglebone Black & Udoo(armv7), I can compile R. See R's admin manual. If the OS needs x11, I just need to install 2 required packages.
- install gfortran: apt-get install build-essential gfortran (gfortran is not part of build-essential)
- install readline library: apt-get install libreadline5-dev (pogoplug), apt-get install libreadline6-dev (raspberry pi/BBB), apt-get install libreadline-dev (Ubuntu)
Note: if I need X11, I should install
- libX11 and libX11-devel, libXt, libXt-devel (for fedora)
- libx11-dev (for debian) or xorg-dev (for pogoplug/raspberry pi/BBB)
and optional
- texinfo (to fix 'WARNING: you cannot build info or HTML versions of the R manuals')
Note that it is also safe to install required tools via (please run nano /etc/apt/sources.list to include the repository of your favorite R mirror and also run sudo apt-get update first)
sudo apt-get build-dep r-base
The above command will install R dependence like jdk, tcl, tex, etc. The apt-get build-dep gave a more complete list than apt-get install r-base-dev for some reasons.
[Arm architecture] I also run apt-get install readline-common. I don't know if this is necessary. If x11 is not needed or not available (eg Pogoplug), I can add --with-x=no option in ./configure command. If R will be called from other applications such as Rserve, I can add --enable-R-shlib option in ./configure command. Check out ./configure --help to get a complete list of all options.
After running
wget http://cran.r-project.org/src/base/R-2/R-2.15.2.tar.gz tar xzvf R-2.15.2.tar.gz cd R-2.15.2 ./configure --enable-R-shlib
(--enable-R-shlib option will create a shared R library libR.so in $RHOME/lib subdirectory. This allows R to be embedded in other applications. See Embedding R.) I got
R is now configured for armv5tel-unknown-linux-gnueabi Source directory: . Installation directory: /usr/local C compiler: gcc -std=gnu99 -g -O2 Fortran 77 compiler: gfortran -g -O2 C++ compiler: g++ -g -O2 Fortran 90/95 compiler: gfortran -g -O2 Obj-C compiler: Interfaces supported: External libraries: readline Additional capabilities: NLS Options enabled: shared R library, shared BLAS, R profiling Recommended packages: yes configure: WARNING: you cannot build info or HTML versions of the R manuals configure: WARNING: you cannot build PDF versions of the R manuals configure: WARNING: you cannot build PDF versions of vignettes and help pages configure: WARNING: I could not determine a browser configure: WARNING: I could not determine a PDF viewer
After that, we can run make to create R binary. If the computer has multiple cores, we can run make in parallel by using the -j flag (for example, '-j 4' means to run 4 jobs simultaneously). We can also add time command in front of make to report the make time (useful for benchmark).
make # make -j 4 # time make
PS 1. On my raspberry pi machine, it shows R is now configured for armv6l-unknown-linux-gnueabihf and on Beaglebone black it shows R is now configured for armv7l-unknown-linux-gnueabihf.
PS 2. On my Beaglebone black, it took 2 hours to run 'make' and it only took 5 minutes to run 'make -j 12' on my Xeon W3690 @ 3.47Ghz (6 cores with hyperthread) based on R 3.1.2. The timing is obtained by using 'time' command as described above.
PS 3. On my x86 system, it shows
R is now configured for x86_64-unknown-linux-gnu Source directory: . Installation directory: /usr/local C compiler: gcc -std=gnu99 -g -O2 Fortran 77 compiler: gfortran -g -O2 C++ compiler: g++ -g -O2 Fortran 90/95 compiler: gfortran -g -O2 Obj-C compiler: Interfaces supported: X11, tcltk External libraries: readline, lzma Additional capabilities: PNG, JPEG, TIFF, NLS, cairo Options enabled: shared R library, shared BLAS, R profiling, Java Recommended packages: yes
[arm] However, make gave errors for recommanded packages like KernSmooth, MASS, boot, class, cluster, codetools, foreign, lattice, mgcv, nlme, nnet, rpart, spatial, and survival. The error stems from
gcc: SHLIB_LIBADD: No such file or directory. Note that I can get this error message even I try install.packages("MASS", type="source"). A suggested fix is here; adding perl = TRUE in sub() call for two lines in src/library/tools/R/install.R file. However, I got another error shared object 'MASS.so' not found. See also http://ftp.debian.org/debian/pool/main/r/r-base/. To build R without recommended packages like ./configure --without-recommended.
make[1]: Entering directory `/mnt/usb/R-2.15.2/src/library/Recommended' make[2]: Entering directory `/mnt/usb/R-2.15.2/src/library/Recommended' begin installing recommended package MASS * installing *source* package 'MASS' ... ** libs make[3]: Entering directory `/tmp/Rtmp4caBfg/R.INSTALL1d1244924c77/MASS/src' gcc -std=gnu99 -I/mnt/usb/R-2.15.2/include -DNDEBUG -I/usr/local/include -fpic -g -O2 -c MASS.c -o MASS.o gcc -std=gnu99 -I/mnt/usb/R-2.15.2/include -DNDEBUG -I/usr/local/include -fpic -g -O2 -c lqs.c -o lqs.o gcc -std=gnu99 -shared -L/usr/local/lib -o MASSSHLIB_EXT MASS.o lqs.o SHLIB_LIBADD -L/mnt/usb/R-2.15.2/lib -lR gcc: SHLIB_LIBADD: No such file or directory make[3]: *** [MASSSHLIB_EXT] Error 1 make[3]: Leaving directory `/tmp/Rtmp4caBfg/R.INSTALL1d1244924c77/MASS/src' ERROR: compilation failed for package 'MASS' * removing '/mnt/usb/R-2.15.2/library/MASS' make[2]: *** [MASS.ts] Error 1 make[2]: Leaving directory `/mnt/usb/R-2.15.2/src/library/Recommended' make[1]: *** [recommended-packages] Error 2 make[1]: Leaving directory `/mnt/usb/R-2.15.2/src/library/Recommended' make: *** [stamp-recommended] Error 2 root@debian:/mnt/usb/R-2.15.2# root@debian:/mnt/usb/R-2.15.2# bin/R R version 2.15.2 (2012-10-26) -- "Trick or Treat" Copyright (C) 2012 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: armv5tel-unknown-linux-gnueabi (32-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > library(MASS) Error in library(MASS) : there is no package called 'MASS' > library() Packages in library '/mnt/usb/R-2.15.2/library': base The R Base Package compiler The R Compiler Package datasets The R Datasets Package grDevices The R Graphics Devices and Support for Colours and Fonts graphics The R Graphics Package grid The Grid Graphics Package methods Formal Methods and Classes parallel Support for Parallel computation in R splines Regression Spline Functions and Classes stats The R Stats Package stats4 Statistical Functions using S4 Classes tcltk Tcl/Tk Interface tools Tools for Package Development utils The R Utils Package > Sys.info()["machine"] machine "armv5tel" > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 170369 4.6 350000 9.4 350000 9.4 Vcells 163228 1.3 905753 7.0 784148 6.0
See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=679180
PS 4. The complete log of building R from source is in here File:Build R log.txt
Full configuration
Interfaces supported: X11, tcltk External libraries: readline Additional capabilities: PNG, JPEG, TIFF, NLS, cairo Options enabled: shared R library, shared BLAS, R profiling, Java
Update: R 3.0.1 on Beaglebone Black (armv7a) + Ubuntu 13.04
See the page here.
Install all dependencies for building R
This is a comprehensive list. This list is even larger than r-base-dev.
root@debian:/mnt/usb/R-2.15.2# apt-get build-dep r-base Reading package lists... Done Building dependency tree Reading state information... Done The following packages will be REMOVED: libreadline5-dev The following NEW packages will be installed: bison ca-certificates ca-certificates-java debhelper defoma ed file fontconfig gettext gettext-base html2text intltool-debian java-common libaccess-bridge-java libaccess-bridge-java-jni libasound2 libasyncns0 libatk1.0-0 libaudit0 libavahi-client3 libavahi-common-data libavahi-common3 libblas-dev libblas3gf libbz2-dev libcairo2 libcairo2-dev libcroco3 libcups2 libdatrie1 libdbus-1-3 libexpat1-dev libflac8 libfontconfig1-dev libfontenc1 libfreetype6-dev libgif4 libglib2.0-dev libgtk2.0-0 libgtk2.0-common libice-dev libjpeg62-dev libkpathsea5 liblapack-dev liblapack3gf libnewt0.52 libnspr4-0d libnss3-1d libogg0 libopenjpeg2 libpango1.0-0 libpango1.0-common libpango1.0-dev libpcre3-dev libpcrecpp0 libpixman-1-0 libpixman-1-dev libpng12-dev libpoppler5 libpulse0 libreadline-dev libreadline6-dev libsm-dev libsndfile1 libthai-data libthai0 libtiff4-dev libtiffxx0c2 libunistring0 libvorbis0a libvorbisenc2 libxaw7 libxcb-render-util0 libxcb-render-util0-dev libxcb-render0 libxcb-render0-dev libxcomposite1 libxcursor1 libxdamage1 libxext-dev libxfixes3 libxfont1 libxft-dev libxi6 libxinerama1 libxkbfile1 libxmu6 libxmuu1 libxpm4 libxrandr2 libxrender-dev libxss-dev libxt-dev libxtst6 luatex m4 openjdk-6-jdk openjdk-6-jre openjdk-6-jre-headless openjdk-6-jre-lib openssl pkg-config po-debconf preview-latex-style shared-mime-info tcl8.5-dev tex-common texi2html texinfo texlive-base texlive-binaries texlive-common texlive-doc-base texlive-extra-utils texlive-fonts-recommended texlive-generic-recommended texlive-latex-base texlive-latex-extra texlive-latex-recommended texlive-pictures tk8.5-dev tzdata-java whiptail x11-xkb-utils x11proto-render-dev x11proto-scrnsaver-dev x11proto-xext-dev xauth xdg-utils xfonts-base xfonts-encodings xfonts-utils xkb-data xserver-common xvfb zlib1g-dev 0 upgraded, 136 newly installed, 1 to remove and 0 not upgraded. Need to get 139 MB of archives. After this operation, 410 MB of additional disk space will be used. Do you want to continue [Y/n]?
Instruction of installing a development version of R under Ubuntu
https://github.com/wch/r-source/wiki (works on Ubuntu 12.04)
or
https://github.com/wch/r-source/wiki/Ubuntu-build-instructions
Note that texi2dvi has to be installed first to avoid the following error. It is better to follow the Ubuntu instruction when we work on Ubuntu OS.
$ (cd doc/manual && make front-matter html-non-svn) creating RESOURCES /bin/bash: number-sections: command not found make: [../../doc/RESOURCES] Error 127 (ignored)
If we DO NOT use -depth option in git clone command, we can use git checkout SHA1 (40 characters) to get a certain version of code.
git checkout 8f4af29153671b87a48c7e46c09acbeade441b8b git checkout trunk # switch back to trunk
Minimal installation of R from source
Assume we have installed g++ (or build-essential) and gfortran (Ubuntu has only gcc pre-installed, but not g++),
sudo apt-get install build-essential gfortran
we can go ahead to build a minimal R.
wget http://cran.rstudio.com/src/base/R-3/R-3.1.1.tar.gz tar -xzvf R-3.1.1.tar.gz; cd R-3.1.1 ./configure --with-x=no --with-recommended-packages=no --with-readline=no
See ./configure --help. This still builds the essential packages like base, compiler, datasets, graphics, grDevices, grid, methods, parallel, splines, stats, stats4, tcltk, tools, and utils.
Note that at the end of 'make', it shows an error of 'cannot find any java interpreter. Please make sure java is on your PATH or set JAVA_HOME correspondingly'. Even with the error message, we can use R by typing bin/R.
To check whether we have Java installed, type 'java -version'.
$ java -version java version "1.6.0_32" OpenJDK Runtime Environment (IcedTea6 1.13.4) (6b32-1.13.4-4ubuntu0.12.04.2) OpenJDK 64-Bit Server VM (build 23.25-b01, mixed mode)
R CMD
- R CMD build someDirectory - create a package
- R CMD check somePackage_1.2-3.tar.gz - check a package
- R CMD INSTALL somePackage_1.2-3.tar.gz - install a package from its source
bin/R (shell script) and bin/exec/R (binary executable) on Linux OS
bin/R is just a shell script to launch bin/exec/R program. So if we try to run the following program
# test.R cat("-- reading arguments\n", sep = ""); cmd_args = commandArgs(); for (arg in cmd_args) cat(" ", arg, "\n", sep="");
from command line like
brb@brb-P45T-A:~/Downloads$ ~/R-3.0.1/bin/R --slave --no-save --no-restore --no-environ --silent --args arg1=abc < test.R -- reading arguments /home/brb/R-3.0.1/bin/exec/R --slave --no-save --no-restore --no-environ --silent --args arg1=abc
we can see R actually call bin/exec/R program.
CentOS 6.x
Install build-essential (make, gcc, gdb, ...).
su yum groupinstall "Development Tools" yum install kernel-devel kernel-headers
Install readline and X11 (probably not necessary if we use ./configure --with-x=no)
yum install readline-devel yum install libX11 libX11-devel libXt libXt-devel
Install libpng (already there) and libpng-devel library. This is for web application purpose because png (and possibly svg) is a standard and preferred graphics format. If we want to output different graphics formats, we have to follow the guide in R-admin manual to install extra graphics libraries in Linux.
yum install libpng-devel rpm -qa | grep "libpng" # make sure both libpng and libpng-devel exist.
Install Java. One possibility is to download from Oracle. We want to download jdk-7u45-linux-x64.rpm and jre-7u45-linux-x64.rpm (assume 64-bit OS).
rpm -Uvh jdk-7u45-linux-x64.rpm rpm -Uvh jre-7u45-linux-x64.rpm # Check java -version
Now we are ready to build R by using "./configure" and then "make" commands.
We can make R accessible from any directory by either run "make install" command or creating an R_HOME environment variable and export it to PATH environment variable, such as
export R_HOME="path to R" export PATH=$PATH:$R_HOME/bin
Web Applications
See also CRAN Task View: Web Technologies and Services
Create HTML5 web and slides
http://www.gastonsanchez.com/depot/knitr-slides. The HTML5 slides work on my IE 8 too.
HTML5 slides examples
- http://yihui.name/slides/knitr-slides.html
- http://yihui.name/slides/2012-knitr-RStudio.html
- http://yihui.name/slides/2011-r-dev-lessons.html#slide1
- http://inundata.org/R_talks/BARUG/#intro
Software requirement
- Rstudio
- knitr, XML, RCurl (See omegahat for installation on Ubuntu)
- pandoc package This is a command line tool. I am testing it on Windows 7.
Slide #22 gives an instruction to create
- regular html file by using RStudio -> Knit HTML button
- HTML5 slides by using pandoc from command line.
Files:
- Rcmd source: 009-slides.Rmd Note that IE 8 was not supported by github. For IE 9, be sure to turn off "Compatibility View".
- markdown output: 009-slides.md
- HTML output: 009-slides.html
We can create Rcmd source in Rstudio by File -> New -> R Markdown.
There are 4 ways to produce slides with pandoc
- S5
- DZSlides
- Slidy
- Slideous
Use the markdown file (md) and convert it with pandoc
pandoc -s -S -i -t dzslides --mathjax html5_slides.md -o html5_slides.html
If we are comfortable with HTML and CSS code, open the html file (generated by pandoc) and modify the CSS style at will.
Markdown language
According to wikipedia:
Markdown is a lightweight markup language, originally created by John Gruber with substantial contributions from Aaron Swartz, allowing people “to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML)”.
- Markup is a general term for content formatting - such as HTML - but markdown is a library that generates HTML markup.
- Nice summary from stackoverflow.com and more complete list from github.
- An example https://gist.github.com/jeromyanglim/2716336
- Convert mediawiki to markdown using online conversion tool from pandoc.
- R markdown file and use it in RStudio. Customizing Chunk Options can be found in knitr page and rpubs.com.
HTTP protocol
- http://en.wikipedia.org/wiki/File:Http_request_telnet_ubuntu.png
- Query string
- How to capture http header? Use curl -i en.wikipedia.org.
- Web Inspector. Build-in in Chrome. Right click on any page and choose 'Inspect Element'.
- Web server
- Simple TCP/IP web server
- HTTP Made Really Easy
- Illustrated Guide to HTTP
- nweb: a tiny, safe Web server with 200 lines
- Tiny HTTPd
An HTTP server is conceptually simple:
- Open port 80 for listening
- When contact is made, gather a little information (get mainly - you can ignore the rest for now)
- Translate the request into a file request
- Open the file and spit it back at the client
It gets more difficult depending on how much of HTTP you want to support - POST is a little more complicated, scripts, handling multiple requests, etc.
Example in R
> co <- socketConnection(port=8080, server=TRUE, blocking=TRUE) > # Now open a web browser and type http://localhost:8080/index.html > readLines(co,1) [1] "GET /index.html HTTP/1.1" > readLines(co,1) [1] "Host: localhost:8080" > readLines(co,1) [1] "User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:23.0) Gecko/20100101 Firefox/23.0" > readLines(co,1) [1] "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" > readLines(co,1) [1] "Accept-Language: en-US,en;q=0.5" > readLines(co,1) [1] "Accept-Encoding: gzip, deflate" > readLines(co,1) [1] "Connection: keep-alive" > readLines(co,1) [1] ""
Example in C (Very simple http server written in C, 187 lines)
Create a simple hello world html page and save it as <index.html> in the current directory (/home/brb/Downloads/)
Launch the server program (assume we have done gcc http_server.c -o http_server)
$ ./http_server -p 50002 Server started at port no. 50002 with root directory as /home/brb/Downloads
Secondly open a browser and type http://localhost:50002/index.html. The server will respond
GET /index.html HTTP/1.1 Host: localhost:50002 User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:23.0) Gecko/20100101 Firefox/23.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate Connection: keep-alive file: /home/brb/Downloads/index.html GET /favicon.ico HTTP/1.1 Host: localhost:50002 User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:23.0) Gecko/20100101 Firefox/23.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate Connection: keep-alive file: /home/brb/Downloads/favicon.ico GET /favicon.ico HTTP/1.1 Host: localhost:50003 User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:23.0) Gecko/20100101 Firefox/23.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate Connection: keep-alive file: /home/brb/Downloads/favicon.ico
The browser will show the page from <index.html> in server.
The only bad thing is the code does not close the port. For example, if I have use Ctrl+C to close the program and try to re-launch with the same port, it will complain socket() or bind(): Address already in use.
Another Example in C (55 lines)
http://mwaidyanatha.blogspot.com/2011/05/writing-simple-web-server-in-c.html
The response is embedded in the C code.
If we test the server program by opening a browser and type "http://localhost:15000/", the server received the follwing 7 lines
GET / HTTP/1.1 Host: localhost:15000 User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:23.0) Gecko/20100101 Firefox/23.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate Connection: keep-alive
If we include a non-executable file's name in the url, we will be able to download that file. Try "http://localhost:15000/client.c".
If we use telnet program to test, wee need to type anything we want
$ telnet localhost 15000 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. ThisCanBeAnything <=== This is what I typed in the client and it is also shown on server HTTP/1.1 200 OK <=== From here is what I got from server Content-length: 37Content-Type: text/html HTML_DATA_HERE_AS_YOU_MENTIONED_ABOVE <=== The html tags are not passed from server, interesting! Connection closed by foreign host. $
See also more examples under C page.
Others
- http://rosettacode.org/wiki/Hello_world/ (Different languages)
- http://kperisetla.blogspot.com/2012/07/simple-http-web-server-in-c.html (Windows web server)
- http://css.dzone.com/articles/web-server-c (handling HTTP GET request, handling content types(txt, html, jpg, zip. rar, pdf, php etc.), sending proper HTTP error codes, serving the files from a web root, change in web root in a config file, zero copy optimization using sendfile method and php file handling.)
- https://github.com/gtungatkar/Simple-HTTP-server
- https://github.com/davidmoreno/onion
shiny
The following is what we see on a browser after we run an example from shiny package. See http://rstudio.github.com/shiny/tutorial/#hello-shiny. Note that the R session needs to be on; i.e. R command prompt will not be returned unless we press Ctrl+C or ESC.
shiny depends on websockets, caTools, bitops, digest packages.
Q & A:
- Q: If we run runExample('01_hello') in Rserve from an R client, we can continue our work in R client without losing the functionality of the GUI from shiny. Question: how do we kill the job?
- If I run the example "01_hello", the browser only shows the control but not graph on Firefox? A: Use Chrome or Opera as the default browser.
- If I run the example "01_hello" on RHEL the first time, it works fine. But if I click 'Ctrl + C' to stop it and run it again, I got a message
Warning in .SOCK_SERVE(port) : R-Websockets(tcpserv): bind() failed. Error in createContext(port, webpage, is.binary = is.binary) : Unable to bind socket on port 8100; is it realsy in use?
A simple solution is to close R and open it again.
- Q: Deployment on web. A: Not ready yet. Shiny server platform is still under beta testing. Shiny apps are hosted using the R websockets package which acts more like a tcp server than a web server, and that architecture just doesn't fit with rApache, or even apache for that matter.
- Q: How difficult to put the code in Gist:github? A: Just create an account. Do not even need to create a repository. Just go to http://gist.github.com and create a new gist. The new gist can be secret or public. A secret gist can not be edited again after it is created although it works fine when it was used in runGist() function.
Deploy to run locally
Follow the instruction here, we can do as following (Tested on Windows OS)
- Create a desktop shortcut with target "C:\Program Files\R\R-3.0.2\bin\R.exe" -e "shiny::runExample('01_hello')" . We can name the shortcut as we like, e.g. R+shiny
- Double click the shortcut. The Windows Firewall will be popped up and say it block some features of the program. It does not matter if we choose Allow access or Cancel.
- Look at the command prompt window (black background console window), it will say something like
Listening on port 7510
at the last line of the console. - Open your browser (Chrome or Firefox works), and type the address http://localhost:7510. You will see something magic happen.
- If we don't want to play with it, we can close the browser and close the command console (hit 'x')too.
Deploy to run remotely -shiny server
If we want to deploy our shiny apps to WWW, we need to install shiny server.
Following the guide on here, shiny-server is up smoothly on my Ubuntu machine. After I run the command sudo gdebi shiny-server-0.4.0.8-amd64.deb, shiny-server is started. Thanks to upstart in Ubuntu, shiny-server is automatically started whenever the machine is started.
Each app directory needs to be copied to /srv/shiny-server/ directory using sudo.
The default port is 3838. That is, the remote computer can access the website using http://xxx.xxx.x.xx:3838/AppName.
Last but not the least, according to its web page, shiny-server is Experimental quality. Use at your own risk!.
shinydashboard
websocket
http://illposed.net/jsm2012.pdf
CentOS
http://blog.supstat.com/2014/05/install-rstudio-server-on-centos6-5/
Gallery
- http://www.showmeshiny.com/
- Example of using googleVis: http://shinyeoda.cloudapp.net/
- Integrate with Javascript: https://github.com/wch/shiny-jsdemo and https://github.com/trestletech/ShinyDash-Sample
- interactiveDisplay (Bioconductor package, there is a STOP Application button too): http://www.bioconductor.org/packages/release/bioc/html/interactiveDisplay.html
httpuv
http and WebSocket library.
RApache
gWidgetsWWW
- http://www.jstatsoft.org/v49/i10/paper
- gWidgetsWWW2 gWidgetsWWW based on Rook
- Compare shiny with gWidgetsWWW2.rapache
Rook
Since R 2.13, the internal web server was exposed.
Tutorual from useR2012 and Jeffrey Horner
Here is another one from http://www.rinfinance.com.
Rook is also supported by [rApache too. See http://rapache.net/manual.html.
Google group. https://groups.google.com/forum/?fromgroups#!forum/rrook
Advantage
- the web applications are created on desktop, whether it is Windows, Mac or Linux.
- No Apache is needed.
- create multiple applications at the same time. This complements the limit of rApache.
4 lines of code example.
library(Rook) s <- Rhttpd$new() s$start(quiet=TRUE) s$print() s$browse(1) # OR s$browse("RookTest")
Notice that after s$browse() command, the cursor will return to R because the command just a shortcut to open the web page http://127.0.0.1:10215/custom/RookTest.
We can add Rook application to the server; see ?Rhttpd.
s$add( app=system.file('exampleApps/helloworld.R',package='Rook'),name='hello' ) s$add( app=system.file('exampleApps/helloworldref.R',package='Rook'),name='helloref' ) s$add( app=system.file('exampleApps/summary.R',package='Rook'),name='summary' ) s$print() #Server started on 127.0.0.1:10221 #[1] RookTest http://127.0.0.1:10221/custom/RookTest #[2] helloref http://127.0.0.1:10221/custom/helloref #[3] summary http://127.0.0.1:10221/custom/summary #[4] hello http://127.0.0.1:10221/custom/hello # Stops the server but doesn't uninstall the app ## Not run: s$stop() ## End(Not run) s$remove(all=TRUE) rm(s)
For example, the interface and the source code of summary app are given below
app <- function(env) { req <- Rook::Request$new(env) res <- Rook::Response$new() res$write('Choose a CSV file:\n') res$write('<form method="POST" enctype="multipart/form-data">\n') res$write('<input type="file" name="data">\n') res$write('<input type="submit" name="Upload">\n</form>\n<br>') if (!is.null(req$POST())){ data <- req$POST()[['data']] res$write("<h3>Summary of Data</h3>"); res$write("<pre>") res$write(paste(capture.output(summary(read.csv(data$tempfile,stringsAsFactors=FALSE)),file=NULL),collapse='\n')) res$write("</pre>") res$write("<h3>First few lines (head())</h3>"); res$write("<pre>") res$write(paste(capture.output(head(read.csv(data$tempfile,stringsAsFactors=FALSE)),file=NULL),collapse='\n')) res$write("</pre>") } res$finish() }
More example:
- http://lamages.blogspot.com/2012/08/rook-rocks-example-with-googlevis.html
- Self-organizing map
- Deploy Rook apps with rApache. First one and two.
sumo
Sumo is a fully-functional web application template that exposes an authenticated user's R session within java server pages. See the paper http://journal.r-project.org/archive/2012-1/RJournal_2012-1_Bergsma+Smith.pdf.
Stockplot
FastRWeb
Rwui
CGHWithR and WebDevelopR
CGHwithR is still working with old version of R although it is removed from CRAN. Its successor is WebDevelopR. Its The vignette (year 2013) provides a review of several available methods.
manipulate from RStudio
This is not a web application. But the manipulate package can be used to create interactive plot within R(Studio) environment easily. Its source is available at here.
Mathematica also has manipulate function for plotting; see here.
RCloud
RCloud is an environment for collaboratively creating and sharing data analysis scripts. RCloud lets you mix analysis code in R, HTML5, Markdown, Python, and others. Much like Sage, iPython notebooks and Mathematica, RCloud provides a notebook interface that lets you easily record a session and annotate it with text, equations, and supporting images.
See also the Talk in UseR 2014.
Web page scraping
rvest package.
Send email
- mailR package
d3Network
library(d3Network) Source <- c("A", "A", "A", "A", "B", "B", "C", "C", "D") Target <- c("B", "C", "D", "J", "E", "F", "G", "H", "I") NetworkData <- data.frame(Source, Target) d3SimpleNetwork(NetworkData, height = 800, width = 1024, file="tmp.html")
Creating local repository for CRAN and Bioconductor (focus on Windows binary packages only)
How to set up a local repository
- CRAN specific: http://cran.r-project.org/mirror-howto.html
- Bioconductor specific: http://www.bioconductor.org/about/mirrors/mirror-how-to/
General guide: http://cran.r-project.org/doc/manuals/R-admin.html#Setting-up-a-package-repository
Utilities such as install.packages can be pointed at any CRAN-style repository, and R users may want to set up their own. The ‘base’ of a repository is a URL such as http://www.omegahat.org/R/: this must be an URL scheme that download.packages supports (which also includes ‘ftp://’ and ‘file://’, but not on most systems ‘https://’). Under that base URL there should be directory trees for one or more of the following types of package distributions:
- "source": located at src/contrib and containing .tar.gz files. Other forms of compression can be used, e.g. .tar.bz2 or .tar.xz files.
- "win.binary": located at bin/windows/contrib/x.y for R versions x.y.z and containing .zip files for Windows.
- "mac.binary.leopard": located at bin/macosx/leopard/contrib/x.y for R versions x.y.z and containing .tgz files.
Each terminal directory must also contain a PACKAGES file. This can be a concatenation of the DESCRIPTION files of the packages separated by blank lines, but only a few of the fields are needed. The simplest way to set up such a file is to use function write_PACKAGES in the tools package, and its help explains which fields are needed. Optionally there can also be a PACKAGES.gz file, a gzip-compressed version of PACKAGES—as this will be downloaded in preference to PACKAGES it should be included for large repositories. (If you have a mis-configured server that does not report correctly non-existent files you will need PACKAGES.gz.)
To add your repository to the list offered by setRepositories(), see the help file for that function.
A repository can contain subdirectories, when the descriptions in the PACKAGES file of packages in subdirectories must include a line of the form
Path: path/to/subdirectory
—once again write_PACKAGES is the simplest way to set this up.
Space requirement if we want to mirror WHOLE repository
- Whole CRAN takes about 92GB (rsync -avn cran.r-project.org::CRAN > ~/Downloads/cran).
- Bioconductor is big (> 64G for BioC 2.11). Please check the size of what will be transferred with e.g. (rsync -avn bioconductor.org::2.11 > ~/Downloads/bioc) and make sure you have enough room on your local disk before you start.
On the other hand, we if only care about Windows binary part, the space requirement is largely reduced.
- CRAN: 2.7GB
- Bioconductor: 28GB.
Misc notes
- If the binary package was built on R 2.15.1, then it cannot be installed on R 2.15.2. But vice is OK.
- Remember to issue "--delete" option in rsync, otherwise old version of package will be installed.
- The repository still need src directory. If it is missing, we will get an error
Warning: unable to access index for repository http://arraytools.no-ip.org/CRAN/src/contrib Warning message: package ‘glmnet’ is not available (for R version 2.15.2)
The error was given by available.packages() function.
To bypass the requirement of src directory, I can use
install.packages("glmnet", contriburl = contrib.url(getOption('repos'), "win.binary"))
but there may be a problem when we use biocLite() command.
I find a workaround. Since the error comes from missing CRAN/src directory, we just need to make sure the directory CRAN/src/contrib exists AND either CRAN/src/contrib/PACKAGES or CRAN/src/contrib/PACKAGES.gz exists.
To create CRAN repository
Before creating a local repository please give a dry run first. You don't want to be surprised how long will it take to mirror a directory.
Dry run (-n option). Pipe out the process to a text file for an examination.
rsync -avn cran.r-project.org::CRAN > crandryrun.txt
To mirror only partial repository, it is necessary to create directories before running rsync command.
cd mkdir -p ~/Rmirror/CRAN/bin/windows/contrib/2.15 rsync -rtlzv --delete cran.r-project.org::CRAN/bin/windows/contrib/2.15/ ~/Rmirror/CRAN/bin/windows/contrib/2.15 (one line with space before ~/Rmirror) # src directory is very large (~27GB) since it contains source code for each R version. # We just need the files PACKAGES and PACKAGES.gz in CRAN/src/contrib. So I comment out the following line. # rsync -rtlzv --delete cran.r-project.org::CRAN/src/ ~/Rmirror/CRAN/src/ mkdir -p ~/Rmirror/CRAN/src/contrib rsync -rtlzv --delete cran.r-project.org::CRAN/src/contrib/PACKAGES ~/Rmirror/CRAN/src/contrib/ rsync -rtlzv --delete cran.r-project.org::CRAN/src/contrib/PACKAGES.gz ~/Rmirror/CRAN/src/contrib/
And optionally
library(tools) write_PACKAGES("~/Rmirror/CRAN/bin/windows/contrib/2.15", type="win.binary")
and if we want to get src directory
rsync -rtlzv --delete cran.r-project.org::CRAN/src/contrib/*.tar.gz ~/Rmirror/CRAN/src/contrib/ rsync -rtlzv --delete cran.r-project.org::CRAN/src/contrib/2.15.3 ~/Rmirror/CRAN/src/contrib/
We can use du -h to check the folder size.
For example (as of 1/7/2013),
$ du -k ~/Rmirror --max-depth=1 --exclude ".*" | sort -nr | cut -f2 | xargs -d '\n' du -sh 30G /home/brb/Rmirror 28G /home/brb/Rmirror/Bioc 2.7G /home/brb/Rmirror/CRAN
To create Bioconductor repository
Dry run
rsync -avn bioconductor.org::2.11 > biocdryrun.txt
Then creates directories before running rsync.
cd mkdir -p ~/Rmirror/Bioc wget -N http://www.bioconductor.org/biocLite.R -P ~/Rmirror/Bioc
where -N is to overwrite original file if the size or timestamp change and -P in wget means an output directory, not a file name.
Optionally, we can add the following in order to see the Bioconductor front page.
rsync -zrtlv --delete bioconductor.org::2.11/BiocViews.html ~/Rmirror/Bioc/packages/2.11/ rsync -zrtlv --delete bioconductor.org::2.11/index.html ~/Rmirror/Bioc/packages/2.11/
The software part (aka bioc directory) installation:
cd mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/bin/windows mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/src rsync -zrtlv --delete bioconductor.org::2.11/bioc/bin/windows/ ~/Rmirror/Bioc/packages/2.11/bioc/bin/windows # Either rsync whole src directory or just essential files # rsync -zrtlv --delete bioconductor.org::2.11/bioc/src/ ~/Rmirror/Bioc/packages/2.11/bioc/src rsync -zrtlv --delete bioconductor.org::2.11/bioc/src/contrib/PACKAGES ~/Rmirror/Bioc/packages/2.11/bioc/src/contrib/ rsync -zrtlv --delete bioconductor.org::2.11/bioc/src/contrib/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/bioc/src/contrib/ # Optionally the html part mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/html rsync -zrtlv --delete bioconductor.org::2.11/bioc/html/ ~/Rmirror/Bioc/packages/2.11/bioc/html mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/vignettes rsync -zrtlv --delete bioconductor.org::2.11/bioc/vignettes/ ~/Rmirror/Bioc/packages/2.11/bioc/vignettes mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/news rsync -zrtlv --delete bioconductor.org::2.11/bioc/news/ ~/Rmirror/Bioc/packages/2.11/bioc/news mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/licenses rsync -zrtlv --delete bioconductor.org::2.11/bioc/licenses/ ~/Rmirror/Bioc/packages/2.11/bioc/licenses mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/manuals rsync -zrtlv --delete bioconductor.org::2.11/bioc/manuals/ ~/Rmirror/Bioc/packages/2.11/bioc/manuals mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/readmes rsync -zrtlv --delete bioconductor.org::2.11/bioc/readmes/ ~/Rmirror/Bioc/packages/2.11/bioc/readmes
and annotation (aka data directory) part:
mkdir -p ~/Rmirror/Bioc/packages/2.11/data/annotation/bin/windows mkdir -p ~/Rmirror/Bioc/packages/2.11/data/annotation/src/contrib # one line for each of the following rsync -zrtlv --delete bioconductor.org::2.11/data/annotation/bin/windows/ ~/Rmirror/Bioc/packages/2.11/data/annotation/bin/windows rsync -zrtlv --delete bioconductor.org::2.11/data/annotation/src/contrib/PACKAGES ~/Rmirror/Bioc/packages/2.11/data/annotation/src/contrib/ rsync -zrtlv --delete bioconductor.org::2.11/data/annotation/src/contrib/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/data/annotation/src/contrib/
and experiment directory:
mkdir -p ~/Rmirror/Bioc/packages/2.11/data/experiment/bin/windows/contrib/2.15 mkdir -p ~/Rmirror/Bioc/packages/2.11/data/experiment/src/contrib # one line for each of the following # Note that we are cheating by only downloading PACKAGES and PACKAGES.gz files rsync -zrtlv --delete bioconductor.org::2.11/data/experiment/bin/windows/contrib/2.15/PACKAGES ~/Rmirror/Bioc/packages/2.11/data/experiment/bin/windows/contrib/2.15/ rsync -zrtlv --delete bioconductor.org::2.11/data/experiment/bin/windows/contrib/2.15/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/data/experiment/bin/windows/contrib/2.15/ rsync -zrtlv --delete bioconductor.org::2.11/data/experiment/src/contrib/PACKAGES ~/Rmirror/Bioc/packages/2.11/data/experiment/src/contrib/ rsync -zrtlv --delete bioconductor.org::2.11/data/experiment/src/contrib/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/data/experiment/src/contrib/
and extra directory:
mkdir -p ~/Rmirror/Bioc/packages/2.11/extra/bin/windows/contrib/2.15 mkdir -p ~/Rmirror/Bioc/packages/2.11/extra/src/contrib # one line for each of the following # Note that we are cheating by only downloading PACKAGES and PACKAGES.gz files rsync -zrtlv --delete bioconductor.org::2.11/extra/bin/windows/contrib/2.15/PACKAGES ~/Rmirror/Bioc/packages/2.11/extra/bin/windows/contrib/2.15/ rsync -zrtlv --delete bioconductor.org::2.11/extra/bin/windows/contrib/2.15/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/extra/bin/windows/contrib/2.15/ rsync -zrtlv --delete bioconductor.org::2.11/extra/src/contrib/PACKAGES ~/Rmirror/Bioc/packages/2.11/extra/src/contrib/ rsync -zrtlv --delete bioconductor.org::2.11/extra/src/contrib/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/extra/src/contrib/
To test local repository
Create soft links in Apache server
su ln -s /home/brb/Rmirror/CRAN /var/www/html/CRAN ln -s /home/brb/Rmirror/Bioc /var/www/html/Bioc ls -l /var/www/html
The soft link mode should be 777.
To test CRAN
Replace the host name arraytools.no-ip.org by IP address 10.133.2.111 if necessary.
r <- getOption("repos"); r["CRAN"] <- "http://arraytools.no-ip.org/CRAN" options(repos=r) install.packages("glmnet")
We can test if the backup server is working or not by installing a package which was removed from the CRAN. For example, 'ForImp' was removed from CRAN in 11/8/2012, but I still a local copy built on R 2.15.2 (run rsync on 11/6/2012).
r <- getOption("repos"); r["CRAN"] <- "http://cran.r-project.org" r <- c(r, BRB='http://arraytools.no-ip.org/CRAN') # CRAN CRANextra BRB # "http://cran.r-project.org" "http://www.stats.ox.ac.uk/pub/RWin" "http://arraytools.no-ip.org/CRAN" options(repos=r) install.packages('ForImp')
Note by default, CRAN mirror is selected interactively.
> getOption("repos") CRAN CRANextra "@CRAN@" "http://www.stats.ox.ac.uk/pub/RWin"
To test Bioconductor
# CRAN part: r <- getOption("repos"); r["CRAN"] <- "http://arraytools.no-ip.org/CRAN" options(repos=r) # Bioconductor part: options("BioC_mirror" = "http://arraytools.no-ip.org/Bioc") source("http://bioconductor.org/biocLite.R") # This source biocLite.R line can be placed either before or after the previous 2 lines biocLite("aCGH")
If there is a connection problem, check folder attributes.
chmod -R 755 ~/CRAN/bin
- Note that if a binary package was created for R 2.15.1, then it can be installed under R 2.15.1 but not R 2.15.2. The R console will show package xxx is not available (for R version 2.15.2).
- For binary installs, the function also checks for the availability of a source package on the same repository, and reports if the source package has a later version, or is available but no binary version is.
So for example, if the mirror does not have contents under src directory, we need to run the following line in order to successfully run install.packages() function.
options(install.packages.check.source = "no")
- If we only mirror the essential directories, we can run biocLite() successfully. However, the R console will give some warning
> biocLite("aCGH") BioC_mirror: http://arraytools.no-ip.org/Bioc Using Bioconductor version 2.11 (BiocInstaller 1.8.3), R version 2.15. Installing package(s) 'aCGH' Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/data/experiment/src/contrib Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/extra/src/contrib Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/data/experiment/bin/windows/contrib/2.15 Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/extra/bin/windows/contrib/2.15 trying URL 'http://arraytools.no-ip.org/Bioc/packages/2.11/bioc/bin/windows/contrib/2.15/aCGH_1.36.0.zip' Content type 'application/zip' length 2431158 bytes (2.3 Mb) opened URL downloaded 2.3 Mb package ‘aCGH’ successfully unpacked and MD5 sums checked The downloaded binary packages are in C:\Users\limingc\AppData\Local\Temp\Rtmp8IGGyG\downloaded_packages Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/data/experiment/bin/windows/contrib/2.15 Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/extra/bin/windows/contrib/2.15 > library()
CRAN repository directory structure
The information below is specific to R 2.15.2. There are linux and macosx subdirecotries whenever there are windows subdirectory.
bin/winows/contrib/2.15 src/contrib /contrib/2.15.2 /contrib/Archive web/checks /dcmeta /packages /views
A clickable map [1]
Bioconductor repository directory structure
The information below is specific to Bioc 2.11. There are linux and macosx subdirecotries whenever there are windows subdirectory.
bioc/bin/windows/contrib/2.15 /html /install /license /manuals /news /src /vignettes data/annotation/bin/windows/contrib/2.15 /html /licenses /manuals /src /vignettes /experiment/bin/windows/contrib/2.15 /html /manuals /src/contrib /vignettes extra/bin/windows/contrib /html /src /vignettes
List all R packages from CRAN/Bioconductor
Check my daily result based on R 2.15 and Bioc 2.11 in [2]
Parallel Computing
- Example code for the book Parallel R by McCallum and Weston.
- An introduction to distributed memory parallelism in R and C
- Processing: When does it worth?
Windows Security Warning
It seems it is safe to choose 'Cancel' when Windows Firewall tried to block R program when we use makeCluster() to create a socket cluster.
library(parallel) cl <- makeCluster(2) clusterApply(cl, 1:2, get("+"), 3) stopCluster(cl)
If we like to see current firewall settings, just click Windows Start button, search 'Firewall' and choose 'Windows Firewall with Advanced Security'. In the 'Inbound Rules', we can see what programs (like, R for Windows GUI front-end, or Rserve) are among the rules. These rules are called 'private' in the 'Profile' column. Note that each of them may appear twice because one is 'TCP' protocol and the other one has a 'UDP' protocol.
parallel package
Parallel package was included in R 2.14.0. It is derived from the snow and multicore packages and provides many of the same functions as those packages.
The parallel package provides several *apply functions for R users to quickly modify their code using parallel computing.
- makeCluster(makePSOCKcluster, makeForkCluster), stopCluster. Other cluster types are passed to package snow.
- clusterCall, clusterEvalQ, clusterSplit
- clusterApply, clusterApplyLB
- clusterExport
- clusterMap
- parLapply, parSapply, parApply, parRapply, parCapply
- parLapplyLB, parSapplyLB (load balance version)
- clusterSetRNGStream, nextRNGStream, nextRNGSubStream
Examples (See ?clusterApply)
library(parallel) cl <- makeCluster(2, type = "SOCK") clusterApply(cl, 1:2, function(x) x*3) # OR clusterApply(cl, 1:2, get("*"), 3) # [[1]] # [1] 3 # # [[2]] # [1] 6 parSapply(cl, 1:20, get("+"), 3) # [1] 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 stopCluster(cl)
snow package
Supported cluster types are "SOCK", "PVM", "MPI", and "NWS".
multicore package
This package is removed from CRAN.
Consider using package ‘parallel’ instead.
foreach package
This package depends on one of the following
- doParallel - Foreach parallel adaptor for the parallel package
- doSNOW - Foreach parallel adaptor for the snow package
- doMC - Foreach parallel adaptor for the multicore package
- doMPI - Foreach parallel adaptor for the Rmpi package
- doRedis - Foreach parallel adapter for the rredis package
as a backend.
library(foreach) library(doParallel) m <- matrix(rnorm(9), 3, 3) cl <- makeCluster(2, type = "SOCK") registerDoParallel(cl) foreach(i=1:nrow(m), .combine=rbind) %dopar% (m[i,] / mean(m[i,])) stopCluster(cl)
snowfall package
Rmpi package
Some examples/tutorials
- http://trac.nchc.org.tw/grid/wiki/R-MPI_Install
- http://www.arc.vt.edu/resources/software/r/index.php
- https://www.sharcnet.ca/help/index.php/Using_R_and_MPI
- http://math.acadiau.ca/ACMMaC/Rmpi/examples.html
- http://www.umbc.edu/hpcf/resources-tara/how-to-run-R.html
- Ryan Rosario
- http://pj.freefaculty.org/guides/Rcourse/parallel-1/parallel-1.pdf
- * http://biowulf.nih.gov/apps/R.html
Cloud Computing
Install R on Amazon EC2
http://randyzwitch.com/r-amazon-ec2/
Bioconductor on Amazon EC2
http://www.bioconductor.org/help/bioconductor-cloud-ami/
Big Data Analysis
http://blog.comsysto.com/2013/02/14/my-favorite-community-links/
Useful R packages
RInside
- http://dirk.eddelbuettel.com/code/rinside.html
- http://dirk.eddelbuettel.com/papers/rfinance2010_rcpp_rinside_tutorial_handout.pdf
Ubuntu
With RInside, R can be embedded in a graphical application. For example, $HOME/R/x86_64-pc-linux-gnu-library/3.0/RInside/examples/qt directory includes source code of a Qt application to show a kernel density plot with various options like kernel functions, bandwidth and an R command text box to generate the random data. See my demo on Youtube. I have tested this qtdensity example successfully using Qt 4.8.5.
- Follow the instruction cairoDevice to install required libraries for cairoDevice package and then cairoDevice itself.
- Install Qt. Check 'qmake' command becomes available by typing 'whereis qmake' or 'which qmake' in terminal.
- Open Qt Creator from Ubuntu start menu/Launcher. Open the project file $HOME/R/x86_64-pc-linux-gnu-library/3.0/RInside/examples/qt/qtdensity.pro in Qt Creator.
- Under Qt Creator, hit 'Ctrl + R' or the big green triangle button on the lower-left corner to build/run the project. If everything works well, you shall see the interactive program qtdensity appears on your desktop.
With RInside + Wt web toolkit installed, we can also create a web application. To demonstrate the example in examples/wt directory, we can do
cd ~/R/x86_64-pc-linux-gnu-library/3.0/RInside/examples/wt make sudo ./wtdensity --docroot . --http-address localhost --http-port 8080
Then we can go to the browser's address bar and type http://localhost:8080 to see how it works (a screenshot is in here).
Windows 7
To make RInside works on Windows OS, try the following
- Make sure R is installed under C:\ instead of C:\Program Files if we don't want to get an error like g++.exe: error: Files/R/R-3.0.1/library/RInside/include: No such file or directory.
- Install RTools
- Instal RInside package from source (the binary version will give an error )
- Create a DOS batch file containing necessary paths in PATH environment variable
@echo off set PATH=C:\Rtools\bin;c:\Rtools\gcc-4.6.3\bin;%PATH% set PATH=C:\R\R-3.0.1\bin\i386;%PATH% set PKG_LIBS=`Rscript -e "Rcpp:::LdFlags()"` set PKG_CPPFLAGS=`Rscript -e "Rcpp:::CxxFlags()"` set R_HOME=C:\R\R-3.0.1 echo Setting environment for using R cmd
In the Windows command prompt, run
cd C:\R\R-3.0.1\library\RInside\examples\standard make -f Makefile.win
Now we can test by running any of executable files that make generates. For example, rinside_sample0.
rinside_sample0
As for the Qt application qdensity program, we need to make sure the same version of MinGW was used in building RInside/Rcpp and Qt. See some discussions in
- http://stackoverflow.com/questions/12280707/using-rinside-with-qt-in-windows
- http://www.mail-archive.com/[email protected]/msg04377.html
So the Qt and Wt web tool applications on Windows may or may not be possible.
Qt and R
Hadoop (eg ~100 terabytes)
See also HighPerformanceComputing
- RHadoop
- Hive
- MapReduce. Introduction by Linux Journal.
- http://www.techspritz.com/category/tutorials/hadoopmapredcue/ Single node or multinode cluster setup using Ubuntu with VirtualBox (Excellent)
- Running Hadoop on Ubuntu Linux (Single-Node Cluster)
- Ubuntu 12.04 http://www.youtube.com/watch?v=WN2tJk_oL6E and instruction
- Linux Mint http://blog.hackedexistence.com/installing-hadoop-single-node-on-linux-mint
- http://www.r-bloggers.com/search/hadoop
RHadoop
- RDataMining.com based on Mac.
- Ubuntu 12.04 - Crishantha.com, nikhilshah123sh.blogspot.com.Bighadoop.wordpress contains an example.
- RapReduce in R by RevolutionAnalytics with a few examples.
- https://twitter.com/hashtag/rhadoop
- Bigd8ta.com based on Ubuntu 14.04.
Snowdoop: an alternative to MapReduce algorithm
- http://matloff.wordpress.com/2014/11/26/how-about-a-snowdoop-package/
- http://matloff.wordpress.com/2014/12/26/snowdooppartools-update/comment-page-1/#comment-665
XML
On Ubuntu, we need to install libxml2-dev before we can install XML package.
sudo apt-get update sudo apt-get install libxml2-dev
On CentOS,
yum -y install libxml2 libxml2-devel
RCurl
On Ubuntu, we need to install one package
sudo apt-get install libcurl4-openssl-dev
DirichletMultinomial
On Ubuntu, we do
sudo apt-get install libgsl0-dev
Create GUI
gWidgets
GenOrd: Generate ordinal and discrete variables with given correlation matrix and marginal distributions
rjson
http://heuristically.wordpress.com/2013/05/20/geolocate-ip-addresses-in-r/
RJSONIO
Plot IP on google map
- http://thebiobucket.blogspot.com/2011/12/some-fun-with-googlevis-plotting-blog.html#more (RCurl, RJONIO, plyr, googleVis)
- http://devblog.icans-gmbh.com/using-the-maxmind-geoip-api-with-r/ (RCurl, RJONIO, maps)
- http://cran.r-project.org/web/packages/geoPlot/index.html (geoPlot package (deprecated as 8/12/2013))
- http://archive09.linux.com/feature/135384 (Not R) ApacheMap
- http://batchgeo.com/features/geolocation-ip-lookup/ (Not R) (Enter a spreadsheet of adress, city, zip or a column of IPs and it will show the location on google map)
- http://code.google.com/p/apachegeomap/
The following example is modified from the first of above list.
require(RJSONIO) # fromJSON require(RCurl) # getURL temp = getURL("https://gist.github.com/arraytools/6743826/raw/23c8b0bc4b8f0d1bfe1c2fad985ca2e091aeb916/ip.txt", ssl.verifypeer = FALSE) ip <- read.table(textConnection(temp), as.is=TRUE) names(ip) <- "IP" nr = nrow(ip) Lon <- as.numeric(rep(NA, nr)) Lat <- Lon Coords <- data.frame(Lon, Lat) ip2coordinates <- function(ip) { api <- "http://freegeoip.net/json/" get.ips <- getURL(paste(api, URLencode(ip), sep="")) # result <- ldply(fromJSON(get.ips), data.frame) result <- data.frame(fromJSON(get.ips)) names(result)[1] <- "ip.address" return(result) } for (i in 1:nr){ cat(i, "\n") try( Coords[i, 1:2] <- ip2coordinates(ip$IP[i])[c("longitude", "latitude")] ) } # append to log-file: logfile <- data.frame(ip, Lat = Coords$Lat, Long = Coords$Lon, LatLong = paste(round(Coords$Lat, 1), round(Coords$Lon, 1), sep = ":")) log_gmap <- logfile[!is.na(logfile$Lat), ] require(googleVis) # gvisMap gmap <- gvisMap(log_gmap, "LatLong", options = list(showTip = TRUE, enableScrollWheel = TRUE, mapType = 'hybrid', useMapTypeControl = TRUE, width = 1024, height = 800)) plot(gmap)
The plot.gvis() method in googleVis packages also teaches the startDynamicHelp() function in the tools package, which was used to launch a http server. See Jeffrey Horner's note about deploying Rook App.
googleVis
See an example from RJSONIO above.
Rcpp
Use Rcpp in RStudio
RStudio makes it easy to use Rcpp package.
Open RStudio, click New File -> C++ File. It will create a C++ template on the RStudio editor
#include <Rcpp.h> using namespace Rcpp; // Below is a simple example of exporting a C++ function to R. You can // source this function into an R session using the Rcpp::sourceCpp // function (or via the Source button on the editor toolbar) // For more on using Rcpp click the Help button on the editor toolbar // [[Rcpp::export]] int timesTwo(int x) { return x * 2; }
Now in R console, type
library(Rcpp) sourceCpp("~/Downloads/timesTwo.cpp") timesTwo(9) # [1] 18
See more examples on http://adv-r.had.co.nz/Rcpp.html.
If we wan to test Boost library, we can try it in RStudio. Consider the following example in stackoverflow.com.
// [[Rcpp::depends(BH)]] #include <Rcpp.h> #include <boost/foreach.hpp> #include <boost/math/special_functions/gamma.hpp> #define foreach BOOST_FOREACH using namespace boost::math; //[[Rcpp::export]] Rcpp::NumericVector boost_gamma( Rcpp::NumericVector x ) { foreach( double& elem, x ) { elem = boost::math::tgamma(elem); }; return x; }
Then the R console
boost_gamma(0:10 + 1) # [1] 1 1 2 6 24 120 720 5040 40320 # [10] 362880 3628800 identical( boost_gamma(0:10 + 1), factorial(0:10) ) # [1] TRUE
Example 1. convolution example
First, Rcpp package should be installed (I am working on Linux system). Next we try one example shipped in Rcpp package.
PS. If R was not available in global environment (such as built by ourselves), we need to modify 'Makefile' file by replacing 'R' command with its complete path (4 places).
cd ~/R/x86_64-pc-linux-gnu-library/3.0/Rcpp/examples/ConvolveBenchmarks/ make R
Then type the following in an R session to see how it works. Note that we don't need to issue library(Rcpp) in R.
dyn.load("convolve3_cpp.so") x <- .Call("convolve3cpp", 1:3, 4:6) x # 4 13 28 27 18
If we have our own cpp file, we need to use the following way to create dynamic loaded library file. Note that the character (grave accent) ` is not (single quote)'. If you mistakenly use ', it won't work.
export PKG_CXXFLAGS=`Rscript -e "Rcpp:::CxxFlags()"` export PKG_LIBS=`Rscript -e "Rcpp:::LdFlags()"` R CMD SHILB xxxx.cpp
Example 2. Use together with inline package
library(inline) src <-' Rcpp::NumericVector xa(a); Rcpp::NumericVector xb(b); int n_xa = xa.size(), n_xb = xb.size(); Rcpp::NumericVector xab(n_xa + n_xb - 1); for (int i = 0; i < n_xa; i++) for (int j = 0; j < n_xb; j++) xab[i + j] += xa[i] * xb[j]; return xab; ' fun <- cxxfunction(signature(a = "numeric", b = "numeric"), src, plugin = "Rcpp") fun(1:3, 1:4) # [1] 1 4 10 16 17 12
Example 3. Calling an R function
RcppParallel
caret
Read/Write Excel files package
- xlsx: depends on Java
- openxlsx: not depend on Java. Depend on zip application. On Windows, it seems to be OK without installing Rtools. But it can not read xls file; it works on xlsx file.
ggplot2
Some examples:
Data Manipulation
stringr and plyr
http://martinsbioblogg.wordpress.com/2013/03/24/using-r-reading-tables-that-need-a-little-cleaning/
A data.frame is pretty much a list of vectors, so we use plyr to apply over the list and stringr to search and replace in the vectors.
reshape2
Use acast() function in reshape2 package. It will convert data.frame used for analysis to a table-like data.frame good for display.
magrittr
Instead of nested statements, it is using pipe operator %>%. So the code is easier to read. Impressive!
jpeg
If we want to create the image on this wiki left hand side panel, we can use jpeg package to read an existing plot and then edit and save it.
cairoDevice
PS. Not sure the advantage of functions in this package compared to R's functions (eg. Cairo_svg() vs svg()) or even Cairo package.
For ubuntu OS, we need to install 2 libraries and 1 R package RGtk2.
sudo apt-get install libgtk2.0-dev libcairo2-dev
On Windows OS, we may got the error: unable to load shared object 'C:/Program Files/R/R-3.0.2/library/cairoDevice/libs/x64/cairoDevice.dll' . We need to follow the instruction in here.
iterators
Iterator is useful over for-loop if the data is already a collection. It can be used to iterate over a vector, data frame, matrix, file
Iterator can be combined to use with foreach package http://www.exegetic.biz/blog/2013/11/iterators-in-r/ has more elaboration.
colortools
Tools that allow users generate color schemes and palettes
tkrplot
On Ubuntu, we need to install tk packages, such as by
sudo apt-get install tk-dev
rex
Friendly Regular Expressions
Different ways of using R
R call C/C++
Mainly talks about .C() and .Call().
- R-Extension manual of course.
- http://faculty.washington.edu/kenrice/sisg-adv/sisg-07.pdf
- http://www.stat.berkeley.edu/scf/paciorek-cppWorkshop.pdf (Very useful)
- http://www.stat.harvard.edu/ccr2005/
- http://mazamascience.com/WorkingWithData/?p=1099
Embedding R
- See Writing for R Extensions Manual Chapter 8.
- Talk by Simon Urbanek in UseR 2004.
- Technical report by Friedrich Leisch in 2007.
- https://stat.ethz.ch/pipermail/r-help/attachments/20110729/b7d86ed7/attachment.pl
An very simple example (do not return from shell) from Writing R Extensions manual
The command-line R front-end, R_HOME/bin/exec/R, is one such example. Its source code is in file <src/main/Rmain.c>.
This example can be run by
R_HOME/bin/R CMD R_HOME/bin/exec/R
Note:
- R_HOME/bin/exec/R is the R binary. However, it couldn't be launched directly unless R_HOME and LD_LIBRARY_PATH are set up. Again, this is explained in Writing R Extension manual.
- R_HOME/bin/R is a shell-script front-end where users can invoke it. It sets up the environment for the executable. It can be copied to /usr/local/bin/R. When we run R_HOME/bin/R, it actually runs R_HOME/bin/R CMD R_HOME/bin/exec/R (see line 259 of R_HOME/bin/R as in R 3.0.2) so we know the important role of R_HOME/bin/exec/R.
More examples of embedding can be found in tests/Embedding directory. Read <index.html> for more information about these test examples.
An example from Bioconductor workshop
- What is covered in this section is different from Create and use a standalone Rmath library.
- Use eval() function. See R-Ext 8.1 and 8.2 and 5.11.
- http://stackoverflow.com/questions/2463437/r-from-c-simplest-possible-helloworld (obtained from searching R_tryEval on google)
- http://stackoverflow.com/questions/7457635/calling-r-function-from-c
Example: Create <embed.c> file
#include <Rembedded.h> #include <Rdefines.h> static void doSplinesExample(); int main(int argc, char *argv[]) { Rf_initEmbeddedR(argc, argv); doSplinesExample(); Rf_endEmbeddedR(0); return 0; } static void doSplinesExample() { SEXP e, result; int errorOccurred; // create and evaluate 'library(splines)' PROTECT(e = lang2(install("library"), mkString("splines"))); R_tryEval(e, R_GlobalEnv, &errorOccurred); if (errorOccurred) { // handle error } UNPROTECT(1); // 'options(FALSE)' ... PROTECT(e = lang2(install("options"), ScalarLogical(0))); // ... modified to 'options(example.ask=FALSE)' (this is obscure) SET_TAG(CDR(e), install("example.ask")); R_tryEval(e, R_GlobalEnv, NULL); UNPROTECT(1); // 'example("ns")' PROTECT(e = lang2(install("example"), mkString("ns"))); R_tryEval(e, R_GlobalEnv, &errorOccurred); UNPROTECT(1); }
Then build the executable. Note that I don't need to create R_HOME variable.
cd tar xzvf cd R-3.0.1 ./configure --enable-R-shlib make cd tests/Embedding make ~/R-3.0.1/bin/R CMD ./Rtest nano embed.c # Using a single line will give an error and cannot not show the real problem. # ../../bin/R CMD gcc -I../../include -L../../lib -lR embed.c # A better way is to run compile and link separately gcc -I../../include -c embed.c gcc -o embed embed.o -L../../lib -lR -lRblas ../../bin/R CMD ./embed
Note that if we want to call the executable file ./embed directly, we shall set up R environment by specifying R_HOME variable and including the directories used in linking R in LD_LIBRARY_PATH. This is based on the inform provided by Writing R Extensions.
export R_HOME=/home/brb/Downloads/R-3.0.2 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/brb/Downloads/R-3.0.2/lib ./embed # No need to include R CMD in front.
Question: Create a data frame in C? Answer: Use data.frame() via an eval() call from C. Or see the code is stats/src/model.c, as part of model.frame.default. Or using Rcpp as here.
Reference http://bioconductor.org/help/course-materials/2012/Seattle-Oct-2012/AdvancedR.pdf
Create a Simple Socket Server in R
This example is coming from this paper.
Create an R function
simpleServer <- function(port=6543) { sock <- socketConnection ( port=port , server=TRUE) on.exit(close( sock )) cat("\nWelcome to R!\nR>" ,file=sock ) while(( line <- readLines ( sock , n=1)) != "quit") { cat(paste("socket >" , line , "\n")) out<- capture.output (try(eval(parse(text=line )))) writeLines ( out , con=sock ) cat("\nR> " ,file =sock ) } }
Then run simpleServer(). Open another terminal and try to communicate with the server
$ telnet localhost 6543 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. Welcome to R! R> summary(iris[, 3:5]) Petal.Length Petal.Width Species Min. :1.000 Min. :0.100 setosa :50 1st Qu.:1.600 1st Qu.:0.300 versicolor:50 Median :4.350 Median :1.300 virginica :50 Mean :3.758 Mean :1.199 3rd Qu.:5.100 3rd Qu.:1.800 Max. :6.900 Max. :2.500 R> quit Connection closed by foreign host.
Rserve
Note the way of launching Rserve is like the way we launch C program when R was embedded in C. See Call R from C/C++ or Example from Bioconductor workshop.
See my Rserve page.
(Commercial) StatconnDcom
R.NET
RJava
RCaller
RApache
littler
http://dirk.eddelbuettel.com/code/littler.html
Difference between Rscript and littler
RInside: Embed R in C++
See RInside
(From RInside documentation) The RInside package makes it easier to embed R in your C++ applications. There is no code you would execute directly from the R environment. Rather, you write C++ programs that embed R which is illustrated by some the included examples.
The included examples are armadillo, eigen, mpi, qt, standard, threads and wt.
To run 'make' when we don't have a global R, we should modify the file <Makefile>. Also if we just want to create one executable file, we can do, for example, 'make rinside_sample1'.
To run any executable program, we need to specify LD_LIBRARY_PATH variable, something like
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/brb/Downloads/R-3.0.2/lib
The real build process looks like (check <Makefile> for completeness)
g++ -I/home/brb/Downloads/R-3.0.2/include \ -I/home/brb/Downloads/R-3.0.2/library/Rcpp/include \ -I/home/brb/Downloads/R-3.0.2/library/RInside/include -g -O2 -Wall \ -I/usr/local/include \ rinside_sample0.cpp \ -L/home/brb/Downloads/R-3.0.2/lib -lR -lRblas -lRlapack \ -L/home/brb/Downloads/R-3.0.2/library/Rcpp/lib -lRcpp \ -Wl,-rpath,/home/brb/Downloads/R-3.0.2/library/Rcpp/lib \ -L/home/brb/Downloads/R-3.0.2/library/RInside/lib -lRInside \ -Wl,-rpath,/home/brb/Downloads/R-3.0.2/library/RInside/lib \ -o rinside_sample0
Hello World example of embedding R in C++.
#include <RInside.h> // for the embedded R via RInside int main(int argc, char *argv[]) { RInside R(argc, argv); // create an embedded R instance R["txt"] = "Hello, world!\n"; // assign a char* (string) to 'txt' R.parseEvalQ("cat(txt)"); // eval the init string, ignoring any returns exit(0); }
The above can be compared to the Hello world example in Qt.
#include <QApplication.h> #include <QPushButton.h> int main( int argc, char **argv ) { QApplication app( argc, argv ); QPushButton hello( "Hello world!", 0 ); hello.resize( 100, 30 ); app.setMainWidget( &hello ); hello.show(); return app.exec(); }
RFortran
RFortran is an open source project with the following aim:
To provide an easy to use Fortran software library that enables Fortran programs to transfer data and commands to and from R.
It works only on Windows platform with Microsoft Visual Studio installed:(
Call R from other languages
JRI
ryp2
http://rpy.sourceforge.net/rpy2.html
Create a standalone Rmath library
R has many math and statistical functions. We can easily use these functions in our C/C++/Fortran. The definite guide of doing this is on Chapter 9 "The standalone Rmath library" of R-admin manual.
Here is my experience based on R 3.0.2 on Windows OS.
Create a static library <libRmath.a> and a dynamic library <Rmath.dll>
Suppose we have downloaded R source code and build R from its source. See Build_R_from_its_source. Then the following 2 lines will generate files <libRmath.a> and <Rmath.dll> under C:\R\R-3.0.2\src\nmath\standalone directory.
cd C:\R\R-3.0.2\src\nmath\standalone make -f Makefile.win
Use Rmath library in our code
set CPLUS_INCLUDE_PATH=C:\R\R-3.0.2\src\include set LIBRARY_PATH=C:\R\R-3.0.2\src\nmath\standalone # It is not LD_LIBRARY_PATH in above. # Created <RmathEx1.cpp> from the book "Statistical Computing in C++ and R" web site # http://math.la.asu.edu/~eubank/CandR/ch4Code.cpp # It is OK to save the cpp file under any directory. # Force to link against the static library <libRmath.a> g++ RmathEx1.cpp -lRmath -lm -o RmathEx1.exe # OR g++ RmathEx1.cpp -Wl,-Bstatic -lRmath -lm -o RmathEx1.exe # Force to link against dynamic library <Rmath.dll> g++ RmathEx1.cpp Rmath.dll -lm -o RmathEx1Dll.exe
Test the executable program. Note that the executable program RmathEx1.exe can be transferred to and run in another computer without R installed. Isn't it cool!
c:\R>RmathEx1 Enter a argument for the normal cdf: 1 Enter a argument for the chi-squared cdf: 1 Prob(Z <= 1) = 0.841345 Prob(Chi^2 <= 1)= 0.682689
Below is the cpp program <RmathEx1.cpp>.
//RmathEx1.cpp #define MATHLIB_STANDALONE #include <iostream> #include "Rmath.h" using std::cout; using std::cin; using std::endl; int main() { double x1, x2; cout << "Enter a argument for the normal cdf:" << endl; cin >> x1; cout << "Enter a argument for the chi-squared cdf:" << endl; cin >> x2; cout << "Prob(Z <= " << x1 << ") = " << pnorm(x1, 0, 1, 1, 0) << endl; cout << "Prob(Chi^2 <= " << x2 << ")= " << pchisq(x2, 1, 1, 0) << endl; return 0; }
Calling R.dll directly
See Chapter 8.2.2 of R Extensions. This is related to embedding R under Windows. The file <R.dll> on Windows is like <libR.so> on Linux.
Create HTML 5 web and slides
See here
Create presentation file (beamer)
- http://rmarkdown.rstudio.com/beamer_presentation_format.html
- http://www.theresearchkitchen.com/archives/1017 (markdown and presentation files)
- http://rmarkdown.rstudio.com/
- Create Rmd file first in Rstudio by File -> R markdown. Select Presentation > choose pdf (beamer) as output format.
- Edit the template created by RStudio.
- Click 'Knit pdf' button (Ctrl+Shift+k) to create/display the pdf file.
An example of Rmd is
--- title: "My Example" author: You Know Me date: Dec 32, 2014 output: beamer_presentation --- ## R Markdown This is an R Markdown presentation. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>. When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. ## Slide with Bullets - Bullet 1 - Bullet 2 - Bullet 3. Mean is $\frac{1}{n} \sum_{i=1}^n x_i$. $$ \mu = \frac{1}{n} \sum_{i=1}^n x_i $$ ## New slide ![picture of BDGE](/home/brb/Pictures/BDGEFinished.png) ## Slide with R Code and Output ```{r} summary(cars) ``` ## Slide with Plot ```{r, echo=FALSE} plot(cars) ```
Create HTML report
ReportingTools (Jason Hackney) from Bioconductor.
htmlTable package
http://gforge.se/2014/01/fast-track-publishing-using-knitr-part-iv/
==== htmltab package This package is NOT used to CREATE html report but EXTRACT html table.
ztable package
Makes zebra-striped tables (tables with alternating row colors) in LaTeX and HTML formats easily from a data.frame, matrix, lm, aov, anova, glm or coxph objects.
Create academic report
reports package in CRAN and in github repository. The youtube video gives an overview of the package.
Create pdf file
# Idea: # knitr pdflatex # rnw -------> tex ----------> pdf library(knitr) knit("example.rnw") # create example.tex file
- A very simple example <002-minimal.Rnw> from yihui.name works fine on linux.
- <knitr-minimal.Rnw>. I have no problem to create pdf file on Windows but still cannot generate pdf on Linux from tex file. Some people suggested to run sudo apt-get install texlive-fonts-recommended to install missing fonts. It works!
To see some real examples, check out DESeq2 package (vignettes subdirectory).
Or starts with markdown file. Download the example <001-minimal.Rmd> and remove the last line of getting png file from internet.
# Idea: # knitr pandoc # rmd -------> md ----------> pdf R -e "library(knitr); knit('001-minimal.Rmd')" pandoc 001-minimal.md -o 001-minimal.pdf
Create Word report
knitr + pandoc
- http://www.r-statistics.com/2013/03/write-ms-word-document-using-r-with-as-little-overhead-as-possible/
- http://www.carlboettiger.info/2012/04/07/writing-reproducibly-in-the-open-with-knitr.html
It is better to create rmd file in RStudio. Rstudio provides a template for rmd file and it also provides a quick reference to R markdown language.
# Idea: # knitr pandoc # rmd -------> md --------> docx library(knitr) knit2html("example.rmd") #Create md and html files
and then
FILE <- "example" system(paste0("pandoc -o ", FILE, ".docx ", FILE, ".md"))
Note. For example reason, if I play around the above 2 commands for several times, the knit2html() does not work well. However, if I click 'Knit HTML' button on the RStudio, it then works again.
Another way is
library(pander) name = "demo" knit(paste0(name, ".Rmd"), encoding = "utf-8") Pandoc.brew(file = paste0(name, ".md"), output = paste0(-name, "docx"), convert = "docx")
Note that once we have used knitr command to create a md file, we can use pandoc shell command to convert it to different formats:
- A pdf file: pandoc -s report.md -t latex -o report.pdf
- A html file: pandoc -s report.md -o report.html (with the -c flag html files can be added easily)
- Openoffice: pandoc report.md -o report.odt
- Word docx: pandoc report.md -o report.docx
We can also create the epub file for reading on Kobo ereader. For example, download this file and save it as example.Rmd. I need to remove the line containing the link to http://i.imgur.com/RVNmr.jpg since it creates an error when I run pandoc (not sure if it is the pandoc version I have is too old). Now we just run these 2 lines to get the epub file. Amazing!
knit("example.Rmd") pandoc("example.md", format="epub")
pander
Try pandoc[1] with a minimal reproducible example, you might give a try to my "pander" package [2] too:
library(pander) Pandoc.brew(system.file('examples/minimal.brew', package='pander'), output = tempfile(), convert = 'docx')
Where the content of the "minimal.brew" file is something you might have got used to with Sweave - although it's using "brew" syntax instead. See the examples of pander [3] for more details. Please note that pandoc should be installed first, which is pretty easy on Windows.
- http://johnmacfarlane.net/pandoc/
- http://rapporter.github.com/pander/
- http://rapporter.github.com/pander/#examples
R2wd
Use R2wd package. However, only 32-bit R is allowed and sometimes it can not produce all 'table's.
> library(R2wd) > wdGet() Loading required package: rcom Loading required package: rscproxy rcom requires a current version of statconnDCOM installed. To install statconnDCOM type installstatconnDCOM() This will download and install the current version of statconnDCOM You will need a working Internet connection because installation needs to download a file. Error in if (wdapp[["Documents"]][["Count"]] == 0) wdapp[["Documents"]]$Add() : argument is of length zero
The solution is to launch 32-bit R instead of 64-bit R since statconnDCOM does not support 64-bit R.
Convert from pdf to word
The best rendering of advanced tables is done by converting from pdf to Word. See http://biostat.mc.vanderbilt.edu/wiki/Main/SweaveConvert
rtf
Use rtf package for Rich Text Format (RTF) Output.
xtable
Package xtable will produce html output. If you save the file and then open it with Word, you will get serviceable results. I've had better luck copying the output from xtable and pasting it into Excel.
ReporteRs
Microsoft Word, Microsoft Powerpoint and HTML documents generation from R. The source code is hosted on https://github.com/davidgohel/ReporteRs
COM client or server
Client
RDCOMClient where excel.link depends on it.
Server
Use R under proxy
http://support.rstudio.org/help/kb/faq/configuring-r-to-use-an-http-proxy
What is the best place to save Rconsole on Windows platform
Put it in C:/Users/USERNAME/Documents folder so no matter how R was upgraded/downgraded, it always find my preference.
Web scraping
http://www.slideshare.net/schamber/web-data-from-r#btnNext
pubmed.mineR
Text mining of PubMed Abstracts (http://www.ncbi.nlm.nih.gov/pubmed). The algorithms are designed for two formats (text and XML) from PubMed.
Launch Rstudio
If multiple versions of R was detected, Rstudio can not be launched successfully. A java-like clock will be spinning without a stop. The trick is to click Ctrl key and click the Rstudio at the same time. After done that, it will show up a selection of R to choose from.
List files using regular expression
- Extension
list.files(pattern = "\\.txt$")
where the dot (.) is a metacharacter. It is used to refer to any character.
- Start with
list.files(pattern = "^Something")
Using Sys.glob()"' as
> Sys.glob("~/Downloads/*.txt") [1] "/home/brb/Downloads/ip.txt" "/home/brb/Downloads/valgrind.txt"
Hidden tool: rsync in Rtools
c:\Rtools\bin>rsync -avz "/cygdrive/c/users/limingc/Downloads/a.exe" "/cygdrive/c/users/limingc/Documents/" sending incremental file list a.exe sent 323142 bytes received 31 bytes 646346.00 bytes/sec total size is 1198416 speedup is 3.71 c:\Rtools\bin>
And rsync works best when we need to sync folder.
c:\Rtools\bin>rsync -avz "/cygdrive/c/users/limingc/Downloads/binary" "/cygdrive/c/users/limingc/Documents/" sending incremental file list binary/ binary/Eula.txt binary/cherrytree.lnk binary/depends64.chm binary/depends64.dll binary/depends64.exe binary/mtputty.exe binary/procexp.chm binary/procexp.exe binary/pscp.exe binary/putty.exe binary/sqlite3.exe binary/wget.exe sent 4115294 bytes received 244 bytes 1175868.00 bytes/sec total size is 8036311 speedup is 1.95 c:\Rtools\bin>rm c:\users\limingc\Documents\binary\procexp.exe cygwin warning: MS-DOS style path detected: c:\users\limingc\Documents\binary\procexp.exe Preferred POSIX equivalent is: /cygdrive/c/users/limingc/Documents/binary/procexp.exe CYGWIN environment variable option "nodosfilewarning" turns off this warning. Consult the user's guide for more details about POSIX paths: http://cygwin.com/cygwin-ug-net/using.html#using-pathnames c:\Rtools\bin>rsync -avz "/cygdrive/c/users/limingc/Downloads/binary" "/cygdrive/c/users/limingc/Documents/" sending incremental file list binary/ binary/procexp.exe sent 1767277 bytes received 35 bytes 3534624.00 bytes/sec total size is 8036311 speedup is 4.55 c:\Rtools\bin>
Unforunately, if the destination is a network drive, I could get a permission denied (13) error. See also http://superuser.com/questions/69620/rsync-file-permissions-on-windows
Install rgdal package on ubuntu
sudo apt-get install libgdal1-dev libproj-dev R > install.packages("rgdal")
Set up Emacs on Windows
Edit the file C:\Program Files\GNU Emacs 23.2\site-lisp\site-start.el with something like
(setq-default inferior-R-program-name "c:/program files/r/r-2.15.2/bin/i386/rterm.exe")
Database
RMySQL
RSQLite
MongoDB
- http://www.r-bloggers.com/r-and-mongodb/
- http://watson.nci.nih.gov/~sdavis/blog/rmongodb-using-R-with-mongo/
Github
R source
- https://github.com/wch/r-source/ Daily update, interesting, should be visited every day. Clicking 1000+ commits to look at daily changes.
- https://github.com/SurajGupta/r-source (update for each R release)
github
https://github.com/languages/R
Send local repository to Github in R by using reports package
http://www.youtube.com/watch?v=WdOI_-aZV0Y
My collection
- https://github.com/arraytools
- https://gist.github.com/4383351 heatmap using leukemia data
- https://gist.github.com/4382774 heatmap using sequential data
- https://gist.github.com/4484270 biocLite
How to download
Clone ~ Download.
- Command line
git clone https://gist.github.com/4484270.git
This will create a subdirectory called '4484270' with all cloned files there.
- Within R
library(devtools) source_gist("4484270")
or First download the json file from
https://api.github.com/users/MYUSERLOGIN/gists
and then
library(RJSONIO) x <- fromJSON("~/Downloads/gists.json") setwd("~/Downloads/") gist.id <- lapply(x, "[[", "id") lapply(gist.id, function(x){ cmd <- paste0("git clone https://gist.github.com/", x, ".git") system(cmd) })
Connect R with Arduino
- http://lamages.blogspot.com/2012/10/connecting-real-world-to-r-with-arduino.html
- http://jean-robert.github.io/2012/11/11/thermometer-R-using-Arduino-Java.html
- http://bio7.org/?p=2049
- http://www.rforge.net/Arduino/svn.html
Android App
- R Instructor $4.84
- Statistical Distribution (Not R related app)
Amazing plots
3D plot
Using persp function to create the following plot.
### Random pattern # Create matrix with random values with dimension of final grid rand <- rnorm(441, mean=0.3, sd=0.1) mat.rand <- matrix(rand, nrow=21) # Create another matrix for the colors. Start by making all cells green fill <- matrix("green3", nr = 21, nc = 21) # Change colors in each cell based on corresponding mat.rand value fcol <- fill fcol[] <- terrain.colors(40)[cut(mat.rand, stats::quantile(mat.rand, seq(0,1, len = 41), na.rm=T), include.lowest = TRUE)] # Create concave surface using expontential function x <- -10:10 y <- x^2 y <- as.matrix(y) y1 <- y for(i in 1:20){tmp <- cbind(y,y1); y1 <- tmp[,1]; y <- tmp;} mat <- tmp[1:21, 1:21] # Plot it up! persp(1:21, 1:21, t(mat)/10, theta = 90, phi = 35,col=fcol, scale = FALSE, axes = FALSE, box = FALSE) ### Organized pattern # Same as before rand <- rnorm(441, mean=0.3, sd=0.1) # Create concave surface using expontential function x <- -10:10 y <- x^2 y <- as.matrix(y) for(i in 1:20){tmp <- cbind(y,y); y1 <- tmp[,1]; y <- tmp;} mat <- tmp[1:21, 1:21] ###Organize rand by y and put into matrix form o <- order(rand,as.vector(mat)) o.tmp <- cbind(rand[o], rev(sort(as.vector(mat)))) mat.org <- matrix(o.tmp[,1], nrow=21) half.1 <- mat.org[,seq(1,21,2)] half.2 <- mat.org[,rev(seq(2,20,2))] full <- cbind(half.1, half.2) full <- t(full) # Again, create color matrix and populate using rand values zi <- full[-1, -1] + full[-1, -21] + full[-21,-1] + full[-21, -21] fill <- matrix("green3", nr = 20, nc = 20) fcol <- fill fcol[] <- terrain.colors(40)[cut(zi, stats::quantile(zi, seq(0,1, len = 41), na.rm=T), include.lowest = TRUE)] # Plot it up! persp(1:21, 1:21, t(mat)/10, theta = 90, phi = 35,col=t(fcol), scale = FALSE, axes = FALSE, box = FALSE)
Christmas tree
http://wiekvoet.blogspot.com/2014/12/merry-christmas.html
# http://blogs.sas.com/content/iml/2012/12/14/a-fractal-christmas-tree/ # Each row is a 2x2 linear transformation # Christmas tree L <- matrix( c(0.03, 0, 0 , 0.1, 0.85, 0.00, 0.00, 0.85, 0.8, 0.00, 0.00, 0.8, 0.2, -0.08, 0.15, 0.22, -0.2, 0.08, 0.15, 0.22, 0.25, -0.1, 0.12, 0.25, -0.2, 0.1, 0.12, 0.2), nrow=4) # ... and each row is a translation vector B <- matrix( c(0, 0, 0, 1.5, 0, 1.5, 0, 0.85, 0, 0.85, 0, 0.3, 0, 0.4), nrow=2) prob = c(0.02, 0.6,.08, 0.07, 0.07, 0.07, 0.07) # Iterate the discrete stochastic map N = 1e5 #5 # number of iterations x = matrix(NA,nrow=2,ncol=N) x[,1] = c(0,2) # initial point k <- sample(1:7,N,prob,replace=TRUE) # values 1-7 for (i in 2:N) x[,i] = crossprod(matrix(L[,k[i]],nrow=2),x[,i-1]) + B[,k[i]] # iterate # Plot the iteration history png('card.png') par(bg='darkblue',mar=rep(0,4)) plot(x=x[1,],y=x[2,], col=grep('green',colors(),value=TRUE), axes=FALSE, cex=.1, xlab='', ylab='' )#,pch='.') bals <- sample(N,20) points(x=x[1,bals],y=x[2,bals]-.1, col=c('red','blue','yellow','orange'), cex=2, pch=19 ) text(x=-.7,y=8, labels='Merry', adj=c(.5,.5), srt=45, vfont=c('script','plain'), cex=3, col='gold' ) text(x=0.7,y=8, labels='Christmas', adj=c(.5,.5), srt=-45, vfont=c('script','plain'), cex=3, col='gold' )
Tricks
Change default R repository
Edit global Rprofile file. On *NIX platforms, it's located in /usr/lib/R/library/base/R/Rprofile although local .Rprofile settings take precedence.
For example, I can specify the R mirror I like by creating a single line <.Rprofile> file under my home directory.
options(repos = "http://cran.rstudio.com/")
Query about an R package
packageDescription("MASS") packageVersion("MASS") packageStatus() # Summarize information about installed packages installed.packages() available.packages()
The 'available.packages()' command is useful for understanding package dependency. Use setRepositories() to select repositories and options()$repos to check or change the repositories.
> packageStatus() Number of installed packages: ok upgrade unavailable C:/Program Files/R/R-3.0.1/library 110 0 1 Number of available packages (each package counted only once): installed not installed http://watson.nci.nih.gov/cran_mirror/bin/windows/contrib/3.0 76 4563 http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/3.0 0 5 http://www.bioconductor.org/packages/2.12/bioc/bin/windows/contrib/3.0 16 625 http://www.bioconductor.org/packages/2.12/data/annotation/bin/windows/contrib/3.0 4 686 > tmp <- available.packages() > str(tmp) chr [1:5975, 1:17] "A3" "ABCExtremes" "ABCp2" "ACCLMA" "ACD" "ACNE" "ADGofTest" "ADM3" "AER" ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:5975] "A3" "ABCExtremes" "ABCp2" "ACCLMA" ... ..$ : chr [1:17] "Package" "Version" "Priority" "Depends" ... > tmp[1:3,] Package Version Priority Depends Imports LinkingTo Suggests A3 "A3" "0.9.2" NA "xtable, pbapply" NA NA "randomForest, e1071" ABCExtremes "ABCExtremes" "1.0" NA "SpatialExtremes, combinat" NA NA NA ABCp2 "ABCp2" "1.1" NA "MASS" NA NA NA Enhances License License_is_FOSS License_restricts_use OS_type Archs MD5sum NeedsCompilation File A3 NA "GPL (>= 2)" NA NA NA NA NA NA NA ABCExtremes NA "GPL-2" NA NA NA NA NA NA NA ABCp2 NA "GPL-2" NA NA NA NA NA NA NA Repository A3 "http://watson.nci.nih.gov/cran_mirror/bin/windows/contrib/3.0" ABCExtremes "http://watson.nci.nih.gov/cran_mirror/bin/windows/contrib/3.0" ABCp2 "http://watson.nci.nih.gov/cran_mirror/bin/windows/contrib/3.0"
And the following commands find which package depends on Rcpp and also which are from bioconductor repository.
> pkgName <- "Rcpp" > rownames(tmp)[grep(pkgName, tmp[,"Depends"])] > tmp[grep("Rcpp", tmp[,"Depends"]), "Depends"] > ind <- intersect(grep(pkgName, tmp[,"Depends"]), grep("bioconductor", tmp[, "Repository"])) > rownames(grep)[ind] NULL > rownames(tmp)[ind] [1] "ddgraph" "DESeq2" "GeneNetworkBuilder" "GOSemSim" "GRENITS" [6] "mosaics" "mzR" "pcaMethods" "Rdisop" "Risa" [11] "rTANDEM"
Editor
http://en.wikipedia.org/wiki/R_(programming_language)#Editors_and_IDEs
- Emacs + ESS.
- Rstudio - editor/R terminal/R graphics/file browser/package manager. The new version (0.98) also provides a new feature for debugging step-by-step.
- geany - I like the feature that it shows defined functions on the side panel even for R code. RStudio can also do this (see the bottom of the code panel).
- Rgedit which includes a feature of splitting screen into two panes and run R in the bottom panel. See here.
- Komodo IDE with browser preview http://www.youtube.com/watch?v=wv89OOw9roI at 4:06 and http://docs.activestate.com/komodo/4.4/editor.html
GUI for Data Analysis
Rcmdr
http://cran.r-project.org/web/packages/Rcmdr/index.html
Deducer
http://cran.r-project.org/web/packages/Deducer/index.html
Create a new R package, namespace, documentation
- http://cran.r-project.org/doc/contrib/Leisch-CreatingPackages.pdf (highly recommend)
- https://stat.ethz.ch/pipermail/r-devel/2013-July/066975.html
- Benefit of import in a namespace
- This youtube video from Tyler Rinker teaches how to use RStudio to develop an R package and also use Git to do version control. Very useful!
Create R package with devtools and roxygen2
A useful post by Jacob Montgomery. Do watch the youtube video there.
The process requires 3 components: RStudio software, devtools and roxygen2 (creating documentation from R code) packages.
Minimal R package for submission
https://stat.ethz.ch/pipermail/r-devel/2013-August/067257.html and CRAN Repository Policy.
Vectorization
http://www.noamross.net/blog/2014/4/16/vectorization-in-r--why.html
Mean of duplicated rows
> attach(mtcars) > head(mtcars) mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 > aggdata <-aggregate(mtcars, by=list(cyl,vs), + FUN=mean, na.rm=TRUE) > print(aggdata) Group.1 Group.2 mpg cyl disp hp drat wt qsec vs 1 4 0 26.00000 4 120.30 91.0000 4.430000 2.140000 16.70000 0 2 6 0 20.56667 6 155.00 131.6667 3.806667 2.755000 16.32667 0 3 8 0 15.10000 8 353.10 209.2143 3.229286 3.999214 16.77214 0 4 4 1 26.73000 4 103.62 81.8000 4.035000 2.300300 19.38100 1 5 6 1 19.12500 6 204.55 115.2500 3.420000 3.388750 19.21500 1 am gear carb 1 1.0000000 5.000000 2.000000 2 1.0000000 4.333333 4.666667 3 0.1428571 3.285714 3.500000 4 0.7000000 4.000000 1.500000 5 0.0000000 3.500000 2.500000 > detach(mtcars)
Apply family
Vectorize, aggregate, apply, by, eapply, lapply, mapply, replicate, scale, sapply, split, tapply, and vapply. Check out this.
Examples of using lapply + split on a data frame. See rollingyours.wordpress.com.
However, apply is just a wrap of a loop. The performance is not better than a for loop. See
- http://tolstoy.newcastle.edu.au/R/help/06/05/27255.html (answered by Brian Ripley)
- https://stat.ethz.ch/pipermail/r-help/2014-October/422455.html (has one example)
The package 'pbapply' creates a text-mode progress bar - it works on any platforms. On Windows platform, check out this post. It uses winProgressBar() and setWinProgressBar() functions.
plyr and dplyr packages
Paper in J. Stat Software.
A quick introduction to plyr with a summary of apply functions in R and compare them with functions in plyr package.
- plyr has a common syntax -- easier to remember
- plyr requires less code since it takes care of the input and output format
- plyr can easily be run in parallel -- faster
A video of dplyr package can be found on vimeo.
A hands-on tutorial from dataschool.io.
llply()
llply is equivalent to lapply except that it will preserve labels and can display a progress bar. This is handy if we want to do a crazy thing.
LLID2GOIDs <- lapply(rLLID, function(x) get("org.Hs.egGO")[[x]])
where rLLID is a list of entrez ID. For example,
get("org.Hs.egGO")[["6772"]]
returns a list of 49 GOs.
ddply()
http://lamages.blogspot.com/2012/06/transforming-subsets-of-data-in-r-with.html
ldply()
An R Script to Automatically download PubMed Citation Counts By Year of Publication
mclapply() from paralle package is a mult-core version of lapply()
Note that Windows OS can not take advantage of it.
Another choice for Windows OS is to use parLapply() function in parallel package.
ncores <- as.integer( Sys.getenv('NUMBER_OF_PROCESSORS') ) cl <- makeCluster(getOption("cl.cores", ncores)) LLID2GOIDs2 <- parLapply(cl, rLLID, function(x) { library(org.Hs.eg.db); get("org.Hs.egGO")[[x]]} ) stopCluster(cl)
It does work. Cut the computing time from 100 sec to 29 sec on 4 cores.
regular expression
- ?grep, ?regexpr in R
- http://biostat.mc.vanderbilt.edu/wiki/pub/Main/SvetlanaEdenRFiles/regExprTalk.pdf
- http://www.johndcook.com/r_language_regex.html
- http://en.wikibooks.org/wiki/R_Programming/Text_Processing#Regular_Expressions
- http://www.endmemo.com/program/R/grep.php
- http://rpubs.com/Lionel/19068
- http://ucfagls.wordpress.com/2012/08/15/processing-sample-labels-using-regular-expressions-in-r/
- http://www.dummies.com/how-to/content/how-to-use-regular-expressions-in-r.html
- http://www.r-bloggers.com/example-8-27-using-regular-expressions-to-read-data-with-variable-number-of-words-in-a-field/
- http://www.r-bloggers.com/using-regular-expressions-in-r-case-study-in-cleaning-a-bibtex-database/
- http://cbio.ensmp.fr/~thocking/papers/2011-08-16-directlabels-and-regular-expressions-for-useR-2011/2011-useR-named-capture-regexp.pdf
- http://stackoverflow.com/questions/5214677/r-find-the-last-dot-in-a-string
- http://stackoverflow.com/questions/10294284/remove-all-special-characters-from-a-string-in-r
Not specific to R
- http://www.autohotkey.com/docs/misc/RegEx-QuickRef.htm
- http://opencompany.org/download/regex-cheatsheet.pdf
Examples
- grep("\\.zip$", pkgs) or grep("\\.tar.gz$", pkgs) will search for the string ending with zip or tar.gz
- grep("9.11", string) will search for the string containing '9', any character (to split 9 & 11) and '11'.
- pipe metacharacter; it is translated to 'or'. flood|fire will match strings containing floor or fire.
- [^?.]$ will match anyone not ending with the question mark or period.
- ^[Gg]ood|[Bb]ad will match strings starting with Good/good and anywhere containing Bad/bad.
- ^([Gg]ood|[Bb]ad) will look for strings beginning with Good/good/Bad/bad.
- ? character; it means optional. [Gg]eorge( [Ww]\.)? [Bb]ush will match strings like 'george bush', 'George W. Bush' or 'george bushes'. Note that we escape the metacharacter dot by '\.' so it becomes a literal period.
- star and plus sign. star means any number including none and plus means at least one. For example, (.*) matches 'abc(222 )' and '()'.
- [0-9]+ (.*) [0-9]+ will match one number and following by any number of characters and a number; e.g. 'afda1080 p' and '4 by 5 size'.
- {} refers to as interval quantifiers; specify the minimum and maximum number of match of an expression.
Clipboard
source("clipboard") read.table("clipboard")
read/manipulate binary data
- x <- readBin(fn, raw(), file.info(fn)$size)
- rawToChar(x[1:16])
- See Biostrings C API
string data manipulation
ebook by Gaston Sanchez.
read/download/source a file from internet
Simple text file http
retail <- read.csv("http://robjhyndman.com/data/ausretail.csv",header=FALSE)
Zip file and url() function
con = gzcon(url('http://www.systematicportfolio.com/sit.gz', 'rb')) source(con) close(con)
Here url() function is like file(), gzfile(), bzfile(), xzfile(), unz(), pipe(), fifo(), socketConnection(). They are used to create connections. By default, the connection is not opened (except for ‘socketConnection’), but may be opened by setting a non-empty value of argument ‘open’. See ?url.
Another example of using url() is
load(url("http:/www.example.com/example.RData"))
downloader package
This package provides a wrapper for the download.file function, making it possible to download files over https on Windows, Mac OS X, and other Unix-like platforms. The RCurl package provides this functionality (and much more) but can be difficult to install because it must be compiled with external dependencies. This package has no external dependencies, so it is much easier to install.
Google drive file based on https using RCurl package
require(RCurl) myCsv <- getURL("https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AkuuKBh0jM2TdGppUFFxcEdoUklCQlJhM2kweGpoUUE&single=true&gid=0&output=csv") read.csv(textConnection(myCsv))
Github files https using RCurl package
- http://support.rstudio.org/help/kb/faq/configuring-r-to-use-an-http-proxy
- http://tonybreyal.wordpress.com/2011/11/24/source_https-sourcing-an-r-script-from-github/
x = getURL("https://gist.github.com/arraytools/6671098/raw/c4cb0ca6fe78054da8dbe253a05f7046270d5693/GeneIDs.txt", ssl.verifypeer = FALSE) read.table(text=x)
Create publication tables using tables package
See p13 for example in http://www.ianwatson.com.au/stata/tabout_tutorial.pdf
R's tables packages is the best solution. For example,
> library(tables) > tabular( (Species + 1) ~ (n=1) + Format(digits=2)* + (Sepal.Length + Sepal.Width)*(mean + sd), data=iris ) Sepal.Length Sepal.Width Species n mean sd mean sd setosa 50 5.01 0.35 3.43 0.38 versicolor 50 5.94 0.52 2.77 0.31 virginica 50 6.59 0.64 2.97 0.32 All 150 5.84 0.83 3.06 0.44 > str(iris) 'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
and
# This example shows some of the less common options > Sex <- factor(sample(c("Male", "Female"), 100, rep=TRUE)) > Status <- factor(sample(c("low", "medium", "high"), 100, rep=TRUE)) > z <- rnorm(100)+5 > fmt <- function(x) { s <- format(x, digits=2) even <- ((1:length(s)) %% 2) == 0 s[even] <- sprintf("(%s)", s[even]) s } > tabular( Justify(c)*Heading()*z*Sex*Heading(Statistic)*Format(fmt())*(mean+sd) ~ Status ) Status Sex Statistic high low medium Female mean 4.88 4.96 5.17 sd (1.20) (0.82) (1.35) Male mean 4.45 4.31 5.05 sd (1.01) (0.93) (0.75)
See also a collection of R packages related to reproducible research in http://cran.r-project.org/web/views/ReproducibleResearch.html
Create flat tables in R console using ftable()
> ftable(Titanic, row.vars = 1:3) Survived No Yes Class Sex Age 1st Male Child 0 5 Adult 118 57 Female Child 0 1 Adult 4 140 2nd Male Child 0 11 Adult 154 14 Female Child 0 13 Adult 13 80 3rd Male Child 35 13 Adult 387 75 Female Child 17 14 Adult 89 76 Crew Male Child 0 0 Adult 670 192 Female Child 0 0 Adult 3 20 > ftable(Titanic, row.vars = 1:2, col.vars = "Survived") Survived No Yes Class Sex 1st Male 118 62 Female 4 141 2nd Male 154 25 Female 13 93 3rd Male 422 88 Female 106 90 Crew Male 670 192 Female 3 20 > ftable(Titanic, row.vars = 2:1, col.vars = "Survived") Survived No Yes Sex Class Male 1st 118 62 2nd 154 25 3rd 422 88 Crew 670 192 Female 1st 4 141 2nd 13 93 3rd 106 90 Crew 3 20 > str(Titanic) table [1:4, 1:2, 1:2, 1:2] 0 0 35 0 0 0 17 0 118 154 ... - attr(*, "dimnames")=List of 4 ..$ Class : chr [1:4] "1st" "2nd" "3rd" "Crew" ..$ Sex : chr [1:2] "Male" "Female" ..$ Age : chr [1:2] "Child" "Adult" ..$ Survived: chr [1:2] "No" "Yes" > x <- ftable(mtcars[c("cyl", "vs", "am", "gear")]) > x gear 3 4 5 cyl vs am 4 0 0 0 0 0 1 0 0 1 1 0 1 2 0 1 0 6 1 6 0 0 0 0 0 1 0 2 1 1 0 2 2 0 1 0 0 0 8 0 0 12 0 0 1 0 0 2 1 0 0 0 0 1 0 0 0 > ftable(x, row.vars = c(2, 4)) cyl 4 6 8 am 0 1 0 1 0 1 vs gear 0 3 0 0 0 0 12 0 4 0 0 0 2 0 0 5 0 1 0 1 0 2 1 3 1 0 2 0 0 0 4 2 6 2 0 0 0 5 0 1 0 0 0 0 > > ## Start with expressions, use table()'s "dnn" to change labels > ftable(mtcars$cyl, mtcars$vs, mtcars$am, mtcars$gear, row.vars = c(2, 4), dnn = c("Cylinders", "V/S", "Transmission", "Gears")) Cylinders 4 6 8 Transmission 0 1 0 1 0 1 V/S Gears 0 3 0 0 0 0 12 0 4 0 0 0 2 0 0 5 0 1 0 1 0 2 1 3 1 0 2 0 0 0 4 2 6 2 0 0 0 5 0 1 0 0 0 0
tracemem, data type, copy
How to avoid copying a long vector
Tell if the current R is running in 32-bit or 64-bit mode
8 * .Machine$sizeof.pointer
where sizeof.pointer returns the number of *bytes* in a C SEXP type and '8' means number of bits per byte.
32- and 64-bit
See R-admin.html.
- For speed you may want to use a 32-bit build, but to handle large datasets a 64-bit build.
- Even on 64-bit builds of R there are limits on the size of R objects, some of which stem from the use of 32-bit integers (especially in FORTRAN code). For example, the dimensionas of an array are limited to 2^31 -1.
- Since R 2.15.0, it is possible to select '64-bit Files' from the standard installer even on a 32-bit version of Windows (2012/3/30).
Handling length 2^31 and more in R 3.0.0
From R News for 3.0.0 release:
There is a subtle change in behaviour for numeric index values 2^31 and larger. These never used to be legitimate and so were treated as NA, sometimes with a warning. They are now legal for long vectors so there is no longer a warning, and x[2^31] <- y will now extend the vector on a 64-bit platform and give an error on a 32-bit one.
In R 2.15.2, if I try to assign a vector of length 2^31, I will get an error
> x <- seq(1, 2^31) Error in from:to : result would be too long a vector
However, for R 3.0.0 (tested on my 64-bit Ubuntu with 16GB RAM. The R was compiled by myself):
> system.time(x <- seq(1,2^31)) user system elapsed 8.604 11.060 120.815 > length(x) [1] 2147483648 > length(x)/2^20 [1] 2048 > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 183823 9.9 407500 21.8 350000 18.7 Vcells 2147764406 16386.2 2368247221 18068.3 2148247383 16389.9 >
Note:
- 2^31 length is about 2 Giga length. It takes about 16 GB (2^31*8/2^20 MB) memory.
- On Windows, it is almost impossible to work with 2^31 length of data if the memory is less than 16 GB because virtual disk on Windows does not work well. For example, when I tested on my 12 GB Windows 7, the whole Windows system freezes for several minutes before I force to power off the machine.
- My slide in http://goo.gl/g7sGX shows the screenshots of running the above command on my Ubuntu and RHEL machines. As you can see the linux is pretty good at handling large (> system RAM) data. That said, as long as your linux system is 64-bit, you can possibly work on large data without too much pain.
- For large dataset, it makes sense to use database or specially crafted packages like bigmemory or ff.
NA in index
- Question: what is seq(1, 3)[c(1, 2, NA)]?
Answer: It will reserve the element with NA in indexing and return the value NA for it.
- Question: What is TRUE & NA?
Answer: NA
- Question: What is FALSE & NA?
Answer: FALSE
- Question: c("A", "B", NA) != "" ?
Answer: TRUE TRUE NA
- Question: which(c("A", "B", NA) != "") ?
Answer: 1 2
- Question: c(1, 2, NA) != "" & !is.na(c(1, 2, NA)) ?
Answer: TRUE TRUE FALSE
- Question: c("A", "B", NA) != "" & !is.na(c("A", "B", NA)) ?
Answer: TRUE TRUE FALSE
Conclusion: In order to exclude empty or NA for numerical or character data type, we can use which() or a convenience function keep.complete(x) <- function(x) x != "" & !is.na(x). This will guarantee return logical values and not contain NAs.
Don't just use x != "" OR !is.na(x).
Creating publication quality graphs in R
Write unix format files on Windows and vice versa
https://stat.ethz.ch/pipermail/r-devel/2012-April/063931.html
with() and within() functions
within() is similar to with() except it is used to create new columns and merge them with the original data sets. See youtube video.
closePr <- with(mariokart, totalPr - shipPr) head(closePr, 20) mk <- within(mariokart, { closePr <- totalPr - shipPr }) head(mk) # new column closePr mk <- mariokart aggregate(. ~ wheels + cond, mk, mean) # create mean according to each level of (wheels, cond) aggregate(totalPr ~ wheels + cond, mk, mean) tapply(mk$totalPr, mk[, c("wheels", "cond")], mean)
Scatterplot with rugs
require(stats) # both 'density' and its default method with(faithful, { plot(density(eruptions, bw = 0.15)) rug(eruptions) rug(jitter(eruptions, amount = 0.01), side = 3, col = "light blue") })
Draw Color Palette
Read excel files
My experience is to save the Excel file as csv file (before it can be read into R) if it is possible.
Serialization
If we want to pass an R object to C (use recv() function), we can use writeBin() to output the stream size and then use serialize() function to output the stream to a file. See the post on R mailing list.
> a <- list(1,2,3) > a_serial <- serialize(a, NULL) > a_length <- length(a_serial) > a_length [1] 70 > writeBin(as.integer(a_length), connection, endian="big") > serialize(a, connection)
In C++ process, I receive one int variable first to get the length, and then read <length> bytes from the connection.
socketConnection
See ?socketconnection.
Simple example
from the socketConnection's manual.
Open one R session
con1 <- socketConnection(port = 22131, server = TRUE) # wait until a connection from some client writeLines(LETTERS, con1) close(con1)
Open another R session (client)
con2 <- socketConnection(Sys.info()["nodename"], port = 22131) # as non-blocking, may need to loop for input readLines(con2) while(isIncomplete(con2)) { Sys.sleep(1) z <- readLines(con2) if(length(z)) print(z) } close(con2)
Use nc in client
The client does not have to be the R. We can use telnet, nc, etc. See the post here. For example, on the client machine, we can issue
nc localhost 22131 [ENTER]
Then the client will wait and show anything written from the server machine. The connection from nc will be terminated once close(con1) is given.
If I use the command
nc -v -w 2 localhost -z 22130-22135
then the connection will be established for a short time which means the cursor on the server machine will be returned. If we issue the above nc command again on the client machine it will show the connection to the port 22131 is refused. PS. "-w" switch denotes the number of seconds of the timeout for connects and final net reads.
Some post I don't have a chance to read. http://digitheadslabnotebook.blogspot.com/2010/09/how-to-send-http-put-request-from-r.html
Use curl command in client
On the server,
con1 <- socketConnection(port = 8080, server = TRUE)
On the client,
curl --trace-ascii debugdump.txt http://localhost:8080/
Then go to the server,
while(nchar(x <- readLines(con1, 1)) > 0) cat(x, "\n") close(con1) # return cursor in the client machine
Use telnet command in client
On the server,
con1 <- socketConnection(port = 8080, server = TRUE)
On the client,
sudo apt-get install telnet telnet localhost 8080 abcdefg hijklmn qestst
Go to the server,
readLines(con1, 1) readLines(con1, 1) readLines(con1, 1) close(con1) # return cursor in the client machine
Some tutorial about using telnet on http request. And this is a summary of using telnet.
Warning: cannot remove prior installation of package
For example,
# Install the latest hgu133plus2cdf package # Remove/Uninstall hgu133plus2.db package # Put/Install an old version of IRanges (eg version 1.18.2 while currently it is version 1.18.3) # Test on R 3.0.1 library(hgu133plus2cdf) # hgu133pluscdf does not depend or import IRanges source("http://bioconductor.org/biocLite.R") biocLite("hgu133plus2.db", ask=FALSE) # hgu133plus2.db imports IRanges # Warning:cannot remove prior installation of package 'IRanges' # Open Windows Explorer and check IRanges folder. Only see libs subfolder.
Note:
- In the above example, all packages were installed under C:\Program Files\R\R-3.0.1\library\.
- In another instance where I cannot reproduce the problem, new R packages were installed under C:\Users\xxx\Documents\R\win-library\3.0\. The different thing is IRanges package CAN be updated but if I use packageVersion("IRanges") command in R, it still shows the old version.
- The above were tested on a desktop.
- When working on virtualbox VM, sometimes (sort of frequently) I will get an error Warning: unable to move temporary installation `C:\Users\brb\Documents\R\win-library\3.0\fileed8270978f5\quadprog` to `C:\Users\brb\Documents\R\win-library\3.0\quadprog` when I try to run 'install.packages("forecast").
install.packages()
By default, install.packages() will check versions and install uninstalled packages shown in 'Depends', 'Imports', and 'LinkingTo' fields. See R-exts manual.
If we want to install packages listed in 'Suggests' field, we should specify it explicitly by using dependencies argument:
install.packages(XXXX, dependencies = c("Depends", "Imports", "Suggests", "LinkingTo")) # OR install.packages(XXXX, dependencies = TRUE)
For example, if I use a plain install.packages() command to install downloader package
install.packages("downloader")
it will only install 'digest' and 'downloader' packages. If I use
install.packages("downloader", dependencies=TRUE)
it will also install 'testhat' package.
R package depends vs imports
- http://stackoverflow.com/questions/8637993/better-explanation-of-when-to-use-imports-depends
- http://stackoverflow.com/questions/9893791/imports-and-depends
- https://stat.ethz.ch/pipermail/r-devel/2013-August/067082.html
In the namespace era Depends is never really needed. All modern packages have no technical need for Depends anymore. Loosely speaking the only purpose of Depends today is to expose other package's functions to the user without re-exporting them.
load = functions exported in myPkg are available to interested parties as myPkg::foo or via direct imports - essentially this means the package can now be used
attach = the namespace (and thus all exported functions) is attached to the search path - the only effect is that you have now added the exported functions to the global pool of functions - sort of like dumping them in the workspace (for all practical purposes, not technically)
import a function into a package = make sure that this function works in my package regardless of the search path (so I can write fn1 instead of pkg1::fn1 and still know it will come from pkg1 and not someone's workspace or other package that chose the same name)
The distinction is between "loading" and "attaching" a package. Loading it (which would be done if you had MASS::loglm, or imported it) guarantees that the package is initialized and in memory, but doesn't make it visible to the user without the explicit MASS:: prefix. Attaching it first loads it, then modifies the user's search list so the user can see it.
Loading is less intrusive, so it's preferred over attaching. Both library() and require() would attach it.
R package dependecies
Bioconductor's pkgDepTools package
The is an example of querying the dependencies of the notorious 'lumi' package which often broke the installation script. I am using R 3.1.2 and Bioconductor 3.0.
source("http://bioconductor.org/biocLite.R") biocLite("pkgDepTools") biocLite("Rgraphviz") library(BiocInstaller) library(Biobase) biocUrl <- biocinstallRepos()["BioCsoft"] biocDeps <- makeDepGraph(biocUrl, type="source", dosize=FALSE) categoryNodes <- c("lumi", names(acc(biocDeps, "lumi")[[1]])) categoryGraph <- subGraph(categoryNodes, biocDeps) nn <- makeNodeAttrs(categoryGraph, shape="ellipse") plot(categoryGraph, nodeAttrs=nn) # Complete but too complicated. allDeps <- makeDepGraph(biocinstallRepos(), type="source", keep.builtin=TRUE, dosize=FALSE) sort(getInstallOrder("lumi", allDeps, needed.only=FALSE)$packages) [1] "affy" "affyio" "annotate" "AnnotationDbi" "base64" [6] "base64enc" "BatchJobs" "BBmisc" "beanplot" "Biobase" [11] "BiocGenerics" "BiocInstaller" "BiocParallel" "biomaRt" "Biostrings" [16] "bitops" "brew" "bumphunter" "checkmate" "codetools" [21] "colorspace" "DBI" "dichromat" "digest" "doRNG" [26] "fail" "foreach" "genefilter" "GenomeInfoDb" "GenomicAlignments" [31] "GenomicFeatures" "GenomicRanges" "ggplot2" "graphics" "grDevices" [36] "grid" "gtable" "illuminaio" "IRanges" "iterators" [41] "KernSmooth" "labeling" "lattice" "limma" "locfit" [46] "lumi" "MASS" "Matrix" "matrixStats" "mclust" [51] "methods" "methylumi" "mgcv" "minfi" "multtest" [56] "munsell" "nleqslv" "nlme" "nor1mix" "parallel" [61] "pkgmaker" "plyr" "preprocessCore" "proto" "quadprog" [66] "R.methodsS3" "RColorBrewer" "Rcpp" "RCurl" "registry" [71] "reshape" "reshape2" "rngtools" "Rsamtools" "RSQLite" [76] "rtracklayer" "S4Vectors" "scales" "sendmailR" "siggenes" [81] "splines" "stats" "stats4" "stringr" "survival" [86] "tools" "utils" "XML" "xtable" "XVector" [91] "zlibbioc"
Compared to the new packages installed under the R\library folder, there are 73-1 new packages were installed with lumi package.
[1] "affy" "affyio" "annotate" [4] "AnnotationDbi" "base64" "base64enc" [7] "BatchJobs" "BBmisc" "beanplot" [10] "Biobase" "BiocGenerics" "BiocInstaller" [13] "BiocParallel" "biomaRt" "Biostrings" [16] "bitops" "brew" "bumphunter" [19] "checkmate" "colorspace" "DBI" [22] "dichromat" "digest" "doRNG" [25] "fail" "foreach" "genefilter" [28] "GenomeInfoDb" "GenomicAlignments" "GenomicFeatures" [31] "GenomicRanges" "ggplot2" "gtable" [34] "illuminaio" "IRanges" "iterators" [37] "labeling" "limma" "locfit" [40] "lumi" "matrixStats" "mclust" [43] "methylumi" "minfi" "multtest" [46] "munsell" "nleqslv" "nor1mix" [49] "pkgmaker" "plyr" "preprocessCore" [52] "proto" "quadprog" "R.methodsS3" [55] "RColorBrewer" "Rcpp" "RCurl" [58] "registry" "reshape" "reshape2" [61] "rngtools" "Rsamtools" "RSQLite" [64] "rtracklayer" "S4Vectors" "scales" [67] "sendmailR" "siggenes" "stringr" [70] "XML" "xtable" "XVector" [73] "zlibbioc"
miniCRAN package
- http://blog.revolutionanalytics.com/2014/07/dependencies-of-popular-r-packages.html
- http://www.magesblog.com/2014/09/managing-r-package-dependencies.html
MRAN
Reverse dependence
Subsetting
Subset assignment of R Language Definition and Manipulation of functions.
The result of the command x[3:5] <- 13:15 is as if the following had been executed
`*tmp*` <- x x <- "[<-"(`*tmp*`, 3:5, value=13:15) rm(`*tmp*`)
S3 and S4
- Software for Data Analysis: Programming with R by John Chambers
- Programming with Data: A Guide to the S Language by John Chambers
- https://www.rmetrics.org/files/Meielisalp2009/Presentations/Chalabi1.pdf
- https://www.stat.auckland.ac.nz/S-Workshop/Gentleman/S4Objects.pdf
To get the source code of S4 methods, we can use showMethod(), getMethod() and showMethod(). For example
library(qrqc) showMethods("gcPlot") getMethod("gcPlot", "FASTQSummary") # get an error showMethod("gcPlot", "FASTQSummary") # good.
- getClassDef() in S4 (Bioconductor course).
library(IRanges) ir <- IRanges(start=c(10, 20, 30), width=5) ir class(ir) ## [1] "IRanges" ## attr(,"package") ## [1] "IRanges" getClassDef(class(ir)) ## Class "IRanges" [package "IRanges"] ## ## Slots: ## ## Name: start width NAMES elementType ## Class: integer integer characterORNULL character ## ## Name: elementMetadata metadata ## Class: DataTableORNULL list ## ## Extends: ## Class "Ranges", directly ## Class "IntegerList", by class "Ranges", distance 2 ## Class "RangesORmissing", by class "Ranges", distance 2 ## Class "AtomicList", by class "Ranges", distance 3 ## Class "List", by class "Ranges", distance 4 ## Class "Vector", by class "Ranges", distance 5 ## Class "Annotated", by class "Ranges", distance 6 ## ## Known Subclasses: "NormalIRanges"
findInterval()
Related functions are cuts() and split(). See also
do.call, rbind, lapply
Lots of examples. See for example this one for creating a data frame from a vector.
x <- readLines(textConnection("---CLUSTER 1 --- 3 4 5 6 ---CLUSTER 2 --- 9 10 8 11")) # create a list of where the 'clusters' are clust <- c(grep("CLUSTER", x), length(x) + 1L) # get size of each cluster clustSize <- diff(clust) - 1L # get cluster number clustNum <- gsub("[^0-9]+", "", x[grep("CLUSTER", x)]) result <- do.call(rbind, lapply(seq(length(clustNum)), function(.cl){ cbind(Object = x[seq(clust[.cl] + 1L, length = clustSize[.cl])] , Cluster = .cl ) })) result Object Cluster [1,] "3" "1" [2,] "4" "1" [3,] "5" "1" [4,] "6" "1" [5,] "9" "2" [6,] "10" "2" [7,] "8" "2" [8,] "11" "2"
How to get examples from help file
See this post. Method 1:
example(acf, give.lines=TRUE)
Method 2:
Rd <- utils:::.getHelpFile(?acf) tools::Rd2ex(Rd)
"[" and "[[" with the sapply() function
Suppose we want to extract string from the id like "ABC-123-XYZ" before the first hyphen.
sapply(strsplit("ABC-123-XYZ", "-"), "[", 1)
is the same as
sapply(strsplit("ABC-123-XYZ", "-"), function(x) x[1])
Dealing with date
d1 = date() class(d1) # "character" d2 = Sys.Date() class(d2) # "Date" format(d2, "%a %b %d") library(lubridate); ymd("20140108") # "2014-01-08 UTC" mdy("08/04/2013") # "2013-08-04 UTC" dmy("03-04-2013") # "2013-04-03 UTC" ymd_hms("2011-08-03 10:15:03") # "2011-08-03 10:15:03 UTC" ymd_hms("2011-08-03 10:15:03", tz="Pacific/Auckland") # "2011-08-03 10:15:03 NZST" ?Sys.timezone x = dmy(c("1jan2013", "2jan2013", "31mar2013", "30jul2013")) wday(x[1]) # 3 wday(x[1], label=TRUE) # Tues
- http://www.r-statistics.com/2012/03/do-more-with-dates-and-times-in-r-with-lubridate-1-1-0/
- http://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html
- We want our dates and times as class "Date" or the class "POSIXct", "POSIXlt". For more information type ?POSIXlt.
Nonstandard evaluation
- substitute() - capture expression
- quote() - similar to substitute() but do nothing
- eval() - non-standard evaluation
- deparse(substitute()) - convert expression to char string
Lazy evaluation in R functions arguments
R function arguments are lazy — they’re only evaluated if they’re actually used.
- Example 1. By default, R function arguments are lazy.
f <- function(x) { 999 } f(stop("This is an error!")) #> [1] 999
- Example 2. If you want to ensure that an argument is evaluated you can use force().
add <- function(x) { force(x) function(y) x + y } adders2 <- lapply(1:10, add) adders2[[1]](10) #> [1] 11 adders2[[10]](10) #> [1] 20
- Example 3. Default arguments are evaluated inside the function.
f <- function(x = ls()) { a <- 1 x } # ls() evaluated inside f: f() # [1] "a" "x" # ls() evaluated in global environment: f(ls()) # [1] "add" "adders" "f"
- Example 4. Laziness is useful in if statements — the second statement below will be evaluated only if the first is true.
x <- NULL if (!is.null(x) && x > 0) { }
Backtick sign, infix/prefix/postfix operators
The backtick sign ` (not the single quote) refers to functions or variables that have otherwise reserved or illegal names; e.g. '&&', '+', '(', 'for', 'if', etc. See some examples in this note.
infix operator.
1 + 2 # infix + 1 2 # prefix 1 2 + # postfix
List data type
Calling a function given a list of arguments
> args <- list(c(1:10, NA, NA), na.rm = TRUE) > do.call(mean, args) [1] 5.5 > mean(c(1:10, NA, NA), na.rm = TRUE) [1] 5.5
Error handling and exceptions
- http://adv-r.had.co.nz/Exceptions-Debugging.html
- try() allows execution to continue even after an error has occurred. You can suppress the message with try(..., silent = TRUE).
out <- try({ a <- 1 b <- "x" a + b }) elements <- list(1:10, c(-1, 10), c(T, F), letters) results <- lapply(elements, log) is.error <- function(x) inherits(x, "try-error") succeeded <- !sapply(results, is.error)
- tryCatch(): With tryCatch() you map conditions to handlers (like switch()), named functions that are called with the condition as an input. Note that try() is a simplified version of tryCatch().
tryCatch(expr, ..., finally) show_condition <- function(code) { tryCatch(code, error = function(c) "error", warning = function(c) "warning", message = function(c) "message" ) } show_condition(stop("!")) #> [1] "error" show_condition(warning("?!")) #> [1] "warning" show_condition(message("?")) #> [1] "message" show_condition(10) #> [1] 10
Using list type
Avoid if-else or switch
?plot.stepfun.
y0 <- c(1,2,4,3) sfun0 <- stepfun(1:3, y0, f = 0) sfun.2 <- stepfun(1:3, y0, f = .2) sfun1 <- stepfun(1:3, y0, right = TRUE) for(i in 1:3) lines(list(sfun0, sfun.2, stepfun(1:3, y0, f = 1))[[i]], col = i) legend(2.5, 1.9, paste("f =", c(0, 0.2, 1)), col = 1:3, lty = 1, y.intersp = 1)
Resource
Books
- Modern Applied Statistics with S by William N. Venables and Brian D. Ripley
- Seamless R and C++ Integration with Rcpp by Dirk Eddelbuettel
- Advanced R by Hadley Wickham
- R Cookbook by Paul Teetor
- Machine Learning with R by Brett Lantz
- R for Everyone by Jared P. Lander
- The Art of R Programming by Norman Matloff
- Applied Predictive Modeling by Max Kuhn
- R in Action by Robert Kabacoff
- The R Book by Michael J. Crawley
- Regression Modeling Strategies, with Applications to Linear Models, Survival Analysis and Logistic Regression by Frank E. Harrell
- Data Manipulation with R by Phil Spector