R: Difference between revisions
Line 1,086: | Line 1,086: | ||
=== [http://cran.r-project.org/web/packages/RJSONIO/index.html RJSONIO] === | === [http://cran.r-project.org/web/packages/RJSONIO/index.html RJSONIO] === | ||
==== Plot IP on google map ==== | ==== Plot IP on google map ==== | ||
* http://thebiobucket.blogspot.com/2011/12/some-fun-with-googlevis-plotting-blog.html#more | * http://thebiobucket.blogspot.com/2011/12/some-fun-with-googlevis-plotting-blog.html#more (RCurl, RJONIO, plyr, googleVis) | ||
* http://devblog.icans-gmbh.com/using-the-maxmind-geoip-api-with-r/ (RCurl, RJONIO, maps) | |||
* http://cran.r-project.org/web/packages/geoPlot/index.html (geoPlot package (deprecated as 8/12/2013)) | |||
* http://devblog.icans-gmbh.com/using-the-maxmind-geoip-api-with-r/ | * http://archive09.linux.com/feature/135384 (Not R) ApacheMap | ||
* http://batchgeo.com/features/geolocation-ip-lookup/ (Not R) (Enter a spreadsheet of adress, city, zip or a column of IPs and it will show the location on google map) | |||
* http://cran.r-project.org/web/packages/geoPlot/index.html | |||
* http://archive09.linux.com/feature/135384 | |||
(Not R) | |||
* http://batchgeo.com/features/geolocation-ip-lookup/ | |||
* http://code.google.com/p/apachegeomap/ | * http://code.google.com/p/apachegeomap/ | ||
Line 1,132: | Line 1,122: | ||
# append to log-file: | # append to log-file: | ||
logfile <- data.frame(ip, Lat = Coords$Lat, Long = Coords$Lon, | logfile <- data.frame(ip, Lat = Coords$Lat, Long = Coords$Lon, | ||
LatLong = paste(round(Coords$Lat, 1), round(Coords$Lon, 1), sep = ":")) | |||
log_gmap <- logfile[!is.na(logfile$Lat), ] | log_gmap <- logfile[!is.na(logfile$Lat), ] | ||
Revision as of 17:24, 28 September 2013
Install Rtools for Windows users
See http://goo.gl/gYh6C for a step-by-step instruction with screenshot.
My preferred way is not to check the option of setting PATH environment. But I manually add the followings to the PATH environment (based on Rtools v3.0)
c:\Rtools\bin; c:\Rtools\gcc-4.6.3\bin; C:\Program Files\R\R-2.15.2\bin\i386;
We can make our life easy by creating a file <Rcommand.bat> with the content (also useful if you have C:\cygwin\bin in your PATH although cygwin setup will not do it automatically for you.)
PS. I put <Rcommand.bat> under C:\Program Files\R folder. I create a shortcut called 'Rcmd' on desktop. I enter C:\Windows\System32\cmd.exe /K "Rcommand.bat" in the Target entry and "C:\Program Files\R" in Start in entry.
@echo off set PATH=C:\Rtools\bin;c:\Rtools\gcc-4.6.3\bin;%PATH% set PATH=C:\Program Files\R\R-2.15.2\bin\i386;%PATH% set PKG_LIBS=`Rscript -e "Rcpp:::LdFlags()"` set PKG_CPPFLAGS=`Rscript -e "Rcpp:::CxxFlags()"` echo Setting environment for using R cmd
So we can open the Command Prompt anywhere and run <Rcommand.bat> to get all environment variables ready! On Windows Vista, 7 and 8, we need to run it as administrator. OR we can change the security of the property so the current user can have an executive right.
Windows Toolset
Note that R on Windows supports Mingw-w64 (not Mingw which is a separate project). See here for the issue of developing a Qt application that links against R using Rcpp. And http://qt-project.org/wiki/MinGW is the wiki for compiling Qt using MinGW and MinGW-w64.
Compile and install an R package
cd C:\Documents and Settings\brb wget http://www.bioconductor.org/packages/2.11/bioc/src/contrib/affxparser_1.30.2.tar.gz C:\progra~1\r\r-2.15.2\bin\R CMD INSTALL --build affxparser_1.30.2.tar.gz
Helpful - check Chapter 6 of R Installation and Administration
Check/Upload to CRAN
http://win-builder.r-project.org/
64 bit toolchain
See January 2010 email https://stat.ethz.ch/pipermail/r-devel/2010-January/056301.html and R-Admin manual.
From R 2.11.0 there is 64 bit Windows binary for R.
Install R using binary package on Linux OS
Ubuntu/Debian
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9 sudo nano /etc/apt/sources.list # deb http://cran.fhcrc.org/bin/linux/ubuntu precise/ sudo apt-get update sudo apt-get install r-base
Redhat el6
It should be pretty easy to install via the EPEL: http://fedoraproject.org/wiki/EPEL
Just follow the instructions to enable the EPEL and then from the CLI as root:
yum install R
or via sudo:
sudo yum install R
Install R from source (ix86, x86_64 and arm platforms, Linux system)
Debian system (focus on arm architecture with notes from x86 system)
Simplest configuration
On my debian system in Pogoplug (armv5), Raspberry Pi (armv6) OR Beaglebone Black (armv7), I can compile R. See R's admin manual. If the OS needs x11, I just need to install 2 required packages.
- install gfortran: apt-get install gfortran (gfortran is not part of build-essential)
- install readline library: apt-get install libreadline5-dev (pogoplug), apt-get install libreadline6-dev (raspberry pi, ubuntu)
Note: if I need x11, I should install
- libx11 and libx11-devel, libXt, libXt-devel (for fedora)
- libx11-dev (for debian) or xorg-dev (for pogoplug/raspberry pi)
and optional
- texinfo (to fix 'WARNING: you cannot build info or HTML versions of the R manuals')
Note that it is also safe to install required tools via (in ubuntu)
sudo apt-get install r-base-dev
Or even better with
sudo apt-get build-dep r-base
See Install all dependencies for building R Since with the first approach, running ./configure still complains x11 header/libs is still missing. The second approach will pull in dependence like jdk, tcl, tex and more. The apt-get build-dep gave a more complete list than apt-get install r-base-dev for some reasons.
[Arm architecture] I also run apt-get install readline-common. I don't know if this is necessary. Since I don't need x11, I use the option in configure command. After running
wget http://cran.r-project.org/src/base/R-2/R-2.15.2.tar.gz tar xzvf R-2.15.2.tar.gz cd R-2.15.2 ./configure --with-x=no --enable-R-shlib
I got
R is now configured for armv5tel-unknown-linux-gnueabi Source directory: . Installation directory: /usr/local C compiler: gcc -std=gnu99 -g -O2 Fortran 77 compiler: gfortran -g -O2 C++ compiler: g++ -g -O2 Fortran 90/95 compiler: gfortran -g -O2 Obj-C compiler: Interfaces supported: External libraries: readline Additional capabilities: NLS Options enabled: shared R library, shared BLAS, R profiling Recommended packages: yes configure: WARNING: you cannot build info or HTML versions of the R manuals configure: WARNING: you cannot build PDF versions of the R manuals configure: WARNING: you cannot build PDF versions of vignettes and help pages configure: WARNING: I could not determine a browser configure: WARNING: I could not determine a PDF viewer
PS 1. On my raspberry pi machine, it shows R is now configured for armv6l-unknown-linux-gnueabihf.
PS 2. On my x86 system, it shows
R is now configured for x86_64-unknown-linux-gnu Source directory: . Installation directory: /usr/local C compiler: gcc -std=gnu99 -g -O2 Fortran 77 compiler: gfortran -g -O2 C++ compiler: g++ -g -O2 Fortran 90/95 compiler: gfortran -g -O2 Obj-C compiler: Interfaces supported: X11, tcltk External libraries: readline, lzma Additional capabilities: PNG, JPEG, TIFF, NLS, cairo Options enabled: shared R library, shared BLAS, R profiling, Java Recommended packages: yes
[arm] However, make gave errors for recommanded packages like KernSmooth, MASS, boot, class, cluster, codetools, foreign, lattice, mgcv, nlme, nnet, rpart, spatial, and survival. The error stems from gcc: SHLIB_LIBADD: No such file or directory. Note that I can get this error message even I try install.packages("MASS", type="source"). A suggested fix is here; adding perl = TRUE in sub() call for two lines in src/library/tools/R/install.R file. However, I got another error shared object 'MASS.so' not found. See also http://ftp.debian.org/debian/pool/main/r/r-base/.
make[1]: Entering directory `/mnt/usb/R-2.15.2/src/library/Recommended' make[2]: Entering directory `/mnt/usb/R-2.15.2/src/library/Recommended' begin installing recommended package MASS * installing *source* package 'MASS' ... ** libs make[3]: Entering directory `/tmp/Rtmp4caBfg/R.INSTALL1d1244924c77/MASS/src' gcc -std=gnu99 -I/mnt/usb/R-2.15.2/include -DNDEBUG -I/usr/local/include -fpic -g -O2 -c MASS.c -o MASS.o gcc -std=gnu99 -I/mnt/usb/R-2.15.2/include -DNDEBUG -I/usr/local/include -fpic -g -O2 -c lqs.c -o lqs.o gcc -std=gnu99 -shared -L/usr/local/lib -o MASSSHLIB_EXT MASS.o lqs.o SHLIB_LIBADD -L/mnt/usb/R-2.15.2/lib -lR gcc: SHLIB_LIBADD: No such file or directory make[3]: *** [MASSSHLIB_EXT] Error 1 make[3]: Leaving directory `/tmp/Rtmp4caBfg/R.INSTALL1d1244924c77/MASS/src' ERROR: compilation failed for package 'MASS' * removing '/mnt/usb/R-2.15.2/library/MASS' make[2]: *** [MASS.ts] Error 1 make[2]: Leaving directory `/mnt/usb/R-2.15.2/src/library/Recommended' make[1]: *** [recommended-packages] Error 2 make[1]: Leaving directory `/mnt/usb/R-2.15.2/src/library/Recommended' make: *** [stamp-recommended] Error 2 root@debian:/mnt/usb/R-2.15.2# root@debian:/mnt/usb/R-2.15.2# bin/R R version 2.15.2 (2012-10-26) -- "Trick or Treat" Copyright (C) 2012 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: armv5tel-unknown-linux-gnueabi (32-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > library(MASS) Error in library(MASS) : there is no package called 'MASS' > library() Packages in library '/mnt/usb/R-2.15.2/library': base The R Base Package compiler The R Compiler Package datasets The R Datasets Package grDevices The R Graphics Devices and Support for Colours and Fonts graphics The R Graphics Package grid The Grid Graphics Package methods Formal Methods and Classes parallel Support for Parallel computation in R splines Regression Spline Functions and Classes stats The R Stats Package stats4 Statistical Functions using S4 Classes tcltk Tcl/Tk Interface tools Tools for Package Development utils The R Utils Package > Sys.info()["machine"] machine "armv5tel" > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 170369 4.6 350000 9.4 350000 9.4 Vcells 163228 1.3 905753 7.0 784148 6.0
See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=679180
PS 3. The complete log of building R from source is in here File:Build R log.txt
Full configuration
Interfaces supported: X11, tcltk External libraries: readline Additional capabilities: PNG, JPEG, TIFF, NLS, cairo Options enabled: shared R library, shared BLAS, R profiling, Java
Update: R 3.0.1 on Beaglebone Black (armv7a) + Ubuntu 13.04
See the page here.
Install all dependencies for building R
This is a comprehensive list. This list is even larger than r-base-dev.
root@debian:/mnt/usb/R-2.15.2# apt-get build-dep r-base Reading package lists... Done Building dependency tree Reading state information... Done The following packages will be REMOVED: libreadline5-dev The following NEW packages will be installed: bison ca-certificates ca-certificates-java debhelper defoma ed file fontconfig gettext gettext-base html2text intltool-debian java-common libaccess-bridge-java libaccess-bridge-java-jni libasound2 libasyncns0 libatk1.0-0 libaudit0 libavahi-client3 libavahi-common-data libavahi-common3 libblas-dev libblas3gf libbz2-dev libcairo2 libcairo2-dev libcroco3 libcups2 libdatrie1 libdbus-1-3 libexpat1-dev libflac8 libfontconfig1-dev libfontenc1 libfreetype6-dev libgif4 libglib2.0-dev libgtk2.0-0 libgtk2.0-common libice-dev libjpeg62-dev libkpathsea5 liblapack-dev liblapack3gf libnewt0.52 libnspr4-0d libnss3-1d libogg0 libopenjpeg2 libpango1.0-0 libpango1.0-common libpango1.0-dev libpcre3-dev libpcrecpp0 libpixman-1-0 libpixman-1-dev libpng12-dev libpoppler5 libpulse0 libreadline-dev libreadline6-dev libsm-dev libsndfile1 libthai-data libthai0 libtiff4-dev libtiffxx0c2 libunistring0 libvorbis0a libvorbisenc2 libxaw7 libxcb-render-util0 libxcb-render-util0-dev libxcb-render0 libxcb-render0-dev libxcomposite1 libxcursor1 libxdamage1 libxext-dev libxfixes3 libxfont1 libxft-dev libxi6 libxinerama1 libxkbfile1 libxmu6 libxmuu1 libxpm4 libxrandr2 libxrender-dev libxss-dev libxt-dev libxtst6 luatex m4 openjdk-6-jdk openjdk-6-jre openjdk-6-jre-headless openjdk-6-jre-lib openssl pkg-config po-debconf preview-latex-style shared-mime-info tcl8.5-dev tex-common texi2html texinfo texlive-base texlive-binaries texlive-common texlive-doc-base texlive-extra-utils texlive-fonts-recommended texlive-generic-recommended texlive-latex-base texlive-latex-extra texlive-latex-recommended texlive-pictures tk8.5-dev tzdata-java whiptail x11-xkb-utils x11proto-render-dev x11proto-scrnsaver-dev x11proto-xext-dev xauth xdg-utils xfonts-base xfonts-encodings xfonts-utils xkb-data xserver-common xvfb zlib1g-dev 0 upgraded, 136 newly installed, 1 to remove and 0 not upgraded. Need to get 139 MB of archives. After this operation, 410 MB of additional disk space will be used. Do you want to continue [Y/n]?
bin/R (shell script) and bin/exec/R (binary executable) on Linux OS
bin/R is just a shell script to launch bin/exec/R program. So if we try to run the following program
# test.R cat("-- reading arguments\n", sep = ""); cmd_args = commandArgs(); for (arg in cmd_args) cat(" ", arg, "\n", sep="");
from command line like
brb@brb-P45T-A:~/Downloads$ ~/R-3.0.1/bin/R --slave --no-save --no-restore --no-environ --silent --args arg1=abc < test.R -- reading arguments /home/brb/R-3.0.1/bin/exec/R --slave --no-save --no-restore --no-environ --silent --args arg1=abc
we can see R actually call bin/exec/R program.
Web Applications
Create HTML5 web and slides
http://www.gastonsanchez.com/depot/knitr-slides. The HTML5 slides work on my IE 8 too.
HTML5 slides examples
- http://yihui.name/slides/knitr-slides.html
- http://yihui.name/slides/2012-knitr-RStudio.html
- http://yihui.name/slides/2011-r-dev-lessons.html#slide1
- http://inundata.org/R_talks/BARUG/#intro
Software requirement
- Rstudio
- knitr, XML, RCurl (See omegahat for installation on Ubuntu)
- pandoc package This is a command line tool. I am testing it on Windows 7.
Slide #22 gives an instruction to create
- regular html file by using RStudio -> Knit HTML button
- HTML5 slides by using pandoc from command line.
Files:
- Rcmd source: 009-slides.Rmd Note that IE 8 was not supported by github. For IE 9, be sure to turn off "Compatibility View".
- markdown output: 009-slides.md
- HTML output: 009-slides.html
We can create Rcmd source in Rstudio by File -> New -> R Markdown.
There are 4 ways to produce slides with pandoc
- S5
- DZSlides
- Slidy
- Slideous
Use the markdown file (md) and convert it with pandoc
pandoc -s -S -i -t dzslides --mathjax html5_slides.md -o html5_slides.html
If we are comfortable with HTML and CSS code, open the html file (generated by pandoc) and modify the CSS style at will.
Markdown language
According to wikipedia:
Markdown is a lightweight markup language, originally created by John Gruber with substantial contributions from Aaron Swartz, allowing people “to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML)”.
- Markup is a general term for content formatting - such as HTML - but markdown is a library that generates HTML markup.
- Nice summary from stackoverflow.com and more complete list from github.
- An example https://gist.github.com/jeromyanglim/2716336
- Convert mediawiki to markdown using online conversion tool from pandoc.
- R markdown file and use it in RStudio. Customizing Chunk Options can be found in knitr page and rpubs.com.
HTTP protocol
- http://en.wikipedia.org/wiki/File:Http_request_telnet_ubuntu.png
- Query string
- How to capture http header? Use curl -i en.wikipedia.org.
- Web Inspector. Build-in in Chrome. Right click on any page and choose 'Inspect Element'.
- Web server
- Simple TCP/IP web server
- HTTP Made Really Easy
- Illustrated Guide to HTTP
- nweb: a tiny, safe Web server with 200 lines
- Tiny HTTPd
An HTTP server is conceptually simple:
- Open port 80 for listening
- When contact is made, gather a little information (get mainly - you can ignore the rest for now)
- Translate the request into a file request
- Open the file and spit it back at the client
It gets more difficult depending on how much of HTTP you want to support - POST is a little more complicated, scripts, handling multiple requests, etc.
Example in R
> co <- socketConnection(port=8080, server=TRUE, blocking=TRUE) > # Now open a web browser and type http://localhost:8080/index.html > readLines(co,1) [1] "GET /index.html HTTP/1.1" > readLines(co,1) [1] "Host: localhost:8080" > readLines(co,1) [1] "User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:23.0) Gecko/20100101 Firefox/23.0" > readLines(co,1) [1] "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" > readLines(co,1) [1] "Accept-Language: en-US,en;q=0.5" > readLines(co,1) [1] "Accept-Encoding: gzip, deflate" > readLines(co,1) [1] "Connection: keep-alive" > readLines(co,1) [1] ""
Example in C (Very simple http server written in C, 187 lines)
Create a simple hello world html page and save it as <index.html> in the current directory (/home/brb/Downloads/)
Launch the server program (assume we have done gcc http_server.c -o http_server)
$ ./http_server -p 50002 Server started at port no. 50002 with root directory as /home/brb/Downloads
Secondly open a browser and type http://localhost:50002/index.html. The server will respond
GET /index.html HTTP/1.1 Host: localhost:50002 User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:23.0) Gecko/20100101 Firefox/23.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate Connection: keep-alive file: /home/brb/Downloads/index.html GET /favicon.ico HTTP/1.1 Host: localhost:50002 User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:23.0) Gecko/20100101 Firefox/23.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate Connection: keep-alive file: /home/brb/Downloads/favicon.ico GET /favicon.ico HTTP/1.1 Host: localhost:50003 User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:23.0) Gecko/20100101 Firefox/23.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate Connection: keep-alive file: /home/brb/Downloads/favicon.ico
The browser will show the page from <index.html> in server.
The only bad thing is the code does not close the port. For example, if I have use Ctrl+C to close the program and try to re-launch with the same port, it will complain socket() or bind(): Address already in use.
Another Example in C (55 lines)
http://mwaidyanatha.blogspot.com/2011/05/writing-simple-web-server-in-c.html
The response is embedded in the C code.
If we test the server program by opening a browser and type "http://localhost:15000/", the server received the follwing 7 lines
GET / HTTP/1.1 Host: localhost:15000 User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:23.0) Gecko/20100101 Firefox/23.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate Connection: keep-alive
If we include a non-executable file's name in the url, we will be able to download that file. Try "http://localhost:15000/client.c".
If we use telnet program to test, wee need to type anything we want
$ telnet localhost 15000 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. ThisCanBeAnything <=== This is what I typed in the client and it is also shown on server HTTP/1.1 200 OK <=== From here is what I got from server Content-length: 37Content-Type: text/html HTML_DATA_HERE_AS_YOU_MENTIONED_ABOVE <=== The html tags are not passed from server, interesting! Connection closed by foreign host. $
Others
- http://rosettacode.org/wiki/Hello_world/ (Different languages)
- http://kperisetla.blogspot.com/2012/07/simple-http-web-server-in-c.html (Windows web server)
- http://css.dzone.com/articles/web-server-c (handling HTTP GET request, handling content types(txt, html, jpg, zip. rar, pdf, php etc.), sending proper HTTP error codes, serving the files from a web root, change in web root in a config file, zero copy optimization using sendfile method and php file handling.)
- https://github.com/gtungatkar/Simple-HTTP-server
- https://github.com/davidmoreno/onion
shiny
The following is what we see on a browser after we run an example from shiny package. See http://rstudio.github.com/shiny/tutorial/#hello-shiny. Note that the R session needs to be on; i.e. R command prompt will not be returned unless we press Ctrl+C or ESC.
shiny depends on websockets, caTools, bitops, digest packages.
Q & A:
- Q: If we run runExample('01_hello') in Rserve from an R client, we can continue our work in R client without losing the functionality of the GUI from shiny. Question: how do we kill the job?
- If I run the example "01_hello", the browser only shows the control but not graph on Firefox? A: Use Chrome or Opera as the default browser.
- If I run the example "01_hello" on RHEL the first time, it works fine. But if I click 'Ctrl + C' to stop it and run it again, I got a message
Warning in .SOCK_SERVE(port) : R-Websockets(tcpserv): bind() failed. Error in createContext(port, webpage, is.binary = is.binary) : Unable to bind socket on port 8100; is it realsy in use?
A simple solution is to close R and open it again.
- Q: Deployment on web. A: Not ready yet. Shiny server platform is still under beta testing. Shiny apps are hosted using the R websockets package which acts more like a tcp server than a web server, and that architecture just doesn't fit with rApache, or even apache for that matter.
- Q: How difficult to put the code in Gist:github? A: Just create an account. Do not even need to create a repository. Just go to http://gist.github.com and create a new gist. The new gist can be secret or public. A secret gist can not be edited again after it is created although it works fine when it was used in runGist() function.
shiny server
See https://github.com/rstudio/shiny-server
It works on my ubuntu server. To test, I need to run
sudo shiny-server # maybe I need to add ampersand '&' sign to the end of the above command
websocket
http://illposed.net/jsm2012.pdf
Heatmap Example
http://taichi.selfip.net:3838/hello/
RApache
gWidgetsWWW
- http://www.jstatsoft.org/v49/i10/paper
- gWidgetsWWW2 gWidgetsWWW based on Rook
- Compare shiny with gWidgetsWWW2.rapache
Rook
Since R 2.13, the internal web server was exposed.
Tutorual from useR2012 and Jeffrey Horner
Here is another one from http://www.rinfinance.com.
Rook is also supported by [rApache too. See http://rapache.net/manual.html.
Google group. https://groups.google.com/forum/?fromgroups#!forum/rrook
Advantage
- the web applications are created on desktop, whether it is Windows, Mac or Linux.
- No Apache is needed.
- create multiple applications at the same time. This complements the limit of rApache.
4 lines of code example.
library(Rook) s <- Rhttpd$new() s$start(quiet=TRUE) s$print() s$browse(1) # OR s$browse("RookTest")
Notice that after s$browse() command, the cursor will return to R because the command just a shortcut to open the web page http://127.0.0.1:10215/custom/RookTest.
We can add Rook application to the server; see ?Rhttpd.
s$add( app=system.file('exampleApps/helloworld.R',package='Rook'),name='hello' ) s$add( app=system.file('exampleApps/helloworldref.R',package='Rook'),name='helloref' ) s$add( app=system.file('exampleApps/summary.R',package='Rook'),name='summary' ) s$print() #Server started on 127.0.0.1:10221 #[1] RookTest http://127.0.0.1:10221/custom/RookTest #[2] helloref http://127.0.0.1:10221/custom/helloref #[3] summary http://127.0.0.1:10221/custom/summary #[4] hello http://127.0.0.1:10221/custom/hello # Stops the server but doesn't uninstall the app ## Not run: s$stop() ## End(Not run) s$remove(all=TRUE) rm(s)
For example, the interface and the source code of summary app are given below
app <- function(env) { req <- Rook::Request$new(env) res <- Rook::Response$new() res$write('Choose a CSV file:\n') res$write('<form method="POST" enctype="multipart/form-data">\n') res$write('<input type="file" name="data">\n') res$write('<input type="submit" name="Upload">\n</form>\n<br>') if (!is.null(req$POST())){ data <- req$POST()[['data']] res$write("<h3>Summary of Data</h3>"); res$write("<pre>") res$write(paste(capture.output(summary(read.csv(data$tempfile,stringsAsFactors=FALSE)),file=NULL),collapse='\n')) res$write("</pre>") res$write("<h3>First few lines (head())</h3>"); res$write("<pre>") res$write(paste(capture.output(head(read.csv(data$tempfile,stringsAsFactors=FALSE)),file=NULL),collapse='\n')) res$write("</pre>") } res$finish() }
More example:
- http://lamages.blogspot.com/2012/08/rook-rocks-example-with-googlevis.html
- Self-organizing map
- Deploy Rook apps with rApache. First one and two.
Stockplot
FastRWeb
Rwui
CGHWithR (removed from CRAN)
But it is still working with old version of R.
Creating local repository for CRAN and Bioconductor (focus on Windows binary packages only)
How to set up a local repository
- CRAN specific: http://cran.r-project.org/mirror-howto.html
- Bioconductor specific: http://www.bioconductor.org/about/mirrors/mirror-how-to/
General guide: http://cran.r-project.org/doc/manuals/R-admin.html#Setting-up-a-package-repository
Utilities such as install.packages can be pointed at any CRAN-style repository, and R users may want to set up their own. The ‘base’ of a repository is a URL such as http://www.omegahat.org/R/: this must be an URL scheme that download.packages supports (which also includes ‘ftp://’ and ‘file://’, but not on most systems ‘https://’). Under that base URL there should be directory trees for one or more of the following types of package distributions:
- "source": located at src/contrib and containing .tar.gz files. Other forms of compression can be used, e.g. .tar.bz2 or .tar.xz files.
- "win.binary": located at bin/windows/contrib/x.y for R versions x.y.z and containing .zip files for Windows.
- "mac.binary.leopard": located at bin/macosx/leopard/contrib/x.y for R versions x.y.z and containing .tgz files.
Each terminal directory must also contain a PACKAGES file. This can be a concatenation of the DESCRIPTION files of the packages separated by blank lines, but only a few of the fields are needed. The simplest way to set up such a file is to use function write_PACKAGES in the tools package, and its help explains which fields are needed. Optionally there can also be a PACKAGES.gz file, a gzip-compressed version of PACKAGES—as this will be downloaded in preference to PACKAGES it should be included for large repositories. (If you have a mis-configured server that does not report correctly non-existent files you will need PACKAGES.gz.)
To add your repository to the list offered by setRepositories(), see the help file for that function.
A repository can contain subdirectories, when the descriptions in the PACKAGES file of packages in subdirectories must include a line of the form
Path: path/to/subdirectory
—once again write_PACKAGES is the simplest way to set this up.
Space requirement if we want to mirror WHOLE repository
- Whole CRAN takes about 92GB (rsync -avn cran.r-project.org::CRAN > ~/Downloads/cran).
- Bioconductor is big (> 64G for BioC 2.11). Please check the size of what will be transferred with e.g. (rsync -avn bioconductor.org::2.11 > ~/Downloads/bioc) and make sure you have enough room on your local disk before you start.
On the other hand, we if only care about Windows binary part, the space requirement is largely reduced.
- CRAN: 2.7GB
- Bioconductor: 28GB.
Misc notes
- If the binary package was built on R 2.15.1, then it cannot be installed on R 2.15.2. But vice is OK.
- Remember to issue "--delete" option in rsync, otherwise old version of package will be installed.
- The repository still need src directory. If it is missing, we will get an error
Warning: unable to access index for repository http://arraytools.no-ip.org/CRAN/src/contrib Warning message: package ‘glmnet’ is not available (for R version 2.15.2)
The error was given by available.packages() function.
To bypass the requirement of src directory, I can use
install.packages("glmnet", contriburl = contrib.url(getOption('repos'), "win.binary"))
but there may be a problem when we use biocLite() command.
I find a workaround. Since the error comes from missing CRAN/src directory, we just need to make sure the directory CRAN/src/contrib exists AND either CRAN/src/contrib/PACKAGES or CRAN/src/contrib/PACKAGES.gz exists.
To create CRAN repository
Before creating a local repository please give a dry run first. You don't want to be surprised how long will it take to mirror a directory.
Dry run (-n option). Pipe out the process to a text file for an examination.
rsync -avn cran.r-project.org::CRAN > crandryrun.txt
To mirror only partial repository, it is necessary to create directories before running rsync command.
cd mkdir -p ~/Rmirror/CRAN/bin/windows/contrib/2.15 rsync -rtlzv --delete cran.r-project.org::CRAN/bin/windows/contrib/2.15/ ~/Rmirror/CRAN/bin/windows/contrib/2.15 (one line with space before ~/Rmirror) # src directory is very large (~27GB) since it contains source code for each R version. # We just need the files PACKAGES and PACKAGES.gz in CRAN/src/contrib. So I comment out the following line. # rsync -rtlzv --delete cran.r-project.org::CRAN/src/ ~/Rmirror/CRAN/src/ mkdir -p ~/Rmirror/CRAN/src/contrib rsync -rtlzv --delete cran.r-project.org::CRAN/src/contrib/PACKAGES ~/Rmirror/CRAN/src/contrib/ rsync -rtlzv --delete cran.r-project.org::CRAN/src/contrib/PACKAGES.gz ~/Rmirror/CRAN/src/contrib/
And optionally
library(tools) write_PACKAGES("~/Rmirror/CRAN/bin/windows/contrib/2.15", type="win.binary")
and if we want to get src directory
rsync -rtlzv --delete cran.r-project.org::CRAN/src/contrib/*.tar.gz ~/Rmirror/CRAN/src/contrib/ rsync -rtlzv --delete cran.r-project.org::CRAN/src/contrib/2.15.3 ~/Rmirror/CRAN/src/contrib/
We can use du -h to check the folder size.
For example (as of 1/7/2013),
$ du -k ~/Rmirror --max-depth=1 --exclude ".*" | sort -nr | cut -f2 | xargs -d '\n' du -sh 30G /home/brb/Rmirror 28G /home/brb/Rmirror/Bioc 2.7G /home/brb/Rmirror/CRAN
To create Bioconductor repository
Dry run
rsync -avn bioconductor.org::2.11 > biocdryrun.txt
Then creates directories before running rsync.
cd mkdir -p ~/Rmirror/Bioc wget -N http://www.bioconductor.org/biocLite.R -P ~/Rmirror/Bioc
where -N is to overwrite original file if the size or timestamp change and -P in wget means an output directory, not a file name.
Optionally, we can add the following in order to see the Bioconductor front page.
rsync -zrtlv --delete bioconductor.org::2.11/BiocViews.html ~/Rmirror/Bioc/packages/2.11/ rsync -zrtlv --delete bioconductor.org::2.11/index.html ~/Rmirror/Bioc/packages/2.11/
The software part (aka bioc directory) installation:
cd mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/bin/windows mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/src rsync -zrtlv --delete bioconductor.org::2.11/bioc/bin/windows/ ~/Rmirror/Bioc/packages/2.11/bioc/bin/windows # Either rsync whole src directory or just essential files # rsync -zrtlv --delete bioconductor.org::2.11/bioc/src/ ~/Rmirror/Bioc/packages/2.11/bioc/src rsync -zrtlv --delete bioconductor.org::2.11/bioc/src/contrib/PACKAGES ~/Rmirror/Bioc/packages/2.11/bioc/src/contrib/ rsync -zrtlv --delete bioconductor.org::2.11/bioc/src/contrib/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/bioc/src/contrib/ # Optionally the html part mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/html rsync -zrtlv --delete bioconductor.org::2.11/bioc/html/ ~/Rmirror/Bioc/packages/2.11/bioc/html mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/vignettes rsync -zrtlv --delete bioconductor.org::2.11/bioc/vignettes/ ~/Rmirror/Bioc/packages/2.11/bioc/vignettes mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/news rsync -zrtlv --delete bioconductor.org::2.11/bioc/news/ ~/Rmirror/Bioc/packages/2.11/bioc/news mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/licenses rsync -zrtlv --delete bioconductor.org::2.11/bioc/licenses/ ~/Rmirror/Bioc/packages/2.11/bioc/licenses mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/manuals rsync -zrtlv --delete bioconductor.org::2.11/bioc/manuals/ ~/Rmirror/Bioc/packages/2.11/bioc/manuals mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/readmes rsync -zrtlv --delete bioconductor.org::2.11/bioc/readmes/ ~/Rmirror/Bioc/packages/2.11/bioc/readmes
and annotation (aka data directory) part:
mkdir -p ~/Rmirror/Bioc/packages/2.11/data/annotation/bin/windows mkdir -p ~/Rmirror/Bioc/packages/2.11/data/annotation/src/contrib # one line for each of the following rsync -zrtlv --delete bioconductor.org::2.11/data/annotation/bin/windows/ ~/Rmirror/Bioc/packages/2.11/data/annotation/bin/windows rsync -zrtlv --delete bioconductor.org::2.11/data/annotation/src/contrib/PACKAGES ~/Rmirror/Bioc/packages/2.11/data/annotation/src/contrib/ rsync -zrtlv --delete bioconductor.org::2.11/data/annotation/src/contrib/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/data/annotation/src/contrib/
and experiment directory:
mkdir -p ~/Rmirror/Bioc/packages/2.11/data/experiment/bin/windows/contrib/2.15 mkdir -p ~/Rmirror/Bioc/packages/2.11/data/experiment/src/contrib # one line for each of the following # Note that we are cheating by only downloading PACKAGES and PACKAGES.gz files rsync -zrtlv --delete bioconductor.org::2.11/data/experiment/bin/windows/contrib/2.15/PACKAGES ~/Rmirror/Bioc/packages/2.11/data/experiment/bin/windows/contrib/2.15/ rsync -zrtlv --delete bioconductor.org::2.11/data/experiment/bin/windows/contrib/2.15/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/data/experiment/bin/windows/contrib/2.15/ rsync -zrtlv --delete bioconductor.org::2.11/data/experiment/src/contrib/PACKAGES ~/Rmirror/Bioc/packages/2.11/data/experiment/src/contrib/ rsync -zrtlv --delete bioconductor.org::2.11/data/experiment/src/contrib/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/data/experiment/src/contrib/
and extra directory:
mkdir -p ~/Rmirror/Bioc/packages/2.11/extra/bin/windows/contrib/2.15 mkdir -p ~/Rmirror/Bioc/packages/2.11/extra/src/contrib # one line for each of the following # Note that we are cheating by only downloading PACKAGES and PACKAGES.gz files rsync -zrtlv --delete bioconductor.org::2.11/extra/bin/windows/contrib/2.15/PACKAGES ~/Rmirror/Bioc/packages/2.11/extra/bin/windows/contrib/2.15/ rsync -zrtlv --delete bioconductor.org::2.11/extra/bin/windows/contrib/2.15/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/extra/bin/windows/contrib/2.15/ rsync -zrtlv --delete bioconductor.org::2.11/extra/src/contrib/PACKAGES ~/Rmirror/Bioc/packages/2.11/extra/src/contrib/ rsync -zrtlv --delete bioconductor.org::2.11/extra/src/contrib/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/extra/src/contrib/
To test local repository
Create soft links in Apache server
su ln -s /home/brb/Rmirror/CRAN /var/www/html/CRAN ln -s /home/brb/Rmirror/Bioc /var/www/html/Bioc ls -l /var/www/html
The soft link mode should be 777.
To test CRAN
Replace the host name arraytools.no-ip.org by IP address 10.133.2.111 if necessary.
r <- getOption("repos"); r["CRAN"] <- "http://arraytools.no-ip.org/CRAN" options(repos=r) install.packages("glmnet")
We can test if the backup server is working or not by installing a package which was removed from the CRAN. For example, 'ForImp' was removed from CRAN in 11/8/2012, but I still a local copy built on R 2.15.2 (run rsync on 11/6/2012).
r <- getOption("repos"); r["CRAN"] <- "http://cran.r-project.org" r <- c(r, BRB='http://arraytools.no-ip.org/CRAN') # CRAN CRANextra BRB # "http://cran.r-project.org" "http://www.stats.ox.ac.uk/pub/RWin" "http://arraytools.no-ip.org/CRAN" options(repos=r) install.packages('ForImp')
Note by default, CRAN mirror is selected interactively.
> getOption("repos") CRAN CRANextra "@CRAN@" "http://www.stats.ox.ac.uk/pub/RWin"
To test Bioconductor
# CRAN part: r <- getOption("repos"); r["CRAN"] <- "http://arraytools.no-ip.org/CRAN" options(repos=r) # Bioconductor part: options("BioC_mirror" = "http://arraytools.no-ip.org/Bioc") source("http://bioconductor.org/biocLite.R") # This source biocLite.R line can be placed either before or after the previous 2 lines biocLite("aCGH")
If there is a connection problem, check folder attributes.
chmod -R 755 ~/CRAN/bin
- Note that if a binary package was created for R 2.15.1, then it can be installed under R 2.15.1 but not R 2.15.2. The R console will show package xxx is not available (for R version 2.15.2).
- For binary installs, the function also checks for the availability of a source package on the same repository, and reports if the source package has a later version, or is available but no binary version is.
So for example, if the mirror does not have contents under src directory, we need to run the following line in order to successfully run install.packages() function.
options(install.packages.check.source = "no")
- If we only mirror the essential directories, we can run biocLite() successfully. However, the R console will give some warning
> biocLite("aCGH") BioC_mirror: http://arraytools.no-ip.org/Bioc Using Bioconductor version 2.11 (BiocInstaller 1.8.3), R version 2.15. Installing package(s) 'aCGH' Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/data/experiment/src/contrib Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/extra/src/contrib Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/data/experiment/bin/windows/contrib/2.15 Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/extra/bin/windows/contrib/2.15 trying URL 'http://arraytools.no-ip.org/Bioc/packages/2.11/bioc/bin/windows/contrib/2.15/aCGH_1.36.0.zip' Content type 'application/zip' length 2431158 bytes (2.3 Mb) opened URL downloaded 2.3 Mb package ‘aCGH’ successfully unpacked and MD5 sums checked The downloaded binary packages are in C:\Users\limingc\AppData\Local\Temp\Rtmp8IGGyG\downloaded_packages Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/data/experiment/bin/windows/contrib/2.15 Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/extra/bin/windows/contrib/2.15 > library()
CRAN repository directory structure
The information below is specific to R 2.15.2. There are linux and macosx subdirecotries whenever there are windows subdirectory.
bin/winows/contrib/2.15 src/contrib /contrib/2.15.2 /contrib/Archive web/checks /dcmeta /packages /views
A clickable map [1]
Bioconductor repository directory structure
The information below is specific to Bioc 2.11. There are linux and macosx subdirecotries whenever there are windows subdirectory.
bioc/bin/windows/contrib/2.15 /html /install /license /manuals /news /src /vignettes data/annotation/bin/windows/contrib/2.15 /html /licenses /manuals /src /vignettes /experiment/bin/windows/contrib/2.15 /html /manuals /src/contrib /vignettes extra/bin/windows/contrib /html /src /vignettes
List all R packages from CRAN/Bioconductor
Check my daily result based on R 2.15 and Bioc 2.11 in [2]
Parallel Computing
Example code for the book Parallel R by McCallum and Weston.
Windows Security Warning
It seems it is safe to choose 'Cancel' when Windows Firewall tried to block R program when we use makeCluster() to create a socket cluster.
library(parallel) cl <- makeCluster(2) clusterApply(cl, 1:2, get("+"), 3) stopCluster(cl)
If we like to see current firewall settings, just click Windows Start button, search 'Firewall' and choose 'Windows Firewall with Advanced Security'. In the 'Inbound Rules', we can see what programs (like, R for Windows GUI front-end, or Rserve) are among the rules. These rules are called 'private' in the 'Profile' column. Note that each of them may appear twice because one is 'TCP' protocol and the other one has a 'UDP' protocol.
parallel package
Parallel package was included in R 2.14.0. It is derived from the snow and multicore packages and provides many of the same functions as those packages.
The parallel package provides several *apply functions for R users to quickly modify their code using parallel computing.
- makeCluster(makePSOCKcluster, makeForkCluster), stopCluster. Other cluster types are passed to package snow.
- clusterCall, clusterEvalQ, clusterSplit
- clusterApply, clusterApplyLB
- clusterExport
- clusterMap
- parLapply, parSapply, parApply, parRapply, parCapply
- parLapplyLB, parSapplyLB (load balance version)
- clusterSetRNGStream, nextRNGStream, nextRNGSubStream
Examples (See ?clusterApply)
snow package
Supported cluster types are "SOCK", "PVM", "MPI", and "NWS".
multicore package
foreach package
This package depends on one of the following
- doParallel - Foreach parallel adaptor for the parallel package
- doSNOW - Foreach parallel adaptor for the snow package
- doMC - Foreach parallel adaptor for the multicore package
- doMPI - Foreach parallel adaptor for the Rmpi package
- doRedis - Foreach parallel adapter for the rredis package
as a backend.
snowfall package
Cloud Computing
Install R on Amazon EC2
http://randyzwitch.com/r-amazon-ec2/
Bioconductor on Amazon EC2
http://www.bioconductor.org/help/bioconductor-cloud-ami/
Big Data Analysis
http://blog.comsysto.com/2013/02/14/my-favorite-community-links/
Useful R packages
RInside
- http://dirk.eddelbuettel.com/code/rinside.html
- http://dirk.eddelbuettel.com/papers/rfinance2010_rcpp_rinside_tutorial_handout.pdf
See my demo on Youtube of RInside + Qt.
With RInside + web toolkit, we can also create a web application. To demonstrate the example in examples/wt directory, we can do
cd ~/R/x86_64-pc-linux-gnu-library/3.0/RInside/examples/wt make sudo ./wtdensity --docroot . --http-address localhost --http-port 8080
Then we can go to the browser's address bar and type http://localhost:8080 to see how it works (a screenshot is in here).
Ubuntu
Straightforward. If we want to run the example from examples/qt directory, we simply need to install Qt from apt-get. I have tested the qtdensity example successfully with Qt 4.
Windows 7
- Make sure R is installed under C:\ instead of C:\Program Files if we don't want to get an error like g++.exe: error: Files/R/R-3.0.1/library/RInside/include: No such file or directory.
- Install RTools
- Instal RInside package from source (the binary version will give an error )
- Create a DOS batch file containing necessary paths in PATH environment variable
@echo off set PATH=C:\Rtools\bin;c:\Rtools\gcc-4.6.3\bin;%PATH% set PATH=C:\R\R-3.0.1\bin\i386;%PATH% set PKG_LIBS=`Rscript -e "Rcpp:::LdFlags()"` set PKG_CPPFLAGS=`Rscript -e "Rcpp:::CxxFlags()"` set R_HOME=C:\R\R-3.0.1 echo Setting environment for using R cmd
- In the Windows command prompt
cd C:\R\R-3.0.1\library\RInside\examples\standard make -f Makefile.win
Now we can test by running any of executable files that make generates. For example, rinside_sample0.
rinside_sample0
As for the qdensity program, we need to make sure the same version of MinGW was used in building RInside/Rcpp and Qt. See some discussion in
- http://stackoverflow.com/questions/12280707/using-rinside-with-qt-in-windows
- http://www.mail-archive.com/[email protected]/msg04377.html
Hadoop (eg ~100 terabytes)
See also HighPerformanceComputing
- RHadoop
- Hive
- MapReduce. Introduction by Linux Journal.
- http://www.techspritz.com/category/tutorials/hadoopmapredcue/ Single node or multinode cluster setup using Ubuntu with VirtualBox (Excellent)
- Running Hadoop on Ubuntu Linux (Single-Node Cluster)
- Ubuntu 12.04 http://www.youtube.com/watch?v=WN2tJk_oL6E and instruction
- Linux Mint http://blog.hackedexistence.com/installing-hadoop-single-node-on-linux-mint
- http://www.r-bloggers.com/search/hadoop
XML
On Ubuntu, we need to install libxml2-dev before we can install XML package.
sudo apt-get update sudo apt-get install libxml2-dev
GenOrd: Generate ordinal and discrete variables with given correlation matrix and marginal distributions
rjson
http://heuristically.wordpress.com/2013/05/20/geolocate-ip-addresses-in-r/
RJSONIO
Plot IP on google map
- http://thebiobucket.blogspot.com/2011/12/some-fun-with-googlevis-plotting-blog.html#more (RCurl, RJONIO, plyr, googleVis)
- http://devblog.icans-gmbh.com/using-the-maxmind-geoip-api-with-r/ (RCurl, RJONIO, maps)
- http://cran.r-project.org/web/packages/geoPlot/index.html (geoPlot package (deprecated as 8/12/2013))
- http://archive09.linux.com/feature/135384 (Not R) ApacheMap
- http://batchgeo.com/features/geolocation-ip-lookup/ (Not R) (Enter a spreadsheet of adress, city, zip or a column of IPs and it will show the location on google map)
- http://code.google.com/p/apachegeomap/
The following example is modified from the first of above list.
temp = getURL("https://gist.github.com/arraytools/6743826/raw/23c8b0bc4b8f0d1bfe1c2fad985ca2e091aeb916/ip.txt", ssl.verifypeer = FALSE) ip <- read.table(textConnection(temp), as.is=TRUE) names(ip) <- "IP" nr = nrow(ip) Lon <- as.numeric(rep(NA, nr)) Lat <- Lon Coords <- data.frame(Lon, Lat) ip2coordinates <- function(ip) { api <- "http://freegeoip.net/json/" get.ips <- getURL(paste(api, URLencode(ip), sep="")) # result <- ldply(fromJSON(get.ips), data.frame) result <- data.frame(fromJSON(get.ips)) names(result)[1] <- "ip.address" return(result) } for (i in 1:nr){ cat(i, "\n") try( Coords[i, 1:2] <- ip2coordinates(ip$IP[i])[c("longitude", "latitude")] ) } # append to log-file: logfile <- data.frame(ip, Lat = Coords$Lat, Long = Coords$Lon, LatLong = paste(round(Coords$Lat, 1), round(Coords$Lon, 1), sep = ":")) log_gmap <- logfile[!is.na(logfile$Lat), ] require(RJSONIO) require(RCurl) require(googleVis) gmap <- gvisMap(log_gmap, "LatLong", options = list(showTip = TRUE, enableScrollWheel = TRUE, mapType = 'hybrid', useMapTypeControl = TRUE, width = 1024, height = 800)) plot(gmap)
Rcpp
Example 1. convolution
First, Rcpp package should be installed (I am working on Linux system). Next we try one example shipped in Rcpp package.
cd ~/R/x86_64-pc-linux-gnu-library/3.0/Rcpp/examples/ConvolveBenchmarks/ make R
Then type the following in an R session to see how it works. Note that we don't need to issue library(Rcpp) in R.
dyn.load("convolve3_cpp.so") x <- .Call("convolve3cpp", 1:3, 4:6) x # 4 13 28 27 18
If we have our own cpp file, we need to use the following way to create dynamic loaded library file. Note that the character (grave accent) ` is not (single quote)'. If you use ', then it won't work.
export PKG_CXXFLAGS=`Rscript -e "Rcpp:::CxxFlags()"` export PKG_LIBS=`Rscript -e "Rcpp:::LdFlags()"` R CMD SHILB xxxx.cpp
Example 2. Use together with inline package
Example 3. Calling an R function
caret
xlsx package
ggplot2
stringr and plyr
http://martinsbioblogg.wordpress.com/2013/03/24/using-r-reading-tables-that-need-a-little-cleaning/
A data.frame is pretty much a list of vectors, so we use plyr to apply over the list and stringr to search and replace in the vectors.
jpeg
If we want to create the image on this wiki left hand side panel, we can use jpeg package to read an existing plot and then edit and save it.
cairoDevice
For ubuntu OS, we need to install 2 libraries
sudo apt-get install libgtk2.0-dev libcairo2-dev
Different ways of using R
R and C/C++ communicate
Call R from C/C++
- Use eval() function. See R-Ext 8.1 and 8.2 and 5.11.
- http://stackoverflow.com/questions/2463437/r-from-c-simplest-possible-helloworld (obtained from searching R_tryEval on google)
- http://stackoverflow.com/questions/7457635/calling-r-function-from-c
Example: Create <embed.c> file
#include <Rembedded.h> #include <Rdefines.h> static void doSplinesExample(); int main(int argc, char *argv[]) { Rf_initEmbeddedR(argc, argv); doSplinesExample(); Rf_endEmbeddedR(0); return 0; } static void doSplinesExample() { SEXP e, result; int errorOccurred; // create and evaluate 'library(splines)' PROTECT(e = lang2(install("library"), mkString("splines"))); R_tryEval(e, R_GlobalEnv, &errorOccurred); if (errorOccurred) { // handle error } UNPROTECT(1); // 'options(FALSE)' ... PROTECT(e = lang2(install("options"), ScalarLogical(0))); // ... modified to 'options(example.ask=FALSE)' (this is obscure) SET_TAG(CDR(e), install("example.ask")); R_tryEval(e, R_GlobalEnv, NULL); UNPROTECT(1); // 'example("ns")' PROTECT(e = lang2(install("example"), mkString("ns"))); R_tryEval(e, R_GlobalEnv, &errorOccurred); UNPROTECT(1); }
Then build the executable. Note that I don't need to create R_HOME variable.
cd tar xzvf cd R-3.0.1 ./configure --enable-R-shlib make cd tests/Embedding make ~/R-3.0.1/bin/R CMD ./Rtest nano embed.c # Using a single line will give an error and cannot not show the real problem. # ../../bin/R CMD gcc -I../../include -L../../lib -lR embed.c # A better way is to run compile and link separately gcc -I../../include -c embed.c gcc -o embed embed.o -L../../lib -lR -lRblas ../../bin/R CMD ./embed
Question: Create a data frame in C? Answer: Use data.frame() via an eval() call from C. Or see the code is stats/src/model.c, as part of model.frame.default. Or using Rcpp as here.
R calls C/C++
- http://faculty.washington.edu/kenrice/sisg-adv/sisg-07.pdf
- http://www.stat.berkeley.edu/scf/paciorek-cppWorkshop.pdf
- http://www.stat.harvard.edu/ccr2005/
- http://www.sfu.ca/~sblay/R-C-interface.txt
SEXP
Embedding R
- See Writing for R Extensions Manual Chapter 8.
- Talk by Simon Urbanek in UseR 2004.
- Technical report by Friedrich Leisch in 2007.
- https://stat.ethz.ch/pipermail/r-help/attachments/20110729/b7d86ed7/attachment.pl
An Example from Bioconductor Workshop
This example is the same as Call R from C/C++ in above.
First make sure before 'make' R, R is configured with
./configure --enable-R-shlib
Reference http://bioconductor.org/help/course-materials/2012/Seattle-Oct-2012/AdvancedR.pdf
mli@PhenomIIx6:~/Downloads/R-2.15.2/library/AdvancedR/embedding$ export R_HOME=/home/mli/Downloads/R-2.15.2 mli@PhenomIIx6:~/Downloads/R-2.15.2/library/AdvancedR/embedding$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/mli/Downloads/R-2.15.2/lib mli@PhenomIIx6:~/Downloads/R-2.15.2/library/AdvancedR/embedding$ g++ embed.c -I/home/mli/Downloads/R-2.15.2/include -L/home/mli/Downloads/R-2.15.2/lib -lR mli@PhenomIIx6:~/Downloads/R-2.15.2/library/AdvancedR/embedding$ R CMD ./a.out WARNING: ignoring environment value of R_HOME R version 2.15.2 (2012-10-26) -- "Trick or Treat" Copyright (C) 2012 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. ns> require(stats); require(graphics) ns> ns(women$height, df = 5) 1 2 3 4 5 [1,] 0.000000e+00 0.000000e+00 0.00000000 0.00000000 0.0000000000 [2,] 7.592323e-03 0.000000e+00 -0.08670223 0.26010669 -0.1734044626 [3,] 6.073858e-02 0.000000e+00 -0.15030440 0.45091320 -0.3006088020 [4,] 2.047498e-01 6.073858e-05 -0.16778345 0.50335034 -0.3355668952 [5,] 4.334305e-01 1.311953e-02 -0.13244035 0.39732106 -0.2648807067 [6,] 6.256681e-01 8.084305e-02 -0.07399720 0.22199159 -0.1479943948 [7,] 6.477162e-01 2.468416e-01 -0.02616007 0.07993794 -0.0532919575 [8,] 4.791667e-01 4.791667e-01 0.01406302 0.02031093 -0.0135406187 [9,] 2.468416e-01 6.477162e-01 0.09733619 0.02286023 -0.0152401533 [10,] 8.084305e-02 6.256681e-01 0.27076826 0.06324188 -0.0405213106 [11,] 1.311953e-02 4.334305e-01 0.48059836 0.12526031 -0.0524087186 [12,] 6.073858e-05 2.047498e-01 0.59541597 0.19899261 0.0007809246 [13,] 0.000000e+00 6.073858e-02 0.50097182 0.27551020 0.1627793975 [14,] 0.000000e+00 7.592323e-03 0.22461127 0.35204082 0.4157555879 [15,] 0.000000e+00 0.000000e+00 -0.14285714 0.42857143 0.7142857143 attr(,"degree") [1] 3 attr(,"knots") 20% 40% 60% 80% 60.8 63.6 66.4 69.2 attr(,"Boundary.knots") [1] 58 72 attr(,"intercept") [1] FALSE attr(,"class") [1] "ns" "basis" "matrix" ns> summary(fm1 <- lm(weight ~ ns(height, df = 5), data = women)) Call: lm(formula = weight ~ ns(height, df = 5), data = women) Residuals: Min 1Q Median 3Q Max -0.38333 -0.12585 0.07083 0.15401 0.30426 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 114.7447 0.2338 490.88 < 2e-16 *** ns(height, df = 5)1 15.9474 0.3699 43.12 9.69e-12 *** ns(height, df = 5)2 25.1695 0.4323 58.23 6.55e-13 *** ns(height, df = 5)3 33.2582 0.3541 93.93 8.91e-15 *** ns(height, df = 5)4 50.7894 0.6062 83.78 2.49e-14 *** ns(height, df = 5)5 45.0363 0.2784 161.75 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.2645 on 9 degrees of freedom Multiple R-squared: 0.9998, Adjusted R-squared: 0.9997 F-statistic: 9609 on 5 and 9 DF, p-value: < 2.2e-16 ns> ## example of safe prediction ns> plot(women, xlab = "Height (in)", ylab = "Weight (lb)") ns> ht <- seq(57, 73, length.out = 200) ns> lines(ht, predict(fm1, data.frame(height=ht))) ns> ## Don't show: ns> ## Consistency: ns> x <- c(1:3,5:6) ns> stopifnot(identical(ns(x), ns(x, df = 1)), ns+ identical(ns(x, df=2), ns(x, df=2, knots=NULL)),# not true till 2.15.2 ns+ !is.null(kk <- attr(ns(x), "knots")),# not true till 1.5.1 ns+ length(kk) == 0) ns> ## End Don't show ns> ns> ns>
The above result can be compared with running
mli@PhenomIIx6:~/Downloads/R-2.15.2/library/AdvancedR/embedding$ R WARNING: ignoring environment value of R_HOME
R version 2.15.2 (2012-10-26) -- "Trick or Treat" Copyright (C) 2012 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R.
> library(splines) > example("ns")
ns> require(stats); require(graphics)
ns> ns(women$height, df = 5)
1 2 3 4 5 [1,] 0.000000e+00 0.000000e+00 0.00000000 0.00000000 0.0000000000 [2,] 7.592323e-03 0.000000e+00 -0.08670223 0.26010669 -0.1734044626 [3,] 6.073858e-02 0.000000e+00 -0.15030440 0.45091320 -0.3006088020 [4,] 2.047498e-01 6.073858e-05 -0.16778345 0.50335034 -0.3355668952 [5,] 4.334305e-01 1.311953e-02 -0.13244035 0.39732106 -0.2648807067 [6,] 6.256681e-01 8.084305e-02 -0.07399720 0.22199159 -0.1479943948 [7,] 6.477162e-01 2.468416e-01 -0.02616007 0.07993794 -0.0532919575 [8,] 4.791667e-01 4.791667e-01 0.01406302 0.02031093 -0.0135406187 [9,] 2.468416e-01 6.477162e-01 0.09733619 0.02286023 -0.0152401533
[10,] 8.084305e-02 6.256681e-01 0.27076826 0.06324188 -0.0405213106 [11,] 1.311953e-02 4.334305e-01 0.48059836 0.12526031 -0.0524087186 [12,] 6.073858e-05 2.047498e-01 0.59541597 0.19899261 0.0007809246 [13,] 0.000000e+00 6.073858e-02 0.50097182 0.27551020 0.1627793975 [14,] 0.000000e+00 7.592323e-03 0.22461127 0.35204082 0.4157555879 [15,] 0.000000e+00 0.000000e+00 -0.14285714 0.42857143 0.7142857143 attr(,"degree") [1] 3 attr(,"knots")
20% 40% 60% 80%
60.8 63.6 66.4 69.2 attr(,"Boundary.knots") [1] 58 72 attr(,"intercept") [1] FALSE attr(,"class") [1] "ns" "basis" "matrix"
ns> summary(fm1 <- lm(weight ~ ns(height, df = 5), data = women))
Call: lm(formula = weight ~ ns(height, df = 5), data = women)
Residuals:
Min 1Q Median 3Q Max
-0.38333 -0.12585 0.07083 0.15401 0.30426
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 114.7447 0.2338 490.88 < 2e-16 *** ns(height, df = 5)1 15.9474 0.3699 43.12 9.69e-12 *** ns(height, df = 5)2 25.1695 0.4323 58.23 6.55e-13 *** ns(height, df = 5)3 33.2582 0.3541 93.93 8.91e-15 *** ns(height, df = 5)4 50.7894 0.6062 83.78 2.49e-14 *** ns(height, df = 5)5 45.0363 0.2784 161.75 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2645 on 9 degrees of freedom Multiple R-squared: 0.9998, Adjusted R-squared: 0.9997 F-statistic: 9609 on 5 and 9 DF, p-value: < 2.2e-16
ns> ## example of safe prediction
ns> plot(women, xlab = "Height (in)", ylab = "Weight (lb)")
ns> ht <- seq(57, 73, length.out = 200)
ns> lines(ht, predict(fm1, data.frame(height=ht)))
ns> ## Don't show: ns> ## Consistency: ns> x <- c(1:3,5:6)
ns> stopifnot(identical(ns(x), ns(x, df = 1)), ns+ identical(ns(x, df=2), ns(x, df=2, knots=NULL)),# not true till 2.15.2 ns+ !is.null(kk <- attr(ns(x), "knots")),# not true till 1.5.1 ns+ length(kk) == 0)
ns> ## End Don't show ns> ns> ns>
Note that if I follow the instruction to put embed.c at the end of g++ command, I will get an error.
mli@PhenomIIx6:~/Downloads/R-2.15.2/library/AdvancedR/embedding$ g++ -I/home/mli/Downloads/R-2.15.2/include -L/home/mli/Downloads/R-2.15.2/lib -lR embed.c /tmp/cc7Vum5j.o: In function `main': embed.c:(.text+0x1c): undefined reference to `Rf_initEmbeddedR' embed.c:(.text+0x2b): undefined reference to `Rf_endEmbeddedR' /tmp/cc7Vum5j.o: In function `doSplinesExample()': embed.c:(.text+0x45): undefined reference to `Rf_mkString' embed.c:(.text+0x52): undefined reference to `Rf_install' embed.c:(.text+0x5d): undefined reference to `Rf_lang2' embed.c:(.text+0x6d): undefined reference to `Rf_protect' embed.c:(.text+0x74): undefined reference to `R_GlobalEnv' embed.c:(.text+0x87): undefined reference to `R_tryEval' embed.c:(.text+0x91): undefined reference to `Rf_unprotect' embed.c:(.text+0x9b): undefined reference to `Rf_ScalarLogical' embed.c:(.text+0xa8): undefined reference to `Rf_install' embed.c:(.text+0xb3): undefined reference to `Rf_lang2' embed.c:(.text+0xc3): undefined reference to `Rf_protect' embed.c:(.text+0xcd): undefined reference to `Rf_install' embed.c:(.text+0xdc): undefined reference to `CDR' embed.c:(.text+0xe7): undefined reference to `SET_TAG' embed.c:(.text+0xee): undefined reference to `R_GlobalEnv' embed.c:(.text+0x102): undefined reference to `R_tryEval' embed.c:(.text+0x10c): undefined reference to `Rf_unprotect' embed.c:(.text+0x116): undefined reference to `Rf_mkString' embed.c:(.text+0x123): undefined reference to `Rf_install' embed.c:(.text+0x12e): undefined reference to `Rf_lang2' embed.c:(.text+0x13e): undefined reference to `Rf_protect' embed.c:(.text+0x145): undefined reference to `R_GlobalEnv' embed.c:(.text+0x158): undefined reference to `R_tryEval' embed.c:(.text+0x162): undefined reference to `Rf_unprotect' collect2: ld returned 1 exit status
Graphical Example
See RInside
Create a Simple Socket Server in R
This example is coming from this paper.
Create an R function
simpleServer <- function(port=6543) { sock <- socketConnection ( port=port , server=TRUE) on.exit(close( sock )) cat("\nWelcome to R!\nR>" ,file=sock ) while(( line <- readLines ( sock , n=1)) != "quit") { cat(paste("socket >" , line , "\n")) out<- capture.output (try(eval(parse(text=line )))) writeLines ( out , con=sock ) cat("\nR> " ,file =sock ) } }
Then run simpleServer(). Open another terminal and try to communicate with the server
$ telnet localhost 6543 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. Welcome to R! R> summary(iris[, 3:5]) Petal.Length Petal.Width Species Min. :1.000 Min. :0.100 setosa :50 1st Qu.:1.600 1st Qu.:0.300 versicolor:50 Median :4.350 Median :1.300 virginica :50 Mean :3.758 Mean :1.199 3rd Qu.:5.100 3rd Qu.:1.800 Max. :6.900 Max. :2.500 R> quit Connection closed by foreign host.
Rserve
Note the way of launching Rserve is like the way we launch C program when C needs to call R. See Call R from C/C++ or Example from Bioconductor workshop.
See Rserve page.
(Commercial) StatconnDcom
R.NET
RJava
RCaller
RApache
Create HTML 5 web and slides
See here
Create academic report
reports package and github repository. The youtube video gives an overview of the package.
Create Word report
knitr + pandoc
- http://www.r-statistics.com/2013/03/write-ms-word-document-using-r-with-as-little-overhead-as-possible/
- http://www.carlboettiger.info/2012/04/07/writing-reproducibly-in-the-open-with-knitr.html
It is better to create rmd file in RStudio. Rstudio provides a template for rmd file and it also provides a quick reference to R markdown language.
# Idea: # knitr pandoc # rmd -------> md --------> docx library(knitr) knit2html("example.rmd") #Create md and html files
and then
FILE <- "example" system(paste0("pandoc -o ", FILE, ".docx ", FILE, ".md"))
Note. For example reason, if I play around the above 2 commands for several times, the knit2html() does not work well. However, if I click 'Knit HTML' button on the RStudio, it then works again.
Another way is
library(pander) name = "demo" knit(paste0(name, ".Rmd"), encoding = "utf-8") Pandoc.brew(file = paste0(name, ".md"), output = paste0(-name, "docx"), convert = "docx")
Note that once we have used knitr command to create a md file, we can use pandoc shell command to convert it to different formats:
- A pdf file: pandoc -s report.md -t latex -o report.pdf
- A html file: pandoc -s report.md -o report.html (with the -c flag html files can be added easily)
- Openoffice: pandoc report.md -o report.odt
- Word docx: pandoc report.md -o report.docx
pander
Try pandoc[1] with a minimal reproducible example, you might give a try to my "pander" package [2] too:
library(pander) Pandoc.brew(system.file('examples/minimal.brew', package='pander'), output = tempfile(), convert = 'docx')
Where the content of the "minimal.brew" file is something you might have got used to with Sweave - although it's using "brew" syntax instead. See the examples of pander [3] for more details. Please note that pandoc should be installed first, which is pretty easy on Windows.
- http://johnmacfarlane.net/pandoc/
- http://rapporter.github.com/pander/
- http://rapporter.github.com/pander/#examples
R2wd
Use R2wd package. However, only 32-bit R is allowed and sometimes it can not produce all 'table's.
> library(R2wd) > wdGet() Loading required package: rcom Loading required package: rscproxy rcom requires a current version of statconnDCOM installed. To install statconnDCOM type installstatconnDCOM() This will download and install the current version of statconnDCOM You will need a working Internet connection because installation needs to download a file. Error in if (wdapp[["Documents"]][["Count"]] == 0) wdapp[["Documents"]]$Add() : argument is of length zero
The solution is to launch 32-bit R instead of 64-bit R since statconnDCOM does not support 64-bit R.
Convert from pdf to word
The best rendering of advanced tables is done by converting from pdf to Word. See http://biostat.mc.vanderbilt.edu/wiki/Main/SweaveConvert
rtf
Use rtf package for Rich Text Format (RTF) Output.
xtable
Package xtable will produce html output. If you save the file and then open it with Word, you will get serviceable results. I've had better luck copying the output from xtable and pasting it into Excel.
COM client or server
Client
RDCOMClient where excel.link depends on it.
Server
Use R under proxy
http://support.rstudio.org/help/kb/faq/configuring-r-to-use-an-http-proxy
What is the best place to save Rconsole on Windows platform
Put it in C:/Users/USERNAME/Documents folder so no matter how R was upgraded/downgraded, it always find my preference.
Web scraping
http://www.slideshare.net/schamber/web-data-from-r#btnNext
Launch Rstudio
If multiple versions of R was detected, Rstudio can not be launched successfully. A java-like clock will be spinning without a stop. The trick is to click Ctrl key and click the Rstudio at the same time. After done that, it will show up a selection of R to choose from.
List files using regular expression
- Extension
list.files(pattern = "\\.txt$")
- Start with
list.files(pattern = "^Something")
Hidden tool: rsync in Rtools
c:\Rtools\bin>rsync -avz "/cygdrive/c/users/limingc/Downloads/a.exe" "/cygdrive/c/users/limingc/Documents/" sending incremental file list a.exe sent 323142 bytes received 31 bytes 646346.00 bytes/sec total size is 1198416 speedup is 3.71 c:\Rtools\bin>
And rsync works best when we need to sync folder.
c:\Rtools\bin>rsync -avz "/cygdrive/c/users/limingc/Downloads/binary" "/cygdrive/c/users/limingc/Documents/" sending incremental file list binary/ binary/Eula.txt binary/cherrytree.lnk binary/depends64.chm binary/depends64.dll binary/depends64.exe binary/mtputty.exe binary/procexp.chm binary/procexp.exe binary/pscp.exe binary/putty.exe binary/sqlite3.exe binary/wget.exe sent 4115294 bytes received 244 bytes 1175868.00 bytes/sec total size is 8036311 speedup is 1.95 c:\Rtools\bin>rm c:\users\limingc\Documents\binary\procexp.exe cygwin warning: MS-DOS style path detected: c:\users\limingc\Documents\binary\procexp.exe Preferred POSIX equivalent is: /cygdrive/c/users/limingc/Documents/binary/procexp.exe CYGWIN environment variable option "nodosfilewarning" turns off this warning. Consult the user's guide for more details about POSIX paths: http://cygwin.com/cygwin-ug-net/using.html#using-pathnames c:\Rtools\bin>rsync -avz "/cygdrive/c/users/limingc/Downloads/binary" "/cygdrive/c/users/limingc/Documents/" sending incremental file list binary/ binary/procexp.exe sent 1767277 bytes received 35 bytes 3534624.00 bytes/sec total size is 8036311 speedup is 4.55 c:\Rtools\bin>
Unforunately, if the destination is a network drive, I could get a permission denied (13) error. See also http://superuser.com/questions/69620/rsync-file-permissions-on-windows
Install rgdal package on ubuntu
sudo apt-get install libgdal1-dev libproj-dev R > install.packages("rgdal")
Set up Emacs on Windows
Edit the file C:\Program Files\GNU Emacs 23.2\site-lisp\site-start.el with something like
(setq-default inferior-R-program-name "c:/program files/r/r-2.15.2/bin/i386/rterm.exe")
Database
RMySQL
RSQLite
Not suitable for client/server architecture. The limit is quite large; see here.
For BRB-ArrayTools, in the MatchAffyAnnotation() function for GO match, it breaks genes into 10k as a block. Otherwise it will get an error like here.
Github
R source
- https://github.com/wch/r-source/ Daily update, interesting, should be visited every day. Clicking 1000+ commits to look at daily changes.
- https://github.com/SurajGupta/r-source (update for each R release)
github
https://github.com/languages/R
My collection
- https://github.com/arraytools
- https://gist.github.com/4383351 heatmap using leukemia data
- https://gist.github.com/4382774 heatmap using sequential data
- https://gist.github.com/4484270 biocLite
How to download
Clone ~ Download.
- Command line
git clone https://gist.github.com/4484270.git
This will create a subdirectory called '4484270' with all cloned files there.
- Within R
library(devtools) source_gist("4484270")
or First download the json file from
https://api.github.com/users/MYUSERLOGIN/gists
and then
library(RJSONIO) x <- fromJSON("~/Downloads/gists.json") setwd("~/Downloads/") gist.id <- lapply(x, "[[", "id") lapply(gist.id, function(x){ cmd <- paste0("git clone https://gist.github.com/", x, ".git") system(cmd) })
Tricks
Query about an R package
packageDescription("MASS") packageVersion("MASS") packageStatus() # Summarize information about installed packages installed.packages() available.packages()
The 'available.packages()' command is useful for understanding package dependency. Use setRepositories() to select repositories and options()$repos to check or change the repositories.
> packageStatus() Number of installed packages: ok upgrade unavailable C:/Program Files/R/R-3.0.1/library 110 0 1 Number of available packages (each package counted only once): installed not installed http://watson.nci.nih.gov/cran_mirror/bin/windows/contrib/3.0 76 4563 http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/3.0 0 5 http://www.bioconductor.org/packages/2.12/bioc/bin/windows/contrib/3.0 16 625 http://www.bioconductor.org/packages/2.12/data/annotation/bin/windows/contrib/3.0 4 686 > tmp <- available.packages() > str(tmp) chr [1:5975, 1:17] "A3" "ABCExtremes" "ABCp2" "ACCLMA" "ACD" "ACNE" "ADGofTest" "ADM3" "AER" ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:5975] "A3" "ABCExtremes" "ABCp2" "ACCLMA" ... ..$ : chr [1:17] "Package" "Version" "Priority" "Depends" ... > tmp[1:3,] Package Version Priority Depends Imports LinkingTo Suggests A3 "A3" "0.9.2" NA "xtable, pbapply" NA NA "randomForest, e1071" ABCExtremes "ABCExtremes" "1.0" NA "SpatialExtremes, combinat" NA NA NA ABCp2 "ABCp2" "1.1" NA "MASS" NA NA NA Enhances License License_is_FOSS License_restricts_use OS_type Archs MD5sum NeedsCompilation File A3 NA "GPL (>= 2)" NA NA NA NA NA NA NA ABCExtremes NA "GPL-2" NA NA NA NA NA NA NA ABCp2 NA "GPL-2" NA NA NA NA NA NA NA Repository A3 "http://watson.nci.nih.gov/cran_mirror/bin/windows/contrib/3.0" ABCExtremes "http://watson.nci.nih.gov/cran_mirror/bin/windows/contrib/3.0" ABCp2 "http://watson.nci.nih.gov/cran_mirror/bin/windows/contrib/3.0"
And the following commands find which package depends on Rcpp and also which are from bioconductor repository.
> pkgName <- "Rcpp" > rownames(tmp)[grep(pkgName, tmp[,"Depends"])] > tmp[grep("Rcpp", tmp[,"Depends"]), "Depends"] > ind <- intersect(grep(pkgName, tmp[,"Depends"]), grep("bioconductor", tmp[, "Repository"])) > rownames(grep)[ind] NULL > rownames(tmp)[ind] [1] "ddgraph" "DESeq2" "GeneNetworkBuilder" "GOSemSim" "GRENITS" [6] "mosaics" "mzR" "pcaMethods" "Rdisop" "Risa" [11] "rTANDEM"
Editor
http://en.wikipedia.org/wiki/R_(programming_language)#Editors_and_IDEs
- Rstudio - editor/R terminal/R graphics/file browser/package manager.
- geany - I like the feature that it shows defined functions on the side panel even for R code.
- Rgedit which includes a feature of splitting screen into two panes and run R in the bottom panel. See here.
- Komodo IDE with browser preview http://www.youtube.com/watch?v=wv89OOw9roI at 4:06 and http://docs.activestate.com/komodo/4.4/editor.html
Create a new R package, namespace, documentation
Create R package from R code with roxyPackage
http://lamages.blogspot.com/2013/03/create-r-package-from-single-r-file.html
Minimal R package for submission
https://stat.ethz.ch/pipermail/r-devel/2013-August/067257.html and CRAN Repository Policy.
llply() from plyr package
llply is equivalent to lapply except that it will preserve labels and can display a progress bar. This is handy if we want to do a crazy thing.
LLID2GOIDs <- lapply(rLLID, function(x) get("org.Hs.egGO")[[x]])
where rLLID is a list of entrez ID. For example,
get("org.Hs.egGO")[["6772"]]
returns a list of 49 GOs.
mclapply() from paralle package is a mult-core version of lapply()
Note that Windows OS can not take advantage of it.
Another choice for Windows OS is to use parLapply() function in parallel package.
ncores <- as.integer( Sys.getenv('NUMBER_OF_PROCESSORS') ) cl <- makeCluster(getOption("cl.cores", ncores)) LLID2GOIDs2 <- parLapply(cl, rLLID, function(x) { library(org.Hs.eg.db); get("org.Hs.egGO")[[x]]} ) stopCluster(cl)
It does work. Cut the computing time from 100 sec to 29 sec on 4 cores.
regular expression
- ?regexpr in R
- http://biostat.mc.vanderbilt.edu/wiki/pub/Main/SvetlanaEdenRFiles/regExprTalk.pdf
- http://www.johndcook.com/r_language_regex.html
- http://en.wikibooks.org/wiki/R_Programming/Text_Processing#Regular_Expressions
- http://www.endmemo.com/program/R/grep.php
- http://ucfagls.wordpress.com/2012/08/15/processing-sample-labels-using-regular-expressions-in-r/
- http://www.dummies.com/how-to/content/how-to-use-regular-expressions-in-r.html
- http://www.r-bloggers.com/example-8-27-using-regular-expressions-to-read-data-with-variable-number-of-words-in-a-field/
- http://www.r-bloggers.com/using-regular-expressions-in-r-case-study-in-cleaning-a-bibtex-database/
- http://cbio.ensmp.fr/~thocking/papers/2011-08-16-directlabels-and-regular-expressions-for-useR-2011/2011-useR-named-capture-regexp.pdf
- http://stackoverflow.com/questions/5214677/r-find-the-last-dot-in-a-string
- http://stackoverflow.com/questions/10294284/remove-all-special-characters-from-a-string-in-r
Not specific to R
- http://www.autohotkey.com/docs/misc/RegEx-QuickRef.htm
- http://opencompany.org/download/regex-cheatsheet.pdf
Example
- grep("\\.zip$", pkgs) or grep("\\.tar.gz$", pkgs)
Clipboard
source("clipboard") read.table("clipboard")
read/manipulate binary data
- x <- readBin(fn, raw(), file.info(fn)$size)
- rawToChar(x[1:16])
- See Biostrings C API
read/download/source a file from internet
Simple text file http
retail <- read.csv("http://robjhyndman.com/data/ausretail.csv",header=FALSE)
Zip file
con = gzcon(url('http://www.systematicportfolio.com/sit.gz', 'rb')) source(con) close(con)
Google drive file based on https using RCurl package
require(RCurl) myCsv <- getURL("https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AkuuKBh0jM2TdGppUFFxcEdoUklCQlJhM2kweGpoUUE&single=true&gid=0&output=csv") read.csv(textConnection(myCsv))
Github files https using RCurl package
- http://support.rstudio.org/help/kb/faq/configuring-r-to-use-an-http-proxy
- http://tonybreyal.wordpress.com/2011/11/24/source_https-sourcing-an-r-script-from-github/
x = getURL("https://gist.github.com/arraytools/6671098/raw/c4cb0ca6fe78054da8dbe253a05f7046270d5693/GeneIDs.txt", ssl.verifypeer = FALSE) read.table(text=x)
Create publication tables using tables package
See p13 for example in http://www.ianwatson.com.au/stata/tabout_tutorial.pdf
R's tables packages is the best solution. For example,
> library(tables) > tabular( (Species + 1) ~ (n=1) + Format(digits=2)* + (Sepal.Length + Sepal.Width)*(mean + sd), data=iris ) Sepal.Length Sepal.Width Species n mean sd mean sd setosa 50 5.01 0.35 3.43 0.38 versicolor 50 5.94 0.52 2.77 0.31 virginica 50 6.59 0.64 2.97 0.32 All 150 5.84 0.83 3.06 0.44 > str(iris) 'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
and
# This example shows some of the less common options > Sex <- factor(sample(c("Male", "Female"), 100, rep=TRUE)) > Status <- factor(sample(c("low", "medium", "high"), 100, rep=TRUE)) > z <- rnorm(100)+5 > fmt <- function(x) { s <- format(x, digits=2) even <- ((1:length(s)) %% 2) == 0 s[even] <- sprintf("(%s)", s[even]) s } > tabular( Justify(c)*Heading()*z*Sex*Heading(Statistic)*Format(fmt())*(mean+sd) ~ Status ) Status Sex Statistic high low medium Female mean 4.88 4.96 5.17 sd (1.20) (0.82) (1.35) Male mean 4.45 4.31 5.05 sd (1.01) (0.93) (0.75)
See also a collection of R packages related to reproducible research in http://cran.r-project.org/web/views/ReproducibleResearch.html
Create flat tables in R console using ftable()
> ftable(Titanic, row.vars = 1:3) Survived No Yes Class Sex Age 1st Male Child 0 5 Adult 118 57 Female Child 0 1 Adult 4 140 2nd Male Child 0 11 Adult 154 14 Female Child 0 13 Adult 13 80 3rd Male Child 35 13 Adult 387 75 Female Child 17 14 Adult 89 76 Crew Male Child 0 0 Adult 670 192 Female Child 0 0 Adult 3 20 > ftable(Titanic, row.vars = 1:2, col.vars = "Survived") Survived No Yes Class Sex 1st Male 118 62 Female 4 141 2nd Male 154 25 Female 13 93 3rd Male 422 88 Female 106 90 Crew Male 670 192 Female 3 20 > ftable(Titanic, row.vars = 2:1, col.vars = "Survived") Survived No Yes Sex Class Male 1st 118 62 2nd 154 25 3rd 422 88 Crew 670 192 Female 1st 4 141 2nd 13 93 3rd 106 90 Crew 3 20 > str(Titanic) table [1:4, 1:2, 1:2, 1:2] 0 0 35 0 0 0 17 0 118 154 ... - attr(*, "dimnames")=List of 4 ..$ Class : chr [1:4] "1st" "2nd" "3rd" "Crew" ..$ Sex : chr [1:2] "Male" "Female" ..$ Age : chr [1:2] "Child" "Adult" ..$ Survived: chr [1:2] "No" "Yes" > x <- ftable(mtcars[c("cyl", "vs", "am", "gear")]) > x gear 3 4 5 cyl vs am 4 0 0 0 0 0 1 0 0 1 1 0 1 2 0 1 0 6 1 6 0 0 0 0 0 1 0 2 1 1 0 2 2 0 1 0 0 0 8 0 0 12 0 0 1 0 0 2 1 0 0 0 0 1 0 0 0 > ftable(x, row.vars = c(2, 4)) cyl 4 6 8 am 0 1 0 1 0 1 vs gear 0 3 0 0 0 0 12 0 4 0 0 0 2 0 0 5 0 1 0 1 0 2 1 3 1 0 2 0 0 0 4 2 6 2 0 0 0 5 0 1 0 0 0 0 > > ## Start with expressions, use table()'s "dnn" to change labels > ftable(mtcars$cyl, mtcars$vs, mtcars$am, mtcars$gear, row.vars = c(2, 4), dnn = c("Cylinders", "V/S", "Transmission", "Gears")) Cylinders 4 6 8 Transmission 0 1 0 1 0 1 V/S Gears 0 3 0 0 0 0 12 0 4 0 0 0 2 0 0 5 0 1 0 1 0 2 1 3 1 0 2 0 0 0 4 2 6 2 0 0 0 5 0 1 0 0 0 0
tracemem, data type, copy
How to avoid copying a long vector
Handling length 2^31 and more in R 3.0.0
From R News for 3.0.0 release:
There is a subtle change in behaviour for numeric index values 2^31 and larger. These never used to be legitimate and so were treated as NA, sometimes with a warning. They are now legal for long vectors so there is no longer a warning, and x[2^31] <- y will now extend the vector on a 64-bit platform and give an error on a 32-bit one.
In R 2.15.2, if I try to assign a vector of length 2^31, I will get an error
> x <- seq(1, 2^31) Error in from:to : result would be too long a vector
However, for R 3.0.0 (tested on my 64-bit Ubuntu with 16GB RAM. The R was compiled by myself):
> system.time(x <- seq(1,2^31)) user system elapsed 8.604 11.060 120.815 > length(x) [1] 2147483648 > length(x)/2^20 [1] 2048 > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 183823 9.9 407500 21.8 350000 18.7 Vcells 2147764406 16386.2 2368247221 18068.3 2148247383 16389.9 >
Note:
- 2^31 length is about 2 Giga length. It takes about 16 GB (2^31*8/2^20 MB) memory.
- On Windows, it is almost impossible to work with 2^31 length of data if the memory is less than 16 GB because virtual disk on Windows does not work well. For example, when I tested on my 12 GB Windows 7, the whole Windows system freezes for several minutes before I force to power off the machine.
- My slide in http://goo.gl/g7sGX shows the screenshots of running the above command on my Ubuntu and RHEL machines. As you can see the linux is pretty good at handling large (> system RAM) data. That said, as long as your linux system is 64-bit, you can possibly work on large data without too much pain.
- For large dataset, it makes sense to use database or specially crafted packages like bigmemory or ff.
NA in index
- Question: what is seq(1, 3)[c(1, 2, NA)]?
Answer: It will reserve the element with NA in indexing and return the value NA for it.
- Question: What is TRUE & NA?
Answer: NA
- Question: What is FALSE & NA?
Answer: FALSE
- Question: c("A", "B", NA) != "" ?
Answer: TRUE TRUE NA
- Question: which(c("A", "B", NA) != "") ?
Answer: 1 2
- Question: c(1, 2, NA) != "" & !is.na(c(1, 2, NA)) ?
Answer: TRUE TRUE FALSE
- Question: c("A", "B", NA) != "" & !is.na(c("A", "B", NA)) ?
Answer: TRUE TRUE FALSE
Conclusion: In order to exclude empty or NA for numerical or character data type, we can use which() or a convenience function keep.complete(x) <- function(x) x != "" & !is.na(x). This will guarantee return logical values and not contain NAs.
Don't just use x != "" OR !is.na(x).
Creating publication quality graphs in R
with() and within() functions
within() is similar to with() except it is used to create new columns and merge them with the original data sets. See youtube video.
closePr <- with(mariokart, totalPr - shipPr) head(closePr, 20) mk <- within(mariokart, { closePr <- totalPr - shipPr }) head(mk) # new column closePr mk <- mariokart aggregate(. ~ wheels + cond, mk, mean) # create mean according to each level of (wheels, cond) aggregate(totalPr ~ wheels + cond, mk, mean) tapply(mk$totalPr, mk[, c("wheels", "cond")], mean)
Draw Color Palette
Read excel files
My experience is to save the Excel file as csv file (before it can be read into R) if it is possible.
Serialization
If we want to pass an R object to C (use recv() function), we can use writeBin() to output the stream size and then use serialize() function to output the stream to a file. See the post on R mailing list.
> a <- list(1,2,3) > a_serial <- serialize(a, NULL) > a_length <- length(a_serial) > a_length [1] 70 > writeBin(as.integer(a_length), connection, endian="big") > serialize(a, connection)
In C++ process, I receive one int variable first to get the length, and then read <length> bytes from the connection.
socketConnection
See ?socketconnection.
Simple example
from the socketConnection's manual.
Open one R session
con1 <- socketConnection(port = 22131, server = TRUE) # wait until a connection from some client writeLines(LETTERS, con1) close(con1)
Open another R session (client)
con2 <- socketConnection(Sys.info()["nodename"], port = 22131) # as non-blocking, may need to loop for input readLines(con2) while(isIncomplete(con2)) { Sys.sleep(1) z <- readLines(con2) if(length(z)) print(z) } close(con2)
Use nc in client
The client does not have to be the R. We can use telnet, nc, etc. See the post here. For example, on the client machine, we can issue
nc localhost 22131 [ENTER]
Then the client will wait and show anything written from the server machine. The connection from nc will be terminated once close(con1) is given.
If I use the command
nc -v -w 2 localhost -z 22130-22135
then the connection will be established for a short time which means the cursor on the server machine will be returned. If we issue the above nc command again on the client machine it will show the connection to the port 22131 is refused. PS. "-w" switch denotes the number of seconds of the timeout for connects and final net reads.
Some post I don't have a chance to read. http://digitheadslabnotebook.blogspot.com/2010/09/how-to-send-http-put-request-from-r.html
Use curl command in client
On the server,
con1 <- socketConnection(port = 8080, server = TRUE)
On the client,
curl --trace-ascii debugdump.txt http://localhost:8080/
Then go to the server,
while(nchar(x <- readLines(con1, 1)) > 0) cat(x, "\n") close(con1) # return cursor in the client machine
Use telnet command in client
On the server,
con1 <- socketConnection(port = 8080, server = TRUE)
On the client,
sudo apt-get install telnet telnet localhost 8080 abcdefg hijklmn qestst
Go to the server,
readLines(con1, 1) readLines(con1, 1) readLines(con1, 1) close(con1) # return cursor in the client machine
Some tutorial about using telnet on http request. And this is a summary of using telnet.
Warning: cannot remove prior installation of package
For example,
# Install the latest hgu133plus2cdf package # Remove/Uninstall hgu133plus2.db package # Put/Install an old version of IRanges (eg version 1.18.2 while currently it is version 1.18.3) # Test on R 3.0.1 library(hgu133plus2cdf) # hgu133pluscdf does not depend or import IRanges source("http://bioconductor.org/biocLite.R") biocLite("hgu133plus2.db", ask=FALSE) # hgu133plus2.db imports IRanges # Warning:cannot remove prior installation of package 'IRanges' # Open Windows Explorer and check IRanges folder. Only see libs subfolder.
Note:
- In the above example, all packages were installed under C:\Program Files\R\R-3.0.1\library\.
- In another instance where I cannot reproduce the problem, new R packages were installed under C:\Users\xxx\Documents\R\win-library\3.0\. The different thing is IRanges package CAN be updated but if I use packageVersion("IRanges") command in R, it still shows the old version.
- The above were tested on a desktop.
- When working on virtualbox VM, sometimes (sort of frequently) I will get an error Warning: unable to move temporary installation `C:\Users\brb\Documents\R\win-library\3.0\fileed8270978f5\quadprog` to `C:\Users\brb\Documents\R\win-library\3.0\quadprog` when I try to run 'install.packages("forecast").
R package depends vs imports
- http://stackoverflow.com/questions/8637993/better-explanation-of-when-to-use-imports-depends
- http://stackoverflow.com/questions/9893791/imports-and-depends
- https://stat.ethz.ch/pipermail/r-devel/2013-August/067082.html
In the namespace era Depends is never really needed. All modern packages have no technical need for Depends anymore. Loosely speaking the only purpose of Depends today is to expose other package's functions to the user without re-exporting them.
load = functions exported in myPkg are available to interested parties as myPkg::foo or via direct imports - essentially this means the package can now be used
attach = the namespace (and thus all exported functions) is attached to the search path - the only effect is that you have now added the exported functions to the global pool of functions - sort of like dumping them in the workspace (for all practical purposes, not technically)
import a function into a package = make sure that this function works in my package regardless of the search path (so I can write fn1 instead of pkg1::fn1 and still know it will come from pkg1 and not someone's workspace or other package that chose the same name)
The distinction is between "loading" and "attaching" a package. Loading it (which would be done if you had MASS::loglm, or imported it) guarantees that the package is initialized and in memory, but doesn't make it visible to the user without the explicit MASS:: prefix. Attaching it first loads it, then modifies the user's search list so the user can see it.
Loading is less intrusive, so it's preferred over attaching. Both library() and require() would attach it.
Subsetting
Subset assignment of R Language Definition and Manipulation of functions.
The result of the command x[3:5] <- 13:15 is as if the following had been executed
`*tmp*` <- x x <- "[<-"(`*tmp*`, 3:5, value=13:15) rm(`*tmp*`)