R

From 太極
Jump to navigation Jump to search

Install Rtools for Windows users

See http://goo.gl/gYh6C for a step-by-step instruction with screenshot.

My preferred way is not to check the option of setting PATH environment. But I manually add the followings to the PATH environment (based on Rtools v3.0)

c:\Rtools\bin;
c:\Rtools\gcc-4.6.3\bin;
C:\Program Files\R\R-2.15.2\bin\i386;

We can make our life easy by creating a file <Rcommand.bat> with the content (also useful if you have C:\cygwin\bin in your PATH although cygwin setup will not do it automatically for you.)

PS. I put <Rcommand.bat> under C:\Program Files\R folder. I create a shortcut called 'Rcmd' on desktop. I enter C:\Windows\System32\cmd.exe /K "Rcommand.bat" in the Target entry and "C:\Program Files\R" in Start in entry.

@echo off
set PATH=C:\Rtools\bin;c:\Rtools\gcc-4.6.3\bin;%PATH%
set PATH=C:\Program Files\R\R-2.15.2\bin\i386;%PATH%
set PKG_LIBS=`Rscript -e "Rcpp:::LdFlags()"`
set PKG_CPPFLAGS=`Rscript -e "Rcpp:::CxxFlags()"`
echo Setting environment for using R
cmd

So we can open the Command Prompt anywhere and run <Rcommand.bat> to get all environment variables ready! On Windows Vista, 7 and 8, we need to run it as administrator. OR we can change the security of the property so the current user can have an executive right.

Compile and install an R package

cd C:\Documents and Settings\brb
wget http://www.bioconductor.org/packages/2.11/bioc/src/contrib/affxparser_1.30.2.tar.gz
C:\progra~1\r\r-2.15.2\bin\R CMD INSTALL --build affxparser_1.30.2.tar.gz

Helpful - check Chapter 6 of R Installation and Administration

Check/Upload to CRAN

http://win-builder.r-project.org/

Install R using binary package

Ubuntu/Debian

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9
gksudo gedit /etc/apt/sources.list
# deb http://cran.fhcrc.org/bin/linux/ubuntu precise/
sudo apt-get update
sudo apt-get install r-base

Redhat el6

It should be pretty easy to install via the EPEL: http://fedoraproject.org/wiki/EPEL

Just follow the instructions to enable the EPEL and then from the CLI as root:

yum install R

or via sudo:

sudo yum install R

Install R from source (ix86, x86_64 and arm platforms, Linux system)

Debian system (focus on arm architecture with notes from x86 system)

Simplest configuration

On my debian system in Pogoplug (armv5) OR Raspberry Pi (armv6), I can compile R. See R's admin manual. If I don't need x11, I just need to install 2 required packages.

  • install gfortran: apt-get install gfortran
  • install readline library: apt-get install libreadline5-dev (pogoplug), apt-get install libreadline6-dev (raspberry pi)

Note: if I need x11, I should install

  • libx11 and libx11-devel, libXt, libXt-devel (for fedora)
  • libx11-dev (for debian) or xorg-dev (for pogoplug/raspberry pi)

and optional

  • texinfo (to fix 'WARNING: you cannot build info or HTML versions of the R manuals')

Note that it is also safe to install required tools via (in ubuntu)

sudo apt-get install r-base-dev 

See #Install r-base and r-base-dev Or even better with

sudo apt-get build-dep r-base

See #Install all dependencies for building R Since with the first approach, running ./configure still complains cannot x11 header/libs still missing. The second approach will pull in dependence like jdk, tcl, tex and more. The apt-get build-dep gave a more complete list than apt-get install r-base-dev for some reasons.

[Arm architecture]I also run apt-get install readline-common. I don't know if this is necessary. Since I don't need x11, I use the option in configure command. After running

wget http://cran.r-project.org/src/base/R-2/R-2.15.2.tar.gz
tar xzvf R-2.15.2.tar.gz
cd R-2.15.2
./configure --with-x=no --enable-R-shlib

I got

R is now configured for armv5tel-unknown-linux-gnueabi

  Source directory:          .
  Installation directory:    /usr/local

  C compiler:                gcc -std=gnu99  -g -O2
  Fortran 77 compiler:       gfortran  -g -O2

  C++ compiler:              g++  -g -O2
  Fortran 90/95 compiler:    gfortran -g -O2
  Obj-C compiler:

  Interfaces supported:
  External libraries:        readline
  Additional capabilities:   NLS
  Options enabled:           shared R library, shared BLAS, R profiling

  Recommended packages:      yes

configure: WARNING: you cannot build info or HTML versions of the R manuals
configure: WARNING: you cannot build PDF versions of the R manuals
configure: WARNING: you cannot build PDF versions of vignettes and help pages
configure: WARNING: I could not determine a browser
configure: WARNING: I could not determine a PDF viewer

PS 1. On my raspberry pi machine, it shows R is now configured for armv6l-unknown-linux-gnueabihf.

PS 2. On my x86 system, it shows

R is now configured for x86_64-unknown-linux-gnu

  Source directory:          .
  Installation directory:    /usr/local

  C compiler:                gcc -std=gnu99  -g -O2
  Fortran 77 compiler:       gfortran  -g -O2

  C++ compiler:              g++  -g -O2
  Fortran 90/95 compiler:    gfortran -g -O2
  Obj-C compiler:

  Interfaces supported:      X11, tcltk
  External libraries:        readline, lzma
  Additional capabilities:   PNG, JPEG, TIFF, NLS, cairo
  Options enabled:           shared R library, shared BLAS, R profiling, Java

  Recommended packages:      yes

[arm] However, make gave errors for recommanded packages like KernSmooth, MASS, boot, class, cluster, codetools, foreign, lattice, mgcv, nlme, nnet, rpart, spatial, and survival. The error stems from gcc: SHLIB_LIBADD: No such file or directory. Note that I can get this error message even I try install.packages("MASS", type="source"). A suggested fix is here; adding perl = TRUE in sub() call for two lines in src/library/tools/R/install.R file. However, I got another error shared object 'MASS.so' not found. See also http://ftp.debian.org/debian/pool/main/r/r-base/.


make[1]: Entering directory `/mnt/usb/R-2.15.2/src/library/Recommended'
make[2]: Entering directory `/mnt/usb/R-2.15.2/src/library/Recommended'
begin installing recommended package MASS
* installing *source* package 'MASS' ...
** libs
make[3]: Entering directory `/tmp/Rtmp4caBfg/R.INSTALL1d1244924c77/MASS/src'
gcc -std=gnu99 -I/mnt/usb/R-2.15.2/include -DNDEBUG  -I/usr/local/include    -fpic  -g -O2  -c MASS.c -o MASS.o
gcc -std=gnu99 -I/mnt/usb/R-2.15.2/include -DNDEBUG  -I/usr/local/include    -fpic  -g -O2  -c lqs.c -o lqs.o
gcc -std=gnu99 -shared -L/usr/local/lib -o MASSSHLIB_EXT MASS.o lqs.o SHLIB_LIBADD -L/mnt/usb/R-2.15.2/lib -lR
gcc: SHLIB_LIBADD: No such file or directory
make[3]: *** [MASSSHLIB_EXT] Error 1
make[3]: Leaving directory `/tmp/Rtmp4caBfg/R.INSTALL1d1244924c77/MASS/src'
ERROR: compilation failed for package 'MASS'
* removing '/mnt/usb/R-2.15.2/library/MASS'
make[2]: *** [MASS.ts] Error 1
make[2]: Leaving directory `/mnt/usb/R-2.15.2/src/library/Recommended'
make[1]: *** [recommended-packages] Error 2
make[1]: Leaving directory `/mnt/usb/R-2.15.2/src/library/Recommended'
make: *** [stamp-recommended] Error 2
root@debian:/mnt/usb/R-2.15.2#
root@debian:/mnt/usb/R-2.15.2# bin/R

R version 2.15.2 (2012-10-26) -- "Trick or Treat"
Copyright (C) 2012 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: armv5tel-unknown-linux-gnueabi (32-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(MASS)
Error in library(MASS) : there is no package called 'MASS'
> library()
Packages in library '/mnt/usb/R-2.15.2/library':

base                    The R Base Package
compiler                The R Compiler Package
datasets                The R Datasets Package
grDevices               The R Graphics Devices and Support for Colours
                        and Fonts
graphics                The R Graphics Package
grid                    The Grid Graphics Package
methods                 Formal Methods and Classes
parallel                Support for Parallel computation in R
splines                 Regression Spline Functions and Classes
stats                   The R Stats Package
stats4                  Statistical Functions using S4 Classes
tcltk                   Tcl/Tk Interface
tools                   Tools for Package Development
utils                   The R Utils Package
> Sys.info()["machine"]
   machine
"armv5tel"
> gc()
         used (Mb) gc trigger (Mb) max used (Mb)
Ncells 170369  4.6     350000  9.4   350000  9.4
Vcells 163228  1.3     905753  7.0   784148  6.0

See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=679180

PS 3. The complete log of building R from source is in here File:Build R log.txt

Full configuration

  Interfaces supported:      X11, tcltk
  External libraries:        readline
  Additional capabilities:   PNG, JPEG, TIFF, NLS, cairo
  Options enabled:           shared R library, shared BLAS, R profiling, Java

Install r-base and r-base-dev

In fact, if we want to take a short cut, it seems OK to run the following statement to install r-base and all required components for building r-base and extra packages. Notice that for readline package, it installs 'libreadline6-dev' instead of 'libreadline5-dev' as I just did.

Note that I did not touch /etc/apt/sources.list file so I don't know what version of R will be installed by this method.

root@debian:/mnt/usb/R-2.15.2# apt-get install r-base-dev
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following package was automatically installed and is no longer required:
  libreadline5
Use 'apt-get autoremove' to remove them.
The following extra packages will be installed:
  defoma dpatch file fontconfig libblas-dev libblas3gf libbz2-dev libcairo2 libdatrie1
  libjpeg62-dev liblapack-dev liblapack3gf libnewt0.52 libpango1.0-0 libpango1.0-common
  libpaper-utils libpaper1 libpcre3-dev libpcrecpp0 libpixman-1-0 libpng12-dev libreadline-dev
  libreadline6-dev libthai-data libthai0 libxcb-render-util0 libxcb-render0 r-base-core unzip
  whiptail xdg-utils zip zlib1g-dev
Suggested packages:
  defoma-doc psfontmgr x-ttcidfont-conf dfontmgr curl ttf-japanese-gothic ttf-japanese-mincho
  ttf-thryomanes ttf-baekmuk ttf-arphic-gbsn00lp ttf-arphic-bsmi00lp ttf-arphic-gkai00mp
  ttf-arphic-bkai00mp ess r-doc-info r-doc-pdf r-mathlib r-base-html cdbs debhelper gvfs-bin
Recommended packages:
  libfont-freetype-perl fakeroot patchutils libfribidi0 r-recommended r-doc-html iceweasel
  www-browser x11-utils x11-xserver-utils shared-mime-info
The following packages will be REMOVED:
  libreadline5-dev
The following NEW packages will be installed:
  defoma dpatch file fontconfig libblas-dev libblas3gf libbz2-dev libcairo2 libdatrie1
  libjpeg62-dev liblapack-dev liblapack3gf libnewt0.52 libpango1.0-0 libpango1.0-common
  libpaper-utils libpaper1 libpcre3-dev libpcrecpp0 libpixman-1-0 libpng12-dev libreadline-dev
  libreadline6-dev libthai-data libthai0 libxcb-render-util0 libxcb-render0 r-base-core
  r-base-dev unzip whiptail xdg-utils zip zlib1g-dev
0 upgraded, 34 newly installed, 1 to remove and 0 not upgraded.
Need to get 22.9 MB of archives.
After this operation, 58.2 MB of additional disk space will be used.
Do you want to continue [Y/n]? n

Install all dependencies for building R

This is a comprehensive list. This list is even larger than r-base-dev.

root@debian:/mnt/usb/R-2.15.2# apt-get build-dep r-base
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages will be REMOVED:
  libreadline5-dev
The following NEW packages will be installed:
  bison ca-certificates ca-certificates-java debhelper defoma ed file fontconfig gettext
  gettext-base html2text intltool-debian java-common libaccess-bridge-java
  libaccess-bridge-java-jni libasound2 libasyncns0 libatk1.0-0 libaudit0 libavahi-client3
  libavahi-common-data libavahi-common3 libblas-dev libblas3gf libbz2-dev libcairo2
  libcairo2-dev libcroco3 libcups2 libdatrie1 libdbus-1-3 libexpat1-dev libflac8
  libfontconfig1-dev libfontenc1 libfreetype6-dev libgif4 libglib2.0-dev libgtk2.0-0
  libgtk2.0-common libice-dev libjpeg62-dev libkpathsea5 liblapack-dev liblapack3gf libnewt0.52
  libnspr4-0d libnss3-1d libogg0 libopenjpeg2 libpango1.0-0 libpango1.0-common libpango1.0-dev
  libpcre3-dev libpcrecpp0 libpixman-1-0 libpixman-1-dev libpng12-dev libpoppler5 libpulse0
  libreadline-dev libreadline6-dev libsm-dev libsndfile1 libthai-data libthai0 libtiff4-dev
  libtiffxx0c2 libunistring0 libvorbis0a libvorbisenc2 libxaw7 libxcb-render-util0
  libxcb-render-util0-dev libxcb-render0 libxcb-render0-dev libxcomposite1 libxcursor1
  libxdamage1 libxext-dev libxfixes3 libxfont1 libxft-dev libxi6 libxinerama1 libxkbfile1
  libxmu6 libxmuu1 libxpm4 libxrandr2 libxrender-dev libxss-dev libxt-dev libxtst6 luatex m4
  openjdk-6-jdk openjdk-6-jre openjdk-6-jre-headless openjdk-6-jre-lib openssl pkg-config
  po-debconf preview-latex-style shared-mime-info tcl8.5-dev tex-common texi2html texinfo
  texlive-base texlive-binaries texlive-common texlive-doc-base texlive-extra-utils
  texlive-fonts-recommended texlive-generic-recommended texlive-latex-base texlive-latex-extra
  texlive-latex-recommended texlive-pictures tk8.5-dev tzdata-java whiptail x11-xkb-utils
  x11proto-render-dev x11proto-scrnsaver-dev x11proto-xext-dev xauth xdg-utils xfonts-base
  xfonts-encodings xfonts-utils xkb-data xserver-common xvfb zlib1g-dev
0 upgraded, 136 newly installed, 1 to remove and 0 not upgraded.
Need to get 139 MB of archives.
After this operation, 410 MB of additional disk space will be used.
Do you want to continue [Y/n]?

Web Applications

Create HTML5 web and slides

http://www.gastonsanchez.com/depot/knitr-slides. The HTML5 slides work on my IE 8 too.

HTML5 slides examples

Software requirement

  • Rstudio
  • knitr, XML, RCurl
  • pandoc package This is a command line tool. I am testing it on Windows 7.

Slide #22 gives an instruction to create

  • regular html file by using RStudio -> Knit HTML button
  • HTML5 slides by using pandoc from command line.

Files:

  • Rcmd source: 009-slides.Rmd Note that IE 8 was not supported by github. For IE 9, be sure to turn off "Compatibility View".
  • markdown output: 009-slides.md
  • HTML output: 009-slides.html

We can create Rcmd source in Rstudio by File -> New -> R Markdown.

There are 4 ways to produce slides with pandoc

  • S5
  • DZSlides
  • Slidy
  • Slideous

Use the markdown file (md) and convert it with pandoc

pandoc -s -S -i -t dzslides --mathjax html5_slides.md -o html5_slides.html

If we are comfortable with HTML and CSS code, open the html file (generated by pandoc) and modify the CSS style at will.

Markdown language

According to wikipedia:

Markdown is a lightweight markup language, originally created by John Gruber with substantial contributions from Aaron Swartz, allowing people “to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML)”.

  • Markup is a general term for content formatting - such as HTML - but markdown is a library that generates HTML markup.
  • Convert mediawiki to markdown using online conversion tool from pandoc.

HTTP protocol

shiny

The following is what we see on a browser after we run an example from shiny package. See http://rstudio.github.com/shiny/tutorial/#hello-shiny. Note that the R session needs to be on; i.e. R command prompt will not be returned unless we press Ctrl+C or ESC.

ShinyHello.png Shinympg.png ShinyReactivity.png ShinyTabsets.png ShinyUpload.png

shiny depends on websockets, caTools, bitops, digest packages.

Q & A:

  • Q: If we run runExample('01_hello') in Rserve from an R client, we can continue our work in R client without losing the functionality of the GUI from shiny. Question: how do we kill the job?
  • If I run the example "01_hello", the browser only shows the control but not graph on Firefox? A: Use Chrome or Opera as the default browser.
  • If I run the example "01_hello" on RHEL the first time, it works fine. But if I click 'Ctrl + C' to stop it and run it again, I got a message
Warning in .SOCK_SERVE(port) : R-Websockets(tcpserv): bind() failed. 
Error in createContext(port, webpage, is.binary = is.binary) : 
  Unable to bind socket on port 8100; is it realsy in use?

A simple solution is to close R and open it again.

  • Q: Deployment on web. A: Not ready yet. Shiny server platform is still under beta testing. Shiny apps are hosted using the R websockets package which acts more like a tcp server than a web server, and that architecture just doesn't fit with rApache, or even apache for that matter.
  • Q: How difficult to put the code in Gist:github? A: Just create an account. Do not even need to create a repository. Just go to http://gist.github.com and create a new gist. The new gist can be secret or public. A secret gist can not be edited again after it is created although it works fine when it was used in runGist() function.


shiny server

See https://github.com/rstudio/shiny-server

It works on my ubuntu server. To test, I need to run

sudo shiny-server
# maybe I need to add ampersand '&' sign to the end of the above command

Then I can test it by opening a browser and pointing it to http://taichi.selfip.net:3838/hello/

RApache

gWidgetsWWW

Rook

Since R 2.13, the internal web server was exposed.

Tutorual from useR2012 and Jeffrey Horner

Here is another one from http://www.rinfinance.com.

Rook is also supported by [rApache too. See http://rapache.net/manual.html.

Google group. https://groups.google.com/forum/?fromgroups#!forum/rrook

Advantage

  • the web applications are created on desktop, whether it is Windows, Mac or Linux.
  • No Apache is needed.
  • create multiple applications at the same time. This complements the limit of rApache.

4 lines of code example.

library(Rook)
s <- Rhttpd$new()
s$start(quiet=TRUE)
s$print()
s$browse(1)  # OR s$browse("RookTest")

Notice that after s$browse() command, the cursor will return to R because the command just a shortcut to open the web page http://127.0.0.1:10215/custom/RookTest.

Rook.png Rook2.png Rookapprnorm.png

We can add Rook application to the server; see ?Rhttpd.

s$add(
    app=system.file('exampleApps/helloworld.R',package='Rook'),name='hello'
)
s$add(
    app=system.file('exampleApps/helloworldref.R',package='Rook'),name='helloref'
)
s$add(
    app=system.file('exampleApps/summary.R',package='Rook'),name='summary'
)

s$print()

#Server started on 127.0.0.1:10221
#[1] RookTest http://127.0.0.1:10221/custom/RookTest
#[2] helloref http://127.0.0.1:10221/custom/helloref
#[3] summary  http://127.0.0.1:10221/custom/summary
#[4] hello    http://127.0.0.1:10221/custom/hello

#  Stops the server but doesn't uninstall the app
## Not run: 
s$stop()

## End(Not run)
s$remove(all=TRUE)
rm(s)

For example, the interface and the source code of summary app are given below

Rookappsummary.png

app <- function(env) {
    req <- Rook::Request$new(env)
    res <- Rook::Response$new()
    res$write('Choose a CSV file:\n')
    res$write('<form method="POST" enctype="multipart/form-data">\n')
    res$write('<input type="file" name="data">\n')
    res$write('<input type="submit" name="Upload">\n</form>\n<br>')

    if (!is.null(req$POST())){
	data <- req$POST()[['data']]
	res$write("<h3>Summary of Data</h3>");
	res$write("<pre>")
	res$write(paste(capture.output(summary(read.csv(data$tempfile,stringsAsFactors=FALSE)),file=NULL),collapse='\n'))
	res$write("</pre>")
	res$write("<h3>First few lines (head())</h3>");
	res$write("<pre>")
	res$write(paste(capture.output(head(read.csv(data$tempfile,stringsAsFactors=FALSE)),file=NULL),collapse='\n'))
	res$write("</pre>") 
    }
    res$finish()
}

More example:

Stockplot

FastRWeb

Rwui

CGHWithR (removed from CRAN)

But it is still working with old version of R.

Creating local repository for CRAN and Bioconductor (focus on Windows binary packages only)

How to set up a local repository

General guide: http://cran.r-project.org/doc/manuals/R-admin.html#Setting-up-a-package-repository

Utilities such as install.packages can be pointed at any CRAN-style repository, and R users may want to set up their own. The ‘base’ of a repository is a URL such as http://www.omegahat.org/R/: this must be an URL scheme that download.packages supports (which also includes ‘ftp://’ and ‘file://’, but not on most systems ‘https://’). Under that base URL there should be directory trees for one or more of the following types of package distributions:

  • "source": located at src/contrib and containing .tar.gz files. Other forms of compression can be used, e.g. .tar.bz2 or .tar.xz files.
  • "win.binary": located at bin/windows/contrib/x.y for R versions x.y.z and containing .zip files for Windows.
  • "mac.binary.leopard": located at bin/macosx/leopard/contrib/x.y for R versions x.y.z and containing .tgz files.

Each terminal directory must also contain a PACKAGES file. This can be a concatenation of the DESCRIPTION files of the packages separated by blank lines, but only a few of the fields are needed. The simplest way to set up such a file is to use function write_PACKAGES in the tools package, and its help explains which fields are needed. Optionally there can also be a PACKAGES.gz file, a gzip-compressed version of PACKAGES—as this will be downloaded in preference to PACKAGES it should be included for large repositories. (If you have a mis-configured server that does not report correctly non-existent files you will need PACKAGES.gz.)

To add your repository to the list offered by setRepositories(), see the help file for that function.

A repository can contain subdirectories, when the descriptions in the PACKAGES file of packages in subdirectories must include a line of the form

Path: path/to/subdirectory

—once again write_PACKAGES is the simplest way to set this up.

Space requirement if we want to mirror WHOLE repository

  • Whole CRAN takes about 92GB (rsync -avn cran.r-project.org::CRAN > ~/Downloads/cran).
  • Bioconductor is big (> 64G for BioC 2.11). Please check the size of what will be transferred with e.g. (rsync -avn bioconductor.org::2.11 > ~/Downloads/bioc) and make sure you have enough room on your local disk before you start.

On the other hand, we if only care about Windows binary part, the space requirement is largely reduced.

  • CRAN: 2.7GB
  • Bioconductor: 28GB.

Misc notes

  • If the binary package was built on R 2.15.1, then it cannot be installed on R 2.15.2. But vice is OK.
  • Remember to issue "--delete" option in rsync, otherwise old version of package will be installed.
  • The repository still need src directory. If it is missing, we will get an error
Warning: unable to access index for repository http://arraytools.no-ip.org/CRAN/src/contrib
Warning message:
package ‘glmnet’ is not available (for R version 2.15.2) 

The error was given by available.packages() function.

To bypass the requirement of src directory, I can use

install.packages("glmnet", contriburl = contrib.url(getOption('repos'), "win.binary"))

but there may be a problem when we use biocLite() command.

I find a workaround. Since the error comes from missing CRAN/src directory, we just need to make sure the directory CRAN/src/contrib exists AND either CRAN/src/contrib/PACKAGES or CRAN/src/contrib/PACKAGES.gz exists.

To create CRAN repository

Before creating a local repository please give a dry run first. You don't want to be surprised how long will it take to mirror a directory.

Dry run (-n option). Pipe out the process to a text file for an examination.

rsync -avn cran.r-project.org::CRAN > crandryrun.txt

To mirror only partial repository, it is necessary to create directories before running rsync command.

cd
mkdir -p ~/Rmirror/CRAN/bin/windows/contrib/2.15
rsync -rtlzv --delete cran.r-project.org::CRAN/bin/windows/contrib/2.15/ ~/Rmirror/CRAN/bin/windows/contrib/2.15
(one line with space before ~/Rmirror)

# src directory is very large (~27GB) since it contains source code for each R version. 
# We just need the files PACKAGES and PACKAGES.gz in CRAN/src/contrib. So I comment out the following line.
# rsync -rtlzv --delete cran.r-project.org::CRAN/src/ ~/Rmirror/CRAN/src/
mkdir -p ~/Rmirror/CRAN/src/contrib
rsync -rtlzv --delete cran.r-project.org::CRAN/src/contrib/PACKAGES ~/Rmirror/CRAN/src/contrib/
rsync -rtlzv --delete cran.r-project.org::CRAN/src/contrib/PACKAGES.gz ~/Rmirror/CRAN/src/contrib/

And optionally

library(tools)
write_PACKAGES("~/Rmirror/CRAN/bin/windows/contrib/2.15", type="win.binary") 

and if we want to get src directory

rsync -rtlzv --delete cran.r-project.org::CRAN/src/contrib/*.tar.gz ~/Rmirror/CRAN/src/contrib/
rsync -rtlzv --delete cran.r-project.org::CRAN/src/contrib/2.15.3 ~/Rmirror/CRAN/src/contrib/

We can use du -h to check the folder size.

For example (as of 1/7/2013),

$ du -k ~/Rmirror --max-depth=1 --exclude ".*" | sort -nr | cut -f2 | xargs -d '\n' du -sh
30G	/home/brb/Rmirror
28G	/home/brb/Rmirror/Bioc
2.7G	/home/brb/Rmirror/CRAN

To create Bioconductor repository

Dry run

rsync -avn bioconductor.org::2.11 > biocdryrun.txt

Then creates directories before running rsync.

cd
mkdir -p ~/Rmirror/Bioc
wget -N http://www.bioconductor.org/biocLite.R -P ~/Rmirror/Bioc

where -N is to overwrite original file if the size or timestamp change and -P in wget means an output directory, not a file name.

Optionally, we can add the following in order to see the Bioconductor front page.

rsync -zrtlv  --delete bioconductor.org::2.11/BiocViews.html ~/Rmirror/Bioc/packages/2.11/
rsync -zrtlv  --delete bioconductor.org::2.11/index.html ~/Rmirror/Bioc/packages/2.11/

The software part (aka bioc directory) installation:

cd
mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/bin/windows
mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/src
rsync -zrtlv  --delete bioconductor.org::2.11/bioc/bin/windows/ ~/Rmirror/Bioc/packages/2.11/bioc/bin/windows
# Either rsync whole src directory or just essential files
# rsync -zrtlv  --delete bioconductor.org::2.11/bioc/src/ ~/Rmirror/Bioc/packages/2.11/bioc/src
rsync -zrtlv  --delete bioconductor.org::2.11/bioc/src/contrib/PACKAGES ~/Rmirror/Bioc/packages/2.11/bioc/src/contrib/
rsync -zrtlv  --delete bioconductor.org::2.11/bioc/src/contrib/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/bioc/src/contrib/
# Optionally the html part
mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/html
rsync -zrtlv  --delete bioconductor.org::2.11/bioc/html/ ~/Rmirror/Bioc/packages/2.11/bioc/html
mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/vignettes
rsync -zrtlv  --delete bioconductor.org::2.11/bioc/vignettes/ ~/Rmirror/Bioc/packages/2.11/bioc/vignettes
mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/news
rsync -zrtlv  --delete bioconductor.org::2.11/bioc/news/ ~/Rmirror/Bioc/packages/2.11/bioc/news
mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/licenses
rsync -zrtlv  --delete bioconductor.org::2.11/bioc/licenses/ ~/Rmirror/Bioc/packages/2.11/bioc/licenses
mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/manuals
rsync -zrtlv  --delete bioconductor.org::2.11/bioc/manuals/ ~/Rmirror/Bioc/packages/2.11/bioc/manuals
mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/readmes
rsync -zrtlv  --delete bioconductor.org::2.11/bioc/readmes/ ~/Rmirror/Bioc/packages/2.11/bioc/readmes

and annotation (aka data directory) part:

mkdir -p ~/Rmirror/Bioc/packages/2.11/data/annotation/bin/windows
mkdir -p ~/Rmirror/Bioc/packages/2.11/data/annotation/src/contrib
# one line for each of the following
rsync -zrtlv --delete bioconductor.org::2.11/data/annotation/bin/windows/ ~/Rmirror/Bioc/packages/2.11/data/annotation/bin/windows
rsync -zrtlv --delete bioconductor.org::2.11/data/annotation/src/contrib/PACKAGES ~/Rmirror/Bioc/packages/2.11/data/annotation/src/contrib/
rsync -zrtlv --delete bioconductor.org::2.11/data/annotation/src/contrib/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/data/annotation/src/contrib/

and experiment directory:

mkdir -p ~/Rmirror/Bioc/packages/2.11/data/experiment/bin/windows/contrib/2.15
mkdir -p ~/Rmirror/Bioc/packages/2.11/data/experiment/src/contrib
# one line for each of the following
# Note that we are cheating by only downloading PACKAGES and PACKAGES.gz files
rsync -zrtlv --delete bioconductor.org::2.11/data/experiment/bin/windows/contrib/2.15/PACKAGES ~/Rmirror/Bioc/packages/2.11/data/experiment/bin/windows/contrib/2.15/
rsync -zrtlv --delete bioconductor.org::2.11/data/experiment/bin/windows/contrib/2.15/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/data/experiment/bin/windows/contrib/2.15/
rsync -zrtlv --delete bioconductor.org::2.11/data/experiment/src/contrib/PACKAGES ~/Rmirror/Bioc/packages/2.11/data/experiment/src/contrib/
rsync -zrtlv --delete bioconductor.org::2.11/data/experiment/src/contrib/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/data/experiment/src/contrib/

and extra directory:

mkdir -p ~/Rmirror/Bioc/packages/2.11/extra/bin/windows/contrib/2.15
mkdir -p ~/Rmirror/Bioc/packages/2.11/extra/src/contrib
# one line for each of the following
# Note that we are cheating by only downloading PACKAGES and PACKAGES.gz files
rsync -zrtlv --delete bioconductor.org::2.11/extra/bin/windows/contrib/2.15/PACKAGES ~/Rmirror/Bioc/packages/2.11/extra/bin/windows/contrib/2.15/
rsync -zrtlv --delete bioconductor.org::2.11/extra/bin/windows/contrib/2.15/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/extra/bin/windows/contrib/2.15/
rsync -zrtlv --delete bioconductor.org::2.11/extra/src/contrib/PACKAGES ~/Rmirror/Bioc/packages/2.11/extra/src/contrib/
rsync -zrtlv --delete bioconductor.org::2.11/extra/src/contrib/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/extra/src/contrib/

To test local repository

Create soft links in Apache server

su
ln -s /home/brb/Rmirror/CRAN /var/www/html/CRAN
ln -s /home/brb/Rmirror/Bioc /var/www/html/Bioc
ls -l /var/www/html

The soft link mode should be 777.

To test CRAN

Replace the host name arraytools.no-ip.org by IP address 10.133.2.111 if necessary.

r <- getOption("repos"); r["CRAN"] <- "http://arraytools.no-ip.org/CRAN"
options(repos=r)
install.packages("glmnet")

We can test if the backup server is working or not by installing a package which was removed from the CRAN. For example, 'ForImp' was removed from CRAN in 11/8/2012, but I still a local copy built on R 2.15.2 (run rsync on 11/6/2012).

r <- getOption("repos"); r["CRAN"] <- "http://cran.r-project.org"
r <- c(r, BRB='http://arraytools.no-ip.org/CRAN')
#                        CRAN                            CRANextra                                  BRB 
# "http://cran.r-project.org" "http://www.stats.ox.ac.uk/pub/RWin"   "http://arraytools.no-ip.org/CRAN"
options(repos=r)
install.packages('ForImp')

Note by default, CRAN mirror is selected interactively.

> getOption("repos")
                                CRAN                            CRANextra 
                            "@CRAN@" "http://www.stats.ox.ac.uk/pub/RWin" 

To test Bioconductor

# CRAN part:
r <- getOption("repos"); r["CRAN"] <- "http://arraytools.no-ip.org/CRAN"
options(repos=r)
# Bioconductor part:
options("BioC_mirror" = "http://arraytools.no-ip.org/Bioc")
source("http://bioconductor.org/biocLite.R")
# This source biocLite.R line can be placed either before or after the previous 2 lines
biocLite("aCGH")

If there is a connection problem, check folder attributes.

chmod -R 755 ~/CRAN/bin
  • Note that if a binary package was created for R 2.15.1, then it can be installed under R 2.15.1 but not R 2.15.2. The R console will show package xxx is not available (for R version 2.15.2).
  • For binary installs, the function also checks for the availability of a source package on the same repository, and reports if the source package has a later version, or is available but no binary version is.

So for example, if the mirror does not have contents under src directory, we need to run the following line in order to successfully run install.packages() function.

options(install.packages.check.source = "no")
  • If we only mirror the essential directories, we can run biocLite() successfully. However, the R console will give some warning
> biocLite("aCGH")
BioC_mirror: http://arraytools.no-ip.org/Bioc
Using Bioconductor version 2.11 (BiocInstaller 1.8.3), R version 2.15.
Installing package(s) 'aCGH'
Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/data/experiment/src/contrib
Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/extra/src/contrib
Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/data/experiment/bin/windows/contrib/2.15
Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/extra/bin/windows/contrib/2.15
trying URL 'http://arraytools.no-ip.org/Bioc/packages/2.11/bioc/bin/windows/contrib/2.15/aCGH_1.36.0.zip'
Content type 'application/zip' length 2431158 bytes (2.3 Mb)
opened URL
downloaded 2.3 Mb

package ‘aCGH’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
        C:\Users\limingc\AppData\Local\Temp\Rtmp8IGGyG\downloaded_packages
Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/data/experiment/bin/windows/contrib/2.15
Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/extra/bin/windows/contrib/2.15
> library()

CRAN repository directory structure

The information below is specific to R 2.15.2. There are linux and macosx subdirecotries whenever there are windows subdirectory.

bin/winows/contrib/2.15
src/contrib
   /contrib/2.15.2
   /contrib/Archive
web/checks
   /dcmeta
   /packages
   /views

A clickable map [1]

Bioconductor repository directory structure

The information below is specific to Bioc 2.11. There are linux and macosx subdirecotries whenever there are windows subdirectory.

bioc/bin/windows/contrib/2.15
    /html
    /install
    /license
    /manuals 
    /news
    /src
    /vignettes
data/annotation/bin/windows/contrib/2.15
               /html
               /licenses
               /manuals
               /src
               /vignettes
     /experiment/bin/windows/contrib/2.15
                /html
                /manuals
                /src/contrib
                /vignettes
extra/bin/windows/contrib
     /html
     /src
     /vignettes

List all R packages from CRAN/Bioconductor

Check my daily result based on R 2.15 and Bioc 2.11 in [2]

  1. CRAN
  2. Bioc software
  3. Bioc annotation
  4. Bioc experiment

Parallel Computing

snowfall package

http://www.imbi.uni-freiburg.de/parallel/docs/Reisensburg2009_TutParallelComputing_Knaus_Porzelius.pdf

Cloud Computing

Install R on Amazon EC2

http://randyzwitch.com/r-amazon-ec2/

Big Data Analysis

http://blog.comsysto.com/2013/02/14/my-favorite-community-links/

Useful R packages

GenOrd: Generate ordinal and discrete variables with given correlation matrix and marginal distributions

here

RJSONIO

Rcpp

caret

xlsx package

ggplot2

stringr and plyr

http://martinsbioblogg.wordpress.com/2013/03/24/using-r-reading-tables-that-need-a-little-cleaning/

A data.frame is pretty much a list of vectors, so we use plyr to apply over the list and stringr to search and replace in the vectors.

jpeg

If we want to create the image on this wiki left hand side panel, we can use jpeg package to read an existing plot and then edit and save it.

Different ways of using R

Create HTML 5 web and slides

See here

Create academic report

reports package and github repository. The youtube video gives an overview of the package.

Create Word report

knitr + pandoc

It is better to create rmd file in RStudio. Rstudio provides a template for rmd file and it also provides a quick reference to R markdown language.

# Idea:
#        knitr       pandoc
#   rmd -------> md --------> docx
library(knitr)
knit2html("example.rmd") #Create md and html files

and then

FILE <- "example"
system(paste0("pandoc -o ", FILE, ".docx ", FILE, ".md"))

Note. For example reason, if I play around the above 2 commands for several times, the knit2html() does not work well. However, if I click 'Knit HTML' button on the RStudio, it then works again.

Another way is

library(pander)
name = "demo"
knit(paste0(name, ".Rmd"), encoding = "utf-8")
Pandoc.brew(file = paste0(name, ".md"), output = paste0(-name, "docx"), convert = "docx")

Note that once we have used knitr command to create a md file, we can use pandoc shell command to convert it to different formats:

  • A pdf file: pandoc -s report.md -t latex -o report.pdf
  • A html file: pandoc -s report.md -o report.html (with the -c flag html files can be added easily)
  • Openoffice: pandoc report.md -o report.odt
  • Word docx: pandoc report.md -o report.docx

pander

Try pandoc[1] with a minimal reproducible example, you might give a try to my "pander" package [2] too:

library(pander)
Pandoc.brew(system.file('examples/minimal.brew', package='pander'),
            output = tempfile(), convert = 'docx')

Where the content of the "minimal.brew" file is something you might have got used to with Sweave - although it's using "brew" syntax instead. See the examples of pander [3] for more details. Please note that pandoc should be installed first, which is pretty easy on Windows.

  1. http://johnmacfarlane.net/pandoc/
  2. http://rapporter.github.com/pander/
  3. http://rapporter.github.com/pander/#examples

R2wd

Use R2wd package. However, only 32-bit R is allowed and sometimes it can not produce all 'table's.

> library(R2wd)
> wdGet()
Loading required package: rcom
Loading required package: rscproxy
rcom requires a current version of statconnDCOM installed.
To install statconnDCOM type
     installstatconnDCOM()

This will download and install the current version of statconnDCOM

You will need a working Internet connection
because installation needs to download a file.
Error in if (wdapp[["Documents"]][["Count"]] == 0) wdapp[["Documents"]]$Add() : 
  argument is of length zero 

The solution is to launch 32-bit R instead of 64-bit R since statconnDCOM does not support 64-bit R.

Convert from pdf to word

The best rendering of advanced tables is done by converting from pdf to Word. See http://biostat.mc.vanderbilt.edu/wiki/Main/SweaveConvert

rtf

Use rtf package for Rich Text Format (RTF) Output.

xtable

Package xtable will produce html output. If you save the file and then open it with Word, you will get serviceable results. I've had better luck copying the output from xtable and pasting it into Excel.

Use R under proxy

http://support.rstudio.org/help/kb/faq/configuring-r-to-use-an-http-proxy

What is the best place to save Rconsole on Windows platform

Put it in C:/Users/USERNAME/Documents folder so no matter how R was upgraded/downgraded, it always find my preference.

Web scraping

http://www.slideshare.net/schamber/web-data-from-r#btnNext

Launch Rstudio

If multiple versions of R was detected, Rstudio can not be launched successfully. A java-like clock will be spinning without a stop. The trick is to click Ctrl key and click the Rstudio at the same time. After done that, it will show up a selection of R to choose from.

RStudio.jpg

List files using regular expression

  • Extension
list.files(pattern = "\\.txt$")
  • Start with
list.files(pattern = "^Something")

Hidden tool: rsync in Rtools

c:\Rtools\bin>rsync -avz "/cygdrive/c/users/limingc/Downloads/a.exe" "/cygdrive/c/users/limingc/Documents/"
sending incremental file list
a.exe

sent 323142 bytes  received 31 bytes  646346.00 bytes/sec
total size is 1198416  speedup is 3.71

c:\Rtools\bin>

And rsync works best when we need to sync folder.

c:\Rtools\bin>rsync -avz "/cygdrive/c/users/limingc/Downloads/binary" "/cygdrive/c/users/limingc/Documents/"
sending incremental file list
binary/
binary/Eula.txt
binary/cherrytree.lnk
binary/depends64.chm
binary/depends64.dll
binary/depends64.exe
binary/mtputty.exe
binary/procexp.chm
binary/procexp.exe
binary/pscp.exe
binary/putty.exe
binary/sqlite3.exe
binary/wget.exe

sent 4115294 bytes  received 244 bytes  1175868.00 bytes/sec
total size is 8036311  speedup is 1.95

c:\Rtools\bin>rm c:\users\limingc\Documents\binary\procexp.exe
cygwin warning:
  MS-DOS style path detected: c:\users\limingc\Documents\binary\procexp.exe
  Preferred POSIX equivalent is: /cygdrive/c/users/limingc/Documents/binary/procexp.exe
  CYGWIN environment variable option "nodosfilewarning" turns off this warning.
  Consult the user's guide for more details about POSIX paths:
    http://cygwin.com/cygwin-ug-net/using.html#using-pathnames

c:\Rtools\bin>rsync -avz "/cygdrive/c/users/limingc/Downloads/binary" "/cygdrive/c/users/limingc/Documents/"
sending incremental file list
binary/
binary/procexp.exe

sent 1767277 bytes  received 35 bytes  3534624.00 bytes/sec
total size is 8036311  speedup is 4.55

c:\Rtools\bin>

Unforunately, if the destination is a network drive, I could get a permission denied (13) error. See also http://superuser.com/questions/69620/rsync-file-permissions-on-windows

Install rgdal package on ubuntu

sudo apt-get install libgdal1-dev libproj-dev
R
> install.packages("rgdal")

Embedding R

First make sure before 'make' R, R is configured with

./configure --enable-R-shlib

Reference http://bioconductor.org/help/course-materials/2012/Seattle-Oct-2012/AdvancedR.pdf

mli@PhenomIIx6:~/Downloads/R-2.15.2/library/AdvancedR/embedding$ export R_HOME=/home/mli/Downloads/R-2.15.2
mli@PhenomIIx6:~/Downloads/R-2.15.2/library/AdvancedR/embedding$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/mli/Downloads/R-2.15.2/lib
mli@PhenomIIx6:~/Downloads/R-2.15.2/library/AdvancedR/embedding$ g++ embed.c -I/home/mli/Downloads/R-2.15.2/include -L/home/mli/Downloads/R-2.15.2/lib -lR

mli@PhenomIIx6:~/Downloads/R-2.15.2/library/AdvancedR/embedding$ R CMD ./a.out
WARNING: ignoring environment value of R_HOME

R version 2.15.2 (2012-10-26) -- "Trick or Treat"
Copyright (C) 2012 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.


ns> require(stats); require(graphics)

ns> ns(women$height, df = 5)
                 1            2           3          4             5
 [1,] 0.000000e+00 0.000000e+00  0.00000000 0.00000000  0.0000000000
 [2,] 7.592323e-03 0.000000e+00 -0.08670223 0.26010669 -0.1734044626
 [3,] 6.073858e-02 0.000000e+00 -0.15030440 0.45091320 -0.3006088020
 [4,] 2.047498e-01 6.073858e-05 -0.16778345 0.50335034 -0.3355668952
 [5,] 4.334305e-01 1.311953e-02 -0.13244035 0.39732106 -0.2648807067
 [6,] 6.256681e-01 8.084305e-02 -0.07399720 0.22199159 -0.1479943948
 [7,] 6.477162e-01 2.468416e-01 -0.02616007 0.07993794 -0.0532919575
 [8,] 4.791667e-01 4.791667e-01  0.01406302 0.02031093 -0.0135406187
 [9,] 2.468416e-01 6.477162e-01  0.09733619 0.02286023 -0.0152401533
[10,] 8.084305e-02 6.256681e-01  0.27076826 0.06324188 -0.0405213106
[11,] 1.311953e-02 4.334305e-01  0.48059836 0.12526031 -0.0524087186
[12,] 6.073858e-05 2.047498e-01  0.59541597 0.19899261  0.0007809246
[13,] 0.000000e+00 6.073858e-02  0.50097182 0.27551020  0.1627793975
[14,] 0.000000e+00 7.592323e-03  0.22461127 0.35204082  0.4157555879
[15,] 0.000000e+00 0.000000e+00 -0.14285714 0.42857143  0.7142857143
attr(,"degree")
[1] 3
attr(,"knots")
 20%  40%  60%  80%
60.8 63.6 66.4 69.2
attr(,"Boundary.knots")
[1] 58 72
attr(,"intercept")
[1] FALSE
attr(,"class")
[1] "ns"     "basis"  "matrix"

ns> summary(fm1 <- lm(weight ~ ns(height, df = 5), data = women))

Call:
lm(formula = weight ~ ns(height, df = 5), data = women)

Residuals:
     Min       1Q   Median       3Q      Max
-0.38333 -0.12585  0.07083  0.15401  0.30426

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)
(Intercept)         114.7447     0.2338  490.88  < 2e-16 ***
ns(height, df = 5)1  15.9474     0.3699   43.12 9.69e-12 ***
ns(height, df = 5)2  25.1695     0.4323   58.23 6.55e-13 ***
ns(height, df = 5)3  33.2582     0.3541   93.93 8.91e-15 ***
ns(height, df = 5)4  50.7894     0.6062   83.78 2.49e-14 ***
ns(height, df = 5)5  45.0363     0.2784  161.75  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2645 on 9 degrees of freedom
Multiple R-squared: 0.9998,     Adjusted R-squared: 0.9997
F-statistic:  9609 on 5 and 9 DF,  p-value: < 2.2e-16


ns> ## example of safe prediction
ns> plot(women, xlab = "Height (in)", ylab = "Weight (lb)")

ns> ht <- seq(57, 73, length.out = 200)

ns> lines(ht, predict(fm1, data.frame(height=ht)))

ns> ## Don't show:
ns> ## Consistency:
ns> x <- c(1:3,5:6)

ns> stopifnot(identical(ns(x), ns(x, df = 1)),
ns+           identical(ns(x, df=2), ns(x, df=2, knots=NULL)),# not true till 2.15.2
ns+           !is.null(kk <- attr(ns(x), "knots")),# not true till 1.5.1
ns+           length(kk) == 0)

ns> ## End Don't show
ns>
ns>
ns>

The above result can be compared with running

mli@PhenomIIx6:~/Downloads/R-2.15.2/library/AdvancedR/embedding$ R WARNING: ignoring environment value of R_HOME

R version 2.15.2 (2012-10-26) -- "Trick or Treat" Copyright (C) 2012 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details.

 Natural language support but running in an English locale

R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R.

> library(splines) > example("ns")

ns> require(stats); require(graphics)

ns> ns(women$height, df = 5)

                1            2           3          4             5
[1,] 0.000000e+00 0.000000e+00  0.00000000 0.00000000  0.0000000000
[2,] 7.592323e-03 0.000000e+00 -0.08670223 0.26010669 -0.1734044626
[3,] 6.073858e-02 0.000000e+00 -0.15030440 0.45091320 -0.3006088020
[4,] 2.047498e-01 6.073858e-05 -0.16778345 0.50335034 -0.3355668952
[5,] 4.334305e-01 1.311953e-02 -0.13244035 0.39732106 -0.2648807067
[6,] 6.256681e-01 8.084305e-02 -0.07399720 0.22199159 -0.1479943948
[7,] 6.477162e-01 2.468416e-01 -0.02616007 0.07993794 -0.0532919575
[8,] 4.791667e-01 4.791667e-01  0.01406302 0.02031093 -0.0135406187
[9,] 2.468416e-01 6.477162e-01  0.09733619 0.02286023 -0.0152401533

[10,] 8.084305e-02 6.256681e-01 0.27076826 0.06324188 -0.0405213106 [11,] 1.311953e-02 4.334305e-01 0.48059836 0.12526031 -0.0524087186 [12,] 6.073858e-05 2.047498e-01 0.59541597 0.19899261 0.0007809246 [13,] 0.000000e+00 6.073858e-02 0.50097182 0.27551020 0.1627793975 [14,] 0.000000e+00 7.592323e-03 0.22461127 0.35204082 0.4157555879 [15,] 0.000000e+00 0.000000e+00 -0.14285714 0.42857143 0.7142857143 attr(,"degree") [1] 3 attr(,"knots")

20%  40%  60%  80%

60.8 63.6 66.4 69.2 attr(,"Boundary.knots") [1] 58 72 attr(,"intercept") [1] FALSE attr(,"class") [1] "ns" "basis" "matrix"

ns> summary(fm1 <- lm(weight ~ ns(height, df = 5), data = women))

Call: lm(formula = weight ~ ns(height, df = 5), data = women)

Residuals:

    Min       1Q   Median       3Q      Max

-0.38333 -0.12585 0.07083 0.15401 0.30426

Coefficients:

                   Estimate Std. Error t value Pr(>|t|)

(Intercept) 114.7447 0.2338 490.88 < 2e-16 *** ns(height, df = 5)1 15.9474 0.3699 43.12 9.69e-12 *** ns(height, df = 5)2 25.1695 0.4323 58.23 6.55e-13 *** ns(height, df = 5)3 33.2582 0.3541 93.93 8.91e-15 *** ns(height, df = 5)4 50.7894 0.6062 83.78 2.49e-14 *** ns(height, df = 5)5 45.0363 0.2784 161.75 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2645 on 9 degrees of freedom Multiple R-squared: 0.9998, Adjusted R-squared: 0.9997 F-statistic: 9609 on 5 and 9 DF, p-value: < 2.2e-16


ns> ## example of safe prediction ns> plot(women, xlab = "Height (in)", ylab = "Weight (lb)")

ns> ht <- seq(57, 73, length.out = 200)

ns> lines(ht, predict(fm1, data.frame(height=ht)))

ns> ## Don't show: ns> ## Consistency: ns> x <- c(1:3,5:6)

ns> stopifnot(identical(ns(x), ns(x, df = 1)), ns+ identical(ns(x, df=2), ns(x, df=2, knots=NULL)),# not true till 2.15.2 ns+ !is.null(kk <- attr(ns(x), "knots")),# not true till 1.5.1 ns+ length(kk) == 0)

ns> ## End Don't show ns> ns> ns>

Note that if I follow the instruction to put embed.c at the end of g++ command, I will get an error.

mli@PhenomIIx6:~/Downloads/R-2.15.2/library/AdvancedR/embedding$ g++ -I/home/mli/Downloads/R-2.15.2/include -L/home/mli/Downloads/R-2.15.2/lib -lR embed.c
/tmp/cc7Vum5j.o: In function `main':
embed.c:(.text+0x1c): undefined reference to `Rf_initEmbeddedR'
embed.c:(.text+0x2b): undefined reference to `Rf_endEmbeddedR'
/tmp/cc7Vum5j.o: In function `doSplinesExample()':
embed.c:(.text+0x45): undefined reference to `Rf_mkString'
embed.c:(.text+0x52): undefined reference to `Rf_install'
embed.c:(.text+0x5d): undefined reference to `Rf_lang2'
embed.c:(.text+0x6d): undefined reference to `Rf_protect'
embed.c:(.text+0x74): undefined reference to `R_GlobalEnv'
embed.c:(.text+0x87): undefined reference to `R_tryEval'
embed.c:(.text+0x91): undefined reference to `Rf_unprotect'
embed.c:(.text+0x9b): undefined reference to `Rf_ScalarLogical'
embed.c:(.text+0xa8): undefined reference to `Rf_install'
embed.c:(.text+0xb3): undefined reference to `Rf_lang2'
embed.c:(.text+0xc3): undefined reference to `Rf_protect'
embed.c:(.text+0xcd): undefined reference to `Rf_install'
embed.c:(.text+0xdc): undefined reference to `CDR'
embed.c:(.text+0xe7): undefined reference to `SET_TAG'
embed.c:(.text+0xee): undefined reference to `R_GlobalEnv'
embed.c:(.text+0x102): undefined reference to `R_tryEval'
embed.c:(.text+0x10c): undefined reference to `Rf_unprotect'
embed.c:(.text+0x116): undefined reference to `Rf_mkString'
embed.c:(.text+0x123): undefined reference to `Rf_install'
embed.c:(.text+0x12e): undefined reference to `Rf_lang2'
embed.c:(.text+0x13e): undefined reference to `Rf_protect'
embed.c:(.text+0x145): undefined reference to `R_GlobalEnv'
embed.c:(.text+0x158): undefined reference to `R_tryEval'
embed.c:(.text+0x162): undefined reference to `Rf_unprotect'
collect2: ld returned 1 exit status

Set up Emacs on Windows

Edit the file C:\Program Files\GNU Emacs 23.2\site-lisp\site-start.el with something like

(setq-default inferior-R-program-name
              "c:/program files/r/r-2.15.2/bin/i386/rterm.exe")

Database

RMySQL

RSQLite

Not suitable for client/server architecture. The limit is quite large; see here.

Github

R source (read only)

https://github.com/wch/r-source/

github

https://github.com/languages/R

My collection

How to download

Clone ~ Download.

  • Command line
git clone https://gist.github.com/4484270.git

This will create a subdirectory called '4484270' with all cloned files there.

  • Within R
library(devtools)
source_gist("4484270")

or First download the json file from

https://api.github.com/users/MYUSERLOGIN/gists

and then

library(RJSONIO)
x <- fromJSON("~/Downloads/gists.json")
setwd("~/Downloads/")
gist.id <- lapply(x, "[[", "id")
lapply(gist.id, function(x){
  cmd <- paste0("git clone https://gist.github.com/", x, ".git")
  system(cmd)
})

Tricks

Editor

http://en.wikipedia.org/wiki/R_(programming_language)#Editors_and_IDEs

Create R package from R code with roxyPackage

http://lamages.blogspot.com/2013/03/create-r-package-from-single-r-file.html

llply() from plyr package

llply is equivalent to lapply except that it will preserve labels and can display a progress bar. This is handy if we want to do a crazy thing.

LLID2GOIDs <- lapply(rLLID, function(x) get("org.Hs.egGO")[[x]])

where rLLID is a list of entrez ID. For example,

get("org.Hs.egGO")[["6772"]]

returns a list of 49 GOs.

mclapply() from paralle package is a mult-core version of lapply()

Note that Windows OS can not take advantage of it.

Another choice for Windows OS is to use parLapply() function in parallel package.

ncores <- as.integer( Sys.getenv('NUMBER_OF_PROCESSORS') )
cl <- makeCluster(getOption("cl.cores", ncores))
LLID2GOIDs2 <- parLapply(cl, rLLID, function(x) {
                                    library(org.Hs.eg.db); get("org.Hs.egGO")[[x]]} 
                        )
stopCluster(cl)

It does work. Cut the computing time from 100 sec to 29 sec on 4 cores.

regular expression

Not specific to R

Example

  • grep("\\.zip$", pkgs) or grep("\\.tar.gz$", pkgs)

Clipboard

source("clipboard")
read.table("clipboard")

read/download/source a file from internet

  • Simple text file
retail <- read.csv("http://robjhyndman.com/data/ausretail.csv",header=FALSE)
  • Zip file
con = gzcon(url('http://www.systematicportfolio.com/sit.gz', 'rb'))
source(con)
close(con)
  • Google drive file based on https
require(RCurl)
myCsv <- getURL("https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AkuuKBh0jM2TdGppUFFxcEdoUklCQlJhM2kweGpoUUE&single=true&gid=0&output=csv")
read.csv(textConnection(myCsv))
  • Github files https

http://support.rstudio.org/help/kb/faq/configuring-r-to-use-an-http-proxy


Create publication tables using tables package

See p13 for example in http://www.ianwatson.com.au/stata/tabout_tutorial.pdf

R's tables packages is the best solution. For example,

> library(tables)
> tabular( (Species + 1) ~ (n=1) + Format(digits=2)*
+          (Sepal.Length + Sepal.Width)*(mean + sd), data=iris )
                                                  
                Sepal.Length      Sepal.Width     
 Species    n   mean         sd   mean        sd  
 setosa      50 5.01         0.35 3.43        0.38
 versicolor  50 5.94         0.52 2.77        0.31
 virginica   50 6.59         0.64 2.97        0.32
 All        150 5.84         0.83 3.06        0.44
> str(iris)
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

and

# This example shows some of the less common options         
> Sex <- factor(sample(c("Male", "Female"), 100, rep=TRUE))
> Status <- factor(sample(c("low", "medium", "high"), 100, rep=TRUE))
> z <- rnorm(100)+5
> fmt <- function(x) {
  s <- format(x, digits=2)
  even <- ((1:length(s)) %% 2) == 0
  s[even] <- sprintf("(%s)", s[even])
  s
}
> tabular( Justify(c)*Heading()*z*Sex*Heading(Statistic)*Format(fmt())*(mean+sd) ~ Status )
                  Status              
 Sex    Statistic high   low    medium
 Female mean       4.88   4.96   5.17 
        sd        (1.20) (0.82) (1.35)
 Male   mean       4.45   4.31   5.05 
        sd        (1.01) (0.93) (0.75)

See also a collection of R packages related to reproducible research in http://cran.r-project.org/web/views/ReproducibleResearch.html

Create flat tables in R console using ftable()

> ftable(Titanic, row.vars = 1:3)
                   Survived  No Yes
Class Sex    Age                   
1st   Male   Child            0   5
             Adult          118  57
      Female Child            0   1
             Adult            4 140
2nd   Male   Child            0  11
             Adult          154  14
      Female Child            0  13
             Adult           13  80
3rd   Male   Child           35  13
             Adult          387  75
      Female Child           17  14
             Adult           89  76
Crew  Male   Child            0   0
             Adult          670 192
      Female Child            0   0
             Adult            3  20
> ftable(Titanic, row.vars = 1:2, col.vars = "Survived")
             Survived  No Yes
Class Sex                    
1st   Male            118  62
      Female            4 141
2nd   Male            154  25
      Female           13  93
3rd   Male            422  88
      Female          106  90
Crew  Male            670 192
      Female            3  20
> ftable(Titanic, row.vars = 2:1, col.vars = "Survived")
             Survived  No Yes
Sex    Class                 
Male   1st            118  62
       2nd            154  25
       3rd            422  88
       Crew           670 192
Female 1st              4 141
       2nd             13  93
       3rd            106  90
       Crew             3  20
> str(Titanic)
 table [1:4, 1:2, 1:2, 1:2] 0 0 35 0 0 0 17 0 118 154 ...
 - attr(*, "dimnames")=List of 4
  ..$ Class   : chr [1:4] "1st" "2nd" "3rd" "Crew"
  ..$ Sex     : chr [1:2] "Male" "Female"
  ..$ Age     : chr [1:2] "Child" "Adult"
  ..$ Survived: chr [1:2] "No" "Yes"
> x <- ftable(mtcars[c("cyl", "vs", "am", "gear")])
> x
          gear  3  4  5
cyl vs am              
4   0  0        0  0  0
       1        0  0  1
    1  0        1  2  0
       1        0  6  1
6   0  0        0  0  0
       1        0  2  1
    1  0        2  2  0
       1        0  0  0
8   0  0       12  0  0
       1        0  0  2
    1  0        0  0  0
       1        0  0  0
> ftable(x, row.vars = c(2, 4))
        cyl  4     6     8   
        am   0  1  0  1  0  1
vs gear                      
0  3         0  0  0  0 12  0
   4         0  0  0  2  0  0
   5         0  1  0  1  0  2
1  3         1  0  2  0  0  0
   4         2  6  2  0  0  0
   5         0  1  0  0  0  0
> 
> ## Start with expressions, use table()'s "dnn" to change labels
> ftable(mtcars$cyl, mtcars$vs, mtcars$am, mtcars$gear, row.vars = c(2, 4),
         dnn = c("Cylinders", "V/S", "Transmission", "Gears"))

          Cylinders     4     6     8   
          Transmission  0  1  0  1  0  1
V/S Gears                               
0   3                   0  0  0  0 12  0
    4                   0  0  0  2  0  0
    5                   0  1  0  1  0  2
1   3                   1  0  2  0  0  0
    4                   2  6  2  0  0  0
    5                   0  1  0  0  0  0

Handling length 2^31 and more in R 3.0.0

From R News for 3.0.0 release:

There is a subtle change in behaviour for numeric index values 2^31 and larger. These never used to be legitimate and so were treated as NA, sometimes with a warning. They are now legal for long vectors so there is no longer a warning, and x[2^31] <- y will now extend the vector on a 64-bit platform and give an error on a 32-bit one.

In R 2.15.2, if I try to assign a vector of length 2^31, I will get an error

> x <- seq(1, 2^31)
Error in from:to : result would be too long a vector

However, for R 3.0.0 (tested on my 64-bit Ubuntu with 16GB RAM. The R was compiled by myself):

> system.time(x <- seq(1,2^31))
   user  system elapsed
  8.604  11.060 120.815
> length(x)
[1] 2147483648
> length(x)/2^20
[1] 2048
> gc()
             used    (Mb) gc trigger    (Mb)   max used    (Mb)
Ncells     183823     9.9     407500    21.8     350000    18.7
Vcells 2147764406 16386.2 2368247221 18068.3 2148247383 16389.9
>

Note:

  1. 2^31 length is about 2 Giga length. It takes about 16 GB (2^31*8/2^20 MB) memory.
  2. On Windows, it is almost impossible to work with 2^31 length of data if the memory is less than 16 GB because virtual disk on Windows does not work well. For example, when I tested on my 12 GB Windows 7, the whole Windows system freezes for several minutes before I force to power off the machine.
  3. My slide in http://goo.gl/g7sGX shows the screenshots of running the above command on my Ubuntu and RHEL machines. As you can see the linux is pretty good at handling large (> system RAM) data. That said, as long as your linux system is 64-bit, you can possibly work on large data without too much pain.
  4. For large dataset, it makes sense to use database or specially crafted packages like bigmemory or ff.

NA in index

  • Question: what is seq(1, 3)[c(1, 2, NA)]?

Answer: It will reserve the element with NA in indexing and return the value NA for it.

  • Question: What is TRUE & NA?

Answer: NA

  • Question: What is FALSE & NA?

Answer: FALSE