R: Difference between revisions

From 太極
Jump to navigation Jump to search
 
Line 1: Line 1:
== Install [https://cran.r-project.org/bin/windows/Rtools/ Rtools] for Windows users ==
= Install and upgrade R =
See http://goo.gl/gYh6C for a step-by-step instruction (based on Rtools30.exe) with screenshot. Note that in the step of 'Select Components', the default is 'Package authoring installation'. But we want 'Full installation to build 32 or 64 bit R'; that is, check all components (including tcl/tk) available. The "extra" files will be stored in subdirectories of the R source home directory. These files are not needed to build packages, only to build R itself. By default, the 32-bit R source home is C:\R and 64-bit source home is C:\R64. After the installation, these two directories will contain a new directory 'Tcl'.
[[Install_R|Here]]


My preferred way is not to check the option of setting PATH environment. But I manually add the followings to the PATH environment (based on Rtools v3.2.2)
== New release ==
<pre>
* R 4.4.0
c:\Rtools\bin;
** [https://www.r-bloggers.com/2024/04/whats-new-in-r-4-4-0/ What’s new in R 4.4.0?]
c:\Rtools\gcc-4.6.3\bin;
** [https://www.r-bloggers.com/2024/05/cve-2024-27322-should-never-have-been-assigned-and-r-data-files-are-still-super-risky-even-in-r-4-4-0/ CVE-2024-27322 Should Never Have Been Assigned And R Data Files Are Still Super Risky Even In R 4.4.0], [https://www.ithome.com.tw/news/162626 程式開發語言R爆有程式碼執行漏洞,可用於供應鏈攻擊], [https://www.bleepingcomputer.com/news/security/r-language-flaw-allows-code-execution-via-rds-rdx-files/ R language flaw allows code execution via RDS/RDX files], [https://www.r-bloggers.com/2024/05/a-security-issue-with-r-serialization/ A security issue with R serialization] and the [https://cran.r-project.org/web/packages/RAppArmor/index.html RAppArmor] Package.
C:\Program Files\R\R-3.2.2\bin\i386;
* R 4.3.0
</pre>
** [https://www.jumpingrivers.com/blog/whats-new-r43/ What's new in R 4.3.0?]
** Extracting from a pipe. The underscore _ can be used to refer to the final value from a pipeline <code style="display:inline-block;">mtcars |> lm(mpg ~ disp, data = _) |> _$coef</code>. Previously we need to use [https://stackoverflow.com/a/56038303 this way] or [https://stackoverflow.com/a/60873298 this way]. If we want to apply some (anonymous) function to each element of a list, use '''map(), map_dbl()''' from the [https://purrr.tidyverse.org/ purrr].
* R 4.2.0
** Calling if() or while() with a condition of length greater than one gives an error rather than a warning.
** [https://twitter.com/henrikbengtsson/status/1501306369319735300 use underscore (_) as a placeholder on the right-hand side (RHS) of a forward pipe]. For example, '''mtcars |> subset(cyl == 4) |> lm(mpg ~ disp, data = _) '''
** [https://developer.r-project.org/Blog/public/2022/04/08/enhancements-to-html-documentation/ Enhancements to HTML Documentation]
** [https://www.jumpingrivers.com/blog/new-features-r420/ New features in R 4.2.0]
* R 4.1.0
** [https://developer.r-project.org/blosxom.cgi/R-devel/2021/01/13#n2021-01-13 pipe and shorthand for creating a function]
** [https://www.jumpingrivers.com/blog/new-features-r410-pipe-anonymous-functions/ New features in R 4.1.0] '''anonymous functions''' (lambda function)
* R 4.0.0
** [https://blog.revolutionanalytics.com/2020/04/r-400-is-released.html R 4.0.0 now available, and a look back at R's history]
** [https://www.infoworld.com/article/3540989/major-r-language-update-brings-big-changes.html R 4.0.0 brings numerous and significant changes to syntax, strings, reference counting, grid units, and more], [https://www.infoworld.com/article/3541368/how-to-run-r-40-in-docker-and-3-cool-new-r-40-features.html R 4.0: 3 new features]
**# factor is not default in data frame for character vector
**# palette() function has a new default set of colours, and [[R#New_palette_in_R_4.0.0|palette.colors() & palette.pals()]] are new
**# r"(YourString)" for ''raw'' character constants. See ?Quotes
* R 3.6.0
** [https://blog.revolutionanalytics.com/2019/05/whats-new-in-r-360.html What's new in R 3.6.0]
*** Changes to random number generation
*** More functions now support vectors with more than 2 billion elements
* R 3.5.0
** [https://community.rstudio.com/t/error-listing-packages-error-in-readrds-pfile-cannot-read-workspace-version-3-written-by-r-3-6-0/40570/2 The default serialization format for R changed in May 2018, such that new default format (version 3) for workspaces saved can no longer be read by versions of R older than 3.5]


We can make our life easy by creating a file <Rcommand.bat> with the content (also useful if you have C:\cygwin\bin in your PATH although cygwin setup will not do it automatically for you.)
= Online Editor =
We can run R on web browsers without installing it on local machines (similar to [/ideone.com Ideone.com] for C++. It does not require an account either (cf RStudio).  


PS. I put <Rcommand.bat> under C:\Program Files\R folder. I create a shortcut called 'Rcmd' on desktop. I enter '''C:\Windows\System32\cmd.exe /K "Rcommand.bat"''' in the ''Target'' entry and
== [https://rdrr.io/snippets/ rdrr.io] ==
'''"C:\Program Files\R"''' in ''Start in'' entry.
It can produce graphics too. The package I am testing ([https://www.rdocumentation.org/packages/cobs/versions/1.3-3/topics/cobs cobs]) is available too.
<pre>
@echo off
set PATH=C:\Rtools\bin;c:\Rtools\gcc-4.6.3\bin
set PATH=C:\Program Files\R\R-3.2.2\bin\i386;%PATH%
set PKG_LIBS=`Rscript -e "Rcpp:::LdFlags()"`
set PKG_CPPFLAGS=`Rscript -e "Rcpp:::CxxFlags()"`
echo Setting environment for using R
cmd
</pre>
So we can open the Command Prompt anywhere and run <Rcommand.bat> to get all environment variables ready! On Windows Vista, 7 and 8, we need to run it as administrator. OR we can change the security of the property so the current user can have an executive right.


=== [http://cran.r-project.org/doc/manuals/r-release/R-admin.html#The-Windows-toolset Windows Toolset] ===
== rstudio.cloud ==


Note that R on Windows supports [http://sourceforge.net/projects/mingw-w64/ Mingw-w64] (not Mingw which is a separate project). See [https://stat.ethz.ch/pipermail/r-devel/2013-September/067410.html here] for the issue of developing a Qt application that links against R using Rcpp. And http://qt-project.org/wiki/MinGW is the wiki for compiling Qt using MinGW and MinGW-w64.
== [https://www.rdocumentation.org/ RDocumentation] ==
The interactive engine is based on [https://github.com/datacamp/datacamp-light DataCamp Light]


=== Build R from its source on Windows OS (not cross compile on Linux) ===
For example, [https://www.rdocumentation.org/packages/dplyr/versions/0.5.0/topics/tbl_df tbl_df] function from dplyr package.  
Reference: https://cran.r-project.org/doc/manuals/R-admin.html#Installing-R-under-Windows


First we try to build 32-bit R (tested on R 3.2.2 using Rtools33). At the end I will see how to build a 64-bit R.  
The website [https://cdn.datacamp.com/dcl/standalone-example.html DataCamp] allows to run ''library()'' on the Script window. After that, we can use the packages on ''R Console''.


Download https://www.stats.ox.ac.uk/pub/Rtools/goodies/multilib/local320.zip (read https://www.stats.ox.ac.uk/pub/Rtools/libs.html). create an empty directory, say c:/R/extsoft, and unpack it in that directory by e.g.
[http://documents.datacamp.com/default_r_packages.txt Here] is a list of (common) R packages that users can use on the web.
<pre>
unzip local320.zip -d c:/R/extsoft
</pre>


Tcl: two methods
The packages on RDocumentation may be outdated. For example, the current stringr on CRAN is v1.2.0 (2/18/2017) but RDocumentation has v1.1.0 (8/19/2016).
# Download tcl file from http://www.stats.ox.ac.uk/pub/Rtools/R_Tcl_8-5-8.zip. Unzip and put 'Tcl' into R_HOME folder. 
# If you have chosen a full installation when running Rtools, then copy C:/R/Tcl or C:/R64/Tcl (not the same) to R_HOME folder.


<strike> Open a command prompt as Administrator" </strike>
= Web Applications =
[[R_web|R web applications]]


<pre>
= Creating local repository for CRAN and Bioconductor =
set PATH=c:\Rtools\bin;c:\Rtools\gcc-4.6.3\bin
[[R_repository|R repository]]
set PATH=%PATH%;C:\Users\brb\Downloads\R-3.2.2\bin\i386;c:\windows;c:\windows\system32
set TMPDIR=C:/tmp


tar --no-same-owner -xf R-3.2.2.tar.gz
= Parallel Computing =
cp -R c:\R64\Tcl c:\Users\brb\Downloads\R-3.2.2
See [[R_parallel|R parallel]].


cd R-3.2.2\src\gnuwin32
= Cloud Computing =
cp MkRules.dist MkRules.local
# Modify MkRules.local file; specifically uncomment + change the following 2 flags.
# LOCAL_SOFT = c:/R/extsoft
# EXT_LIBS = $(LOCAL_SOFT)


make
== Install R on Amazon EC2 ==
</pre>
http://randyzwitch.com/r-amazon-ec2/
If we see an error of texi2dvi() complaining pdflatex is not available, it means a vanilla R is successfully built.


If we want to build the recommended packages (MASS, lattice, Matrix, ...) as well, run (check all '''make''' option in <R_HOME\src\gnuwin32\Makefile>)
== Bioconductor on Amazon EC2 ==
<pre>
http://www.bioconductor.org/help/bioconductor-cloud-ami/
make recommended
</pre>


If we need to rebuild R for whatever reason, run
= Big Data Analysis =
<pre>
* [https://cran.r-project.org/web/views/HighPerformanceComputing.html CRAN Task View: High-Performance and Parallel Computing with R]
make clean
* [http://www.xmind.net/m/LKF2/ R for big data] in one picture
</pre>
* [https://rstudio-pubs-static.s3.amazonaws.com/72295_692737b667614d369bd87cb0f51c9a4b.html Handling large data sets in R]
* [https://www.oreilly.com/library/view/big-data-analytics/9781786466457/#toc-start Big Data Analytics with R] by Simon Walkowiak
* [https://pbdr.org/publications.html pbdR]
** https://en.wikipedia.org/wiki/Programming_with_Big_Data_in_R
** [https://olcf.ornl.gov/wp-content/uploads/2016/01/pbdr.pdf Programming with Big Data in R - pbdR] George Ostrouchov and Mike Matheson Oak Ridge National Laboratory


If we want to [http://www.stats.uwo.ca/faculty/murdoch/software/debuggingR/ build R with debug information], run
== bigmemory, biganalytics, bigtabulate ==
<pre>
make DEBUG=T
</pre>


'''NB''': 1. The above works for creating 32-bit R from its source. If we want to build 64-bit R from its source, we need to modify MkRules.local file to turn on the '''MULTI''' flag.
== ff, ffbase ==
<pre>
* tapply does not work. [https://stackoverflow.com/questions/16470677/using-tapply-ave-functions-for-ff-vectors-in-r Using tapply, ave functions for ff vectors in R]
MULTI = 64
* [http://www.bnosac.be/index.php/blog/12-popularity-bigdata-large-data-packages-in-r-and-ffbase-user-presentation Popularity bigdata / large data packages in R and ffbase useR presentation]
</pre>
* [http://www.bnosac.be/images/bnosac/blog/user2013_presentation_ffbase.pdf ffbase: statistical functions for large datasets] in useR 2013
and reset the PATH variable
* [https://www.rdocumentation.org/packages/ffbase/versions/0.12.7/topics/ffbase-package ffbase] package
<pre>
set PATH=c:\Rtools\bin;c:\Rtools\gcc-4.6.3\bin
set PATH=%PATH%;C:\Users\brb\Downloads\R-3.2.2\bin\x64;c:\windows;c:\windows\system32
</pre>
I don't need to mess up with other flags like BINPREF64, M_ARCH, AS_ARCH, RC_ARCH, DT_ARCH or even WIN. The note http://www.stat.yale.edu/~jay/Admin3.3.pdf is kind of old and is not needed. 2. If we have already built 32-bit R and want to continue to build 64-bit R, it is not enough to run 'make clean' before run 'make' again since it will give an error message ''[http://r.789695.n4.nabble.com/compiling-R-for-Windows-64-bit-td4651400.html incompatible ./libR.dll.a when searching for -lR]'' in building Rgraphapp.dll. In fact, libR.dll.a can be cleaned up if we run 'make distclean' but it will also wipe out /bin/i386 folder:(


See also [[R#Create_a_standalone_Rmath_library|Create_a_standalone_Rmath_library]] below about how to create and use a standalone Rmath library in your own C/C++/Fortran program. For example, if you want to know the 95-th percentile of a T distribution or generate a bunch of random variables, you don't need to search internet to find a library; you can just use Rmath library.
== biglm ==


=== Build R from its source on Linux (cross compile) ===
== data.table ==
See [[Tidyverse#data.table|data.table]].


=== Compile and install an R package ===
== disk.frame ==
'''Command line'''
[https://www.brodrigues.co/blog/2019-10-05-parallel_maxlik/ Split-apply-combine for Maximum Likelihood Estimation of a linear model]
<pre>
cd C:\Documents and Settings\brb
wget http://www.bioconductor.org/packages/2.11/bioc/src/contrib/affxparser_1.30.2.tar.gz
C:\progra~1\r\r-2.15.2\bin\R CMD INSTALL --build affxparser_1.30.2.tar.gz
</pre>
'''N.B.''' the ''--build'' is used to create a binary package (i.e. affxparser_1.30.2.zip). In the above example, it will both install the package and create a binary version of the package. If we don't want the binary package, we can ignore the flag.


'''R console'''
== Apache arrow ==
<pre>
* https://arrow.apache.org/docs/r/
install.packages("C:/Users/USERNAME/Downloads/DESeq2paper_1.3.tar.gz", repos=NULL, type="source")
* [https://www.infoworld.com/article/3637038/the-best-open-source-software-of-2021.html#slide17 The best open source software of 2021]
</pre>


See Chapter 6 of [http://cran.r-project.org/doc/manuals/r-release/R-admin.html R Installation and Administration]
= Reproducible Research =
* http://cran.r-project.org/web/views/ReproducibleResearch.html
* [[Reproducible|Reproducible]]


=== Check/Upload to CRAN ===
== Reproducible Environments ==
https://rviews.rstudio.com/2019/04/22/reproducible-environments/


http://win-builder.r-project.org/
== checkpoint package ==
* https://cran.r-project.org/web/packages/checkpoint/index.html
* [https://timogrossenbacher.ch/2017/07/a-truly-reproducible-r-workflow/ A (truly) reproducible R workflow]


=== 64 bit toolchain ===
== Some lessons in R coding ==
See January 2010 email https://stat.ethz.ch/pipermail/r-devel/2010-January/056301.html and [http://cran.r-project.org/doc/manuals/r-patched/R-admin.html#g_t64_002dbit-Windows-builds R-Admin manual].
# don't use rand() and srand() in c. The result is platform dependent. My experience is Ubuntu/Debian/CentOS give the same result but they are different from macOS and Windows. Use [[Rcpp|Rcpp]] package and R's random number generator instead.
# don't use [https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/list.files list.files()] directly. The result is platform dependent even different Linux OS. An extra [https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/sort sorting] helps!


From R 2.11.0 there is 64 bit Windows binary for R.
= Useful R packages =
* [https://support.rstudio.com/hc/en-us/articles/201057987-Quick-list-of-useful-R-packages Quick list of useful R packages]
* [https://github.com/qinwf/awesome-R awesome-R]
* [https://stevenmortimer.com/one-r-package-a-day/ One R package a day]


== Install R using binary package on Linux OS ==
== Rcpp ==
=== Ubuntu/Debian ===
http://cran.r-project.org/web/packages/Rcpp/index.html. See more [[Rcpp|here]].
* https://cran.rstudio.com/bin/linux/ubuntu/. For more info about GPG stuff, see [[Linux#GPG.2FAuthentication_key|GPG Authentication_key]].
* [http://dirk.eddelbuettel.com/blog/2018/06/11/#r_3_5_0_deb_update R 3.5.0 on Debian and Ubuntu: An Update]


<syntaxhighlight lang='bash'>
== RInside : embed R in C++ code ==
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9
* http://dirk.eddelbuettel.com/code/rinside.html
# Some people have reported difficulties using this approach. The issue is usually related to a firewall blocking port 11371
* http://dirk.eddelbuettel.com/papers/rfinance2010_rcpp_rinside_tutorial_handout.pdf
# So alternatively (no sudo is needed in front of the gpg command)
# gpg --keyserver keyserver.ubuntu.com --recv-key E084DAB9
# gpg -a --export E084DAB9 | sudo apt-key add -
sudo nano /etc/apt/sources.list
# For Ubuntu 14.04 (codename is trusty; https://wiki.ubuntu.com/Releases)
# deb https://cran.rstudio.com/bin/linux/ubuntu trusty/
# deb-src https://cran.rstudio.com/bin/linux/ubuntu trusty/
sudo apt-get update
sudo apt-get install r-base
</syntaxhighlight>


[http://askubuntu.com/questions/36507/how-do-i-import-a-public-key Manually create the public key file] if the ''gpg'' command failed.
=== Ubuntu ===
With RInside, R can be embedded in a graphical application. For example, $HOME/R/x86_64-pc-linux-gnu-library/3.0/RInside/examples/qt directory includes source code of a Qt application to show a kernel density plot with various options like kernel functions, bandwidth and an R command text box to generate the random data. See my demo on [http://www.youtube.com/watch?v=UQ8yKQcPTg0 Youtube]. I have tested this '''qtdensity''' example successfully using Qt 4.8.5.
# Follow the instruction [[#cairoDevice|cairoDevice]] to install required libraries for cairoDevice package and then cairoDevice itself.
# Install [[Qt|Qt]]. Check 'qmake' command becomes available by typing 'whereis qmake' or 'which qmake' in terminal.
# Open Qt Creator from Ubuntu start menu/Launcher. Open the project file $HOME/R/x86_64-pc-linux-gnu-library/3.0/RInside/examples/qt/qtdensity.pro in Qt Creator.
# Under Qt Creator, hit 'Ctrl + R' or the big green triangle button on the lower-left corner to build/run the project. If everything works well, you shall see the ''interactive'' program qtdensity appears on your desktop.


=== Ubuntu/Debian goodies ===
[[:File:qtdensity.png]]
Since the R packages '''XML''' & '''RCurl''' & '''httr''' are frequently used by other packages (e.g. miniCRAN), it is useful to run the following so the ''install.packages("c(RCurl", "XML", "httr"))''  can work without hiccups.
<syntaxhighlight lang='bash'>
sudo apt-get update
sudo apt-get install libxml2-dev
sudo apt-get install curl libcurl4-openssl-dev
sudo apt-get install libssl-dev
</syntaxhighlight>


See also [https://msperlin.github.io/2017-06-01-Instaling-R-in-Linux/ Simple bash script for a fresh install of R and its dependencies in Linux].
With RInside + [http://www.webtoolkit.eu/wt Wt web toolkit] installed, we can also create a web application. To demonstrate the example in ''examples/wt'' directory, we can do
 
To find out the exact package names (in the situation the version number changes, not likely with these two cases: xml and curl), consider the following approach
<syntaxhighlight lang='bash'>
# Search 'curl' but also highlight matches containing both 'lib' and 'dev'
> apt-cache search curl | awk '/lib/ && /dev/'
libcurl4-gnutls-dev - development files and documentation for libcurl (GnuTLS flavour)
libcurl4-nss-dev - development files and documentation for libcurl (NSS flavour)
libcurl4-openssl-dev - development files and documentation for libcurl (OpenSSL flavour)
libcurl-ocaml-dev - OCaml libcurl bindings (Development package)
libcurlpp-dev - c++ wrapper for libcurl (development files)
libflickcurl-dev - C library for accessing the Flickr API - development files
libghc-curl-dev - GHC libraries for the libcurl Haskell bindings
libghc-hxt-curl-dev - LibCurl interface for HXT
libghc-hxt-http-dev - Interface to native Haskell HTTP package HTTP
libresource-retriever-dev - Robot OS resource_retriever library - development files
libstd-rust-dev - Rust standard libraries - development files
lua-curl-dev - libcURL development files for the Lua language
</syntaxhighlight>
 
If we need to install 'rgl' and related packages,
<syntaxhighlight lang='bash'>
sudo apt install libcgal-dev libglu1-mesa-dev
sudo apt install libfreetype6-dev
</syntaxhighlight>
 
=== Windows Subsystem for Linux ===
http://blog.revolutionanalytics.com/2017/12/r-in-the-windows-subsystem-for-linux.html
 
=== Redhat el6 ===
It should be pretty easy to install via the EPEL:  http://fedoraproject.org/wiki/EPEL
 
Just follow the instructions to enable the EPEL OR using the command line
<syntaxhighlight lang='bash'>
sudo rpm -ivh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
sudo yum update # not sure if this is necessary
</syntaxhighlight>
and then from the CLI:
<syntaxhighlight lang='bash'>
sudo yum install R
</syntaxhighlight>
 
== Install R from source (ix86, x86_64 and arm platforms, Linux system) ==
 
=== Debian system (focus on arm architecture with notes from x86 system) ===
==== Simplest configuration ====
<Method 1 of installing requirements>
 
On my debian system in [[NAS|Pogoplug]] (armv5), [[raspberry|Raspberry Pi]] (armv6) OR [[beaglebone|Beaglebone Black]] & [[Udoo|Udoo]](armv7), I can compile R. See R's [http://cran.r-project.org/doc/manuals/R-admin.html#Installing-R-under-Unix_002dalikes admin manual]. If the OS needs x11, I just need to install 2 required packages.
 
* install gfortran: '''apt-get install build-essential gfortran''' (gfortran is not part of build-essential)
* install readline library: '''apt-get install libreadline5-dev''' (pogoplug), '''apt-get install libreadline6-dev''' (raspberry pi/BBB), '''apt-get install libreadline-dev''' (Ubuntu)
 
Note: if I need X11, I should install
* libX11 and libX11-devel, libXt, libXt-devel (for fedora)
* '''libx11-dev''' (for debian) or '''xorg-dev''' (for pogoplug/raspberry pi/BBB/Odroid debian). See [http://unix.stackexchange.com/questions/14085/x-xorg-and-d-bus-what-is-the-difference here] for the difference of x11 and Xorg.
and optional
* '''texinfo''' (to fix 'WARNING: you cannot build info or HTML versions of the R manuals')
 
<Method 2 of installing requirements (recommended)>
 
Note that it is also safe to install required tools via (please run '''sudo nano /etc/apt/sources.list''' to include the ''source'' repository of your favorite R mirror, such as '''deb-src https://cran.rstudio.com/bin/linux/ubuntu xenial/''' and also run sudo apt-get update first)
<syntaxhighlight lang='bash'>
sudo apt-get build-dep r-base
</syntaxhighlight>
The above command will install R dependence like jdk, tcl, tex, X11 libraries, etc. The ''apt-get build-dep'' gave a more complete list than ''apt-get install r-base-dev'' for some reasons.
 
[Arm architecture] I also run '''apt-get install readline-common'''. I don't know if this is necessary.
If x11 is not needed or not available (eg Pogoplug), I can add '''--with-x=no''' option in ./configure command. If R will be called from other applications such as [[Rserve|Rserve]], I can add '''--enable-R-shlib''' option in ./configure command. Check out ''./configure --help'' to get a complete list of all options.
 
After running
<syntaxhighlight lang='bash'>
wget https://cran.rstudio.com/src/base/R-3/R-3.2.3.tar.gz
tar xzvf R-3.2.3.tar.gz
cd R-3.2.3
./configure --enable-R-shlib
</syntaxhighlight>
('''--enable-R-shlib''' option will create a shared R library '''libR.so''' in $RHOME/lib subdirectory. This allows R to be embedded in other applications. See [[#Embedding_R|Embedding R]].) I got
<pre>
<pre>
R is now configured for armv5tel-unknown-linux-gnueabi
cd ~/R/x86_64-pc-linux-gnu-library/3.0/RInside/examples/wt
make
sudo ./wtdensity --docroot . --http-address localhost --http-port 8080
</pre>
Then we can go to the browser's address bar and type ''http://localhost:8080'' to see how it works (a screenshot is in [http://dirk.eddelbuettel.com/blog/2011/11/30/ here]).


  Source directory:         .
=== Windows 7 ===
  Installation directory:   /usr/local
To make RInside works on Windows OS, try the following
 
# Make sure R is installed under '''C:\''' instead of '''C:\Program Files''' if we don't want to get an error like ''g++.exe: error: Files/R/R-3.0.1/library/RInside/include: No such file or directory''.
  C compiler:               gcc -std=gnu99  -g -O2
# Install RTools
  Fortran 77 compiler:      gfortran  -g -O2
# Instal RInside package from source (the binary version will give an [http://stackoverflow.com/questions/13137770/fatal-error-unable-to-open-the-base-package error ])
 
# Create a DOS batch file containing necessary paths in PATH environment variable
  C++ compiler:             g++  -g -O2
<pre>
  Fortran 90/95 compiler:   gfortran -g -O2
@echo off
  Obj-C compiler:
set PATH=C:\Rtools\bin;c:\Rtools\gcc-4.6.3\bin;%PATH%
 
set PATH=C:\R\R-3.0.1\bin\i386;%PATH%
  Interfaces supported:
set PKG_LIBS=`Rscript -e "Rcpp:::LdFlags()"`
  External libraries:       readline
set PKG_CPPFLAGS=`Rscript -e "Rcpp:::CxxFlags()"`
  Additional capabilities:   NLS
set R_HOME=C:\R\R-3.0.1
  Options enabled:           shared R library, shared BLAS, R profiling
echo Setting environment for using R
 
cmd
  Recommended packages:     yes
 
configure: WARNING: you cannot build info or HTML versions of the R manuals
configure: WARNING: you cannot build PDF versions of the R manuals
configure: WARNING: you cannot build PDF versions of vignettes and help pages
configure: WARNING: I could not determine a browser
configure: WARNING: I could not determine a PDF viewer
</pre>
</pre>
After that, we can run '''make''' to create R binary. If the computer has multiple cores, we can run ''make'' in parallel by using the '''-j''' flag (for example, '-j4' means to run 4 jobs simultaneously). We can also add '''time''' command in front of ''make'' to report the ''make'' time (useful for benchmark).
In the Windows command prompt, run  
<syntaxhighlight lang='bash'>
make 
# make -j4
# time make
</syntaxhighlight>
 
PS 1. On my raspberry pi machine, it shows '''R is now configured for armv6l-unknown-linux-gnueabihf''' and on Beaglebone black it shows '''R is now configured for armv7l-unknown-linux-gnueabihf'''.
 
PS 2. On my Beaglebone black, it took 2 hours to run 'make', Raspberry Pi 2 took 1 hour, Odroid XU4 took 23 minutes and it only took 5 minutes to run 'make -j 12' on my Xeon W3690 @ 3.47Ghz (6 cores with hyperthread) based on R 3.1.2. The timing is obtained by using 'time' command as described above.
 
PS 3. On my x86 system, it shows
<pre>
<pre>
R is now configured for x86_64-unknown-linux-gnu
cd C:\R\R-3.0.1\library\RInside\examples\standard
 
make -f Makefile.win
  Source directory:          .
  Installation directory:    /usr/local
 
  C compiler:                gcc -std=gnu99  -g -O2
  Fortran 77 compiler:      gfortran  -g -O2
 
  C++ compiler:              g++  -g -O2
  Fortran 90/95 compiler:    gfortran -g -O2
  Obj-C compiler:
 
  Interfaces supported:      X11, tcltk
  External libraries:        readline, lzma
  Additional capabilities:  PNG, JPEG, TIFF, NLS, cairo
  Options enabled:          shared R library, shared BLAS, R profiling, Java
 
  Recommended packages:      yes
</pre>
</pre>
 
Now we can test by running any of executable files that '''make''' generates. For example, ''rinside_sample0''.
[arm] <strike>However, '''make''' gave errors for recommanded packages like KernSmooth, MASS, boot, class, cluster, codetools, foreign, lattice, mgcv, nlme, nnet, rpart, spatial, and survival. The error stems from
'''gcc: SHLIB_LIBADD: No such file or directory'''. Note that I can get this error message even I try '''install.packages("MASS", type="source")'''. A suggested fix is [http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=679180 here]; adding '''perl = TRUE''' in sub() call for two lines in '''src/library/tools/R/install.R''' file. However, I got another error '''shared object 'MASS.so' not found'''. See also http://ftp.debian.org/debian/pool/main/r/r-base/. </strike>To build R without recommended packages like '''./configure --without-recommended'''.
 
<pre>
<pre>
make[1]: Entering directory `/mnt/usb/R-2.15.2/src/library/Recommended'
rinside_sample0
make[2]: Entering directory `/mnt/usb/R-2.15.2/src/library/Recommended'
</pre>
begin installing recommended package MASS
* installing *source* package 'MASS' ...
** libs
make[3]: Entering directory `/tmp/Rtmp4caBfg/R.INSTALL1d1244924c77/MASS/src'
gcc -std=gnu99 -I/mnt/usb/R-2.15.2/include -DNDEBUG  -I/usr/local/include    -fpic  -g -O2  -c MASS.c -o MASS.o
gcc -std=gnu99 -I/mnt/usb/R-2.15.2/include -DNDEBUG  -I/usr/local/include    -fpic  -g -O2  -c lqs.c -o lqs.o
gcc -std=gnu99 -shared -L/usr/local/lib -o MASSSHLIB_EXT MASS.o lqs.o SHLIB_LIBADD -L/mnt/usb/R-2.15.2/lib -lR
gcc: SHLIB_LIBADD: No such file or directory
make[3]: *** [MASSSHLIB_EXT] Error 1
make[3]: Leaving directory `/tmp/Rtmp4caBfg/R.INSTALL1d1244924c77/MASS/src'
ERROR: compilation failed for package 'MASS'
* removing '/mnt/usb/R-2.15.2/library/MASS'
make[2]: *** [MASS.ts] Error 1
make[2]: Leaving directory `/mnt/usb/R-2.15.2/src/library/Recommended'
make[1]: *** [recommended-packages] Error 2
make[1]: Leaving directory `/mnt/usb/R-2.15.2/src/library/Recommended'
make: *** [stamp-recommended] Error 2
root@debian:/mnt/usb/R-2.15.2#
root@debian:/mnt/usb/R-2.15.2# bin/R


R version 2.15.2 (2012-10-26) -- "Trick or Treat"
As for the Qt application qdensity program, we need to make sure the same version of MinGW was used in building RInside/Rcpp and Qt. See  some discussions in
Copyright (C) 2012 The R Foundation for Statistical Computing
* http://stackoverflow.com/questions/12280707/using-rinside-with-qt-in-windows
ISBN 3-900051-07-0
* http://www.mail-archive.com/rcpp-[email protected]-forge.r-project.org/msg04377.html
Platform: armv5tel-unknown-linux-gnueabi (32-bit)
So the Qt and Wt web tool applications on Windows may or may not be possible.


R is free software and comes with ABSOLUTELY NO WARRANTY.
== GUI ==
You are welcome to redistribute it under certain conditions.
=== Qt and R ===
Type 'license()' or 'licence()' for distribution details.
* http://cran.r-project.org/web/packages/qtbase/index.html [https://stat.ethz.ch/pipermail/r-devel/2015-July/071495.html QtDesigner is such a tool, and its output is compatible with the qtbase R package]
* http://qtinterfaces.r-forge.r-project.org


R is a collaborative project with many contributors.
== tkrplot ==
Type 'contributors()' for more information and
On Ubuntu, we need to install tk packages, such as by
'citation()' on how to cite R or R packages in publications.
<pre>
sudo apt-get install tk-dev
</pre>


Type 'demo()' for some demos, 'help()' for on-line help, or
== reticulate - Interface to 'Python' ==
'help.start()' for an HTML browser interface to help.
[[Python#R_and_Python:_reticulate_package|Python -> reticulate]]
Type 'q()' to quit R.


> library(MASS)
== Hadoop (eg ~100 terabytes) ==
Error in library(MASS) : there is no package called 'MASS'
See also [http://cran.r-project.org/web/views/HighPerformanceComputing.html HighPerformanceComputing]
> library()
Packages in library '/mnt/usb/R-2.15.2/library':


base                    The R Base Package
* RHadoop
compiler                The R Compiler Package
* Hive
datasets                The R Datasets Package
* [http://cran.r-project.org/web/packages/mapReduce/ MapReduce]. Introduction by [http://www.linuxjournal.com/content/introduction-mapreduce-hadoop-linux Linux Journal].
grDevices              The R Graphics Devices and Support for Colours
* http://www.techspritz.com/category/tutorials/hadoopmapredcue/ Single node or multinode cluster setup using Ubuntu with VirtualBox (Excellent)
                        and Fonts
* [http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ Running Hadoop on Ubuntu Linux (Single-Node Cluster)]
graphics                The R Graphics Package
* Ubuntu 12.04 http://www.youtube.com/watch?v=WN2tJk_oL6E and [https://www.dropbox.com/s/05aurcp42asuktp/Chiu%20Hadoop%20Pig%20Install%20Instructions.docx instruction]
grid                    The Grid Graphics Package
* Linux Mint http://blog.hackedexistence.com/installing-hadoop-single-node-on-linux-mint
methods                Formal Methods and Classes
* http://www.r-bloggers.com/search/hadoop
parallel                Support for Parallel computation in R
 
splines                Regression Spline Functions and Classes
=== [https://github.com/RevolutionAnalytics/RHadoop/wiki RHadoop] ===
stats                  The R Stats Package
* [http://www.rdatamining.com/tutorials/r-hadoop-setup-guide RDataMining.com] based on Mac.
stats4                  Statistical Functions using S4 Classes
* Ubuntu 12.04 - [http://crishantha.com/wp/?p=1414 Crishantha.com], [http://nikhilshah123sh.blogspot.com/2014/03/setting-up-rhadoop-in-ubuntu-1204.html nikhilshah123sh.blogspot.com].[http://bighadoop.wordpress.com/2013/02/25/r-and-hadoop-data-analysis-rhadoop/ Bighadoop.wordpress] contains an example.
tcltk                  Tcl/Tk Interface
* RapReduce in R by [https://github.com/RevolutionAnalytics/rmr2/blob/master/docs/tutorial.md RevolutionAnalytics] with a few examples.
tools                  Tools for Package Development
* https://twitter.com/hashtag/rhadoop
utils                  The R Utils Package
* [http://bigd8ta.com/step-by-step-guide-to-setting-up-an-r-hadoop-system/ Bigd8ta.com] based on Ubuntu 14.04.
> Sys.info()["machine"]
  machine
"armv5tel"
> gc()
        used (Mb) gc trigger (Mb) max used (Mb)
Ncells 170369  4.6    350000  9.4  350000  9.4
Vcells 163228  1.3    905753  7.0  784148  6.0
</pre>
See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=679180


PS 4. The complete log of building R from source is in here [[File:Build_R_log.txt‎]]
=== Snowdoop: an alternative to MapReduce algorithm ===
* http://matloff.wordpress.com/2014/11/26/how-about-a-snowdoop-package/
* http://matloff.wordpress.com/2014/12/26/snowdooppartools-update/comment-page-1/#comment-665


==== Full configuration ====
== [http://cran.r-project.org/web/packages/XML/index.html XML] ==
On Ubuntu, we need to install libxml2-dev before we can install XML package.
<pre>
<pre>
  Interfaces supported:      X11, tcltk
sudo apt-get update
  External libraries:        readline
sudo apt-get install libxml2-dev
  Additional capabilities:  PNG, JPEG, TIFF, NLS, cairo
  Options enabled:          shared R library, shared BLAS, R profiling, Java
</pre>
</pre>


==== Update: R 3.0.1 on Beaglebone Black (armv7a) + Ubuntu 13.04 ====
On CentOS,
See the page [[Beaglebone#Build R on BBB|here]].
==== Update: R 3.1.3 & R 3.2.0 on Raspberry Pi 2 ====
It took 134m to run 'make -j 4' on RPi 2 using R 3.1.3.
 
But I got an error when I ran './configure; make -j 4' using R 3.2.0. The errors start from compiling <main/connections.c> file with 'undefined reference to ....'. The gcc version is 4.6.3.
 
==== Raspbian stretch on Pi zero W ====
Edit '''/etc/apt/sources.list''' and add the following 2 lines
<pre>
<pre>
deb http://mirrordirector.raspbian.org/raspbian/ stretch main contrib non-free rpi
yum -y install libxml2 libxml2-devel
deb-src http://mirrordirector.raspbian.org/raspbian/ stretch main contrib non-free rpi
</pre>
</pre>
Now it is ready to run '''sudo apt-get build-dep r-base'''. Note that this will install lots of packages:
<syntaxhighlight lang='bash'>
pi@raspberrypi:~ $ sudo apt-get build-dep r-base
Reading package lists... Done
Reading package lists... Done
Building dependency tree     
Reading state information... Done
The following NEW packages will be installed:
  adwaita-icon-theme autoconf automake autopoint autotools-dev bison ca-certificates-java dconf-gsettings-backend
  dconf-service debhelper default-jdk default-jdk-headless default-jre default-jre-headless dh-autoreconf
  dh-strip-nondeterminism fontconfig fontconfig-config fonts-cabin fonts-comfortaa fonts-croscore
  fonts-crosextra-caladea fonts-crosextra-carlito fonts-dejavu-core fonts-dejavu-extra fonts-ebgaramond
  fonts-ebgaramond-extra fonts-font-awesome fonts-freefont-otf fonts-freefont-ttf fonts-gfs-artemisia
  fonts-gfs-complutum fonts-gfs-didot fonts-gfs-neohellenic fonts-gfs-olga fonts-gfs-solomos fonts-junicode
  fonts-lato fonts-linuxlibertine fonts-lmodern fonts-lobster fonts-lobstertwo fonts-noto-hinted
  fonts-oflb-asana-math fonts-roboto-hinted fonts-sil-gentium fonts-sil-gentium-basic fonts-sil-gentiumplus
  fonts-sil-gentiumplus-compact fonts-stix gettext gfortran gfortran-6 gir1.2-freedesktop gir1.2-glib-2.0
  gir1.2-pango-1.0 glib-networking glib-networking-common glib-networking-services gsettings-desktop-schemas
  gtk-update-icon-cache hicolor-icon-theme icu-devtools intltool-debian java-common libarchive-zip-perl libasyncns0
  libatk-bridge2.0-0 libatk-wrapper-java libatk-wrapper-java-jni libatk1.0-0 libatk1.0-data libatspi2.0-0
  libavahi-client3 libbison-dev libblas-common libblas-dev libblas3 libbz2-dev libcairo-gobject2
  libcairo-script-interpreter2 libcairo2 libcairo2-dev libcolord2 libcroco3 libcups2 libcupsimage2
  libcurl4-openssl-dev libdatrie1 libdconf1 libepoxy0 libexpat1-dev libfile-stripnondeterminism-perl libflac8
  libfontconfig1 libfontconfig1-dev libfontenc1 libgdk-pixbuf2.0-0 libgdk-pixbuf2.0-common libgfortran-6-dev
  libgfortran3 libgif7 libgirepository-1.0-1 libgl1-mesa-glx libglapi-mesa libglib2.0-bin libglib2.0-dev
  libgraphite2-3 libgraphite2-dev libgs9 libgs9-common libgtk-3-0 libgtk-3-common libgtk2.0-0 libgtk2.0-common
  libharfbuzz-dev libharfbuzz-gobject0 libharfbuzz-icu0 libharfbuzz0b libice-dev libice6 libicu-dev libijs-0.35
  libjbig-dev libjbig0 libjbig2dec0 libjpeg-dev libjpeg62-turbo-dev libjson-glib-1.0-0 libjson-glib-1.0-common
  libkpathsea6 liblapack-dev liblapack3 liblcms2-2 liblzma-dev libncurses5-dev libnspr4 libnss3 libogg0 libopenjp2-7
  libpango-1.0-0 libpango1.0-dev libpangocairo-1.0-0 libpangoft2-1.0-0 libpangoxft-1.0-0 libpaper-utils libpaper1
  libpcre16-3 libpcre3-dev libpcre32-3 libpcrecpp0v5 libpixman-1-0 libpixman-1-dev libpoppler64 libpotrace0
  libproxy1v5 libptexenc1 libpthread-stubs0-dev libpulse0 libreadline-dev librest-0.7-0 librsvg2-2 librsvg2-common
  libsm-dev libsm6 libsndfile1 libsoup-gnome2.4-1 libsoup2.4-1 libsynctex1 libtcl8.6 libtexlua52 libtexluajit2
  libtext-unidecode-perl libthai-data libthai0 libtiff5 libtiff5-dev libtiffxx5 libtinfo-dev libtk8.6 libtool
  libvorbis0a libvorbisenc2 libx11-dev libx11-xcb1 libxau-dev libxaw7 libxcb-dri2-0 libxcb-dri3-0 libxcb-glx0
  libxcb-present0 libxcb-render0 libxcb-render0-dev libxcb-shape0 libxcb-shm0 libxcb-shm0-dev libxcb-sync1
  libxcb1-dev libxcomposite1 libxcursor1 libxdamage1 libxdmcp-dev libxext-dev libxfixes3 libxfont1 libxfont2
  libxft-dev libxft2 libxi6 libxinerama1 libxkbfile1 libxml-libxml-perl libxml-namespacesupport-perl
  libxml-sax-base-perl libxml-sax-perl libxmu6 libxpm4 libxrandr2 libxrender-dev libxrender1 libxshmfence1 libxss-dev
  libxss1 libxt-dev libxt6 libxtst6 libxv1 libxxf86dga1 libxxf86vm1 libzzip-0-13 m4 mpack openjdk-8-jdk
  openjdk-8-jdk-headless openjdk-8-jre openjdk-8-jre-headless po-debconf poppler-data preview-latex-style t1utils
  tcl8.6 tcl8.6-dev tex-common texinfo texlive-base texlive-binaries texlive-extra-utils texlive-fonts-extra
  texlive-fonts-recommended texlive-generic-recommended texlive-latex-base texlive-latex-extra
  texlive-latex-recommended texlive-pictures tk8.6 tk8.6-dev ttf-adf-accanthis ttf-adf-gillius ttf-adf-universalis
  x11-common x11-utils x11-xkb-utils x11proto-core-dev x11proto-input-dev x11proto-kb-dev x11proto-render-dev
  x11proto-scrnsaver-dev x11proto-xext-dev xdg-utils xfonts-base xfonts-encodings xfonts-utils xorg-sgml-doctools
  xserver-common xtrans-dev xvfb
0 upgraded, 276 newly installed, 0 to remove and 0 not upgraded.
Need to get 518 MB of archives.
After this operation, 1,578 MB of additional disk space will be used.
Do you want to continue? [Y/n] n
$ ./configure --with-x=no
$ time make
...
make[1]: Leaving directory '/home/pi/R-3.5.1'


real 213m9.985s
=== XML ===
user 206m50.825s
* http://giventhedata.blogspot.com/2012/06/r-and-web-for-beginners-part-ii-xml-in.html. It gave an example of extracting the XML-values from each XML-tag for all nodes and save them in a data frame using '''xmlSApply()'''.
sys 3m23.482s
* http://www.quantumforest.com/2011/10/reading-html-pages-in-r-for-text-processing/
</syntaxhighlight>
* https://tonybreyal.wordpress.com/2011/11/18/htmltotext-extracting-text-from-html-via-xpath/
* https://www.tutorialspoint.com/r/r_xml_files.htm
* https://www.datacamp.com/community/tutorials/r-data-import-tutorial#xml
* [http://www.stat.berkeley.edu/~statcur/Workshop2/Presentations/XML.pdf Extracting data from XML] PubMed and Zillow are used to illustrate. xmlTreeParse(),  xmlRoot(),  xmlName() and xmlSApply().
* https://yihui.name/en/2010/10/grabbing-tables-in-webpages-using-the-xml-package/
{{Pre}}
library(XML)


=== Install all dependencies for building R ===
# Read and parse HTML file
This is a comprehensive list. This list is even larger than r-base-dev.
doc.html = htmlTreeParse('http://apiolaza.net/babel.html', useInternal = TRUE)
<syntaxhighlight lang='bash'>
root@debian:/mnt/usb/R-2.15.2# apt-get build-dep r-base
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages will be REMOVED:
  libreadline5-dev
The following NEW packages will be installed:
  bison ca-certificates ca-certificates-java debhelper defoma ed file fontconfig gettext
  gettext-base html2text intltool-debian java-common libaccess-bridge-java
  libaccess-bridge-java-jni libasound2 libasyncns0 libatk1.0-0 libaudit0 libavahi-client3
  libavahi-common-data libavahi-common3 libblas-dev libblas3gf libbz2-dev libcairo2
  libcairo2-dev libcroco3 libcups2 libdatrie1 libdbus-1-3 libexpat1-dev libflac8
  libfontconfig1-dev libfontenc1 libfreetype6-dev libgif4 libglib2.0-dev libgtk2.0-0
  libgtk2.0-common libice-dev libjpeg62-dev libkpathsea5 liblapack-dev liblapack3gf libnewt0.52
  libnspr4-0d libnss3-1d libogg0 libopenjpeg2 libpango1.0-0 libpango1.0-common libpango1.0-dev
  libpcre3-dev libpcrecpp0 libpixman-1-0 libpixman-1-dev libpng12-dev libpoppler5 libpulse0
  libreadline-dev libreadline6-dev libsm-dev libsndfile1 libthai-data libthai0 libtiff4-dev
  libtiffxx0c2 libunistring0 libvorbis0a libvorbisenc2 libxaw7 libxcb-render-util0
  libxcb-render-util0-dev libxcb-render0 libxcb-render0-dev libxcomposite1 libxcursor1
  libxdamage1 libxext-dev libxfixes3 libxfont1 libxft-dev libxi6 libxinerama1 libxkbfile1
  libxmu6 libxmuu1 libxpm4 libxrandr2 libxrender-dev libxss-dev libxt-dev libxtst6 luatex m4
  openjdk-6-jdk openjdk-6-jre openjdk-6-jre-headless openjdk-6-jre-lib openssl pkg-config
  po-debconf preview-latex-style shared-mime-info tcl8.5-dev tex-common texi2html texinfo
  texlive-base texlive-binaries texlive-common texlive-doc-base texlive-extra-utils
  texlive-fonts-recommended texlive-generic-recommended texlive-latex-base texlive-latex-extra
  texlive-latex-recommended texlive-pictures tk8.5-dev tzdata-java whiptail x11-xkb-utils
  x11proto-render-dev x11proto-scrnsaver-dev x11proto-xext-dev xauth xdg-utils xfonts-base
  xfonts-encodings xfonts-utils xkb-data xserver-common xvfb zlib1g-dev
0 upgraded, 136 newly installed, 1 to remove and 0 not upgraded.
Need to get 139 MB of archives.
After this operation, 410 MB of additional disk space will be used.
Do you want to continue [Y/n]?
</syntaxhighlight>


=== Instruction of installing a development version of R under Ubuntu ===
# Extract all the paragraphs (HTML tag is p, starting at
https://github.com/wch/r-source/wiki  (works on Ubuntu 12.04)
# the root of the document). Unlist flattens the list to
# create a character vector.
doc.text = unlist(xpathApply(doc.html, '//p', xmlValue))


Note that texi2dvi has to be installed first to avoid the following error. It is better to follow the Ubuntu instruction (https://github.com/wch/r-source/wiki/Ubuntu-build-instructions) when we work on Ubuntu OS.
# Replace all by spaces
<syntaxhighlight lang='bash'>
doc.text = gsub('\n', ' ', doc.text)
$ (cd doc/manual && make front-matter html-non-svn)
creating RESOURCES
/bin/bash: number-sections: command not found
make: [../../doc/RESOURCES] Error 127 (ignored)
</syntaxhighlight>


To build R, run the following script. To run the built R, type 'bin/R'.
# Join all the elements of the character vector into a single
<pre>
# character string, separated by spaces
# Get recommended packages if necessary
doc.text = paste(doc.text, collapse = ' ')
tools/rsync-recommended
</pre>


R_PAPERSIZE=letter                              \
This post http://stackoverflow.com/questions/25315381/using-xpathsapply-to-scrape-xml-attributes-in-r can be used to monitor new releases from github.com.
R_BATCHSAVE="--no-save --no-restore"            \
{{Pre}}
R_BROWSER=xdg-open                              \
> library(RCurl) # getURL()
PAGER=/usr/bin/pager                            \
> library(XML)  # htmlParse and xpathSApply
PERL=/usr/bin/perl                              \
> xData <- getURL("https://github.com/alexdobin/STAR/releases")
R_UNZIPCMD=/usr/bin/unzip                      \
> doc = htmlParse(xData)
R_ZIPCMD=/usr/bin/zip                          \
> plain.text <- xpathSApply(doc, "//span[@class='css-truncate-target']", xmlValue)
R_PRINTCMD=/usr/bin/lpr                        \
  # I look at the source code and search 2.5.3a and find the tag as
LIBnn=lib                                      \
  # <span class="css-truncate-target">2.5.3a</span>
AWK=/usr/bin/awk                                \
> plain.text
CC="ccache gcc"                                \
[1] "2.5.3a"      "2.5.2b"      "2.5.2a"      "2.5.1b"     "2.5.1a"   
CFLAGS="-ggdb -pipe -std=gnu99 -Wall -pedantic" \
[6] "2.5.0c"      "2.5.0b"     "STAR_2.5.0a" "STAR_2.4.2a" "STAR_2.4.1d"
CXX="ccache g++"                               \
>
CXXFLAGS="-ggdb -pipe -Wall -pedantic"         \
> # try bwa
FC="ccache gfortran"                           \
> > xData <- getURL("https://github.com/lh3/bwa/releases")
F77="ccache gfortran"                           \
> doc = htmlParse(xData)
MAKE="make"                                     \
> xpathSApply(doc, "//span[@class='css-truncate-target']", xmlValue)
./configure                                    \
[1] "v0.7.15" "v0.7.13"
    --prefix=/usr/local/lib/R-devel            \
    --enable-R-shlib                            \
    --with-blas                                \
    --with-lapack                              \
    --with-readline


#CC="clang -O3"                                 \
> # try picard
#CXX="clang++ -03"                               \
> xData <- getURL("https://github.com/broadinstitute/picard/releases")
> doc = htmlParse(xData)
> xpathSApply(doc, "//span[@class='css-truncate-target']", xmlValue)
[1] "2.9.1" "2.9.0" "2.8.3" "2.8.2" "2.8.1" "2.8.0" "2.7.2" "2.7.1" "2.7.0"
[10] "2.6.0"
</pre>
This method can be used to monitor new tags/releases from some projects like [https://github.com/Ultimaker/Cura/releases Cura], BWA, Picard, [https://github.com/alexdobin/STAR/releases STAR]. But for some projects like [https://github.com/ncbi/sra-tools sratools] the '''class''' attribute in the '''span''' element ("css-truncate-target") can be different (such as "tag-name").


=== xmlview ===
* http://rud.is/b/2016/01/13/cobble-xpath-interactively-with-the-xmlview-package/


# Workaround for explicit SVN check introduced by
== RCurl ==
# https://github.com/wch/r-source/commit/4f13e5325dfbcb9fc8f55fc6027af9ae9c7750a3
On Ubuntu, we need to install the packages (the first one is for XML package that RCurl suggests)
{{Pre}}
# Test on Ubuntu 14.04
sudo apt-get install libxml2-dev
sudo apt-get install libcurl4-openssl-dev
</pre>


# Need to build FAQ
=== Scrape google scholar results ===
(cd doc/manual && make front-matter html-non-svn)
https://github.com/tonybreyal/Blog-Reference-Functions/blob/master/R/googleScholarXScraper/googleScholarXScraper.R


rm -f non-tarball
No google ID is required


# Get current SVN revsion from git log and save in SVN-REVISION
Seems not work
echo -n 'Revision: ' > SVN-REVISION
git log --format=%B -n 1 \
  | grep "^git-svn-id" \
  | sed -E 's/^git-svn-id: https:\/\/svn.r-project.org\/R\/.*?@([0-9]+).*$/\1/' \
  >> SVN-REVISION
echo -n 'Last Changed Date: ' >>  SVN-REVISION
git log -1 --pretty=format:"%ad" --date=iso | cut -d' ' -f1 >> SVN-REVISION
 
# End workaround
 
# Set this to the number of cores on your computer
make --jobs=4
</pre>
 
If we DO NOT use -depth option in git clone command, we can use git checkout SHA1 (40 characters) to get a certain version of code.
<pre>
<pre>
git checkout f1d91a0b34dbaa6ac807f3852742e3d646fbe95e # plot(<dendrogram>): Bug 15215 fixed 5/2/2015
  Error in data.frame(footer = xpathLVApply(doc, xpath.base, "/font/span[@class='gs_fl']",  :
git checkout trunk                                    # switch back to trunk
  arguments imply differing number of rows: 2, 0
</pre>
</pre>
The svn revision number for a certain git revision can be found in the blue box on the github website (git-svn-id). For example, [https://github.com/wch/r-source/commit/f1d91a0b34dbaa6ac807f3852742e3d646fbe95e this revision] has an svn revision number 68302 even the current trunk is 68319.


Now suppose we have run 'git check trunk', create a devel'R successfully. If we want to build R for a certain svn or git revision, we run 'git checkout SHA1', 'make distclean', code to generate the ''SVN-REVISION'' file (it will update this number) and finally './configure' & 'make'.
=== [https://cran.r-project.org/web/packages/devtools/index.html devtools] ===
<pre>
'''devtools''' package depends on Curl. It actually depends on some system files. If we just need to install a package, consider the [[#remotes|remotes]] package which was suggested by the [https://cran.r-project.org/web/packages/BiocManager/index.html BiocManager] package.
time (./configure --with-recommended-packages=no && make --jobs=5)
{{Pre}}
</pre>
# Ubuntu 14.04
sudo apt-get install libcurl4-openssl-dev


The timing is 4m36s if I skip recommended packages and 7m37s if I don't skip. This is based on Xeon W3690 @ 3.47GHz.
# Ubuntu 16.04, 18.04
sudo apt-get install build-essential libcurl4-gnutls-dev libxml2-dev libssl-dev


The full bash script is available on [https://gist.github.com/arraytools/684a316f09a350a9850f Github Gist].
# Ubuntu 20.04
 
sudo apt-get install -y libxml2-dev libcurl4-openssl-dev libssl-dev
=== Install multiple versions of R on Ubuntu ===
* [https://support.rstudio.com/hc/en-us/articles/215488098-Installing-multiple-versions-of-R-on-Linux Installing multiple versions of R on Linux] especially on RStudio Server, Mar 13, 2018.  
** Some common locations are '''/usr/lib/R/bin''', '''/usr/local/bin''' (create softlinks for the binaries here), '''/usr/bin'''.
** When build R from source, specify '''prefix'''. In the following example, RStudio IDE can detect R.
<pre>
$ ./configure --prefix=/opt/R/3.5.0 --enable-R-shlib
$ make
$ sudo make install
$ which R
$ tree -L 3 /opt/R/3.5.0/
/opt/R/3.5.0/
├── bin
│  ├── R
│  └── Rscript
├── lib
│  ├── pkgconfig
│  │  └── libR.pc
│  └── R
│      ├── bin
│      ├── COPYING
│      ├── doc
│      ├── etc
│      ├── include
│      ├── lib
│      ├── library
│      ├── modules
│      ├── share
│      └── SVN-REVISION
└── share
    └── man
        └── man1
</pre>
</pre>
* http://r.789695.n4.nabble.com/Installing-different-versions-of-R-simultaneously-on-Linux-td879536.html
* [[R#Instruction_of_installing_a_development_version_of_R_under_Ubuntu|Instruction_of_installing_a_development_version_of_R_under_Ubuntu]]. You can launch the devel version of R using 'RD' command.
* [https://stackoverflow.com/questions/26897335/how-can-i-load-a-specific-version-of-r-in-linux Use 'export PATH']
* http://stackoverflow.com/questions/24019503/installing-multiple-versions-of-r
* http://stackoverflow.com/questions/8343686/how-to-install-2-different-r-versions-on-debian


To install the devel version of R alongside the current version of R. See [http://sites.psu.edu/theubunturblog/2012/08/09/installing-the-development-version-of-r-on-ubuntu-alongside-the-current-version-of-r/ this post]. For example you need a script that will build r-devel, but install it in a location different from the stable version of R (eg use --prefix=/usr/local/R-X.Y.Z in the ''config'' command). Note that the executable is installed in “/usr/local/lib/R-devel/bin”, but that can be changed to others like "/usr/local/bin".
[https://github.com/wch/movies/issues/3 Lazy-load database XXX is corrupt. internal error -3]. It often happens when you use install_github to install a package that's currently loaded; try restarting R and running the app again.


Another fancy way is to use '''docker'''.
NB. According to the output of '''apt-cache show r-cran-devtools''', the binary package is very old though '''apt-cache show r-base''' and [https://cran.r-project.org/bin/linux/ubuntu/#supported-packages supported packages] like ''survival'' shows the latest version.


=== Minimal installation of R from source ===
=== [https://github.com/hadley/httr httr] ===
Assume we have installed g++ (or build-essential) and gfortran (Ubuntu has only gcc pre-installed, but not g++),
httr imports curl, jsonlite, mime, openssl and R6 packages.
<pre>
sudo apt-get install build-essential gfortran
</pre>
we can go ahead to build a minimal R.
<pre>
wget http://cran.rstudio.com/src/base/R-3/R-3.1.1.tar.gz
tar -xzvf R-3.1.1.tar.gz; cd R-3.1.1
./configure --with-x=no --with-recommended-packages=no --with-readline=no
</pre>
See ./configure --help. This still builds the essential packages like base, compiler, datasets, graphics, grDevices, grid, methods, parallel, splines, stats, stats4, tcltk, tools, and utils.


Note that at the end of 'make', it shows an error of 'cannot find any java interpreter. Please make sure java is on your PATH or set JAVA_HOME correspondingly'. Even with the error message, we can use R by typing bin/R.
When I tried to install httr package, I got an error and some message:
 
To check whether we have Java installed, type 'java -version'.
<pre>
<pre>
$ java -version
Configuration failed because openssl was not found. Try installing:
java version "1.6.0_32"
* deb: libssl-dev (Debian, Ubuntu, etc)
OpenJDK Runtime Environment (IcedTea6 1.13.4) (6b32-1.13.4-4ubuntu0.12.04.2)
* rpm: openssl-devel (Fedora, CentOS, RHEL)
OpenJDK 64-Bit Server VM (build 23.25-b01, mixed mode)
* csw: libssl_dev (Solaris)
* brew: openssl (Mac OSX)
If openssl is already installed, check that 'pkg-config' is in your
PATH and PKG_CONFIG_PATH contains a openssl.pc file. If pkg-config
is unavailable you can set INCLUDE_DIR and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
--------------------------------------------------------------------
ERROR: configuration failed for package ‘openssl’
</pre>
</pre>
It turns out after I run '''sudo apt-get install libssl-dev''' in the terminal (Debian), it would go smoothly with installing httr package. Nice httr!


=== Recommended packages ===
Real example: see [http://stackoverflow.com/questions/27371372/httr-retrieving-data-with-post this post]. Unfortunately I did not get a table result; I only get an html file (R 3.2.5, httr 1.1.0 on Ubuntu and Debian).
R can be installed without recommended packages. Keep it in mind. [https://github.com/wch/r-source/commit/f1f01a73f8c7aa3af8b564efd4254cb0aaa7d83d Some people have assumed that a `recommended' package can safely be used unconditionally, but this is not so.]


=== Run R commands on bash terminal ===
Since httr package was used in many other packages, take a look at how others use it. For example, [https://github.com/ropensci/aRxiv aRxiv] package.
http://pacha.hk/2017-10-20_r_on_ubuntu_17_10.html
<syntaxhighlight lang='bash'>
# Install R
sudo apt-get update
sudo apt-get install gdebi libxml2-dev libssl-dev libcurl4-openssl-dev r-base r-base-dev


# install common packages
[https://www.statsandr.com/blog/a-package-to-download-free-springer-books-during-covid-19-quarantine/ A package to download free Springer books during Covid-19 quarantine], [https://www.radmuzom.com/2020/05/03/an-update-to-an-adventure-in-downloading-books/ An update to "An adventure in downloading books"] (rvest package)
R --vanilla << EOF
install.packages(c("tidyverse","data.table","dtplyr","devtools","roxygen2"), repos = "https://cran.rstudio.com/")
q()
EOF


# Export to HTML/Excel
=== [http://cran.r-project.org/web/packages/curl/ curl] ===
R --vanilla << EOF
curl is independent of RCurl package.
install.packages(c("htmlTable","openxlsx"), repos = "https://cran.rstudio.com/")
q()
EOF# Blog tools
R --vanilla << EOF
install.packages(c("knitr","rmarkdown"), repos='http://cran.us.r-project.org')
q()
EOF
</syntaxhighlight>


=== R CMD ===
* http://cran.r-project.org/web/packages/curl/vignettes/intro.html
* R CMD build someDirectory - create a package
* https://www.opencpu.org/posts/curl-release-0-8/
* R CMD check somePackage_1.2-3.tar.gz - check a package
* R CMD INSTALL somePackage_1.2-3.tar.gz - install a package from its source


=== bin/R (shell script) and bin/exec/R (binary executable) on Linux OS ===
{{Pre}}
'''bin/R''' is just a shell script to launch '''bin/exec/R''' program. So if we try to run the following program
library(curl)
<pre>
h <- new_handle()
# test.R
handle_setform(h,
cat("-- reading arguments\n", sep = "");
  name="aaa", email="bbb"
cmd_args = commandArgs();
)
for (arg in cmd_args) cat(" ", arg, "\n", sep="");
req <- curl_fetch_memory("http://localhost/d/phpmyql3_scripts/ch02/form2.html", handle = h)
</pre>
rawToChar(req$content)
from command line like
</pre>
<syntaxhighlight lang='bash'>
 
$ R --slave --no-save --no-restore --no-environ --silent --args arg1=abc < test.R
=== [http://ropensci.org/packages/index.html rOpenSci] packages ===
# OR using Rscript
'''rOpenSci''' contains packages that allow access to data repositories through the R statistical programming environment
-- reading arguments
 
  /home/brb/R-3.0.1/bin/exec/R
== [https://cran.r-project.org/web/packages/remotes/index.html remotes] ==
  --slave
Download and install R packages stored in 'GitHub', 'BitBucket', or plain 'subversion' or 'git' repositories. This package is a lightweight replacement of the 'install_*' functions in 'devtools'. Also remotes does not require any extra OS level library (at least on Ubuntu 16.04).
  --no-save
  --no-restore
  --no-environ
  --silent
  --args
  arg1=abc
</syntaxhighlight>
we can see R actually call '''bin/exec/R''' program.


=== CentOS 6.x ===
Example:
Install build-essential (make, gcc, gdb, ...).
{{Pre}}
<pre>
# https://github.com/henrikbengtsson/matrixstats
su
remotes::install_github('HenrikBengtsson/matrixStats@develop')
yum groupinstall "Development Tools"
yum install kernel-devel kernel-headers
</pre>
</pre>
Install readline and X11 (probably not necessary if we use '''./configure --with-x=no''')
<pre>
yum install readline-devel
yum install libX11 libX11-devel libXt libXt-devel
</pre>
Install libpng (already there) and libpng-devel library. This is for web application purpose because png (and possibly svg) is a standard and preferred graphics format. If we want to output different graphics formats, we have to follow the guide in [http://cran.r-project.org/doc/manuals/R-admin.html#Getting-the-source-files R-admin manual] to install extra graphics libraries in Linux.
<pre>
yum install libpng-devel
rpm -qa | grep "libpng"
# make sure both libpng and libpng-devel exist.
</pre>
Install Java. One possibility is to download from [http://www.oracle.com/technetwork/java/javase/downloads/index.html Oracle]. We want to download jdk-7u45-linux-x64.rpm and jre-7u45-linux-x64.rpm (assume 64-bit OS).
<pre>
rpm -Uvh jdk-7u45-linux-x64.rpm
rpm -Uvh jre-7u45-linux-x64.rpm
# Check
java -version
</pre>
Now we are ready to build R by using "./configure" and then "make" commands.


We can make R accessible from any directory by either run "make install" command or
== DirichletMultinomial ==
creating an R_HOME environment variable and export it to PATH environment variable, such as
On Ubuntu, we do
<pre>
<pre>
export R_HOME="path to R"
sudo apt-get install libgsl0-dev
export PATH=$PATH:$R_HOME/bin
</pre>
</pre>


== Install R on Mac ==
== Create GUI ==
A binary version of R is available on Mac OS X.
=== [http://cran.r-project.org/web/packages/gWidgets/index.html gWidgets] ===


Noted that personal R packages will be installed to '''~/Library/R''' directory. More specifically, packages from R 3.3.x will be installed onto '''~/Library/R/3.3/library'''.
== [http://cran.r-project.org/web/packages/GenOrd/index.html GenOrd]: Generate ordinal and discrete variables with given correlation matrix and marginal distributions ==
[http://statistical-research.com/simulating-random-multivariate-correlated-data-categorical-variables/?utm_source=rss&utm_medium=rss&utm_campaign=simulating-random-multivariate-correlated-data-categorical-variables here]


For R 3.4.x, the R packages go to '''/Library/Frameworks/R.framework/Versions/3.4/Resources/library'''. The advantages of using this folder is 1. the folder is writable by anyone. 2. even the built-in packages can be upgraded by users.
== json ==
[[R_web#json|R web -> json]]


=== gfortran ===
== Map ==
macOS does not include gfortran. So we cannot compile package like [https://cran.r-project.org/web/packages/quantreg/index.html quantreg] which is required by the '''car''' package. Another example is [https://cran.rstudio.com/web/packages/robustbase/ robustbase] package.
=== [https://rstudio.github.io/leaflet/ leaflet] ===
* rstudio.github.io/leaflet/#installation-and-use
* https://metvurst.wordpress.com/2015/07/24/mapview-basic-interactive-viewing-of-spatial-data-in-r-6/


[https://cran.r-project.org/bin/macosx/tools/ Development Tools and Libraries] for R of R on Mac OS X.
=== choroplethr ===
* http://blog.revolutionanalytics.com/2014/01/easy-data-maps-with-r-the-choroplethr-package-.html
* http://www.arilamstein.com/blog/2015/06/25/learn-to-map-census-data-in-r/
* http://www.arilamstein.com/blog/2015/09/10/user-question-how-to-add-a-state-border-to-a-zip-code-map/


For now, I am using gfortran 6.1 downloaded from https://gcc.gnu.org/wiki/GFortranBinaries#MacOS on my OS X El Capitan (10.11).
=== ggplot2 ===
[https://randomjohn.github.io/r-maps-with-census-data/ How to make maps with Census data in R]


== Upgrade R ==
== [http://cran.r-project.org/web/packages/googleVis/index.html googleVis] ==
* [http://lcolladotor.github.io/2017/05/04/Updating-R/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+FellgernonBit-rstats+%28L.+Collado-Torres+-+rstats%29#.WQ5mibgrJD8 R 3.4.0]
See an example from [[R#RJSONIO|RJSONIO]] above.


== Online Editor ==
== [https://cran.r-project.org/web/packages/googleAuthR/index.html googleAuthR] ==
We can run R on web browsers without installing it on local machines (similar to [/ideone.com Ideone.com] for C++. It does not require an account either (cf RStudio).  
Create R functions that interact with OAuth2 Google APIs easily, with auto-refresh and Shiny compatibility.


=== rstudio.cloud ===
== gtrendsR - Google Trends ==
* [http://blog.revolutionanalytics.com/2015/12/download-and-plot-google-trends-data-with-r.html Download and plot Google Trends data with R]
* [https://datascienceplus.com/analyzing-google-trends-data-in-r/ Analyzing Google Trends Data in R]
* [https://trends.google.com/trends/explore?date=2004-01-01%202017-09-04&q=microarray%20analysis microarray analysis] from 2004-04-01
* [https://trends.google.com/trends/explore?date=2004-01-01%202017-09-04&q=ngs%20next%20generation%20sequencing ngs next generation sequencing] from 2004-04-01
* [https://trends.google.com/trends/explore?date=2004-01-01%202017-09-04&q=dna%20sequencing dna sequencing] from 2004-01-01.
* [https://trends.google.com/trends/explore?date=2004-01-01%202017-09-04&q=rna%20sequencing rna sequencing] from 2004-01-01. It can be seen RNA sequencing >> DNA sequencing.
* [http://www.kdnuggets.com/2017/09/python-vs-r-data-science-machine-learning.html?utm_content=buffere1df7&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer Python vs R – Who Is Really Ahead in Data Science, Machine Learning?] and [https://stackoverflow.blog/2017/09/06/incredible-growth-python/ The Incredible Growth of Python] by [https://twitter.com/drob?lang=en David Robinson]


=== [https://www.rdocumentation.org/ RDocumentation] ===
== quantmod ==
The interactive engine is based on [https://github.com/datacamp/datacamp-light DataCamp Light]
[http://www.thertrader.com/2015/12/13/maintaining-a-database-of-price-files-in-r/ Maintaining a database of price files in R]. It consists of 3 steps.


For example, [https://www.rdocumentation.org/packages/dplyr/versions/0.5.0/topics/tbl_df tbl_df] function from dplyr package.
# Initial data downloading
# Update existing data
# Create a batch file


The website [https://cdn.datacamp.com/dcl/standalone-example.html DataCamp] allows to run ''library()'' on the Script window. After that, we can use the packages on ''R Console''.
== [http://cran.r-project.org/web/packages/caret/index.html caret] ==
* http://topepo.github.io/caret/index.html & https://github.com/topepo/caret/
* https://www.r-project.org/conferences/useR-2013/Tutorials/kuhn/user_caret_2up.pdf
* https://github.com/cran/caret source code mirrored on github
* Cheatsheet https://www.rstudio.com/resources/cheatsheets/
* [https://daviddalpiaz.github.io/r4sl/the-caret-package.html Chapter 21 of "R for Statistical Learning"]


[http://documents.datacamp.com/default_r_packages.txt Here] is a list of (common) R packages that users can use on the web.
== Tool for connecting Excel with R ==
* https://bert-toolkit.com/
* [http://www.thertrader.com/2016/11/30/bert-a-newcomer-in-the-r-excel-connection/ BERT: a newcomer in the R Excel connection]
* http://blog.revolutionanalytics.com/2018/08/how-to-use-r-with-excel.html


The packages on RDocumentation may be outdated. For example, the current stringr on CRAN is v1.2.0 (2/18/2017) but RDocumentation has v1.1.0 (8/19/2016).
== write.table ==
=== Output a named vector ===
<pre>
vec <- c(a = 1, b = 2, c = 3)
write.csv(vec, file = "my_file.csv", quote = F)
x = read.csv("my_file.csv", row.names = 1)
vec2 <- x[, 1]
names(vec2) <- rownames(x)
all.equal(vec, vec2)


== Web Applications ==
# one liner: row names of a 'matrix' become the names of a vector
vec3 <- as.matrix(read.csv('my_file.csv', row.names = 1))[, 1]
all.equal(vec, vec3)
</pre>


See also CRAN Task View: [http://cran.r-project.org/web/views/WebTechnologies.html Web Technologies and Services]
=== Avoid leading empty column to header ===
 
[https://stackoverflow.com/a/2478624 write.table writes unwanted leading empty column to header when has rownames]
=== TexLive ===
<pre>
TexLive can be installed by 2 ways
write.table(a, 'a.txt', col.names=NA)
* Ubuntu repository; does not include '''tlmgr''' utility for package manager.
# Or better by
* [http://tug.org/texlive/ Official website]  
write.table(data.frame("SeqId"=rownames(a), a), "a.txt", row.names=FALSE)
 
</pre>
==== texlive-latex-extra ====
https://packages.debian.org/sid/texlive-latex-extra
 
For example, framed and titling packages are included.
 
==== tlmgr - TeX Live package manager ====
https://www.tug.org/texlive/tlmgr.html
 
=== [https://yihui.name/tinytex/ TinyTex] ===
https://github.com/yihui/tinytex
 
=== [https://github.com/hadley/pkgdown pkgdown]: create a website for your package ===
[http://lbusettspatialr.blogspot.com/2017/08/building-website-with-pkgdown-short.html Building a website with pkgdown: a short guide]
 
=== Rmarkdown: create HTML5 web, slides and more ===
* http://rmarkdown.rstudio.com/html_document_format.html
* [https://www.rstudio.com/resources/videos/r-markdown-eight-ways/ R Markdown: Eight ways]
* https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf
* https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf
* Chunk options http://kbroman.org/knitr_knutshell/pages/Rmarkdown.html
 
HTML5 slides examples
* http://yihui.name/slides/knitr-slides.html
* http://yihui.name/slides/2012-knitr-RStudio.html
* http://yihui.name/slides/2011-r-dev-lessons.html#slide1
* http://inundata.org/R_talks/BARUG/#intro
 
Software requirement
* Rstudio
* knitr, XML, RCurl (See [http://www.omegahat.org/RCurl/FAQ.html omegahat] or [[R#RCurl|this internal link]] for installation on Ubuntu)
* [http://johnmacfarlane.net/pandoc/ pandoc package] This is a command line tool. I am testing it on Windows 7.
 
Slide #22 gives an instruction to create
* regular html file by using RStudio -> Knit HTML button
* HTML5 slides by using pandoc from command line.


Files:
=== Add blank field AND column names in write.table ===
* Rcmd source: [https://github.com/yihui/knitr-examples/blob/master/009-slides.Rmd 009-slides.Rmd] Note that IE 8 was not supported by github. For IE 9, be sure to turn off "Compatibility View".
* '''write.table'''(, row.names = TRUE) will miss one element on the 1st row when "row.names = TRUE" which is enabled by default.
* markdown output: 009-slides.md
** Suppose x is (n x 2)
* HTML output: 009-slides.html
** write.table(x, sep="\t") will generate a file with 2 element on the 1st row
 
** read.table(file) will return an object with a size (n x 2)
We can create Rcmd source in Rstudio by File -> New -> R Markdown.
** read.delim(file) and read.delim2(file) will also be correct
 
* Note that '''write.csv'''() does not have this issue that write.table() has
There are 4 ways to produce slides with pandoc
** Suppose x is (n x 2)
* S5
** Suppose we use write.csv(x, file). The csv file will be ((n+1) x 3) b/c the header row.
* DZSlides
** If we use read.csv(file), the object is (n x 3). So we need to use '''read.csv(file, row.names = 1)'''
* Slidy
* adding blank field AND column names in write.table(); [https://stackoverflow.com/a/2478624 write.table writes unwanted leading empty column to header when has rownames]
* Slideous
:<syntaxhighlight lang="rsplus">
 
write.table(a, 'a.txt', col.names=NA)
Use the markdown file (md) and convert it with pandoc
<syntaxhighlight lang='bash'>
pandoc -s -S -i -t dzslides --mathjax html5_slides.md -o html5_slides.html
</syntaxhighlight>
</syntaxhighlight>
* '''readr::write_tsv'''() does not include row names in the output file


If we are comfortable with HTML and CSS code, open the html file (generated by pandoc) and modify the CSS style at will.
=== read.delim(, row.names=1) and write.table(, row.names=TRUE) ===
[https://www.statology.org/read-delim-in-r/ How to Use read.delim Function in R]


==== Built-in examples from rmarkdown ====
Case 1: no row.names
<syntaxhighlight lang='rsplus'>
# This is done on my ODroid xu4 running Ubuntu Mate 15.10 (Wily)
# I used sudo apt-get install pandoc in shell
# and install.packages("rmarkdown") in R 3.2.3
 
library(rmarkdown)
rmarkdown::render("~/R/armv7l-unknown-linux-gnueabihf-library/3.2/rmarkdown/rmarkdown/templates/html_vignette/skeleton/skeleton.Rmd")
# the output <skeleton.html> is located under the same dir as <skeleton.Rmd>
</syntaxhighlight>
 
Note that the image files in the html are embedded '''Base64''' images in the html file. See
* http://stackoverflow.com/questions/1207190/embedding-base64-images
* [https://en.wikipedia.org/wiki/Data_URI_scheme Data URI scheme]
* http://www.r-bloggers.com/embed-images-in-rd-documents/
* [https://groups.google.com/forum/#!topic/knitr/NfzCGhZTlu4 How to not embed Base64 images in RMarkdown]
* [http://www.networkx.nl/programming/upload-plots-as-png-file-to-your-wordpress/ Upload plots as PNG file to your wordpress]
 
Templates
* https://github.com/rstudio/rticles/tree/master/inst/rmarkdown/templates
* https://github.com/rstudio/rticles/blob/master/inst/rmarkdown/templates/jss_article/resources/template.tex
 
==== Knit button ====
* It calls rmarkdown::render()
* R Markdown = knitr + Pandoc
* rmarkdown::render () = knitr::knit() + a system() call to pandoc
 
==== Pandoc's Markdown ====
Originally Pandoc is for html.
 
Extensions
* YAML '''metadata'''
* Latex Math
* syntax highlight
* embed raw HTML/Latex (raw HTML only works for HTML output and raw Latex only for Latex/pdf output)
* tables
* footnotes
* citations
 
Types of output documents
* Latex/pdf, HTML, Word
* beamer, ioslides, Slidy, reval.js
* Ebooks
* ...
 
Some examples:
<pre>
<pre>
pandoc test.md -o test.html
write.table(df, 'my_data.txt', quote=FALSE, sep='\t', row.names=FALSE)
pandoc test.md -s --mathjax -o test.html
my_df <- read.delim('my_data.txt')  # the rownames will be 1, 2, 3, ...
pandoc test.md -o test.docx
pandoc test.md -o test.pdf
pandoc test.md --latex-engine=xlelatex -o test.pdf
pandoc test.md -o test.epb
</pre>
</pre>
Check out ?rmarkdown::pandoc_convert()/
Case 2: with row.names. '''Note:''' if we open the text file in Excel, we'll see the 1st row is missing one header at the end. It is actually missing the column name for the 1st column.
 
When you click the Knit button in RStudio, you will see the actual command that is executed.
 
==== Global options ====
Suppose I want to create a simple markdown only documentation without worrying about executing code, instead of adding eval = FALSE to each code chunks, I can insert the following between YAML header and the content. Even bash chunks will not be executed.
<pre>
<pre>
```{r setup, include=FALSE}
write.table(df, 'my_data.txt', quote=FALSE, sep='\t', row.names=TRUE)
knitr::opts_chunk$set(echo = TRUE, eval = FALSE)
my_df <- read.delim('my_data.txt')  # it will automatically assign the rownames
```
</pre>
</pre>


==== Examples/gallery ====
== Read/Write Excel files package ==
Some examples of creating papers (with references) based on knitr can be found on the [http://yihui.name/knitr/demo/showcase/ Papers and reports] section of the knitr website.
* http://www.milanor.net/blog/?p=779
* https://rmarkdown.rstudio.com/gallery.html
* [https://www.displayr.com/how-to-read-an-excel-file-into-r/?utm_medium=Feed&utm_source=Syndication flipAPI]. One useful feature of DownloadXLSX, which is not supported by the readxl package, is that it can read Excel files directly from the URL.
* https://github.com/EBI-predocs/knitr-example
* [http://cran.r-project.org/web/packages/xlsx/index.html xlsx]: depends on Java
* https://github.com/timchurches/meta-analyses
** [https://stackoverflow.com/a/17976604 Export both Image and Data from R to an Excel spreadsheet]
* http://www.gastonsanchez.com/depot/knitr-slides
* [http://cran.r-project.org/web/packages/openxlsx/index.html openxlsx]: not depend on Java. Depend on zip application. On Windows, it seems to be OK without installing Rtools. But it can not read xls file; it works on xlsx file.
 
** It can't be used to open .xls or .xlm files.
==== Read the docs Sphinx theme and journal article formats ====
** When I try the package to read an xlsx file, I got a warning: No data found on worksheet. 6/28/2018
http://blog.rstudio.org/2016/03/21/r-markdown-custom-formats/
** [https://fabiomarroni.wordpress.com/2018/08/07/use-r-to-write-multiple-tables-to-a-single-excel-file/ Use R to write multiple tables to a single Excel file]
* [https://github.com/hadley/readxl readxl]: it does not depend on anything although it can only read but not write Excel files. 
** It is part of tidyverse package. The [https://readxl.tidyverse.org/index.html readxl] website provides several articles for more examples.
** [https://github.com/rstudio/webinars/tree/master/36-readxl readxl webinar].
** One advantage of read_excel (as with read_csv in the readr package) is that the data imports into an easy to print object with three attributes a '''tbl_df''', a '''tbl''' and a '''data.frame.'''
** For writing to Excel formats, use writexl or openxlsx package.
:<syntaxhighlight lang='rsplus'>
library(readxl)
read_excel(path, sheet = NULL, range = NULL, col_names = TRUE,
    col_types = NULL, na = "", trim_ws = TRUE, skip = 0, n_max = Inf,
    guess_max = min(1000, n_max), progress = readxl_progress(),
    .name_repair = "unique")
# Example
read_excel(path, range = cell_cols("c:cx"), col_types = "numeric")
</syntaxhighlight>
* [https://ropensci.org/blog/technotes/2017/09/08/writexl-release writexl]: zero dependency xlsx writer for R
:<syntaxhighlight lang='rsplus'>
library(writexl)
mylst <- list(sheet1name = df1, sheet2name = df2)
write_xlsx(mylst, "output.xlsx")
</syntaxhighlight>


* [https://github.com/rstudio/rticles rticles] package
For the Chromosome column, integer values becomes strings (but converted to double, so 5 becomes 5.000000) or NA (empty on sheets).
* [https://github.com/juba/rmdformats rmdformats] package
{{Pre}}
> head(read_excel("~/Downloads/BRCA.xls", 4)[ , -9], 3)
  UniqueID (Double-click) CloneID UGCluster
1                  HK1A1  21652 Hs.445981
2                  HK1A2  22012 Hs.119177
3                  HK1A4  22293 Hs.501376
                                                    Name Symbol EntrezID
1 Catenin (cadherin-associated protein), alpha 1, 102kDa CTNNA1    1495
2                              ADP-ribosylation factor 3  ARF3      377
3                          Uroporphyrinogen III synthase  UROS    7390
  Chromosome      Cytoband ChimericClusterIDs Filter
1  5.000000        5q31.2              <NA>      1
2  12.000000        12q13              <NA>      1
3      <NA> 10q25.2-q26.3              <NA>      1
</pre>


==== rmarkdown news ====
The hidden worksheets become visible (Not sure what are those first rows mean in the output).
* [http://blog.rstudio.org/2016/03/21/rmarkdown-v0-9-5/ floating table of contents and tabbed sections]
{{Pre}}
> excel_sheets("~/Downloads/BRCA.xls")
DEFINEDNAME: 21 00 00 01 0b 00 00 00 02 00 00 00 00 00 00 0d 3b 01 00 00 00 9a 0c 00 00 1a 00
DEFINEDNAME: 21 00 00 01 0b 00 00 00 04 00 00 00 00 00 00 0d 3b 03 00 00 00 9b 0c 00 00 0a 00
DEFINEDNAME: 21 00 00 01 0b 00 00 00 03 00 00 00 00 00 00 0d 3b 02 00 00 00 9a 0c 00 00 06 00
[1] "Experiment descriptors" "Filtered log ratio"    "Gene identifiers"     
[4] "Gene annotations"      "CollateInfo"            "GeneSubsets"         
[7] "GeneSubsetsTemp"     
</pre>


==== Useful tricks when including images in Rmarkdown documents ====
The Chinese character works too.
http://blog.revolutionanalytics.com/2017/06/rmarkdown-tricks.html
{{Pre}}
> read_excel("~/Downloads/testChinese.xlsx", 1)
  中文 B C
1    a b c
2    1 2 3
</pre>


==== Converting Rmarkdown to F1000Research LaTeX Format ====
To read all worksheets we need a convenient function
[https://www.bioconductor.org/packages/release/bioc/html/BiocWorkflowTools.html BiocWorkflowTools] package and [https://f1000research.com/articles/7-431/ paper]
{{Pre}}
 
read_excel_allsheets <- function(filename) {
==== icons for rmarkdown ====
    sheets <- readxl::excel_sheets(filename)
https://ropensci.org/technotes/2018/05/15/icon/
    sheets <- sheets[-1] # Skip sheet 1
    x <- lapply(sheets, function(X) readxl::read_excel(filename, sheet = X, col_types = "numeric"))
    names(x) <- sheets
    x
}
dcfile <- "table0.77_dC_biospear.xlsx"
dc <- read_excel_allsheets(dcfile)
# Each component (eg dc[[1]]) is a tibble.
</pre>


==== Reproducible data analysis ====
=== [https://cran.r-project.org/web/packages/readr/ readr] ===
* http://blog.jom.link/implementation_basic_reproductible_workflow.html


==== Automatic document production with R ====
Compared to base equivalents like '''read.csv()''', '''readr''' is much faster and gives more convenient output: it never converts strings to factors, can parse date/times, and it doesn’t munge the column names.
https://itsalocke.com/improving-automatic-document-production-with-r/


==== Documents with logos, watermarks, and corporate styles ====
[https://blog.rstudio.org/2016/08/05/readr-1-0-0/ 1.0.0] released. [https://www.tidyverse.org/blog/2021/07/readr-2-0-0/ readr 2.0.0] adds built-in support for reading multiple files at once, fast multi-threaded lazy reading and automatic guessing of delimiters among other changes.
http://ellisp.github.io/blog/2017/09/09/rmarkdown


==== rticles and pinp for articles ====
Consider a [http://www.cs.utoronto.ca/~juris/data/cmapbatch/instmatx.21.txt text file] where the table (6100 x 22) has duplicated row names and the (1,1) element is empty. The column names are all unique.
* https://cran.r-project.org/web/packages/rticles/index.html
* read.delim() will treat the first column as rownames but it does not allow duplicated row names. Even we use row.names=NULL, it still does not read correctly. It does give warnings (EOF within quoted string & number of items read is not a multiple of the number of columns). The dim is 5177 x 22.
* http://dirk.eddelbuettel.com/code/pinp.html
* readr::read_delim(Filename, "\t") will miss the last column. The dim is 6100 x 21.
* '''data.table::fread(Filename, sep = "\t")''' will detect the number of column names is less than the number of columns. Added 1 extra default column name for the first column which is guessed to be row names or an index. The dim is 6100 x 22. (Winner!)


=== Markdown language ===
The '''readr::read_csv()''' function is as fast as '''data.table::fread()''' function. ''For files beyond 100MB in size fread() and read_csv() can be expected to be around 5 times faster than read.csv().'' See 5.3 of Efficient R Programming book.


According to [http://en.wikipedia.org/wiki/Markdown wikipedia]:
Note that '''data.table::fread()''' can read a selection of the columns.


''Markdown is a lightweight markup language, originally created by John Gruber with substantial contributions from Aaron Swartz, allowing people “to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML)”.
=== Speed comparison ===
''
[https://predictivehacks.com/the-fastest-way-to-read-and-write-file-in-r/ The Fastest Way To Read And Write Files In R]. data.table >> readr >> base.
 
== [http://cran.r-project.org/web/packages/ggplot2/index.html ggplot2] ==
See [[Ggplot2|ggplot2]]


* Markup is a general term for content formatting - such as HTML - but markdown is a library that generates HTML markup.  
== Data Manipulation & Tidyverse ==
See [[Tidyverse|Tidyverse]].


* [http://stackoverflow.com/editing-help Nice summary from stackoverflow.com] and more complete list from [https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet github].
== Data Science ==
See [[Data_science|Data science]] page


* An example https://gist.github.com/jeromyanglim/2716336
== microbenchmark & rbenchmark ==
* [https://cran.r-project.org/web/packages/microbenchmark/index.html microbenchmark]
** [https://www.r-bloggers.com/using-the-microbenchmark-package-to-compare-the-execution-time-of-r-expressions/ Using the microbenchmark package to compare the execution time of R expressions]
* [https://cran.r-project.org/web/packages/rbenchmark/index.html rbenchmark] (not updated since 2012)


* [http://daringfireball.net/projects/markdown/basics basics] and [http://daringfireball.net/projects/markdown/syntax syntax]
== Plot, image ==
=== [http://cran.r-project.org/web/packages/jpeg/index.html jpeg] ===
If we want to create the image on this wiki left hand side panel, we can use the '''jpeg''' package to read an existing plot and then edit and save it.


* Convert mediawiki to markdown using online conversion tool from [http://johnmacfarlane.net/pandoc/try/ pandoc].
We can also use the jpeg package to import and manipulate a jpg image. See [http://moderndata.plot.ly/fun-with-heatmaps-and-plotly/ Fun with Heatmaps and Plotly].


* [http://support.mashery.com/docs/customizing_your_portal/Markdown_Cheat_Sheet Cheat sheet].
=== EPS/postscript format ===
<ul>
<li>Don't use postscript().  


* [http://dillinger.io/ Cloud-enabled HTML5 markdown editor]
<li>Use cairo_ps(). See [http://www.sthda.com/english/wiki/saving-high-resolution-ggplots-how-to-preserve-semi-transparency aving High-Resolution ggplots: How to Preserve Semi-Transparency]. It works on base R plots too.
<syntaxhighlight lang='r'>
cairo_ps(filename = "survival-curves.eps",
        width = 7, height = 7, pointsize = 12,
        fallback_resolution = 300)
print(p) # or any base R plots statements
dev.off()
</syntaxhighlight>


* [http://www.crypti.cc/markdown-here/livedemo.html live demo]
<li>[https://stackoverflow.com/a/8147482 Export a graph to .eps file with R].
* The results looks the same as using cairo_ps().
* The file size by setEPS() + postscript() is quite smaller compared to using cairo_ps().
* However, '''grep''' can find the characters shown on the plot generated by cairo_ps() but not setEPS() + postscript().
<pre>
setEPS()
postscript("whatever.eps") # 483 KB
plot(rnorm(20000))
dev.off()
# grep rnorm whatever.eps # Not found!


* [https://github.com/dgrapov/TeachingDemos/blob/master/Demos/OPLS/OPLS%20example.md Example from hosted in github]
cairo_ps("whatever_cairo.eps")  # 2.4 MB
plot(rnorm(20000))
dev.off()
# grep rnorm whatever_cairo.eps  # Found!
</pre>


* [http://www.rstudio.com/ide/docs/r_markdown R markdown file] and use it in [http://www.rstudio.com/ide/docs/authoring/using_markdown RStudio]. Customizing Chunk Options can be found in [http://yihui.name/knitr/options knitr page] and [http://rpubs.com/gallery/options rpubs.com].
<li> View EPS files
* Linux: evince. It is installed by default.
* Mac: evince. ''' brew install evince'''
* Windows. Install '''ghostscript''' [https://www.npackd.org/p/com.ghostscript.Ghostscript64/9.20 9.20] (10.x does not work with ghostview/GSview) and '''ghostview/GSview''' (5.0). In Ghostview, open Options -> Advanced Configure. Change '''Ghostscript DLL''' path AND '''Ghostscript include Path''' according to the ghostscript location ("C:\.


==== RStudio ====
<li>Edit EPS files: Inkscape
RStudio is the best editor.
* Step 1: open the EPS file
* Step 2: EPS Input: Determine page orientation from text direction 'Page by page' - OK
* Step 3: PDF Import Settings: default is "Internal import", but we shall choose '''"Cairo import"'''.
* Step 4: '''Zoom in''' first.
* Step 5: Click on '''Layers and Objects''' tab on the RHS. Now we can select any lines or letters and edit them as we like. The selected objects are highlighted in the "Layers and Objects" panel. That is, we can select multiple objects using object names. The selected objects can be rotated (Object -> Rotate 90 CW), for example.
* Step 6: We can save the plot as any formats like svg, eps, pdf, html, pdf, ...
</ul>


Markdown has two drawbacks: 1. it does not support TOC natively. 2. RStudio cannot show headers in the editor.
=== png and resolution ===
It seems people use '''res=300''' as a definition of high resolution.  


Therefore, use rmarkdown format instead of markdown.
<ul>
<li>Bottom line: fix res=300 and adjust height/width as needed. The default is res=72, height=width=480. If we increase res=300, the text font size will be increased, lines become thicker and the plot looks like a zoom-in.
<li>[https://stackoverflow.com/a/51194014 Saving high resolution plot in png].
<pre>
png("heatmap.png", width = 8, height = 6, units='in', res = 300)
# we can adjust width/height as we like
# the pixel values will be width=8*300 and height=6*300 which is equivalent to
# 8*300 * 6*300/10^6 = 4.32 Megapixels (1M pixels = 10^6 pixels) in camera's term
# However, if we use png(, width=8*300, height=6*300, units='px'), it will produce
# a plot with very large figure body and tiny text font size.


=== [http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol HTTP protocol] ===
# It seems the following command gives the same result as above
png("heatmap.png", width = 8*300, height = 6*300, res = 300) # default units="px"
</pre>
<li>Chapter 14.5 [https://r-graphics.org/recipe-output-bitmap Outputting to Bitmap (PNG/TIFF) Files] by R Graphics Cookbook
* Changing the resolution affects the size (in pixels) of graphical objects like text, lines, and points.
<li>[https://blog.revolutionanalytics.com/2009/01/10-tips-for-making-your-r-graphics-look-their-best.html 10 tips for making your R graphics look their best] David Smith
* In Word you can resize the graphic to an appropriate size, but the high resolution gives you the flexibility to choose a size while not compromising on the quality.  I'd recommend '''at least 1200 pixels''' on the longest side for standard printers.
<li>[https://stat.ethz.ch/R-manual/R-devel/library/grDevices/html/png.html ?png]. The png function has default settings ppi=72, height=480, width=480, units="px".
* By default no resolution is recorded in the file, except for BMP.
* [https://www.adobe.com/creativecloud/file-types/image/comparison/bmp-vs-png.html BMP vs PNG format]. If you need a smaller file size and don’t mind a lossless compression, PNG might be a better choice. If you need to retain as much detail as possible and don’t mind a larger file size, BMP could be the way to go.
** '''Compression''': BMP files are raw and uncompressed, meaning they’re large files that retain as much detail as possible. On the other hand, PNG files are compressed but still lossless. This means you can reduce or expand PNGs without losing any information.
** '''File size''': BMPs are larger than PNGs. This is because PNG files automatically compress, and can be compressed again to make the file even smaller.
** '''Common uses''': BMP contains a maximum amount of details while PNGs are good for small illustrations, sketches, drawings, logos and icons.
** '''Quality''': No difference
** '''Transparency''': PNG supports transparency while BMP doesn't
<li>Some comparison about the ratio
* 11/8.5=1.29  (A4 paper)
* 8/6=1.33    (plot output)
* 1440/900=1.6 (my display)
<li>[https://babichmorrowc.github.io/post/2019-05-23-highres-figures/ Setting resolution and aspect ratios in R]
<li>The difference of '''res''' parameter for a simple plot. [https://www.tutorialspoint.com/how-to-change-the-resolution-of-a-plot-in-base-r How to change the resolution of a plot in base R?]
<li>[https://danieljhocking.wordpress.com/2013/03/12/high-resolution-figures-in-r/ High Resolution Figures in R].
<li>[https://magesblog.com/post/2013-10-29-high-resolution-graphics-with-r/ High resolution graphics with R]
<li>[https://stackoverflow.com/questions/8399100/r-plot-size-and-resolution R plot: size and resolution]
<li>[https://stackoverflow.com/a/22815896 How can I increase the resolution of my plot in R?], [https://cran.r-project.org/web/packages/devEMF/index.html devEMF] package
<li>See [[Images#Anti-alias_%E4%BF%AE%E9%82%8A|Images -> Anti-alias]].
<li>How to check DPI on PNG
* '''The width of a PNG file in terms of inches cannot be determined directly from the file itself''', as the file contains pixel dimensions, not physical dimensions. However, '''you can calculate the width in inches if you know the resolution (DPI, dots per inch) of the image'''. Remember that converting pixel measurements to physical measurements like inches involves a specific resolution (DPI), and different devices may display the same image at different sizes due to having different resolutions.
<li>[https://community.rstudio.com/t/save-high-resolution-figures-from-r-300dpi/62016/3 Cairo] case.
</ul>


* http://en.wikipedia.org/wiki/File:Http_request_telnet_ubuntu.png
=== PowerPoint ===
* [http://en.wikipedia.org/wiki/Query_string Query string]
<ul>
* How to capture http header? Use '''curl -i en.wikipedia.org'''.
<li>For PP presentation, I found it is useful to use svg() to generate a small size figure. Then when we enlarge the plot, the text font size can be enlarged too. According to [https://www.rdocumentation.org/packages/grDevices/versions/3.6.2/topics/cairo svg], by default, width = 7, height = 7, pointsize = 12, family = '''sans'''.
* [http://trac.webkit.org/wiki/WebInspector Web Inspector]. Build-in in Chrome. Right click on any page and choose 'Inspect Element'.
<li>Try the following code. The font size is the same for both plots/files. However, the first plot can be enlarged without losing its quality.
* [http://en.wikipedia.org/wiki/Web_server Web server]
<pre>
* [http://www.paulgriffiths.net/program/c/webserv.php Simple TCP/IP web server]
svg("svg4.svg", width=4, height=4)
* [http://jmarshall.com/easy/http/ HTTP Made Really Easy]
plot(1:10, main="width=4, height=4")
* [http://www.manning.com/hethmon/ Illustrated Guide to HTTP]
dev.off()
* [http://www.ibm.com/developerworks/systems/library/es-nweb/ nweb: a tiny, safe Web server with 200 lines]
* [http://sourceforge.net/projects/tinyhttpd/ Tiny HTTPd]


An HTTP server is conceptually simple:
svg("svg7.svg", width=7, height=7) # default
plot(1:10, main="width=7, height=7")
dev.off()
</pre>
</ul>


# Open port 80 for listening
=== magick ===
# When contact is made, gather a little information (get mainly - you can ignore the rest for now)
https://cran.r-project.org/web/packages/magick/
# Translate the request into a file request
# Open the file and spit it back at the client


It gets more difficult depending on how much of HTTP you want to support - POST is a little more complicated, scripts, handling multiple requests, etc.
See an example [[:File:Progpreg.png|here]] I created.


==== Example in R ====
=== [http://cran.r-project.org/web/packages/Cairo/index.html Cairo] ===
<syntaxhighlight lang='r'>
See [[Heatmap#White_strips_.28artifacts.29|White strips problem]] in png() or tiff().
> co <- socketConnection(port=8080, server=TRUE, blocking=TRUE)
> # Now open a web browser and type http://localhost:8080/index.html
> readLines(co,1)
[1] "GET /index.html HTTP/1.1"
> readLines(co,1)
[1] "Host: localhost:8080"
> readLines(co,1)
[1] "User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:23.0) Gecko/20100101 Firefox/23.0"
> readLines(co,1)
[1] "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
> readLines(co,1)
[1] "Accept-Language: en-US,en;q=0.5"
> readLines(co,1)
[1] "Accept-Encoding: gzip, deflate"
> readLines(co,1)
[1] "Connection: keep-alive"
> readLines(co,1)
[1] ""
</syntaxhighlight>


==== Example in C ([http://blog.abhijeetr.com/2010/04/very-simple-http-server-writen-in-c.html Very simple http server written in C], 187 lines) ====
=== geDevices ===
* [https://www.jumpingrivers.com/blog/r-graphics-cairo-png-pdf-saving/ Saving R Graphics across OSs]. Use png(type="cairo-png") or the [https://cran.r-project.org/web/packages/ragg/index.html ragg] package which can be incorporated into RStudio.
* [https://www.jumpingrivers.com/blog/r-knitr-markdown-png-pdf-graphics/ Setting the Graphics Device in a RMarkdown Document]


Create a simple hello world html page and save it as <[http://en.wikipedia.org/wiki/List_of_Hello_world_program_examples#H index.html]> in the current directory (/home/brb/Downloads/)
=== [https://cran.r-project.org/web/packages/cairoDevice/ cairoDevice] ===
PS. Not sure the advantage of functions in this package compared to R's functions (eg. Cairo_svg() vs svg()).


Launch the server program (assume we have done ''gcc http_server.c -o http_server'')
For ubuntu OS, we need to install 2 libraries and 1 R package '''RGtk2'''.
<pre>
<pre>
$ ./http_server -p 50002
sudo apt-get install libgtk2.0-dev libcairo2-dev
Server started at port no. 50002 with root directory as /home/brb/Downloads
</pre>
</pre>


Secondly open a browser and type http://localhost:50002/index.html. The server will respond
On Windows OS, we may got the error: '''unable to load shared object 'C:/Program Files/R/R-3.0.2/library/cairoDevice/libs/x64/cairoDevice.dll' '''. We need to follow the instruction in [http://tolstoy.newcastle.edu.au/R/e6/help/09/05/15613.html here].
<pre>
GET /index.html HTTP/1.1
Host: localhost:50002
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:23.0) Gecko/20100101 Firefox/23.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive


file: /home/brb/Downloads/index.html
=== dpi requirement for publication ===
GET /favicon.ico HTTP/1.1
[http://www.cookbook-r.com/Graphs/Output_to_a_file/ For import into PDF-incapable programs (MS Office)]
Host: localhost:50002
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:23.0) Gecko/20100101 Firefox/23.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive


file: /home/brb/Downloads/favicon.ico
=== sketcher: photo to sketch effects ===
GET /favicon.ico HTTP/1.1
https://htsuda.net/sketcher/
Host: localhost:50003
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:23.0) Gecko/20100101 Firefox/23.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive


file: /home/brb/Downloads/favicon.ico
=== httpgd ===
</pre>
* https://nx10.github.io/httpgd/ A graphics device for R that is accessible via network protocols. Display graphics on browsers.
The browser will show the page from <index.html> in server.
* [https://youtu.be/uxyhmhRVOfw Three tricks to make IDEs other than Rstudio better for R development]


The only bad thing is the code does not close the port. For example, if I have use Ctrl+C to close the program and try to re-launch with the same port, it will complain '''socket() or bind(): Address already in use'''.
== [http://igraph.org/r/ igraph] ==
[[R_web#igraph|R web -> igraph]]


== Identifying dependencies of R functions and scripts ==
https://stackoverflow.com/questions/8761857/identifying-dependencies-of-r-functions-and-scripts
{{Pre}}
library(mvbutils)
foodweb(where = "package:batr")


==== Another Example in C (55 lines) ====
foodweb( find.funs("package:batr"), prune="survRiskPredict", lwd=2)
http://mwaidyanatha.blogspot.com/2011/05/writing-simple-web-server-in-c.html


The response is embedded in the C code.  
foodweb( find.funs("package:batr"), prune="classPredict", lwd=2)
</pre>


If we test the server program by opening a browser and type "http://localhost:15000/", the server received the follwing 7 lines
== [http://cran.r-project.org/web/packages/iterators/ iterators] ==
<pre>
Iterator is useful over for-loop if the data is already a '''collection'''. It can be used to iterate over a vector, data frame, matrix, file
GET / HTTP/1.1
Host: localhost:15000
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:23.0) Gecko/20100101 Firefox/23.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
</pre>


If we include a non-executable file's name in the url, we will be able to download that file. Try "http://localhost:15000/client.c".
Iterator can be combined to use with foreach package http://www.exegetic.biz/blog/2013/11/iterators-in-r/ has more elaboration.


If we use telnet program to test, wee need to type anything we want
== Colors ==
* [https://scales.r-lib.org/ scales] package. This is used in ggplot2 package.
<ul>
<li>[https://cran.r-project.org/web/packages/colorspace/index.html colorspace]: A Toolbox for Manipulating and Assessing Colors and Palettes. Popular! Many reverse imports/suggests; e.g. ComplexHeatmap. See my [[Ggplot2#colorspace_package|ggplot2]] page.
<pre>
<pre>
$ telnet localhost 15000
hcl_palettes(plot = TRUE) # a quick overview
Trying 127.0.0.1...
hcl_palettes(palette = "Dark 2", n=5, plot = T)
Connected to localhost.
q4 <- qualitative_hcl(4, palette = "Dark 3")
Escape character is '^]'.
</pre>
ThisCanBeAnything        <=== This is what I typed in the client and it is also shown on server
</ul>
HTTP/1.1 200 OK          <=== From here is what I got from server
* [https://statisticsglobe.com/create-color-range-between-two-colors-in-r Create color range between two colors in R] using colorRampPalette()
Content-length: 37Content-Type: text/html
* [http://novyden.blogspot.com/2013/09/how-to-expand-color-palette-with-ggplot.html How to expand color palette with ggplot and RColorBrewer]
 
* palette_explorer() function from the [https://cran.r-project.org/web/packages/tmaptools/index.html tmaptools] package. See [https://www.computerworld.com/article/3184778/data-analytics/6-useful-r-functions-you-might-not-know.html selecting color palettes with shiny].
HTML_DATA_HERE_AS_YOU_MENTIONED_ABOVE <=== The html tags are not passed from server, interesting!
* [http://www.cookbook-r.com/ Cookbook for R]
Connection closed by foreign host.
* [http://ggplot2.tidyverse.org/reference/scale_brewer.html Sequential, diverging and qualitative colour scales/palettes from colorbrewer.org]: scale_colour_brewer(), scale_fill_brewer(), ...
$
* http://colorbrewer2.org/
* It seems there is no choice of getting only 2 colors no matter which set name we can use
* To see the set names used in brewer.pal, see
** [https://www.rdocumentation.org/packages/RColorBrewer/versions/1.1-2/topics/RColorBrewer RColorBrewer::display.brewer.all()]
** [https://rpubs.com/flowertear/224344 Output]
** Especially, '''[http://colorbrewer2.org/#type=qualitative&scheme=Set1&n=4 Set1]''' from http://colorbrewer2.org/
* To list all R color names, colors().
** [http://research.stowers.org/mcm/efg/R/Color/Chart/ColorChart.pdf Color Chart] (include Hex and RGB) & [http://research.stowers.org/mcm/efg/Report/UsingColorInR.pdf Using Color in R] from http://research.stowers.org
** Code to generate rectangles with colored background https://www.r-graph-gallery.com/42-colors-names/
* http://www.bauer.uh.edu/parks/truecolor.htm Interactive RGB, Alpha and Color Picker
* http://deanattali.com/blog/colourpicker-package/ Not sure what it is doing
* [http://www.lifehack.org/484519/how-to-choose-the-best-colors-for-your-data-charts How to Choose the Best Colors For Your Data Charts]
* [http://novyden.blogspot.com/2013/09/how-to-expand-color-palette-with-ggplot.html How to expand color palette with ggplot and RColorBrewer]
* [http://sape.inf.usi.ch/quick-reference/ggplot2/colour Color names in R]
<ul>
<li>[https://stackoverflow.com/questions/28461326/convert-hex-color-code-to-color-name convert hex value to color names]
{{Pre}}
library(plotrix)
sapply(rainbow(4), color.id) # color.id is a function
          # it is used to identify closest match to a color
sapply(palette(), color.id)
sapply(RColorBrewer::brewer.pal(4, "Set1"), color.id)
</pre>
</pre>
</li></ul>
* [https://www.rdocumentation.org/packages/grDevices/versions/3.5.3/topics/hsv hsv()] function. [https://eranraviv.com/matrix-style-screensaver-in-r/ Matrix-style screensaver in R]


See also more examples under [[C#Socket_Programming_Examples_using_C.2FC.2B.2B.2FQt|C page]].
Below is an example using the option ''scale_fill_brewer''(palette = "[http://colorbrewer2.org/#type=qualitative&scheme=Paired&n=9 Paired]"). See the source code at [https://gist.github.com/JohannesFriedrich/c7d80b4e47b3331681cab8e9e7a46e17 gist]. Note that only '''set1''' and '''set3''' palettes in '''qualitative scheme''' can support up to 12 classes.  


==== Others  ====
According to the information from the colorbrew website, '''qualitative''' schemes do not imply magnitude differences between legend classes, and hues are used to create the primary visual differences between classes.
* http://rosettacode.org/wiki/Hello_world/ (Different languages)
* http://kperisetla.blogspot.com/2012/07/simple-http-web-server-in-c.html (Windows web server)
* http://css.dzone.com/articles/web-server-c (handling HTTP GET request, handling content types(txt, html, jpg, zip. rar, pdf, php etc.), sending proper HTTP error codes, serving the files from a web root, change in web root in a config file, zero copy optimization using sendfile method and php file handling.)
* https://github.com/gtungatkar/Simple-HTTP-server
* https://github.com/davidmoreno/onion


=== shiny ===
[[:File:GgplotPalette.svg]]
See [[Shiny|Shiny]].


=== [https://www.rplumber.io/ plumber]: Turning your R code into an API ===
=== [http://rpubs.com/gaston/colortools colortools] ===
* https://github.com/trestletech/plumber
Tools that allow users generate color schemes and palettes
* https://www.rstudio.com/resources/videos/plumber-turning-your-r-code-into-an-api/


=== Docker ===
=== [https://github.com/daattali/colourpicker colourpicker] ===
* [https://blog.ouseful.info/2016/05/03/using-docker-as-a-personal-productvity-tool-running-command-line-apps/ Using Docker as a Personal Productivity Tool – Running Command Line Apps Bundled in Docker Containers]
A Colour Picker Tool for Shiny and for Selecting Colours in Plots
* [https://peerj.com/preprints/3181.pdf#page=8 Dockerized RStudio server] from Duke University. 110 containers were set up on a cloud server (4 cores, 28GB RAM, 400GB disk). Each container has its own port number. Each student is mapped to a single container. https://github.com/mccahill/docker-rstudio
* [http://sas-and-r.blogspot.com/2016/12/rstudio-in-cloud-with-amazon-lightsail.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+SASandR+%28SAS+and+R%29 RStudio in the cloud with Amazon Lightsail and docker]
* Mark McCahill (RStudio + Docker)
** http://sites.duke.edu/researchcomputing/files/2014/09/mccahill-DockerDays.pdf
** https://github.com/mccahill/docker-rstudio
** https://hub.docker.com/r/mccahill/rstudio/~/dockerfile/
* [https://github.com/Bioconductor-notebooks/BiocImageBuilder BiocImageBuilder]
** [https://github.com/Bioconductor-notebooks/Identification-of-Differentially-Expressed-Genes-for-Ectopic-Pregnancy/blob/master/CaseStudy1_EctopicPregnancy.ipynb Reproducible Bioconductor Workflow w/ browser-based interactive notebooks+Container].
** [http://biorxiv.org/content/early/2017/06/01/144816 Paper]
** Original [http://www.rna-seqblog.com/reproducible-bioconductor-workflows-using-browser-based-interactive-notebooks-and-containers/ post].
* [https://www.opencpu.org/posts/opencpu-with-docker/ Why Use Docker with R? A DevOps Perspective]


=== [http://cran.r-project.org/web/packages/httpuv/index.html httpuv] ===
=== eyedroppeR ===
http and WebSocket library.
[http://gradientdescending.com/select-colours-from-an-image-in-r-with-eyedropper/ Select colours from an image in R with {eyedroppeR}]


See also the [https://cran.r-project.org/web/packages/servr/index.html servr] package which can start an HTTP server in R to serve static files, or dynamic documents that can be converted to HTML files (e.g., R Markdown) under a given directory.
== [https://github.com/kevinushey/rex rex] ==
Friendly Regular Expressions


=== [http://rapache.net/ RApache] ===
== [http://cran.r-project.org/web/packages/formatR/index.html formatR] ==
'''The best strategy to avoid failure is to put comments in complete lines or after complete R expressions.'''


=== [http://cran.r-project.org/web/packages/gWidgetsWWW/index.html gWidgetsWWW] ===
See also [http://stackoverflow.com/questions/3017877/tool-to-auto-format-r-code this discussion] on stackoverflow talks about R code reformatting.


* http://www.jstatsoft.org/v49/i10/paper
<pre>
* [https://github.com/jverzani/gWidgetsWWW2 gWidgetsWWW2] gWidgetsWWW based on Rook
library(formatR)
* [http://www.r-statistics.com/2012/11/comparing-shiny-with-gwidgetswww2-rapache/ Compare shiny with gWidgetsWWW2.rapache]
tidy_source("Input.R", file = "output.R", width.cutoff=70)
 
tidy_source("clipboard")
=== [http://cran.r-project.org/web/packages/Rook/index.html Rook] ===
# default width is getOption("width") which is 127 in my case.
 
</pre>
Since R 2.13, the internal web server was exposed.
 
[https://docs.google.com/present/view?id=0AUTe_sntp1JtZGdnbjVicTlfMzFuZDQ5dmJxNw Tutorual from useR2012] and [https://github.com/rstats/RookTutorial Jeffrey Horner]
 
Here is another [http://www.rinfinance.com/agenda/2011/JeffHorner.pdf one] from http://www.rinfinance.com.
 
Rook is also supported by [rApache too. See http://rapache.net/manual.html.
 
Google group. https://groups.google.com/forum/?fromgroups#!forum/rrook
 
Advantage
* the web applications are created on desktop, whether it is Windows, Mac or Linux.
* No Apache is needed.
* create [http://jeffreyhorner.tumblr.com/post/4723187316/introducing-rook multiple applications] at the same time. This complements the limit of rApache.
 
----
 
4 lines of code [http://jeffreybreen.wordpress.com/2011/04/25/4-lines-of-r-to-get-you-started-using-the-rook-web-server-interface/ example].


Some issues
* Comments appearing at the beginning of a line within a long complete statement. This will break tidy_source().
<pre>
cat("abcd",
    # This is my comment
    "defg")
</pre>
will result in
<pre>
<pre>
library(Rook)
> tidy_source("clipboard")
s <- Rhttpd$new()
Error in base::parse(text = code, srcfile = NULL) :
s$start(quiet=TRUE)
  3:1: unexpected string constant
s$print()
2: invisible(".BeGiN_TiDy_IdEnTiFiEr_HaHaHa# This is my comment.HaHaHa_EnD_TiDy_IdEnTiFiEr")
s$browse(1)  # OR s$browse("RookTest")
3: "defg"
  ^
</pre>
</pre>
Notice that after s$browse() command, the cursor will return to R because the command just a shortcut to open the web page http://127.0.0.1:10215/custom/RookTest.
* Comments appearing at the end of a line within a long complete statement ''won't break'' tidy_source() but tidy_source() cannot re-locate/tidy the comma sign.  
 
[[File:Rook.png|100px]]
[[File:Rook2.png|100px]]
[[File:Rookapprnorm.png|100px]]
 
We can add Rook '''application''' to the server; see ?Rhttpd.
<pre>
<pre>
s$add(
cat("abcd"
     app=system.file('exampleApps/helloworld.R',package='Rook'),name='hello'
     ,"defg"  # This is my comment
)
  ,"ghij")
s$add(
</pre>
    app=system.file('exampleApps/helloworldref.R',package='Rook'),name='helloref'
will become
)
<pre>
s$add(
cat("abcd", "defg"  # This is my comment
    app=system.file('exampleApps/summary.R',package='Rook'),name='summary'
, "ghij")
)
</pre>
 
Still bad!!
s$print()
* Comments appearing at the end of a line within a long complete statement ''breaks'' tidy_source() function. For example,
 
<pre>
#Server started on 127.0.0.1:10221
cat("</p>",
#[1] RookTest http://127.0.0.1:10221/custom/RookTest
"<HR SIZE=5 WIDTH=\"100%\" NOSHADE>",
#[2] helloref http://127.0.0.1:10221/custom/helloref
ifelse(codeSurv == 0,"<h3><a name='Genes'><b><u>Genes which are differentially expressed among classes:</u></b></a></h3>", #4/9/09
#[3] summary  http://127.0.0.1:10221/custom/summary
                    "<h3><a name='Genes'><b><u>Genes significantly associated with survival:</u></b></a></h3>"),
#[4] hello    http://127.0.0.1:10221/custom/hello
file=ExternalFileName, sep="\n", append=T)
 
</pre>
#  Stops the server but doesn't uninstall the app
will result in
## Not run:
<pre>
s$stop()
> tidy_source("clipboard", width.cutoff=70)
 
Error in base::parse(text = code, srcfile = NULL) :  
## End(Not run)
  3:129: unexpected SPECIAL
s$remove(all=TRUE)
2: "<HR SIZE=5 WIDTH=\"100%\" NOSHADE>" ,
rm(s)
3: ifelse ( codeSurv == 0 , "<h3><a name='Genes'><b><u>Genes which are differentially expressed among classes:</u></b></a></h3>" , %InLiNe_IdEnTiFiEr%
</pre>
* ''width.cutoff'' parameter is not always working. For example, there is no any change for the following snippet though I hope it will move the cat() to the next line.
<pre>
if (codePF & !GlobalTest & !DoExactPermTest) cat(paste("Multivariate Permutations test was computed based on",
    NumPermutations, "random permutations"), "<BR>", " ", file = ExternalFileName,
    sep = "\n", append = T)
</pre>
* It merges lines though I don't always want to do that. For example
<pre>
cat("abcd"
    ,"defg" 
  ,"ghij")
</pre>
will become
<pre>
cat("abcd", "defg", "ghij")  
</pre>
</pre>
For example, the interface and the source code of ''summary'' app are given below
[[File:Rookappsummary.png|100px]]
<nowiki>
app <- function(env) {
    req <- Rook::Request$new(env)
    res <- Rook::Response$new()
    res$write('Choose a CSV file:\n')
    res$write('<form method="POST" enctype="multipart/form-data">\n')
    res$write('<input type="file" name="data">\n')
    res$write('<input type="submit" name="Upload">\n</form>\n<br>')
    if (!is.null(req$POST())){
data <- req$POST()[['data']]
res$write("<h3>Summary of Data</h3>");
res$write("<pre>")
res$write(paste(capture.output(summary(read.csv(data$tempfile,stringsAsFactors=FALSE)),file=NULL),collapse='\n'))
res$write("</pre>")
res$write("<h3>First few lines (head())</h3>");
res$write("<pre>")
res$write(paste(capture.output(head(read.csv(data$tempfile,stringsAsFactors=FALSE)),file=NULL),collapse='\n'))
res$write("</pre>")
    }
    res$finish()
}
</nowiki>
More example:
* http://lamages.blogspot.com/2012/08/rook-rocks-example-with-googlevis.html
* [http://www.road2stat.com/cn/r/rook.html Self-organizing map]
* Deploy Rook apps with rApache. [http://jeffreyhorner.tumblr.com/post/27861973339/deploy-rook-apps-with-rapache-part-i First one] and [http://jeffreyhorner.tumblr.com/post/33814488298/deploy-rook-apps-part-ii two].
* [https://rud.is/b/2016/07/05/a-simple-prediction-web-service-using-the-new-firery-package/ A Simple Prediction Web Service Using the New fiery Package]


=== [https://code.google.com/p/sumo/ sumo] ===
== styler ==
Sumo is a fully-functional web application template that exposes an authenticated user's R session within java server pages. See the paper http://journal.r-project.org/archive/2012-1/RJournal_2012-1_Bergsma+Smith.pdf.
https://cran.r-project.org/web/packages/styler/index.html Pretty-prints R code without changing the user's formatting intent.


=== [http://www.stat.ucla.edu/~jeroen/stockplot Stockplot] ===
== Download papers ==
=== [http://cran.r-project.org/web/packages/biorxivr/index.html biorxivr] ===
Search and Download Papers from the bioRxiv Preprint Server (biology)


=== [http://www.rforge.net/FastRWeb/ FastRWeb] ===
=== [http://cran.r-project.org/web/packages/aRxiv/index.html aRxiv] ===
http://cran.r-project.org/web/packages/FastRWeb/index.html
Interface to the arXiv API


=== [http://sysbio.mrc-bsu.cam.ac.uk/Rwui/tutorial/Instructions.html Rwui] ===
=== [https://cran.r-project.org/web/packages/pdftools/index.html pdftools] ===
* http://ropensci.org/blog/2016/03/01/pdftools-and-jeroen
* http://r-posts.com/how-to-extract-data-from-a-pdf-file-with-r/
* https://ropensci.org/technotes/2018/12/14/pdftools-20/


=== [http://cran.r-project.org/web/packages/CGIwithR/index.html CGHWithR] and [http://cran.r-project.org/web/packages/WebDevelopR/ WebDevelopR] ===
== [https://github.com/ColinFay/aside aside]: set it aside ==
CGHwithR is still working with old version of R although it is removed from CRAN. Its successor is WebDevelopR. Its The vignette (year 2013) provides a review of several available methods.
An RStudio addin to run long R commands aside your current session.


=== [http://www.rstudio.com/ide/docs/advanced/manipulate manipulate] from RStudio ===
== Teaching ==
This is not a web application. But the '''manipulate''' package can be used to create interactive plot within R(Studio) environment easily. Its source is available at [https://github.com/rstudio/rstudio/tree/master/src/cpp/r/R/packages/manipulate here].
* [https://cran.r-project.org/web/packages/smovie/vignettes/smovie-vignette.html smovie]: Some Movies to Illustrate Concepts in Statistics


Mathematica also has manipulate function for plotting; see [http://reference.wolfram.com/mathematica/tutorial/IntroductionToManipulate.html here].
== Organize R research project ==
* [https://cran.r-project.org/web/views/ReproducibleResearch.html CRAN Task View: Reproducible Research]
* [https://ntguardian.wordpress.com/2019/02/04/organizing-r-research-projects-cpat-case-study/ Organizing R Research Projects: CPAT, A Case Study]
* [https://www.tidyverse.org/articles/2017/12/workflow-vs-script/ Project-oriented workflow]. It suggests the [https://github.com/r-lib/here here] package. Don't use '''setwd()''' and '''rm(list = ls())'''.
** [https://rstats.wtf/safe-paths.html Practice safe paths]. Use projects and the [https://cran.r-project.org/web/packages/here/index.html here] package.
** In RStudio, if we try to send a few lines of code and one of the line contains '''setwd()''', it will give a message: ''The working directory was changed to XXX inside a notebook chunk. The working directory will be reset when the chunk is finished running. Use the knitr root.dir option in the setup chunk to change the working directory for notebook chunks.''
** [http://jenrichmond.rbind.io/post/how-to-use-the-here-package/ how to use the `here` package]
** No update for the ''here'' package after 2020-12. Consider [https://github.com/r-lib/usethis usethis] package (Automate project and package setup).
* drake project
** [https://ropensci.org/blog/2018/02/06/drake/ The prequel to the drake R package]
** [https://ropenscilabs.github.io/drake-manual/index.html The drake R Package User Manual]
* [https://docs.ropensci.org/targets/ targets] package
* [http://projecttemplate.net/ ProjectTemplate]


=== [https://github.com/att/rcloud RCloud] ===
=== How to save (and load) datasets in R (.RData vs .Rds file) ===
RCloud is an environment for collaboratively creating and sharing data analysis scripts. RCloud lets you mix analysis code in R, HTML5, Markdown, Python, and others. Much like Sage, iPython notebooks and Mathematica, RCloud provides a notebook interface that lets you easily record a session and annotate it with text, equations, and supporting images.
[https://rcrastinate.rbind.io/post/how-to-save-and-load-data-in-r-an-overview/ How to save (and load) datasets in R: An overview]


See also the [http://user2014.stat.ucla.edu/abstracts/talks/193_Harner.pdf Talk] in UseR 2014.
=== Naming convention ===
<ul>
<li>[https://stackoverflow.com/a/1946879 What is your preferred style for naming variables in R?]
* Use of period separator: they can get mixed up in simple method dispatch. However, it is used by base R ([https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/make.names make.names()], read.table(), et al)
* Use of underscores: really annoying for ESS users
* '''camelCase''': Winner
<li>However, the [https://stackoverflow.com/a/13413278 survey] said (no surprises perhaps) that
* '''lowerCamelCase''' was most often used for function names,
* '''period.separated''' names most often used for parameters.
<li>[https://datamanagement.hms.harvard.edu/collect/file-naming-conventions What are file naming conventions?]
<li>[https://www.r-bloggers.com/2014/07/consistent-naming-conventions-in-r/ Consistent naming conventions in R]
<li>http://adv-r.had.co.nz/Style.html
<li>[https://www.r-bloggers.com/2011/07/testing-for-valid-variable-names/ Testing for valid variable names]
<li>R reserved words ?Reserved
* [https://www.datamentor.io/r-programming/reserved-words/ R Reserved Words]
* Among these words, if, else, repeat, while, function, for, '''in''', next and break are used for conditions, loops and user defined functions.
<li>Microarray/RNA-seq data
<pre>
clinicalDesignData  # clnDesignData
geneExpressionData  # gExpData
geneAnnotationData  # gAnnoData


=== Dropbox access ===
dataClinicalDesign
[https://cran.r-project.org/web/packages/rdrop2/index.html rdrop2] package
dataGeneExpression
dataAnnotation
</pre>
<pre>
# Search all variables ending with .Data
ls()[grep("\\.Data$", ls())]
# Search all variables starting with data_
ls()[grep("^data_", ls())]
</pre>
</ul>


=== Web page scraping ===
=== Efficient Data Management in R ===
http://www.slideshare.net/schamber/web-data-from-r#btnNext
[https://www.mzes.uni-mannheim.de/socialsciencedatalab/article/efficient-data-r/ Efficient Data Management in R]. .Rprofile, renv package and dplyr package.


==== [https://cran.r-project.org/web/packages/xml2/ xml2] package ====
== Text to speech ==
rvest package depends on xml2.
[https://shirinsplayground.netlify.com/2018/06/googlelanguager/ Text-to-Speech with the googleLanguageR package]


==== [https://cran.r-project.org/web/packages/purrr/index.html purrr] ====
== Speech to text ==
* https://purrr.tidyverse.org/
https://github.com/ggerganov/whisper.cpp and an R package [https://github.com/bnosac/audio.whisper audio.whisper]
* [http://data.library.virginia.edu/getting-started-with-the-purrr-package-in-r/ Getting started with the purrr package in R], especially the [https://www.rdocumentation.org/packages/purrr/versions/0.2.5/topics/map map()] function.


==== [https://cran.r-project.org/web/packages/rvest/index.html rvest] ====
== Weather data ==
[http://blog.rstudio.org/2014/11/24/rvest-easy-web-scraping-with-r/ Easy web scraping with R]
* [https://github.com/ropensci/prism prism] package
* [http://www.weatherbase.com/weather/weather.php3?s=507781&cityname=Rockville-Maryland-United-States-of-America Weatherbase]


On Ubuntu, we need to install two packages first!
== logR ==
<syntaxhighlight lang='bash'>
https://github.com/jangorecki/logR
sudo apt-get install libcurl4-openssl-dev # OR libcurl4-gnutls-dev


sudo apt-get install libxml2-dev
== Progress bar ==
</syntaxhighlight>
https://github.com/r-lib/progress#readme


* https://github.com/hadley/rvest
Configurable Progress bars, they may include percentage, elapsed time, and/or the estimated completion time. They work in terminals, in 'Emacs' 'ESS', 'RStudio', 'Windows' 'Rgui' and the 'macOS'.
* [http://datascienceplus.com/visualizing-obesity-across-united-states-by-using-data-from-wikipedia/ Visualizing obesity across United States by using data from Wikipedia]
* [https://stat4701.github.io/edav/2015/04/02/rvest_tutorial/ rvest tutorial: scraping the web using R]
* https://renkun.me/pipeR-tutorial/Examples/rvest.html
* http://zevross.com/blog/2015/05/19/scrape-website-data-with-the-new-r-package-rvest/
* [https://datascienceplus.com/google-scholar-scraping-with-rvest/ Google scholar scraping with rvest package]


==== Animate ====
== cron ==
* [https://guyabel.com/post/football-kits/ Animating Changes in Football Kits using R]: rvest, tidyverse, xml2, purrr & magick
* [https://github.com/bnosac/cronr cronR]
* [https://guyabel.com/post/animated-directional-chord-diagrams/ Animated Directional Chord Diagrams] tweenr & magick
* [https://mathewanalytics.com/building-a-simple-pipeline-in-r/ Building a Simple Pipeline in R]


==== [https://cran.r-project.org/web/packages/V8/index.html V8]: Embedded JavaScript Engine for R ====
== beepr: Play A Short Sound ==
[https://rud.is/b/2017/07/25/r%E2%81%B6-general-attys-distributions/ R⁶ — General (Attys) Distributions]: V8, rvest, ggbeeswarm, hrbrthemes and tidyverse packages are used.
https://www.rdocumentation.org/packages/beepr/versions/1.3/topics/beep. Try sound=3 "fanfare", 4 "complete", 5 "treasure", 7 "shotgun", 8 "mario".


==== [http://cran.r-project.org/web/packages/pubmed.mineR/index.html pubmed.mineR] ====
== utils package ==
Text mining of PubMed Abstracts (http://www.ncbi.nlm.nih.gov/pubmed). The algorithms are designed for two formats (text and XML) from PubMed.
https://www.rdocumentation.org/packages/utils/versions/3.6.2


[https://github.com/jtleek/swfdr R code for scraping the P-values from pubmed, calculating the Science-wise False Discovery Rate, et al] (Jeff Leek)
== tools package ==
* https://www.rdocumentation.org/packages/tools/versions/3.6.2
* [https://www.r-bloggers.com/2023/08/three-four-r-functions-i-enjoyed-this-week/ Where in the file are there non ASCII characters?], [https://rdocumentation.org/packages/tools/versions/3.6.2/topics/showNonASCII tools::showNonASCIIfile(<filename>)]


=== Diving Into Dynamic Website Content with splashr ===
= Different ways of using R =
https://rud.is/b/2017/02/09/diving-into-dynamic-website-content-with-splashr/
[https://www.amazon.com/Extending-Chapman-Hall-John-Chambers/dp/1498775713 Extending R] by John M. Chambers (2016)


=== Send email ===
== 10 things R can do that might surprise you ==
==== [https://github.com/rpremraj/mailR/ mailR] ====
https://simplystatistics.org/2019/03/13/10-things-r-can-do-that-might-surprise-you/
Easiest. Require rJava package (not trivial to install, see [[#RJava|rJava]]). mailR is an interface to Apache Commons Email to send emails from within R. See also [http://unamatematicaseltigre.blogspot.com/2016/12/how-to-send-bulk-email-to-your-students.html send bulk email]


Before we use the mailR package, we have followed [https://support.google.com/accounts/answer/6010255?hl=en here] to have '''Allow less secure apps: 'ON' '''; or you might get an error ''Error: EmailException (Java): Sending the email to the following server failed : smtp.gmail.com:465''. Once we turn on this option, we may get an email for the notification of this change. Note that the recipient can be other than a gmail.
== R call C/C++ ==
<syntaxhighlight lang='rsplus'>
Mainly talks about .C() and .Call().
> send.mail(from = "[email protected]",
          to = c("[email protected]", "Recipient 2 <[email protected]>"),
          replyTo = c("Reply to someone else <someone.else@gmail.com>")
          subject = "Subject of the email",
          body = "Body of the email",
          smtp = list(host.name = "smtp.gmail.com", port = 465, user.name = "gmail_username", passwd = "password", ssl = TRUE),
          authenticate = TRUE,
          send = TRUE)
[1] "Java-Object{org.apache.commons.mail.SimpleEmail@7791a895}"
</syntaxhighlight>


==== [https://cran.r-project.org/web/packages/gmailr/index.html gmailr] ====
Note that scalars and arrays must be passed using pointers. So if we want to access a function not exported from a package, we may need to modify the function to make the arguments as pointers.
More complicated. gmailr provides access the Google's gmail.com RESTful API. [https://cran.r-project.org/web/packages/gmailr/vignettes/sending_messages.html Vignette] and an example on [http://stackoverflow.com/questions/30144876/send-html-message-using-gmailr here]. Note that it does not use a password; it uses a '''json''' file for oauth authentication downloaded from https://console.cloud.google.com/. See also https://github.com/jimhester/gmailr/issues/1.
<syntaxhighlight lang='rsplus'>
library(gmailr)
gmail_auth('mysecret.json', scope = 'compose')


test_email <- mime() %>%
* [http://cran.r-project.org/doc/manuals/R-exts.html R-Extension manual] of course.
  to("to@gmail.com") %>%
* [http://r-pkgs.had.co.nz/src.html Compiled Code] chapter from 'R Packages' by Hadley Wickham
  from("from@gmail.com") %>%
* http://faculty.washington.edu/kenrice/sisg-adv/sisg-07.pdf
  subject("This is a subject") %>%
* http://www.stat.berkeley.edu/scf/paciorek-cppWorkshop.pdf (Very useful)
  html_body("<html><body>I wish <b>this</b> was bold</body></html>")
* http://www.stat.harvard.edu/ccr2005/
send_message(test_email)
* http://mazamascience.com/WorkingWithData/?p=1099
</syntaxhighlight>
* [https://youtube.com/playlist?list=PLwc48KSH3D1OkObQ22NHbFwEzof2CguJJ Make an R package with C++ code] (a playlist from youtube)
* [https://working-with-data.mazamascience.com/2021/07/16/using-r-calling-c-code-hello-world/ Using R – Calling C code ‘Hello World!’]
* [http://www.haowulab.org//pages/computing.html Computing tip] by Hao Wu


==== [https://cran.r-project.org/web/packages/sendmailR/index.html sendmailR] ====
=== .Call ===
sendmailR provides a simple SMTP client. It is not clear how to use the package (i.e. where to enter the password).
* [https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/CallExternal ?.Call]
* [http://mazamascience.com/WorkingWithData/?p=1099 Using R — .Call(“hello”)]
* http://adv-r.had.co.nz/C-interface.html
* [https://working-with-data.mazamascience.com/2021/07/16/using-r-callhello/ Using R – .Call(“hello”)]


=== [http://www.ncbi.nlm.nih.gov/geo/ GEO (Gene Expression Omnibus)] ===
Be sure to add the ''PACKAGE'' parameter to avoid an error like
See [[GEO#R_packages|this internal link]].
<pre>
cvfit <- cv.grpsurvOverlap(X, Surv(time, event), group,
                            cv.ind = cv.ind, seed=1, penalty = 'cMCP')
Error in .Call("standardize", X) :
  "standardize" not resolved from current namespace (grpreg)
</pre>


=== Interactive html output ===
=== NAMESPACE file & useDynLib ===
==== [http://cran.r-project.org/web/packages/sendplot/index.html sendplot] ====
* https://cran.r-project.org/doc/manuals/r-release/R-exts.html#useDynLib
==== [http://cran.r-project.org/web/packages/RIGHT/index.html RIGHT] ====
* We don't need to include double quotes around the C/Fortran subroutines in .C() or .Fortran()
The supported plot types include scatterplot, barplot, box plot, line plot and pie plot.
* digest package example: [https://github.com/cran/digest/blob/master/NAMESPACE NAMESPACE] and [https://github.com/cran/digest/blob/master/R/digest.R R functions] using .Call().
* stats example: [https://github.com/wch/r-source/blob/trunk/src/library/stats/NAMESPACE NAMESPACE]


In addition to tooltip boxes, the package can create a [http://righthelp.github.io/tutorial/interactivity table showing all information about selected nodes].
(From [https://cran.r-project.org/doc/manuals/r-release/R-exts.html#dyn_002eload-and-dyn_002eunload Writing R Extensions manual]) Loading is most often done automatically based on the '''useDynLib()''' declaration in the '''NAMESPACE''' file, but may be done explicitly via a call to '''library.dynam()'''. This has the form
{{Pre}}
library.dynam("libname", package, lib.loc)
</pre>


==== [http://cran.r-project.org/web/packages/d3Network/index.html d3Network] ====
=== library.dynam.unload() ===
* http://christophergandrud.github.io/d3Network/ (old)
* https://stat.ethz.ch/R-manual/R-devel/library/base/html/library.dynam.html
* https://christophergandrud.github.io/networkD3/ (new)
* http://r-pkgs.had.co.nz/src.html. The '''library.dynam.unload()''' function should be placed in '''.onUnload()''' function. This function can be saved in any R files.
<source lang="rsplus">
* digest package example [https://github.com/cran/digest/blob/master/R/zzz.R zzz.R]
library(d3Network)


Source <- c("A", "A", "A", "A", "B", "B", "C", "C", "D")
=== gcc ===
Target <- c("B", "C", "D", "J", "E", "F", "G", "H", "I")
[http://rorynolan.rbind.io/2019/06/30/strexgcc/ Coping with varying `gcc` versions and capabilities in R packages]
NetworkData <- data.frame(Source, Target)


d3SimpleNetwork(NetworkData, height = 800, width = 1024, file="tmp.html")
=== Primitive functions ===
</source>
[https://nathaneastwood.github.io/2020/02/01/primitive-functions-list/ Primitive Functions List]


==== [http://cran.r-project.org/web/packages/htmlwidgets/ htmlwidgets for R] ====
== SEXP ==
Embed widgets in R Markdown documents and Shiny web applications.
Some examples from packages


* Official website http://www.htmlwidgets.org/.
* [https://www.bioconductor.org/packages/release/bioc/html/sva.html sva] package has one C code function
* [http://deanattali.com/blog/htmlwidgets-tips/ How to write a useful htmlwidgets in R: tips and walk-through a real example]


==== [http://cran.r-project.org/web/packages/networkD3/index.html networkD3] ====
== R call Fortran ==
This is a port of Christopher Gandrud's [http://christophergandrud.github.io/d3Network/ d3Network] package to the htmlwidgets framework.
* [https://stat.ethz.ch/pipermail/r-devel/2015-March/070851.html R call Fortran 90]
* [https://www.r-bloggers.com/the-need-for-speed-part-1-building-an-r-package-with-fortran-or-c/ The Need for Speed Part 1: Building an R Package with Fortran (or C)] (Very detailed)


==== [http://cran.r-project.org/web/packages/scatterD3/index.html scatterD3] ====
== Embedding R ==
scatterD3 is an HTML R widget for interactive scatter plots visualization. It is based on the htmlwidgets R package and on the d3.js javascript library.


==== [https://github.com/bwlewis/rthreejs rthreejs] - Create interactive 3D scatter plots, network plots, and globes ====
* See [http://cran.r-project.org/doc/manuals/R-exts.html#Linking-GUIs-and-other-front_002dends-to-R Writing for R Extensions] Manual Chapter 8.
[http://bwlewis.github.io/rthreejs/ Examples]
* [http://www.ci.tuwien.ac.at/Conferences/useR-2004/abstracts/supplements/Urbanek.pdf Talk by Simon Urbanek] in UseR 2004.
* [http://epub.ub.uni-muenchen.de/2085/1/tr012.pdf Technical report] by Friedrich Leisch in 2007.
* https://stat.ethz.ch/pipermail/r-help/attachments/20110729/b7d86ed7/attachment.pl


==== [http://blog.rstudio.org/2015/06/24/d3heatmap/ d3heatmap] ====
=== An very simple example (do not return from shell) from Writing R Extensions manual ===
A package generats interactive heatmaps using d3.js and htmlwidgets. The following screenshots shows 3 features.
The command-line R front-end, R_HOME/bin/exec/R, is one such example. Its source code is in file <src/main/Rmain.c>.
* Shows the row/column/value under the mouse cursor
* Zoom in a region (click on the zoom-in image will bring back the original heatmap)
* Highlight a row or a column (click the label of another row will highlight another row. Click the same label again will bring back the original image)


[[File:D3heatmap mouseover.png|200px]] [[File:D3heatmap zoomin.png|200px]] [[File:D3heatmap highlight.png|200px]]
This example can be run by
<pre>R_HOME/bin/R CMD R_HOME/bin/exec/R</pre>


==== [https://cran.r-project.org/web/packages/svgPanZoom/index.html svgPanZoom] ====
Note:  
This 'htmlwidget' provides pan and zoom interactivity to R graphics, including 'base', 'lattice', and 'ggplot2'. The interactivity is provided through the 'svg-pan-zoom.js' library.
# '''R_HOME/bin/exec/R''' is the R binary. However, it couldn't be launched directly unless R_HOME and LD_LIBRARY_PATH are set up. Again, this is explained in Writing R Extension manual.
# '''R_HOME/bin/R''' is a shell-script front-end where users can invoke it. It sets up the environment for the executable. It can be copied to ''/usr/local/bin/R''. When we run ''R_HOME/bin/R'', it actually runs ''R_HOME/bin/R CMD R_HOME/bin/exec/R'' (see line 259 of ''R_HOME/bin/R'' as in R 3.0.2) so we know the important role of ''R_HOME/bin/exec/R''.


==== DT: An R interface to the DataTables library ====
More examples of embedding can be found in ''tests/Embedding'' directory. Read <index.html> for more information about these test examples.
* http://blog.rstudio.org/2015/06/24/dt-an-r-interface-to-the-datatables-library/


==== plotly ====
=== An example from Bioconductor workshop ===
* [http://moderndata.plot.ly/power-curves-r-plotly-ggplot2/ Power curves] and ggplot2.
* What is covered in this section is different from [[R#Create_a_standalone_Rmath_library|Create and use a standalone Rmath library]].
* [http://moderndata.plot.ly/time-series-charts-by-the-economist-in-r-using-plotly/ TIME SERIES CHARTS BY THE ECONOMIST IN R USING PLOTLY] & [https://moderndata.plot.ly/interactive-r-visualizations-with-d3-ggplot2-rstudio/ FIVE INTERACTIVE R VISUALIZATIONS WITH D3, GGPLOT2, & RSTUDIO]
* Use eval() function. See R-Ext [http://cran.r-project.org/doc/manuals/R-exts.html#Embedding-R-under-Unix_002dalikes 8.1] and [http://cran.r-project.org/doc/manuals/R-exts.html#Embedding-R-under-Windows 8.2] and [http://cran.r-project.org/doc/manuals/R-exts.html#Evaluating-R-expressions-from-C 5.11].
* [http://moderndata.plot.ly/filled-chord-diagram-in-r-using-plotly/ Filled chord diagram]
* http://stackoverflow.com/questions/2463437/r-from-c-simplest-possible-helloworld (obtained from searching R_tryEval on google)
* [https://moderndata.plot.ly/dashboards-in-r-with-shiny-plotly/ DASHBOARDS IN R WITH SHINY & PLOTLY]
* http://stackoverflow.com/questions/7457635/calling-r-function-from-c
* [https://plot.ly/r/shiny-tutorial/ Plotly Graphs in Shiny],
** [https://plot.ly/r/shiny-gallery/ Gallery]
** [https://plot.ly/r/shinyapp-UN-simple/ Single time series]
** [https://plot.ly/r/shinyapp-UN-advanced/ Multiple time series]
* [https://www.r-exercises.com/2017/09/28/how-to-plot-basic-charts-with-plotly/ How to plot basic charts with plotly]
* [https://www.displayr.com/how-to-add-trend-lines-in-r-using-plotly/?utm_medium=Feed&utm_source=Syndication How to add Trend Lines in R Using Plotly]


=== Amazon ===
Example:
[https://github.com/56north/Rmazon Download product information and reviews from Amazon.com]
Create [https://gist.github.com/arraytools/7d32d92fee88ffc029365d178bc09e75#file-embed-c embed.c] file.
<syntaxhighlight lang='bash'>
Then build the executable. Note that I don't need to create R_HOME variable.
sudo apt-get install libxml2-dev
<pre>
sudo apt-get install libcurl4-openssl-dev
cd
</syntaxhighlight>
tar xzvf
and in R
cd R-3.0.1
<syntaxhighlight lang='rsplus'>
./configure --enable-R-shlib
install.packages("devtools")
make
install.packages("XML")
cd tests/Embedding
install.packages("pbapply")
make
install.packages("dplyr")
~/R-3.0.1/bin/R CMD ./Rtest
devtools::install_github("56north/Rmazon")
product_info <- Rmazon::get_product_info("1593273843")
reviews <- Rmazon::get_reviews("1593273843")
reviews[1,6] # only show partial characters from the 1st review
nchar(reviews[1,6])
as.character(reviews[1,6]) # show the complete text from the 1st review
</syntaxhighlight>


=== [https://cran.r-project.org/web/packages/gutenbergr/index.html gutenbergr] ===
nano embed.c
[https://blog.jumpingrivers.com/posts/2018/tidytext_edinbr_2018/ Edinbr: Text Mining with R]
# Using a single line will give an error and cannot not show the real problem.
# ../../bin/R CMD gcc -I../../include -L../../lib -lR embed.c
# A better way is to run compile and link separately
gcc -I../../include -c embed.c
gcc -o embed embed.o -L../../lib -lR -lRblas
../../bin/R CMD ./embed
</pre>


=== Twitter ===
Note that if we want to call the executable file ./embed directly, we shall set up R environment by specifying '''R_HOME''' variable and including the directories used in linking R in '''LD_LIBRARY_PATH'''. This is based on the inform provided by [http://cran.r-project.org/doc/manuals/r-devel/R-exts.html Writing R Extensions].
[http://www.masalmon.eu/2017/03/19/facesofr/ Faces of #rstats Twitter]
<pre>
export R_HOME=/home/brb/Downloads/R-3.0.2
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/brb/Downloads/R-3.0.2/lib
./embed # No need to include R CMD in front.
</pre>


=== OCR ===
Question: Create a data frame in C? Answer: [https://stat.ethz.ch/pipermail/r-devel/2013-August/067107.html Use data.frame() via an eval() call from C]. Or see the code is stats/src/model.c, as part of model.frame.default. Or using Rcpp as [https://stat.ethz.ch/pipermail/r-devel/2013-August/067109.html here].
[http://ropensci.org/blog/blog/2016/11/16/tesseract Tesseract package: High Quality OCR in R]


== Creating local repository for CRAN and Bioconductor (focus on Windows binary packages only) ==
Reference http://bioconductor.org/help/course-materials/2012/Seattle-Oct-2012/AdvancedR.pdf
=== How to set up a local repository ===


* CRAN specific: http://cran.r-project.org/mirror-howto.html
=== Create a Simple Socket Server in R ===
* Bioconductor specific: http://www.bioconductor.org/about/mirrors/mirror-how-to/
This example is coming from this [http://epub.ub.uni-muenchen.de/2085/1/tr012.pdf paper].  
* [https://rstudio.github.io/packrat/custom-repos.html How to Set Up a Custom CRAN-like Repository]


General guide: http://cran.r-project.org/doc/manuals/R-admin.html#Setting-up-a-package-repository
Create an R function
 
Utilities such as install.packages can be pointed at any CRAN-style repository, and R users may want to set up their own. The ‘base’ of a repository is a URL such as http://www.omegahat.org/R/: this must be an URL scheme that download.packages supports (which also includes ‘ftp://’ and ‘file://’, but not on most systems ‘https://’). '''Under that base URL there should be directory trees for one or more of the following types of package distributions:'''
 
* "source": located at src/contrib and containing .tar.gz files. Other forms of compression can be used, e.g. .tar.bz2 or .tar.xz files.
* '''"win.binary": located at bin/windows/contrib/x.y for R versions x.y.z and containing .zip files for Windows.'''
* "mac.binary.leopard": located at bin/macosx/leopard/contrib/x.y for R versions x.y.z and containing .tgz files.
 
Each terminal directory must also contain a PACKAGES file. This can be a concatenation of the DESCRIPTION files of the packages separated by blank lines, but only a few of the fields are needed. The simplest way to set up such a file is to use function write_PACKAGES in the tools package, and its help explains which fields are needed. Optionally there can also be a PACKAGES.gz file, a gzip-compressed version of PACKAGES—as this will be downloaded in preference to PACKAGES it should be included for large repositories. (If you have a mis-configured server that does not report correctly non-existent files you will need PACKAGES.gz.)
 
To add your repository to the list offered by setRepositories(), see the help file for that function.
 
A repository can contain subdirectories, when the descriptions in the PACKAGES file of packages in subdirectories must include a line of the form
 
<nowiki>Path: path/to/subdirectory</nowiki>
 
—once again write_PACKAGES is the simplest way to set this up.
 
==== Space requirement if we want to mirror WHOLE repository ====
* Whole CRAN takes about 92GB (rsync -avn  cran.r-project.org::CRAN > ~/Downloads/cran).
* Bioconductor is big (> 64G for BioC 2.11). Please check the size of what will be transferred with e.g. (rsync -avn bioconductor.org::2.11 > ~/Downloads/bioc) and make sure you have enough room on your local disk before you start.
 
On the other hand, we if only care about Windows binary part, the space requirement is largely reduced.
* CRAN: 2.7GB
* Bioconductor: 28GB.
 
==== Misc notes ====
* If the binary package was built on R 2.15.1, then it cannot be installed on R 2.15.2. But vice is OK.
* Remember to issue "--delete" option in rsync, otherwise old version of package will be installed.
* The repository still need src directory. If it is missing, we will get an error
<pre>
<pre>
Warning: unable to access index for repository http://arraytools.no-ip.org/CRAN/src/contrib
simpleServer <- function(port=6543)
Warning message:
{
package ‘glmnet’ is not available (for R version 2.15.2)  
  sock <- socketConnection ( port=port , server=TRUE)
  on.exit(close( sock ))
  cat("\nWelcome to R!\nR>" ,file=sock )
  while(( line <- readLines ( sock , n=1)) != "quit")
  {
    cat(paste("socket >" , line , "\n"))
    out<- capture.output (try(eval(parse(text=line ))))
    writeLines ( out , con=sock )
    cat("\nR> " ,file =sock )
  }
}
</pre>
</pre>
The error was given by available.packages() function.
Then run simpleServer(). Open another terminal and try to communicate with the server
<pre>
$ telnet localhost 6543
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
 
Welcome to R!
R> summary(iris[, 3:5])
  Petal.Length    Petal.Width          Species 
Min.  :1.000  Min.  :0.100  setosa    :50 
1st Qu.:1.600  1st Qu.:0.300  versicolor:50 
Median :4.350  Median :1.300  virginica :50 
Mean  :3.758  Mean  :1.199                 
3rd Qu.:5.100  3rd Qu.:1.800                 
Max.  :6.900  Max.  :2.500                 


To bypass the requirement of src directory, I can use
R> quit
<pre>
Connection closed by foreign host.
install.packages("glmnet", contriburl = contrib.url(getOption('repos'), "win.binary"))
</pre>
</pre>
but there may be a problem when we use biocLite() command.


I find a workaround. Since the error comes from missing CRAN/src directory, we just need to make sure the directory CRAN/src/contrib exists AND either CRAN/src/contrib/PACKAGES or CRAN/src/contrib/PACKAGES.gz exists.
=== [http://www.rforge.net/Rserve/doc.html Rserve] ===
Note the way of launching Rserve is like the way we launch C program when R was embedded in C. See [[R#An_example_from_Bioconductor_workshop|Example from Bioconductor workshop]].


==== To create CRAN repository ====
See my [[Rserve]] page.
Before creating a local repository please give a dry run first. You don't want to be surprised how long will it take to mirror a directory.


Dry run (-n option). Pipe out the process to a text file for an examination.
=== outsider ===
<pre>
* [https://joss.theoj.org/papers/10.21105/joss.02038 outsider]: Install and run programs, outside of R, inside of R
rsync -avn cran.r-project.org::CRAN > crandryrun.txt
* [https://github.com/stephenturner/om..bcftools Run bcftools with outsider in R]
</pre>
To mirror only partial repository, it is necessary to create directories before running rsync command.
<pre>
cd
mkdir -p ~/Rmirror/CRAN/bin/windows/contrib/2.15
rsync -rtlzv --delete cran.r-project.org::CRAN/bin/windows/contrib/2.15/ ~/Rmirror/CRAN/bin/windows/contrib/2.15
(one line with space before ~/Rmirror)


# src directory is very large (~27GB) since it contains source code for each R version.
=== (Commercial) [http://www.statconn.com/ StatconnDcom] ===
# We just need the files PACKAGES and PACKAGES.gz in CRAN/src/contrib. So I comment out the following line.
# rsync -rtlzv --delete cran.r-project.org::CRAN/src/ ~/Rmirror/CRAN/src/
mkdir -p ~/Rmirror/CRAN/src/contrib
rsync -rtlzv --delete cran.r-project.org::CRAN/src/contrib/PACKAGES ~/Rmirror/CRAN/src/contrib/
rsync -rtlzv --delete cran.r-project.org::CRAN/src/contrib/PACKAGES.gz ~/Rmirror/CRAN/src/contrib/
</pre>
And optionally
<pre>
library(tools)
write_PACKAGES("~/Rmirror/CRAN/bin/windows/contrib/2.15", type="win.binary")
</pre>
and if we want to get src directory
<pre>
rsync -rtlzv --delete cran.r-project.org::CRAN/src/contrib/*.tar.gz ~/Rmirror/CRAN/src/contrib/
rsync -rtlzv --delete cran.r-project.org::CRAN/src/contrib/2.15.3 ~/Rmirror/CRAN/src/contrib/
</pre>


We can use '''du -h''' to check the folder size.  
=== [http://rdotnet.codeplex.com/ R.NET] ===


For example (as of 1/7/2013),
=== [https://cran.r-project.org/web/packages/rJava/index.html rJava] ===
<pre>
* [https://jozefhajnala.gitlab.io/r/r901-primer-java-from-r-1/ A primer in using Java from R - part 1]
$ du -k ~/Rmirror --max-depth=1 --exclude ".*" | sort -nr | cut -f2 | xargs -d '\n' du -sh
* Note rJava is needed by [https://cran.r-project.org/web/packages/xlsx/index.html xlsx] package.
30G /home/brb/Rmirror
 
28G /home/brb/Rmirror/Bioc
Terminal
2.7G /home/brb/Rmirror/CRAN
{{Pre}}
# jdk 7
sudo apt-get install openjdk-7-*
update-alternatives --config java
# oracle jdk 8
sudo add-apt-repository -y ppa:webupd8team/java
sudo apt-get update
echo debconf shared/accepted-oracle-license-v1-1 select true | sudo debconf-set-selections
echo debconf shared/accepted-oracle-license-v1-1 seen true | sudo debconf-set-selections
sudo apt-get -y install openjdk-8-jdk
</pre>
</pre>
 
and then run the following (thanks to http://stackoverflow.com/questions/12872699/error-unable-to-load-installed-packages-just-now) to fix an error: libjvm.so: cannot open shared object file: No such file or directory.
==== To create Bioconductor repository ====
* Create the file '''/etc/ld.so.conf.d/java.conf''' with the following entries:
Dry run
<pre>
<pre>
rsync -avn bioconductor.org::2.11 > biocdryrun.txt
/usr/lib/jvm/java-8-oracle/jre/lib/amd64
/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server
</pre>
</pre>
Then creates directories before running rsync.
* And then run '''sudo ldconfig'''


<syntaxhighlight lang='bash'>
Now go back to R
cd
{{Pre}}
mkdir -p ~/Rmirror/Bioc
install.packages("rJava")
wget -N http://www.bioconductor.org/biocLite.R -P ~/Rmirror/Bioc
</pre>
</syntaxhighlight>
Done!
where '''-N''' is to overwrite original file if the size or timestamp change and '''-P''' in wget means an output directory, not a file name.


Optionally, we can add the following in order to see the Bioconductor front page.
If above does not work, a simple way is by (under Ubuntu) running
<syntaxhighlight lang='bash'>
<pre>
rsync -zrtlv  --delete bioconductor.org::2.11/BiocViews.html ~/Rmirror/Bioc/packages/2.11/
sudo apt-get install r-cran-rjava
rsync -zrtlv  --delete bioconductor.org::2.11/index.html ~/Rmirror/Bioc/packages/2.11/
</pre>
</syntaxhighlight>
which will create new package 'default-jre' (under '''/usr/lib/jvm''') and 'default-jre-headless'.


The software part (aka bioc directory) installation:
=== RCaller ===
<syntaxhighlight lang='bash'>
cd
mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/bin/windows
mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/src
rsync -zrtlv  --delete bioconductor.org::2.11/bioc/bin/windows/ ~/Rmirror/Bioc/packages/2.11/bioc/bin/windows
# Either rsync whole src directory or just essential files
# rsync -zrtlv  --delete bioconductor.org::2.11/bioc/src/ ~/Rmirror/Bioc/packages/2.11/bioc/src
rsync -zrtlv  --delete bioconductor.org::2.11/bioc/src/contrib/PACKAGES ~/Rmirror/Bioc/packages/2.11/bioc/src/contrib/
rsync -zrtlv  --delete bioconductor.org::2.11/bioc/src/contrib/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/bioc/src/contrib/
# Optionally the html part
mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/html
rsync -zrtlv  --delete bioconductor.org::2.11/bioc/html/ ~/Rmirror/Bioc/packages/2.11/bioc/html
mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/vignettes
rsync -zrtlv  --delete bioconductor.org::2.11/bioc/vignettes/ ~/Rmirror/Bioc/packages/2.11/bioc/vignettes
mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/news
rsync -zrtlv  --delete bioconductor.org::2.11/bioc/news/ ~/Rmirror/Bioc/packages/2.11/bioc/news
mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/licenses
rsync -zrtlv  --delete bioconductor.org::2.11/bioc/licenses/ ~/Rmirror/Bioc/packages/2.11/bioc/licenses
mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/manuals
rsync -zrtlv  --delete bioconductor.org::2.11/bioc/manuals/ ~/Rmirror/Bioc/packages/2.11/bioc/manuals
mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/readmes
rsync -zrtlv  --delete bioconductor.org::2.11/bioc/readmes/ ~/Rmirror/Bioc/packages/2.11/bioc/readmes
</syntaxhighlight>
and annotation (aka data directory) part:
<syntaxhighlight lang='bash'>
mkdir -p ~/Rmirror/Bioc/packages/2.11/data/annotation/bin/windows
mkdir -p ~/Rmirror/Bioc/packages/2.11/data/annotation/src/contrib
# one line for each of the following
rsync -zrtlv --delete bioconductor.org::2.11/data/annotation/bin/windows/ ~/Rmirror/Bioc/packages/2.11/data/annotation/bin/windows
rsync -zrtlv --delete bioconductor.org::2.11/data/annotation/src/contrib/PACKAGES ~/Rmirror/Bioc/packages/2.11/data/annotation/src/contrib/
rsync -zrtlv --delete bioconductor.org::2.11/data/annotation/src/contrib/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/data/annotation/src/contrib/
</syntaxhighlight>
and experiment directory:
<syntaxhighlight lang='bash'>
mkdir -p ~/Rmirror/Bioc/packages/2.11/data/experiment/bin/windows/contrib/2.15
mkdir -p ~/Rmirror/Bioc/packages/2.11/data/experiment/src/contrib
# one line for each of the following
# Note that we are cheating by only downloading PACKAGES and PACKAGES.gz files
rsync -zrtlv --delete bioconductor.org::2.11/data/experiment/bin/windows/contrib/2.15/PACKAGES ~/Rmirror/Bioc/packages/2.11/data/experiment/bin/windows/contrib/2.15/
rsync -zrtlv --delete bioconductor.org::2.11/data/experiment/bin/windows/contrib/2.15/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/data/experiment/bin/windows/contrib/2.15/
rsync -zrtlv --delete bioconductor.org::2.11/data/experiment/src/contrib/PACKAGES ~/Rmirror/Bioc/packages/2.11/data/experiment/src/contrib/
rsync -zrtlv --delete bioconductor.org::2.11/data/experiment/src/contrib/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/data/experiment/src/contrib/
</syntaxhighlight>
and extra directory:
<syntaxhighlight lang='bash'>
mkdir -p ~/Rmirror/Bioc/packages/2.11/extra/bin/windows/contrib/2.15
mkdir -p ~/Rmirror/Bioc/packages/2.11/extra/src/contrib
# one line for each of the following
# Note that we are cheating by only downloading PACKAGES and PACKAGES.gz files
rsync -zrtlv --delete bioconductor.org::2.11/extra/bin/windows/contrib/2.15/PACKAGES ~/Rmirror/Bioc/packages/2.11/extra/bin/windows/contrib/2.15/
rsync -zrtlv --delete bioconductor.org::2.11/extra/bin/windows/contrib/2.15/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/extra/bin/windows/contrib/2.15/
rsync -zrtlv --delete bioconductor.org::2.11/extra/src/contrib/PACKAGES ~/Rmirror/Bioc/packages/2.11/extra/src/contrib/
rsync -zrtlv --delete bioconductor.org::2.11/extra/src/contrib/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/extra/src/contrib/
</syntaxhighlight>


==== sync Bioconductor software packages ====
=== RApache ===
To keep a copy of the bioc/source (software packages) code only,
* http://www.stat.ucla.edu/~jeroen/files/seminar.pdf
<syntaxhighlight lang='bash'>
$ mkdir -p ~/bioc_release/bioc/
$ rsync -zrtlv --delete master.bioconductor.org::release/bioc/src ~/bioc_release/bioc/
 
$ du -h ~/bioc_release/bioc/
# 20GB, 1565 items, Bioc 3.7
</syntaxhighlight>
Note ''-z'' - compress file data during the transfer, ''-t'' - preserve modification times, ''-l'' copy symbolic links as symbolic links. The option ''-zrtlv'' can be replaced by the common options ''-avz''.
 
To get the old versions of a packages (after the release of a version of Bioconductor), check out the ''Archive'' folder.


Now we can create a cron job to do sync. ''' ''Note'' ''' my observation is Bioconductor has a daily update around 10:45AM. So I set time at 11:00AM.
=== Rscript, arguments and commandArgs() ===
<syntaxhighlight lang='bash'>
[https://www.r-bloggers.com/passing-arguments-to-an-r-script-from-command-lines/ Passing arguments to an R script from command lines]
echo "00 11 * * * rsync -avz --delete master.bioconductor.org::release/bioc/src ~/bioc_release/bioc/" >> \
Syntax:
  ~/Documents/cronjob  # everyday at 6am & 1pm
crontab ~/Documents/cronjob
crontab -l
</syntaxhighlight>
 
=== To test local repository ===
 
==== Create soft links in Apache server ====
<pre>
<pre>
su
$ Rscript --help
ln -s /home/brb/Rmirror/CRAN /var/www/html/CRAN
Usage: /path/to/Rscript [--options] [-e expr [-e expr2 ...] | file] [args]
ln -s /home/brb/Rmirror/Bioc /var/www/html/Bioc
ls -l /var/www/html
</pre>
</pre>
The soft link mode should be 777.
==== To test CRAN ====
Replace the host name arraytools.no-ip.org by IP address 10.133.2.111 if necessary.


Example:
<pre>
<pre>
r <- getOption("repos"); r["CRAN"] <- "http://arraytools.no-ip.org/CRAN"
args = commandArgs(trailingOnly=TRUE)
options(repos=r)
# test if there is at least one argument: if not, return an error
install.packages("glmnet")
if (length(args)==0) {
  stop("At least one argument must be supplied (input file).n", call.=FALSE)
} else if (length(args)==1) {
  # default output file
  args[2] = "out.txt"
}
cat("args[1] = ", args[1], "\n")
cat("args[2] = ", args[2], "\n")
</pre>
</pre>
We can test if the backup server is working or not by installing a package which was removed from the CRAN. For example, 'ForImp' was removed from CRAN in 11/8/2012, but I still a local copy built on R 2.15.2 (run rsync on 11/6/2012).
<pre>
<pre>
r <- getOption("repos"); r["CRAN"] <- "http://cran.r-project.org"
Rscript --vanilla sillyScript.R iris.txt out.txt
r <- c(r, BRB='http://arraytools.no-ip.org/CRAN')
# args[1] = iris.txt
#                       CRAN                            CRANextra                                  BRB
# args[2] = out.txt
# "http://cran.r-project.org" "http://www.stats.ox.ac.uk/pub/RWin"  "http://arraytools.no-ip.org/CRAN"
options(repos=r)
install.packages('ForImp')
</pre>
</pre>


Note by default, CRAN mirror is selected interactively.
=== Rscript, #! Shebang and optparse package ===
<ul>
<li>Writing [https://www.r-bloggers.com/2014/05/r-scripts/ R scripts] like linux bash files.
<li>[https://www.makeuseof.com/shebang-in-linux/ What Is the Shebang (#!) Character Sequence in Linux?]
<li>[https://blog.rmhogervorst.nl/blog/2020/04/14/where-does-the-output-of-rscript-go/ Where does the output of Rscript go?]
<li>Create a file <shebang.R>.  
<pre>
<pre>
> getOption("repos")
#!/usr/bin/env Rscript
                                CRAN                            CRANextra
print ("shebang works")
                            "@CRAN@" "http://www.stats.ox.ac.uk/pub/RWin"
</pre>
</pre>
 
Then in the command line
==== To test Bioconductor ====
<pre>
<pre>
# CRAN part:
chmod u+x shebang.R
r <- getOption("repos"); r["CRAN"] <- "http://arraytools.no-ip.org/CRAN"
./shebang.R
options(repos=r)
# Bioconductor part:
options("BioC_mirror" = "http://arraytools.no-ip.org/Bioc")
source("http://bioconductor.org/biocLite.R")
# This source biocLite.R line can be placed either before or after the previous 2 lines
biocLite("aCGH")
</pre>
</pre>
<li>[http://www.cureffi.org/2014/01/15/running-r-batch-mode-linux/ Running R in batch mode on Linux]
<li>[https://cran.r-project.org/web/packages/optparse/index.html optparse] package. Check out its vignette.
<li>[https://cran.r-project.org/web/packages/getopt/index.html getopt]: C-Like 'getopt' Behavior.
</ul>


If there is a connection problem, check folder attributes.
=== [http://dirk.eddelbuettel.com/code/littler.html littler] ===
<pre>
Provides hash-bang (#!) capability for R
chmod -R 755 ~/CRAN/bin
 
</pre>
FAQs:
* [http://stackoverflow.com/questions/3205302/difference-between-rscript-and-littler Difference between Rscript and littler]
* [https://stackoverflow.com/questions/3412911/r-exe-rcmd-exe-rscript-exe-and-rterm-exe-whats-the-difference Whats the difference between Rscript and R CMD BATCH]
* [https://stackoverflow.com/questions/21969145/why-or-when-is-rscript-or-littler-better-than-r-cmd-batch Why (or when) is Rscript (or littler) better than R CMD BATCH?]
{{Pre}}
root@ed5f80320266:/# ls -l /usr/bin/{r,R*}
# R 3.5.2 docker container
-rwxr-xr-x 1 root root 82632 Jan 26 18:26 /usr/bin/r        # binary, can be used for 'shebang' lines, r --help
                                              # Example: r --verbose -e "date()"


* Note that if a binary package was created for R 2.15.1, then it can be installed under R 2.15.1 but not R 2.15.2. The R console will show package xxx is not available (for R version 2.15.2).
-rwxr-xr-x 1 root root  8722 Dec 20 11:35 /usr/bin/R       # text, R --help
                                              # Example: R -q -e "date()"


* For binary installs, the function also checks for the availability of a source package on the same repository, and reports if the source package has a later version, or is available but no binary version is.
-rwxr-xr-x 1 root root 14552 Dec 20 11:35 /usr/bin/Rscript  # binary, can be used for 'shebang' lines, Rscript --help
So for example, if the mirror does not have contents under src directory, we need to run the following line in order to successfully run ''install.packages()'' function.
                                              # It won't show the startup message when it is used in the command line.
<pre>
                                              # Example: Rscript -e "date()"
options(install.packages.check.source = "no")
</pre>
</pre>


* If we only mirror the essential directories, we can run biocLite() successfully. However, the R console will give some warning
We can install littler using two ways.
<pre>
* install.packages("littler"). This will install the latest version but the binary 'r' program is only available under the package/bin directory (eg ''~/R/x86_64-pc-linux-gnu-library/3.4/littler/bin/r''). You need to create a soft link in order to access it globally.
> biocLite("aCGH")
* sudo apt install littler. This will install 'r' globally; however, the installed version may be old.
BioC_mirror: http://arraytools.no-ip.org/Bioc
 
Using Bioconductor version 2.11 (BiocInstaller 1.8.3), R version 2.15.
After the installation, vignette contains several examples. The off-line vignette has a table of contents. Nice! The [http://dirk.eddelbuettel.com/code/littler.examples.html web version of examples] does not have the TOC.
Installing package(s) 'aCGH'
Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/data/experiment/src/contrib
Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/extra/src/contrib
Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/data/experiment/bin/windows/contrib/2.15
Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/extra/bin/windows/contrib/2.15
trying URL 'http://arraytools.no-ip.org/Bioc/packages/2.11/bioc/bin/windows/contrib/2.15/aCGH_1.36.0.zip'
Content type 'application/zip' length 2431158 bytes (2.3 Mb)
opened URL
downloaded 2.3 Mb


package ‘aCGH’ successfully unpacked and MD5 sums checked
'''r''' was not meant to run interactively like '''R'''. See ''man r''.


The downloaded binary packages are in
=== RInside: Embed R in C++ ===
        C:\Users\limingc\AppData\Local\Temp\Rtmp8IGGyG\downloaded_packages
See [[R#RInside|RInside]]
Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/data/experiment/bin/windows/contrib/2.15
Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/extra/bin/windows/contrib/2.15
> library()
</pre>


=== CRAN repository directory structure ===
(''From RInside documentation'') The RInside package makes it easier to embed R in your C++ applications. There is no code you would execute directly from the R environment. Rather, you write C++ programs that embed R which is illustrated by some the included examples.
The information below is specific to R 2.15.2. There are linux and macosx subdirecotries whenever there are windows subdirectory.
<pre>
bin/winows/contrib/2.15
src/contrib
  /contrib/2.15.2
  /contrib/Archive
web/checks
  /dcmeta
  /packages
  /views
</pre>


A clickable map [http://taichi.selfip.net:81/RmirrorMap/Rmirror.html]
The included examples are armadillo, eigen, mpi, qt, standard, threads and wt.


=== CRAN package download statistics from RStudio ===
To run 'make' when we don't have a global R, we should modify the file <Makefile>. Also if we just want to create one executable file, we can do, for example, 'make rinside_sample1'.
* Daily download statistics http://cran-logs.rstudio.com/. Note the page is split into 'package' download and 'R' download. It tracks
** Package: date, time, size, r_version, r_arch, r_os, package, version, country, ip_id.
** R: date, time, size, R version, os (win/src/osx), county, ip_id (reset daily).
* https://www.r-bloggers.com/finally-tracking-cran-packages-downloads/. The code still works.
* https://strengejacke.wordpress.com/2015/03/07/cran-download-statistics-of-any-packages-rstats/


=== Bioconductor package download statistics ===
To run any executable program, we need to specify '''LD_LIBRARY_PATH''' variable, something like
http://bioconductor.org/packages/stats/
<pre>export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/brb/Downloads/R-3.0.2/lib </pre>


=== Bioconductor repository directory structure ===
The real build process looks like (check <Makefile> for completeness)
The information below is specific to Bioc 2.11 (R 2.15). There are linux and macosx subdirecotries whenever there are windows subdirectory.
<pre>
<pre>
bioc/bin/windows/contrib/2.15
g++ -I/home/brb/Downloads/R-3.0.2/include \
     /html
     -I/home/brb/Downloads/R-3.0.2/library/Rcpp/include \
     /install
     -I/home/brb/Downloads/R-3.0.2/library/RInside/include -g -O2 -Wall \
     /license
     -I/usr/local/include  \
     /manuals
     rinside_sample0.cpp  \
     /news
     -L/home/brb/Downloads/R-3.0.2/lib -lR  -lRblas -lRlapack \
     /src
     -L/home/brb/Downloads/R-3.0.2/library/Rcpp/lib -lRcpp \
    /vignettes
    -Wl,-rpath,/home/brb/Downloads/R-3.0.2/library/Rcpp/lib \
data/annotation/bin/windows/contrib/2.15
    -L/home/brb/Downloads/R-3.0.2/library/RInside/lib -lRInside \
              /html
    -Wl,-rpath,/home/brb/Downloads/R-3.0.2/library/RInside/lib \
              /licenses
    -o rinside_sample0
              /manuals
              /src
              /vignettes
    /experiment/bin/windows/contrib/2.15
                /html
                /manuals
                /src/contrib
                /vignettes
extra/bin/windows/contrib
    /html
    /src
    /vignettes
</pre>
</pre>


=== List all R packages from CRAN/Bioconductor ===
Hello World example of embedding R in C++.
<s>
<pre>
Check my daily result based on R 2.15 and Bioc 2.11 in [http://taichi.selfip.net:81/Rsummary/R_reposit.html]
#include <RInside.h>                    // for the embedded R via RInside


# [http://taichi.selfip.net:81/Rsummary/cran.html CRAN]
int main(int argc, char *argv[]) {
# [http://taichi.selfip.net:81/Rsummary/bioc.html Bioc software]
# [http://taichi.selfip.net:81/Rsummary/annotation.html Bioc annotation]
# [http://taichi.selfip.net:81/Rsummary/experiment.html Bioc experiment]
</s>


See [http://www.r-pkg.org/pkglist METACRAN] for packages hosted on CRAN. The 'https://github.com/metacran/PACKAGES' file contains the latest update.
    RInside R(argc, argv);              // create an embedded R instance


== r-hub: the everything-builder the R community needs ==
    R["txt"] = "Hello, world!\n"; // assign a char* (string) to 'txt'
https://github.com/r-hub/proposal
=== Introducing R-hub, the R package builder service ===
* https://www.rstudio.com/resources/videos/r-hub-overview/
* http://blog.revolutionanalytics.com/2016/10/r-hub-public-beta.html


== Parallel Computing ==
    R.parseEvalQ("cat(txt)");          // eval the init string, ignoring any returns


# [http://shop.oreilly.com/product/0636920021421.do Example code] for the book Parallel R by McCallum and Weston.
    exit(0);
# [http://www.win-vector.com/blog/2016/01/parallel-computing-in-r/ A gentle introduction to parallel computing in R]
}
# [http://www.stat.berkeley.edu/scf/paciorek-distribComp.pdf An introduction to distributed memory parallelism in R and C]
</pre>
# [http://danielmarcelino.com/parallel-processing/Parallel Processing: When does it worth?]


=== Security warning from Windows/Mac ===
The above can be compared to the Hello world example in Qt.
It seems it is safe to choose 'Cancel' when Windows Firewall tried to block R program when we use '''makeCluster()''' to create a socket cluster.
<pre>
<pre>
library(parallel)
#include <QApplication.h>
cl <- makeCluster(2)
#include <QPushButton.h>
clusterApply(cl, 1:2, get("+"), 3)
stopCluster(cl)
</pre>
[[File:WindowsSecurityAlert.png|100px]]  [[File:RegisterDoParallel mac.png|150px]]


If we like to see current firewall settings, just click Windows Start button, search 'Firewall' and choose 'Windows Firewall with Advanced Security'. In the 'Inbound Rules', we can see what programs (like, R for Windows GUI front-end, or Rserve) are among the rules. These rules are called 'private' in the 'Profile' column. Note that each of them may appear twice because one is 'TCP' protocol and the other one has a 'UDP' protocol.
int main( int argc, char **argv )
{
    QApplication app( argc, argv );


=== Detect number of cores ===
    QPushButton hello( "Hello world!", 0 );
<syntaxhighlight lang='rsplus'>
    hello.resize( 100, 30 );
parallel::detectCores()
</syntaxhighlight>
Don't use the default option getOption("mc.cores", 2L) (PS it only returns 2.) in mclapply() unless you are a developer for a package.


However, it is a different story when we run the R code in HPC cluster. Read the discussion [https://stackoverflow.com/questions/28954991/whether-to-use-the-detectcores-function-in-r-to-specify-the-number-of-cores-for Whether to use the detectCores function in R to specify the number of cores for parallel processing?]
    app.setMainWidget( &hello );
    hello.show();


On NIH's biowulf, even I specify an interactive session with 4 cores, the parallel::detectCores() function returns 56. This number is the same as the output from the bash command '''grep processor /proc/cpuinfo''' or (better) '''lscpu'''. The '''free -hm''' also returns a full 125GB size instead of my requested size (4GB by default).
    return app.exec();
}
</pre>


=== parallel package ===
=== [http://www.rfortran.org/ RFortran] ===
Parallel package was included in R 2.14.0. It is derived from the snow and multicore packages and provides many of the same functions as those packages.
RFortran is an open source project with the following aim:


The parallel package provides several *apply functions for R users to quickly modify their code using parallel computing.
''To provide an easy to use Fortran software library that enables Fortran programs to transfer data and commands to and from R.''


* makeCluster(makePSOCKcluster, makeForkCluster), stopCluster. Other cluster types are passed to package '''snow'''.
It works only on Windows platform with Microsoft Visual Studio installed:(
* '''clusterCall''', clusterEvalQ: source R files and/or load libraries
* clusterSplit
* '''clusterApply''', '''clusterApplyLB''' (vs the foreach package)
* '''clusterExport''': export variables
* clusterMap
* parLapply, parSapply, parApply, parRapply, parCapply
* parLapplyLB, parSapplyLB (load balance version)
* clusterSetRNGStream, nextRNGStream, nextRNGSubStream


Examples (See ?[http://www.inside-r.org/r-doc/parallel/clusterApply clusterApply])
== Call R from other languages ==
<syntaxhighlight lang='rsplus'>
=== C ===
library(parallel)
[http://sebastian-mader.net/programming/using-r-from-c-c/ Using R from C/C++]
cl <- makeCluster(2, type = "SOCK")
clusterApply(cl, 1:2, function(x) x*3)    # OR clusterApply(cl, 1:2, get("*"), 3)
# [[1]]
# [1] 3
#
# [[2]]
# [1] 6
parSapply(cl, 1:20, get("+"), 3)
#  [1] 4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
stopCluster(cl)
</syntaxhighlight>
An example of using clusterCall() or clusterEvalQ()
<syntaxhighlight lang='rsplus'>
library(parallel)


cl <- makeCluster(4)
Error: [https://stackoverflow.com/questions/43662542/not-resolved-from-current-namespace-error-when-calling-c-routines-from-r “not resolved from current namespace” error, when calling C routines from R]
clusterCall(cl, function() {
  source("test.R")
})
# clusterEvalQ(cl, {
#    source("test.R")
# })


## do some parallel work
Solution: add '''getNativeSymbolInfo()''' around your C/Fortran symbols. Search Google:r dyn.load not resolved from current namespace
stopCluster(cl)
</syntaxhighlight>


=== [http://cran.r-project.org/web/packages/snow/index.html snow] package ===
=== JRI ===
http://www.rforge.net/JRI/


Supported cluster types are "SOCK", "PVM", "MPI", and "NWS".
=== ryp2 ===
http://rpy.sourceforge.net/rpy2.html


=== [http://cran.r-project.org/web/packages/multicore/index.html multicore] package ===
== Create a standalone Rmath library ==
This package is removed from CRAN.  
R has many math and statistical functions. We can easily use these functions in our C/C++/Fortran. The definite guide of doing this is on Chapter 9 "The standalone Rmath library" of [http://cran.r-project.org/doc/manuals/R-admin.html#The-standalone-Rmath-library R-admin manual].


Consider using package ‘parallel’ instead.
Here is my experience based on R 3.0.2 on Windows OS.


=== [http://cran.r-project.org/web/packages/foreach/index.html foreach] package ===
=== Create a static library <libRmath.a> and a dynamic library <Rmath.dll> ===
This package depends on one of the following  
Suppose we have downloaded R source code and build R from its source. See [[R#Build_R_from_its_source|Build_R_from_its_source]]. Then the following 2 lines will generate files <libRmath.a> and <Rmath.dll> under C:\R\R-3.0.2\src\nmath\standalone directory.
* doParallel - Foreach parallel adaptor for the parallel package
<pre>
* doSNOW - Foreach parallel adaptor for the snow package
cd C:\R\R-3.0.2\src\nmath\standalone
* doMC - Foreach parallel adaptor for the multicore package. Used in [https://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html glmnet] vignette.
make -f Makefile.win
* doMPI - Foreach parallel adaptor for the Rmpi package
</pre>
* doRedis - Foreach parallel adapter for the rredis package
as a backend.


<syntaxhighlight lang='rsplus'>
=== Use Rmath library in our code ===
library(foreach)
<pre>
library(doParallel)
set CPLUS_INCLUDE_PATH=C:\R\R-3.0.2\src\include
set LIBRARY_PATH=C:\R\R-3.0.2\src\nmath\standalone
# It is not LD_LIBRARY_PATH in above.


m <- matrix(rnorm(9), 3, 3)
# Created <RmathEx1.cpp> from the book "Statistical Computing in C++ and R" web site
# http://math.la.asu.edu/~eubank/CandR/ch4Code.cpp
# It is OK to save the cpp file under any directory.


cl <- makeCluster(2, type = "SOCK")
# Force to link against the static library <libRmath.a>
registerDoParallel(cl) # register the parallel backend with the foreach package
g++ RmathEx1.cpp -lRmath -lm -o RmathEx1.exe
foreach(i=1:nrow(m), .combine=rbind) %dopar%
# OR
  (m[i,] / mean(m[i,]))
g++ RmathEx1.cpp -Wl,-Bstatic -lRmath -lm -o RmathEx1.exe


stopCluster(cl)
# Force to link against dynamic library <Rmath.dll>
</syntaxhighlight>
g++ RmathEx1.cpp Rmath.dll -lm -o RmathEx1Dll.exe
</pre>
Test the executable program. Note that the executable program ''RmathEx1.exe'' can be transferred to and run in another computer without R installed. Isn't it cool!
<pre>
c:\R>RmathEx1
Enter a argument for the normal cdf:
1
Enter a argument for the chi-squared cdf:
1
Prob(Z <= 1) = 0.841345
Prob(Chi^2 <= 1)= 0.682689
</pre>


See also this post [http://blog.revolutionanalytics.com/2015/10/updates-to-the-foreach-package-and-its-friends.html Updates to the foreach package and its friends] on Oct 2015.
Below is the cpp program <RmathEx1.cpp>.
<pre>
//RmathEx1.cpp
#define MATHLIB_STANDALONE
#include <iostream>
#include "Rmath.h"


* [https://statcompute.wordpress.com/2015/12/13/calculate-leave-one-out-prediction-for-glm/ Cross validation in prediction for glm]
using std::cout; using std::cin; using std::endl;
* [http://gforge.se/2015/02/how-to-go-parallel-in-r-basics-tips/#The_foreach_package How-to go parallel in R – basics + tips]


==== combine list of lists ====
int main()
* .combine argument https://stackoverflow.com/questions/27279164/output-list-of-two-rbinded-data-frames-with-foreach-in-r
{
* [https://stackoverflow.com/questions/9519543/merge-two-lists-in-r Merge lists] by mapply() or base::Map()
  double x1, x2;
  cout << "Enter a argument for the normal cdf:" << endl;
  cin >> x1;
  cout << "Enter a argument for the chi-squared cdf:" << endl;
  cin >> x2;


<syntaxhighlight lang='rsplus'>
  cout << "Prob(Z <= " << x1 << ") = " <<
comb <- function(...) {
    pnorm(x1, 0, 1, 1, 0)  << endl;
  mapply('cbind', ..., SIMPLIFY=FALSE)
  cout << "Prob(Chi^2 <= " << x2 << ")= " <<
    pchisq(x2, 1, 1, 0) << endl;
  return 0;
}
}
</pre>


library(foreach)
== Calling R.dll directly ==
library(doParallel)
See Chapter 8.2.2 of [http://cran.r-project.org/doc/manuals/R-exts.html#Calling-R_002edll-directly|Writing R Extensions]. This is related to embedding R under Windows. The file <R.dll> on Windows is like <libR.so> on Linux.


cl <- makeCluster(2)
== Create HTML report ==
registerDoParallel(cl) # register the parallel backend with the foreach package
[http://www.bioconductor.org/packages/release/bioc/html/ReportingTools.html ReportingTools] (Jason Hackney) from Bioconductor. See [[Genome#ReportingTools|Genome->ReportingTools]].


m <- rbind(rep(1,3), rep(2,3))
=== [http://cran.r-project.org/web/packages/htmlTable/index.html htmlTable] package ===
The htmlTable package is intended for generating tables using HTML formatting. This format is compatible with Markdown when used for HTML-output. The most basic table can easily be created by just passing a matrix or a data.frame to the htmlTable-function.


# nrow(m) can represents number of permutations (2 in this toy example)
* http://cran.r-project.org/web/packages/htmlTable/vignettes/general.html
tmp <- foreach(i=1:nrow(m)) %dopar% {
* http://gforge.se/2014/01/fast-track-publishing-using-knitr-part-iv/
  a <- m[i, ]
* [http://gforge.se/2020/07/news-in-htmltable-2-0/ News in htmlTable 2.0]
  b <- a * 10
  list(a, b)
}; tmp
# [[1]]
# [[1]][[1]]
# [1] 1 1 1
#
# [[1]][[2]]
# [1] 10 10 10
#
#
# [[2]]
# [[2]][[1]]
# [1] 2 2 2
#
# [[2]][[2]]
# [1] 20 20 20


foreach(i=1:nrow(m), .combine = "comb") %dopar% {
=== [https://cran.r-project.org/web/packages/formattable/index.html formattable] ===
  a <- m[i,]
* https://github.com/renkun-ken/formattable
  b <- a * 10
* http://www.magesblog.com/2016/01/formatting-table-output-in-r.html
  list(a, b)
* [https://www.displayr.com/formattable/ Make Beautiful Tables with the Formattable Package]
}
# [[1]]
#      [,1] [,2]
# [1,]   1    2
# [2,]    1    2
# [3,]    1    2
#
# [[2]]
#      [,1] [,2]
# [1,]  10  20
# [2,]  10  20
# [3,]  10  20
stopCluster(cl)
</syntaxhighlight>


==== Replacing double loops ====
=== [https://github.com/crubba/htmltab htmltab] package ===
* https://stackoverflow.com/questions/30927693/how-can-i-parallelize-a-double-for-loop-in-r
This package is NOT used to CREATE html report but EXTRACT html table.
* http://www.exegetic.biz/blog/2013/08/the-wonders-of-foreach/
<syntaxhighlight lang='rsplus'>
library(foreach)
library(doParallel)


nc <- 4
=== [http://cran.r-project.org/web/packages/ztable/index.html ztable] package ===
nr <- 2
Makes zebra-striped tables (tables with alternating row colors) in LaTeX and HTML formats easily from a data.frame, matrix, lm, aov, anova, glm or coxph objects.


cores=detectCores()
== Create academic report ==
cl <- makeCluster(cores[1]-1)
[http://cran.r-project.org/web/packages/reports/index.html reports] package in CRAN and in [https://github.com/trinker/reports github] repository. The youtube video gives an overview of the package.
registerDoParallel(cl)
# set.seed(1234) # not work
# set.seed(1234, "L'Ecuyer-CMRG") # not work either
# library("doRNG")
# registerDoRNG(seed = 1985)    # not work with nested foreach
# Error in list(e1 = list(args = (1:nr)(), argnames = "i", evalenv = <environment>,  :
#  nested/conditional foreach loops are not supported yet.
m <- foreach (i = 1:nr, .combine='rbind') %:% # nesting operator
  foreach (j = 1:nc) %dopar% {
    rnorm(1, i*5, j) # code to parallelise
}
m
stopCluster(cl)
</syntaxhighlight>
Note that since the random seed (see the next session) does not work on nested loop, it is better to convert nested loop (two indices) to a single loop (one index).


==== set seed and [https://cran.r-project.org/web/packages/doRNG/ doRNG] package ====
== Create pdf and epub files ==
* [https://cran.r-project.org/web/packages/doRNG/vignettes/doRNG.pdf Vignette], [https://www.rdocumentation.org/packages/doRNG/versions/1.7.1/topics/doRNG-package Documentation]
{{Pre}}
* [http://michaeljkoontz.weebly.com/uploads/1/9/9/4/19940979/parallel.pdf#page=4 doRNG] package example
# Idea:
* [https://stackoverflow.com/questions/8358098/how-to-set-seed-for-random-simulations-with-foreach-and-domc-packages How to set seed for random simulations with foreach and doMC packages?]
#       knitr        pdflatex
* Use '''clusterSetRNGStream()''' from the parallel package; see [http://gforge.se/2015/02/how-to-go-parallel-in-r-basics-tips/ How-to go parallel in R – basics + tips]
#  rnw -------> tex ----------> pdf
* http://www.stat.colostate.edu/~scharfh/CSP_parallel/handouts/foreach_handout.html#random-numbers <syntaxhighlight lang='rsplus'>
library(knitr)
library("doRNG") # doRNG does not need to be loaded after doParallel
knit("example.rnw") # create example.tex file
library("doParallel")
</pre>
* A very simple example <002-minimal.Rnw> from [http://yihui.name/knitr/demo/minimal/ yihui.name] works fine on linux.
{{Pre}}
git clone https://github.com/yihui/knitr-examples.git
</pre>
* <knitr-minimal.Rnw>. I have no problem to create pdf file on Windows but still cannot generate pdf on Linux from tex file. Some people suggested to run '''sudo apt-get install texlive-fonts-recommended''' to install missing fonts. It works!


cl <- makeCluster(2)
To see a real example, check out [http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html DESeq2] package (inst/doc subdirectory). In addition to DESeq2, I also need to install '''DESeq, BiocStyle, airway, vsn, gplots''', and '''pasilla''' packages from Bioconductor. Note that, it is best to use sudo/admin account to install packages.
registerDoParallel(cl)


registerDoRNG(seed = 1234) # works for a single loop
Or starts with markdown file. Download the example <001-minimal.Rmd> and remove the last line of getting png file from internet.
m1 <- foreach(i = 1:5, .combine = 'c') %dopar% rnorm(1)
{{Pre}}
registerDoRNG(seed = 1234)
# Idea:
m2 <- foreach(i = 1:5, .combine = 'c') %dopar% rnorm(1)
#        knitr        pandoc
identical(m1, m2)
#  rmd -------> md ----------> pdf
stopCluster(cl)


attr(m1, "rng") <- NULL # remove rng attribute
git clone https://github.com/yihui/knitr-examples.git
</syntaxhighlight>
cd knitr-examples
* Another way to use the seed is to supply '''[https://www.rdocumentation.org/packages/doRNG/versions/1.7.1/topics/%25dorng%25 .options.RNG]''' in foreach() function. <syntaxhighlight lang='rsplus'>
R -e "library(knitr); knit('001-minimal.Rmd')"
r1 <- foreach(i=1:4, .options.RNG=1234) %dorng% { runif(1) }
pandoc 001-minimal.md -o 001-minimal.pdf # require pdflatex to be installed !!
</syntaxhighlight>
</pre>


==== Export libraries and variables ====
To create an epub file (not success yet on Windows OS, missing figures on Linux OS)
* http://stat.ethz.ch/R-manual/R-devel/library/parallel/html/clusterApply.html
{{Pre}}
<syntaxhighlight lang='rsplus'>
# Idea:
clusterEvalQ(cl, {
#        knitr        pandoc
  library(biospear)
#   rnw -------> tex ----------> markdown or epub
  library(glmnet)
   library(survival)
})
clusterExport(cl, list("var1", "foo2"))
</syntaxhighlight>


==== Summary the result ====
library(knitr)
foreach returns the result in a list. For example, if each component is a matrix we can use
knit("DESeq2.Rnw") # create DESeq2.tex
system("pandoc  -f latex -t markdown -o DESeq2.md DESeq2.tex")
</pre>


* Reduce("+", res)/length(res) # Reduce("+", res, na.rm = TRUE) not working
Convert tex to epub
* apply(simplify2array(res), 1:2, mean, na.rm = TRUE)
* http://tex.stackexchange.com/questions/156668/tex-to-epub-conversion


to get the average of matrices over the list.
=== [https://www.rdocumentation.org/packages/knitr/versions/1.20/topics/kable kable()] for tables ===
Create Tables In LaTeX, HTML, Markdown And ReStructuredText


=== snowfall package ===
* https://rmarkdown.rstudio.com/lesson-7.html
http://www.imbi.uni-freiburg.de/parallel/docs/Reisensburg2009_TutParallelComputing_Knaus_Porzelius.pdf
* https://stackoverflow.com/questions/20942466/creating-good-kable-output-in-rstudio
* http://kbroman.org/knitr_knutshell/pages/figs_tables.html
* https://blogs.reed.edu/ed-tech/2015/10/creating-nice-tables-using-r-markdown/
* [https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html kableExtra] package


=== [http://cran.r-project.org/web/packages/Rmpi/index.html Rmpi] package ===
== Create Word report ==
Some examples/tutorials


* http://trac.nchc.org.tw/grid/wiki/R-MPI_Install
=== Using the power of Word ===
* http://www.arc.vt.edu/resources/software/r/index.php
[https://www.rforecology.com/post/exporting-tables-from-r-to-microsoft-word/ How to go from R to nice tables in Microsoft Word]
* https://www.sharcnet.ca/help/index.php/Using_R_and_MPI
* http://math.acadiau.ca/ACMMaC/Rmpi/examples.html
* http://www.umbc.edu/hpcf/resources-tara/how-to-run-R.html
* [http://www.slideshare.net/bytemining/taking-r-to-the-limit-high-performance-computing-in-r-part-1-parallelization-la-r-users-group-727 Ryan Rosario]
* http://pj.freefaculty.org/guides/Rcourse/parallel-1/parallel-1.pdf
* * http://biowulf.nih.gov/apps/R.html


=== OpenMP ===
=== knitr + pandoc ===
* [http://www.parallelr.com/r-and-openmp-boosting-compiled-code-on-multi-core-cpu-s/ R and openMP: boosting compiled code on multi-core cpu-s] from parallelr.com.
* http://www.r-statistics.com/2013/03/write-ms-word-document-using-r-with-as-little-overhead-as-possible/
* http://www.carlboettiger.info/2012/04/07/writing-reproducibly-in-the-open-with-knitr.html
* http://rmarkdown.rstudio.com/articles_docx.html


=== [http://www.bioconductor.org/packages/release/bioc/html/BiocParallel.html BiocParallel] ===
It is better to create rmd file in RStudio. Rstudio provides a template for rmd file and it also provides a quick reference to R markdown language.
* [http://rpubs.com/seandavi/KallistoFromR Orchestrating a small, parallel, RNA-seq pre-processing workflow using R]
<pre>
# Idea:
#        knitr      pandoc
#  rmd -------> md --------> docx
library(knitr)
knit2html("example.rmd") #Create md and html files
</pre>
and then
<pre>
FILE <- "example"
system(paste0("pandoc -o ", FILE, ".docx ", FILE, ".md"))
</pre>
Note. For example reason, if I play around the above 2 commands for several times, the knit2html() does not work well. However, if I click 'Knit HTML' button on the RStudio, it then works again.


=== [https://cran.r-project.org/web/packages/RcppParallel/index.html RcppParallel] ===
Another way is
<pre>
library(pander)
name = "demo"
knit(paste0(name, ".Rmd"), encoding = "utf-8")
Pandoc.brew(file = paste0(name, ".md"), output = paste0(-name, "docx"), convert = "docx")
</pre>


=== future & [https://cran.r-project.org/web/packages/future.apply/index.html future.apply] package ===
Note that once we have used knitr command to create a md file, we can use pandoc shell command to convert it to different formats:
* [https://alexioannides.com/2016/11/02/asynchronous-and-distributed-programming-in-r-with-the-future-package/ Asynchronous and Distributed Programming in R with the Future Package]
* A pdf file: pandoc -s report.md -t latex -o report.pdf
* [https://www.jottr.org/2018/06/23/future.apply_1.0.0/ Parallelize Any Base R Apply Function]
* A html file: pandoc -s report.md -o report.html (with the -c flag html files can be added easily)
 
* Openoffice: pandoc report.md -o report.odt
=== Apache Spark ===
* Word docx: pandoc report.md -o report.docx
* [http://files.meetup.com/3576292/Dubravko%20Dulic%20SparkR%20June%202016.pdf Introduction to Apache Spark]


=== Microsoft R Server ===
We can also create the epub file for reading on Kobo ereader. For example, download [https://gist.github.com/jeromyanglim/2716336 this file] and save it as example.Rmd. I need to remove the line containing the link to http://i.imgur.com/RVNmr.jpg since it creates an error when I run pandoc (not sure if it is the pandoc version I have is too old). Now we just run these 2 lines to get the epub file. Amazing!
* [http://files.meetup.com/3576292/Stefan%20Cronjaeger%20R%20Server.pdf Microsoft R '''Server'''] (not Microsoft R Open)
<pre>
knit("example.Rmd")
pandoc("example.md", format="epub")
</pre>


=== GPU ===
PS. If we don't remove the link, we will get an error message (pandoc 1.10.1 on Windows 7)
* [http://www.parallelr.com/r-gpu-programming-for-all-with-gpur/ GPU Programming for All with ‘gpuR] from parallelr.com. The gpuR is available on [https://cran.r-project.org/web/packages/gpuR/index.html CRAN].
<pre>
* [https://cran.r-project.org/web/packages/gputools/index.html gputools]
> pandoc("Rmd_to_Epub.md", format="epub")
executing pandoc  -f markdown -t epub -o Rmd_to_Epub.epub "Rmd_to_Epub.utf8md"
pandoc.exe: .\.\http://i.imgur.com/RVNmr.jpg: openBinaryFile: invalid argument (Invalid argument)
Error in (function (input, format, ext, cfg)  : conversion failed
In addition: Warning message:
running command 'pandoc  -f markdown -t epub -o Rmd_to_Epub.epub "Rmd_to_Epub.utf8md"' had status 1
</pre>


=== Threads ===
=== pander ===
* [https://cran.r-project.org/web/packages/Rdsm/index.html Rdsm] package
Try pandoc[1] with a minimal reproducible example, you might give a try to my "[http://cran.r-project.org/web/packages/pander/ pander]" package [2] too:
* [https://random-remarks.net/2016/12/11/a-very-experimental-threading-in-r/ (A Very) Experimental Threading in R] and a post from [https://matloff.wordpress.com/2016/12/11/threading-in-r/ Mad Scientist]


=== Benchmark ===
<pre>
[http://rpsychologist.com/benchmark-parallel-sim Are parallel simulations in the cloud worth it? Benchmarking my MBP vs my Workstation vs Amazon EC2]
library(pander)
Pandoc.brew(system.file('examples/minimal.brew', package='pander'),
            output = tempfile(), convert = 'docx')
</pre>
Where the content of the "minimal.brew" file is something you might have
got used to with Sweave - although it's using "brew" syntax instead. See
the examples of pander [3] for more details. Please note that pandoc should
be installed first, which is pretty easy on Windows.


R functions to run timing
# http://johnmacfarlane.net/pandoc/
<syntaxhighlight lang='rsplus'>
# http://rapporter.github.com/pander/
# Method 1
# http://rapporter.github.com/pander/#examples
system.time( invisible(rnorm(10000)))


# Method 2
=== R2wd ===
btime <- Sys.time()
Use [http://cran.r-project.org/web/packages/R2wd/ R2wd] package. However, only 32-bit R is allowed and sometimes it can not produce all 'table's.  
invisible(rnorm(10000))
<pre>
Sys.time() - btime
> library(R2wd)
</syntaxhighlight>
> wdGet()
Loading required package: rcom
Loading required package: rscproxy
rcom requires a current version of statconnDCOM installed.
To install statconnDCOM type
    installstatconnDCOM()


== Cloud Computing ==
This will download and install the current version of statconnDCOM


=== Install R on Amazon EC2 ===
You will need a working Internet connection
http://randyzwitch.com/r-amazon-ec2/
because installation needs to download a file.
Error in if (wdapp[["Documents"]][["Count"]] == 0) wdapp[["Documents"]]$Add() :
  argument is of length zero
</pre>
 
The solution is to launch 32-bit R instead of 64-bit R since statconnDCOM does not support 64-bit R.


=== Bioconductor on Amazon EC2 ===
=== Convert from pdf to word ===
http://www.bioconductor.org/help/bioconductor-cloud-ami/
The best rendering of advanced tables is done by converting from pdf to Word. See http://biostat.mc.vanderbilt.edu/wiki/Main/SweaveConvert


== Big Data Analysis ==
=== rtf ===
* http://blog.comsysto.com/2013/02/14/my-favorite-community-links/
Use [http://cran.r-project.org/web/packages/rtf/ rtf] package for Rich Text Format (RTF) Output.
* [http://www.xmind.net/m/LKF2/ R for big data] in one picture


== Useful R packages ==
=== [https://www.rdocumentation.org/packages/xtable/versions/1.8-2 xtable] ===
* [https://support.rstudio.com/hc/en-us/articles/201057987-Quick-list-of-useful-R-packages Quick list of useful R packages]
Package xtable will produce html output.  
* [https://github.com/qinwf/awesome-R awesome-R]
{{Pre}}
* [https://stevenmortimer.com/one-r-package-a-day/ One R package a day]
print(xtable(X), type="html")
</pre>


=== RInside ===
If you save the file and then open it with Word, you will get serviceable results. I've had better luck copying the output from xtable and pasting it into Excel.
* http://dirk.eddelbuettel.com/code/rinside.html
* http://dirk.eddelbuettel.com/papers/rfinance2010_rcpp_rinside_tutorial_handout.pdf


==== Ubuntu ====
=== officer ===
With RInside, R can be embedded in a graphical application. For example, $HOME/R/x86_64-pc-linux-gnu-library/3.0/RInside/examples/qt directory includes source code of a Qt application to show a kernel density plot with various options like kernel functions, bandwidth and an R command text box to generate the random data. See my demo on [http://www.youtube.com/watch?v=UQ8yKQcPTg0 Youtube]. I have tested this '''qtdensity''' example successfully using Qt 4.8.5.
<ul>
# Follow the instruction [[#cairoDevice|cairoDevice]] to install required libraries for cairoDevice package and then cairoDevice itself.
<li>[https://cran.r-project.org/web/packages/officer/index.html CRAN]. Microsoft Word, Microsoft Powerpoint and HTML documents generation from R.  
# Install [[Qt|Qt]]. Check 'qmake' command becomes available by typing 'whereis qmake' or 'which qmake' in terminal.
<li>The [https://gist.github.com/arraytools/4f182b036ae7f95a31924ba5d5d3f069 gist] includes a comprehensive example that encompasses various elements such as sections, subsections, and tables. It also incorporates a detailed paragraph, along with visual representations created using base R plots and ggplots.  
# Open Qt Creator from Ubuntu start menu/Launcher. Open the project file $HOME/R/x86_64-pc-linux-gnu-library/3.0/RInside/examples/qt/qtdensity.pro in Qt Creator.
<li>Add a line space
# Under Qt Creator, hit 'Ctrl + R' or the big green triangle button on the lower-left corner to build/run the project. If everything works well, you shall see the ''interactive'' program qtdensity appears on your desktop.
<pre>
[[File:qtdensity.png|100px]].
doc <- body_add_par(doc, "")


With RInside + [http://www.webtoolkit.eu/wt Wt web toolkit] installed, we can also create a web application. To demonstrate the example in ''examples/wt'' directory, we can do
# Function to add n line spaces
body_add_par_n <- function (doc, n) {
  for(i in 1:n){
    doc <- body_add_par(doc, "")
  }
  return(doc)
}
body_add_par_n(3)
</pre>
<li>[https://ardata-fr.github.io/officeverse/officer-for-word.html Figures] from the documentation of '''officeverse'''.
<li>See [https://stackoverflow.com/a/25427314 Data frame to word table?].
<li>See [[Office#Tables|Office]] page for some code.
<li>[https://www.r-bloggers.com/2020/07/how-to-read-and-create-word-documents-in-r/ How to read and create Word Documents in R] where we can extracting tables from Word Documents.
<pre>
<pre>
cd ~/R/x86_64-pc-linux-gnu-library/3.0/RInside/examples/wt
x = read_docx("myfile.docx")
make
content <- docx_summary(x) # a vector
sudo ./wtdensity --docroot . --http-address localhost --http-port 8080
grep("nlme", content$text, ignore.case = T, value = T)
</pre>
</pre>
Then we can go to the browser's address bar and type ''http://localhost:8080'' to see how it works (a screenshot is in [http://dirk.eddelbuettel.com/blog/2011/11/30/ here]).
</ul>


==== Windows 7 ====
== Powerpoint ==
To make RInside works on Windows OS, try the following
<ul>
# Make sure R is installed under '''C:\''' instead of '''C:\Program Files''' if we don't want to get an error like ''g++.exe: error: Files/R/R-3.0.1/library/RInside/include: No such file or directory''.
<li>[https://cran.r-project.org/web/packages/officer/index.html officer] package (formerly ReporteRs). [http://theautomatic.net/2020/07/28/how-to-create-powerpoint-reports-with-r/ How to create powerpoint reports with R]
# Install RTools
</li>
# Instal RInside package from source (the binary version will give an [http://stackoverflow.com/questions/13137770/fatal-error-unable-to-open-the-base-package error ])
<li>[https://davidgohel.github.io/flextable/ flextable] (imports '''officer''')
# Create a DOS batch file containing necessary paths in PATH environment variable
</li>
<li>[https://stackoverflow.com/a/21558466 R data.frame to table image for presentation].
<pre>
<pre>
@echo off
library(gridExtra)
set PATH=C:\Rtools\bin;c:\Rtools\gcc-4.6.3\bin;%PATH%
grid.newpage()
set PATH=C:\R\R-3.0.1\bin\i386;%PATH%
grid.table(mydf)
set PKG_LIBS=`Rscript -e "Rcpp:::LdFlags()"`
</pre>
set PKG_CPPFLAGS=`Rscript -e "Rcpp:::CxxFlags()"`
</li>
set R_HOME=C:\R\R-3.0.1
<li>[https://bookdown.org/yihui/rmarkdown/powerpoint-presentation.html Rmarkdown]
echo Setting environment for using R
</li>
cmd
</ul>
</pre>
 
In the Windows command prompt, run
== PDF manipulation ==
<pre>
[https://github.com/pridiltal/staplr staplr]
cd C:\R\R-3.0.1\library\RInside\examples\standard
make -f Makefile.win
</pre>
Now we can test by running any of executable files that '''make''' generates. For example, ''rinside_sample0''.
<pre>
rinside_sample0
</pre>


As for the Qt application qdensity program, we need to make sure the same version of MinGW was used in building RInside/Rcpp and Qt. See  some discussions in
== R Graphs Gallery ==
* http://stackoverflow.com/questions/12280707/using-rinside-with-qt-in-windows
* [https://www.facebook.com/pages/R-Graph-Gallery/169231589826661 Romain François]
* http://www.mail-archive.com/rcpp-devel@lists.r-forge.r-project.org/msg04377.html
* [http://shinyapps.stat.ubc.ca/r-graph-catalog/ R Graph Catalog] written using R + Shiny. The source code is available on [https://github.com/jennybc/r-graph-catalog Github].
So the Qt and Wt web tool applications on Windows may or may not be possible.
* Forest plot. See the packages [https://cran.r-project.org/web/packages/rmeta/index.html rmeta] and [https://cran.r-project.org/web/packages/forestplot/ forestplot]. The forest plot can be used to plot the quantities like relative risk (with 95% CI) in survival data.
** [http://www.danieldsjoberg.com/bstfun/dev/reference/add_inline_forest_plot.html Inline forest plot]


=== GUI ===
== COM client or server ==
==== Qt and R ====
* http://cran.r-project.org/web/packages/qtbase/index.html [https://stat.ethz.ch/pipermail/r-devel/2015-July/071495.html QtDesigner is such a tool, and its output is compatible with the qtbase R package]
* http://qtinterfaces.r-forge.r-project.org


=== tkrplot ===
=== Client ===
On Ubuntu, we need to install tk packages, such as by
* [http://www.omegahat.org/RDCOMClient/ RDCOMClient] where [http://cran.r-project.org/web/packages/excel.link/index.html excel.link] depends on it.
<pre>
* [https://www.r-bloggers.com/2024/06/how-to-execute-vba-code-in-excel-via-r-using-rdcomclient/ How to Execute VBA Code in Excel via R using RDCOMClient]
sudo apt-get install tk-dev
</pre>


=== Hadoop (eg ~100 terabytes) ===
=== Server ===
See also [http://cran.r-project.org/web/views/HighPerformanceComputing.html HighPerformanceComputing]
[http://www.omegahat.org/RDCOMServer/ RDCOMServer]


* RHadoop
== Use R under proxy ==
* Hive
http://support.rstudio.org/help/kb/faq/configuring-r-to-use-an-http-proxy
* [http://cran.r-project.org/web/packages/mapReduce/ MapReduce]. Introduction by [http://www.linuxjournal.com/content/introduction-mapreduce-hadoop-linux Linux Journal].
* http://www.techspritz.com/category/tutorials/hadoopmapredcue/ Single node or multinode cluster setup using Ubuntu with VirtualBox (Excellent)
* [http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ Running Hadoop on Ubuntu Linux (Single-Node Cluster)]
* Ubuntu 12.04 http://www.youtube.com/watch?v=WN2tJk_oL6E and [https://www.dropbox.com/s/05aurcp42asuktp/Chiu%20Hadoop%20Pig%20Install%20Instructions.docx instruction]
* Linux Mint http://blog.hackedexistence.com/installing-hadoop-single-node-on-linux-mint
* http://www.r-bloggers.com/search/hadoop


==== [https://github.com/RevolutionAnalytics/RHadoop/wiki RHadoop] ====
== RStudio ==
* [http://www.rdatamining.com/tutorials/r-hadoop-setup-guide RDataMining.com] based on Mac.
* [https://github.com/rstudio/rstudio Github]
* Ubuntu 12.04 - [http://crishantha.com/wp/?p=1414 Crishantha.com], [http://nikhilshah123sh.blogspot.com/2014/03/setting-up-rhadoop-in-ubuntu-1204.html nikhilshah123sh.blogspot.com].[http://bighadoop.wordpress.com/2013/02/25/r-and-hadoop-data-analysis-rhadoop/ Bighadoop.wordpress] contains an example.
* Installing RStudio (1.0.44) on Ubuntu will not install Java even the source code contains 37.5% Java??
* RapReduce in R by [https://github.com/RevolutionAnalytics/rmr2/blob/master/docs/tutorial.md RevolutionAnalytics] with a few examples.
* [https://www.rstudio.com/products/rstudio/download/preview/ Preview]
* https://twitter.com/hashtag/rhadoop
 
* [http://bigd8ta.com/step-by-step-guide-to-setting-up-an-r-hadoop-system/ Bigd8ta.com] based on Ubuntu 14.04.
=== rstudio.cloud ===
https://rstudio.cloud/
 
=== Launch RStudio ===
[[Rstudio#Multiple_versions_of_R|Multiple versions of R]]
 
=== Create .Rproj file ===
If you have an existing package that doesn't have an .Rproj file, you can use '''devtools::use_rstudio("path/to/package")''' to add it.
 
With an RStudio project file, you can
* Restore .RData into workspace at startup
* Save workspace to .RData on exit (or '''save.image'''("Robj.RData") & load("Robj.RData"))
* Always save history (even if no saving .RData, '''savehistory'''(".Rhistory") & loadhistory(".Rhistory"))
* etc
 
=== package search ===
https://github.com/RhoInc/CRANsearcher
 
=== Git ===
* (Video) [https://www.rstudio.com/resources/videos/happy-git-and-gihub-for-the-user-tutorial/ Happy Git and Gihub for the useR – Tutorial]
* [https://owi.usgs.gov/blog/beyond-basic-git/ Beyond Basic R - Version Control with Git]


==== Snowdoop: an alternative to MapReduce algorithm ====
== Visual Studio ==
* http://matloff.wordpress.com/2014/11/26/how-about-a-snowdoop-package/
[http://blog.revolutionanalytics.com/2017/05/r-and-python-support-now-built-in-to-visual-studio-2017.html R and Python support now built in to Visual Studio 2017]
* http://matloff.wordpress.com/2014/12/26/snowdooppartools-update/comment-page-1/#comment-665


=== [http://cran.r-project.org/web/packages/XML/index.html XML] ===
== List files using regular expression ==
On Ubuntu, we need to install libxml2-dev before we can install XML package.
* Extension
<pre>
list.files(pattern = "\\.txt$")
</pre>
where the dot (.) is a metacharacter. It is used to refer to any character.
* Start with
<pre>
<pre>
sudo apt-get update
list.files(pattern = "^Something")
sudo apt-get install libxml2-dev
</pre>
</pre>


On CentOS,
Using '''Sys.glob()"' as
<pre>
<pre>
yum -y install libxml2 libxml2-devel
> Sys.glob("~/Downloads/*.txt")
[1] "/home/brb/Downloads/ip.txt"      "/home/brb/Downloads/valgrind.txt"
</pre>
</pre>


==== XML ====
== Hidden tool: rsync in Rtools ==
* http://giventhedata.blogspot.com/2012/06/r-and-web-for-beginners-part-ii-xml-in.html. It gave an example of extracting the XML-values from each XML-tag for all nodes and save them in a data frame using '''xmlSApply()'''.
<pre>
* http://www.quantumforest.com/2011/10/reading-html-pages-in-r-for-text-processing/
c:\Rtools\bin>rsync -avz "/cygdrive/c/users/limingc/Downloads/a.exe" "/cygdrive/c/users/limingc/Documents/"
* https://tonybreyal.wordpress.com/2011/11/18/htmltotext-extracting-text-from-html-via-xpath/
sending incremental file list
* https://www.tutorialspoint.com/r/r_xml_files.htm
a.exe
* https://www.datacamp.com/community/tutorials/r-data-import-tutorial#xml
 
* [http://www.stat.berkeley.edu/~statcur/Workshop2/Presentations/XML.pdf Extracting data from XML] PubMed and Zillow are used to illustrate. xmlTreeParse(), xmlRoot(),  xmlName() and xmlSApply().
sent 323142 bytes  received 31 bytes  646346.00 bytes/sec
* https://yihui.name/en/2010/10/grabbing-tables-in-webpages-using-the-xml-package/
total size is 1198416 speedup is 3.71
<syntaxhighlight lang='rsplus'>
 
library(XML)
c:\Rtools\bin>
</pre>


# Read and parse HTML file
Unforunately, if the destination is a network drive, I could get a permission denied (13) error. See also [https://superuser.com/a/69764 rsync file permissions on windows].
doc.html = htmlTreeParse('http://apiolaza.net/babel.html', useInternal = TRUE)


# Extract all the paragraphs (HTML tag is p, starting at
== Install rgdal package (geospatial Data) on ubuntu ==
# the root of the document). Unlist flattens the list to
Terminal
# create a character vector.
{{Pre}}
doc.text = unlist(xpathApply(doc.html, '//p', xmlValue))
sudo apt-get install libgdal1-dev libproj-dev # https://stackoverflow.com/a/44389304
sudo apt-get install libgdal1i # Ubuntu 16.04 https://stackoverflow.com/a/12143411
</pre>


# Replace all by spaces
R
doc.text = gsub('\n', ' ', doc.text)
{{Pre}}
install.packages("rgdal")
</pre>


# Join all the elements of the character vector into a single
== Install sf package ==
# character string, separated by spaces
I got the following error even I have installed some libraries.
doc.text = paste(doc.text, collapse = ' ')
<pre>
</syntaxhighlight>
checking GDAL version >= 2.0.1... no
configure: error: sf is not compatible with GDAL versions below 2.0.1
</pre>
Then I follow the instruction here
{{Pre}}
sudo apt remove libgdal-dev
sudo apt remove libproj-dev
sudo apt remove gdal-bin
sudo add-apt-repository ppa:ubuntugis/ubuntugis-stable


This post http://stackoverflow.com/questions/25315381/using-xpathsapply-to-scrape-xml-attributes-in-r can be used to monitor new releases from github.com.
sudo apt update
<syntaxhighlight lang='rsplus'>
sudo apt-cache policy libgdal-dev # Make sure a version >= 2.0 appears
> library(RCurl) # getURL()
> library(XML)  # htmlParse and xpathSApply
> xData <- getURL("https://github.com/alexdobin/STAR/releases")
> doc = htmlParse(xData)
> plain.text <- xpathSApply(doc, "//span[@class='css-truncate-target']", xmlValue)
  # I look at the source code and search 2.5.3a and find the tag as
  # <span class="css-truncate-target">2.5.3a</span>
> plain.text
[1] "2.5.3a"      "2.5.2b"      "2.5.2a"      "2.5.1b"      "2.5.1a"   
[6] "2.5.0c"      "2.5.0b"      "STAR_2.5.0a" "STAR_2.4.2a" "STAR_2.4.1d"
>
> # try bwa
> > xData <- getURL("https://github.com/lh3/bwa/releases")
> doc = htmlParse(xData)
> xpathSApply(doc, "//span[@class='css-truncate-target']", xmlValue)
[1] "v0.7.15" "v0.7.13"


> # try picard
sudo apt install libgdal-dev # works on ubuntu 20.04 too
> xData <- getURL("https://github.com/broadinstitute/picard/releases")
                            # no need the previous lines
> doc = htmlParse(xData)
</pre>
> xpathSApply(doc, "//span[@class='css-truncate-target']", xmlValue)
[1] "2.9.1" "2.9.0" "2.8.3" "2.8.2" "2.8.1" "2.8.0" "2.7.2" "2.7.1" "2.7.0"
[10] "2.6.0"
</syntaxhighlight>
This method can be used to monitor new tags/releases from some projects like [https://github.com/Ultimaker/Cura/releases Cura], BWA, Picard, [https://github.com/alexdobin/STAR/releases STAR]. But for some projects like [https://github.com/ncbi/sra-tools sratools] the '''class''' attribute in the '''span''' element ("css-truncate-target") can be different (such as "tag-name").


==== xmlview ====
== Database ==
* http://rud.is/b/2016/01/13/cobble-xpath-interactively-with-the-xmlview-package/
* https://cran.r-project.org/web/views/Databases.html
* [http://blog.revolutionanalytics.com/2017/08/a-modern-database-interface-for-r.html A modern database interface for R]


=== RCurl ===
=== [http://cran.r-project.org/web/packages/RSQLite/index.html RSQLite] ===
On Ubuntu, we need to install the packages (the first one is for XML package that RCurl suggests)
* https://cran.r-project.org/web/packages/RSQLite/vignettes/RSQLite.html
<syntaxhighlight lang='bash'>
* https://github.com/rstats-db/RSQLite
# Test on Ubuntu 14.04
sudo apt-get install libxml2-dev
sudo apt-get install libcurl4-openssl-dev
</syntaxhighlight>


==== Scrape google scholar results ====
'''Creating a new database''':
https://github.com/tonybreyal/Blog-Reference-Functions/blob/master/R/googleScholarXScraper/googleScholarXScraper.R
{{Pre}}
library(DBI)


No google ID is required
mydb <- dbConnect(RSQLite::SQLite(), "my-db.sqlite")
dbDisconnect(mydb)
unlink("my-db.sqlite")


Seems not work
# temporary database
<pre>
mydb <- dbConnect(RSQLite::SQLite(), "")
Error in data.frame(footer = xpathLVApply(doc, xpath.base, "/font/span[@class='gs_fl']",  :
dbDisconnect(mydb)
  arguments imply differing number of rows: 2, 0
</pre>
</pre>


==== [https://cran.r-project.org/web/packages/devtools/index.html devtools] ====
'''Loading data''':
'''devtools''' package depends on Curl.
{{Pre}}
<syntaxhighlight lang='bash'>
mydb <- dbConnect(RSQLite::SQLite(), "")
# Test on Ubuntu 14.04
dbWriteTable(mydb, "mtcars", mtcars)
sudo apt-get install libcurl4-openssl-dev
dbWriteTable(mydb, "iris", iris)
</syntaxhighlight>
 
dbListTables(mydb)


==== [https://github.com/hadley/httr httr] ====
dbListFields(con, "mtcars")
httr imports curl, jsonlite, mime, openssl and R6 packages.


When I tried to install httr package, I got an error and some message:
dbReadTable(con, "mtcars")
<pre>
Configuration failed because openssl was not found. Try installing:
* deb: libssl-dev (Debian, Ubuntu, etc)
* rpm: openssl-devel (Fedora, CentOS, RHEL)
* csw: libssl_dev (Solaris)
* brew: openssl (Mac OSX)
If openssl is already installed, check that 'pkg-config' is in your
PATH and PKG_CONFIG_PATH contains a openssl.pc file. If pkg-config
is unavailable you can set INCLUDE_DIR and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
--------------------------------------------------------------------
ERROR: configuration failed for package ‘openssl’
</pre>
</pre>
It turns out after I run '''sudo apt-get install libssl-dev''' in the terminal (Debian), it would go smoothly with installing httr package. Nice httr!


Real example: see [http://stackoverflow.com/questions/27371372/httr-retrieving-data-with-post this post]. Unfortunately I did not get a table result; I only get an html file (R 3.2.5, httr 1.1.0 on Ubuntu and Debian).
'''Queries''':
{{Pre}}
dbGetQuery(mydb, 'SELECT * FROM mtcars LIMIT 5')


Since httr package was used in many other packages, take a look at how others use it. For example, [https://github.com/ropensci/aRxiv aRxiv] package.
dbGetQuery(mydb, 'SELECT * FROM iris WHERE "Sepal.Length" < 4.6')


==== [http://cran.r-project.org/web/packages/curl/ curl] ====
dbGetQuery(mydb, 'SELECT * FROM iris WHERE "Sepal.Length" < :x', params = list(x = 4.6))
curl is independent of RCurl package.


* http://cran.r-project.org/web/packages/curl/vignettes/intro.html
res <- dbSendQuery(con, "SELECT * FROM mtcars WHERE cyl = 4")
* https://www.opencpu.org/posts/curl-release-0-8/
dbFetch(res)
</pre>


<syntaxhighlight lang='rsplus'>
'''Batched queries''':
library(curl)
{{Pre}}
h <- new_handle()
dbClearResult(rs)
handle_setform(h,
rs <- dbSendQuery(mydb, 'SELECT * FROM mtcars')
   name="aaa", email="bbb"
while (!dbHasCompleted(rs)) {
)
   df <- dbFetch(rs, n = 10)
req <- curl_fetch_memory("http://localhost/d/phpmyql3_scripts/ch02/form2.html", handle = h)
  print(nrow(df))
rawToChar(req$content)
}
</syntaxhighlight>
 
dbClearResult(rs)
</pre>


==== [http://ropensci.org/packages/index.html rOpenSci] packages ====
'''Multiple parameterised queries''':
'''rOpenSci''' contains packages that allow access to data repositories through the R statistical programming environment
{{Pre}}
rs <- dbSendQuery(mydb, 'SELECT * FROM iris WHERE "Sepal.Length" = :x')
dbBind(rs, param = list(x = seq(4, 4.4, by = 0.1)))
nrow(dbFetch(rs))
#> [1] 4
dbClearResult(rs)
</pre>


=== DirichletMultinomial ===
'''Statements''':
On Ubuntu, we do
{{Pre}}
<pre>
dbExecute(mydb, 'DELETE FROM iris WHERE "Sepal.Length" < 4')
sudo apt-get install libgsl0-dev
#> [1] 0
rs <- dbSendStatement(mydb, 'DELETE FROM iris WHERE "Sepal.Length" < :x')
dbBind(rs, param = list(x = 4.5))
dbGetRowsAffected(rs)
#> [1] 4
dbClearResult(rs)
</pre>
</pre>


=== Create GUI ===
=== [https://cran.r-project.org/web/packages/sqldf/ sqldf] ===
==== [http://cran.r-project.org/web/packages/gWidgets/index.html gWidgets] ====
Manipulate R data frames using SQL. Depends on RSQLite. [http://datascienceplus.com/a-use-of-gsub-reshape2-and-sqldf-with-healthcare-data/ A use of gsub, reshape2 and sqldf with healthcare data]


=== [http://cran.r-project.org/web/packages/GenOrd/index.html GenOrd]: Generate ordinal and discrete variables with given correlation matrix and marginal distributions ===
=== [https://cran.r-project.org/web/packages/RPostgreSQL/index.html RPostgreSQL] ===
[http://statistical-research.com/simulating-random-multivariate-correlated-data-categorical-variables/?utm_source=rss&utm_medium=rss&utm_campaign=simulating-random-multivariate-correlated-data-categorical-variables here]


=== [http://cran.r-project.org/web/packages/rjson/index.html rjson] ===
=== [[MySQL#Use_through_R|RMySQL]] ===
http://heuristically.wordpress.com/2013/05/20/geolocate-ip-addresses-in-r/
* http://datascienceplus.com/bringing-the-powers-of-sql-into-r/
* See [[MySQL#Installation|here]] about the installation of the required package ('''libmysqlclient-dev''') in Ubuntu.


=== [http://cran.r-project.org/web/packages/RJSONIO/index.html RJSONIO] ===
=== MongoDB ===
==== Accessing Bitcoin Data with R ====
* http://www.r-bloggers.com/r-and-mongodb/
http://blog.revolutionanalytics.com/2015/11/accessing-bitcoin-data-with-r.html
* http://watson.nci.nih.gov/~sdavis/blog/rmongodb-using-R-with-mongo/


==== Plot IP on google map ====
=== odbc ===
* http://thebiobucket.blogspot.com/2011/12/some-fun-with-googlevis-plotting-blog.html#more  (RCurl, RJONIO, plyr, googleVis)
* http://devblog.icans-gmbh.com/using-the-maxmind-geoip-api-with-r/ (RCurl, RJONIO, maps)
* http://cran.r-project.org/web/packages/geoPlot/index.html (geoPlot package (deprecated as 8/12/2013))
* http://archive09.linux.com/feature/135384  (Not R) ApacheMap
* http://batchgeo.com/features/geolocation-ip-lookup/    (Not R)  (Enter a spreadsheet of adress, city, zip or a column of IPs and it will show the location on google map)
* http://code.google.com/p/apachegeomap/


The following example is modified from the first of above list.
=== RODBC ===
<pre>
require(RJSONIO) # fromJSON
require(RCurl)  # getURL


temp = getURL("https://gist.github.com/arraytools/6743826/raw/23c8b0bc4b8f0d1bfe1c2fad985ca2e091aeb916/ip.txt",
=== DBI ===
                          ssl.verifypeer = FALSE)
ip <- read.table(textConnection(temp), as.is=TRUE)
names(ip) <- "IP"
nr = nrow(ip)
Lon <- as.numeric(rep(NA, nr))
Lat <- Lon
Coords <- data.frame(Lon, Lat)
ip2coordinates <- function(ip) {
  api <- "http://freegeoip.net/json/"
  get.ips <- getURL(paste(api, URLencode(ip), sep=""))
  # result <- ldply(fromJSON(get.ips), data.frame)
  result <- data.frame(fromJSON(get.ips))
  names(result)[1] <- "ip.address"
  return(result)
}


for (i in 1:nr){
=== [https://cran.r-project.org/web/packages/dbplyr/index.html dbplyr] ===
  cat(i, "\n")
* To use databases with dplyr, you need to first install dbplyr
  try(
* https://db.rstudio.com/dplyr/
  Coords[i, 1:2] <- ip2coordinates(ip$IP[i])[c("longitude", "latitude")]
* Five commonly used backends: RMySQL, RPostgreSQ, RSQLite, ODBC, bigrquery.
  )
* http://www.datacarpentry.org/R-ecology-lesson/05-r-and-databases.html
}
# append to log-file:
logfile <- data.frame(ip, Lat = Coords$Lat, Long = Coords$Lon,
                                      LatLong = paste(round(Coords$Lat, 1), round(Coords$Lon, 1), sep = ":"))
log_gmap <- logfile[!is.na(logfile$Lat), ]


require(googleVis) # gvisMap
'''Create a new SQLite database''':
gmap <- gvisMap(log_gmap, "LatLong",
{{Pre}}
                options = list(showTip = TRUE, enableScrollWheel = TRUE,
surveys <- read.csv("data/surveys.csv")
                              mapType = 'hybrid', useMapTypeControl = TRUE,
plots <- read.csv("data/plots.csv")
                              width = 1024, height = 800))
plot(gmap)
</pre>
[[File:GoogleVis.png|200px]]


The plot.gvis() method in googleVis packages also teaches the startDynamicHelp() function in the tools package, which was used to launch a http server. See
my_db_file <- "portal-database.sqlite"
[http://jeffreyhorner.tumblr.com/page/3 Jeffrey Horner's note about deploying Rook App].
my_db <- src_sqlite(my_db_file, create = TRUE)


=== Map ===
copy_to(my_db, surveys)
==== [https://rstudio.github.io/leaflet/ leaflet] ====
copy_to(my_db, plots)
* rstudio.github.io/leaflet/#installation-and-use
my_db
* https://metvurst.wordpress.com/2015/07/24/mapview-basic-interactive-viewing-of-spatial-data-in-r-6/
</pre>


==== choroplethr ====
'''Connect to a database''':
* http://blog.revolutionanalytics.com/2014/01/easy-data-maps-with-r-the-choroplethr-package-.html
{{Pre}}
* http://www.arilamstein.com/blog/2015/06/25/learn-to-map-census-data-in-r/
download.file(url = "https://ndownloader.figshare.com/files/2292171",
* http://www.arilamstein.com/blog/2015/09/10/user-question-how-to-add-a-state-border-to-a-zip-code-map/
              destfile = "portal_mammals.sqlite", mode = "wb")


==== ggplot2 ====
library(dbplyr)
[https://randomjohn.github.io/r-maps-with-census-data/ How to make maps with Census data in R]
library(dplyr)
mammals <- src_sqlite("portal_mammals.sqlite")
</pre>


=== [http://cran.r-project.org/web/packages/googleVis/index.html googleVis] ===
'''Querying the database with the SQL syntax''':
See an example from [[R#RJSONIO|RJSONIO]] above.
{{Pre}}
tbl(mammals, sql("SELECT year, species_id, plot_id FROM surveys"))
</pre>


=== [https://cran.r-project.org/web/packages/googleAuthR/index.html googleAuthR] ===
'''Querying the database with the dplyr syntax''':
Create R functions that interact with OAuth2 Google APIs easily, with auto-refresh and Shiny compatibility.
{{Pre}}
surveys <- tbl(mammals, "surveys")
surveys %>%
    select(year, species_id, plot_id)
head(surveys, n = 10)


=== gtrendsR - Google Trends ===
show_query(head(surveys, n = 10)) # show which SQL commands are actually sent to the database
* [http://blog.revolutionanalytics.com/2015/12/download-and-plot-google-trends-data-with-r.html Download and plot Google Trends data with R]
</pre>
* [https://datascienceplus.com/analyzing-google-trends-data-in-r/ Analyzing Google Trends Data in R]
 
* [https://trends.google.com/trends/explore?date=2004-01-01%202017-09-04&q=microarray%20analysis microarray analysis] from 2004-04-01
'''Simple database queries''':
* [https://trends.google.com/trends/explore?date=2004-01-01%202017-09-04&q=ngs%20next%20generation%20sequencing ngs next generation sequencing] from 2004-04-01
{{Pre}}
* [https://trends.google.com/trends/explore?date=2004-01-01%202017-09-04&q=dna%20sequencing dna sequencing] from 2004-01-01.
surveys %>%
* [https://trends.google.com/trends/explore?date=2004-01-01%202017-09-04&q=rna%20sequencing rna sequencing] from 2004-01-01. It can be seen RNA sequencing >> DNA sequencing.
  filter(weight < 5) %>%
* [http://www.kdnuggets.com/2017/09/python-vs-r-data-science-machine-learning.html?utm_content=buffere1df7&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer Python vs R – Who Is Really Ahead in Data Science, Machine Learning?] and [https://stackoverflow.blog/2017/09/06/incredible-growth-python/ The Incredible Growth of Python] by [https://twitter.com/drob?lang=en David Robinson]
  select(species_id, sex, weight)
</pre>


=== quantmod ===
'''Laziness''' (instruct R to stop being lazy):
[http://www.thertrader.com/2015/12/13/maintaining-a-database-of-price-files-in-r/ Maintaining a database of price files in R]. It consists of 3 steps.
{{Pre}}
data_subset <- surveys %>%
  filter(weight < 5) %>%
  select(species_id, sex, weight) %>%
  collect()
</pre>


# Initial data downloading
'''Complex database queries''':
# Update existing data
{{Pre}}
# Create a batch file
plots <- tbl(mammals, "plots")
plots # # The plot_id column features in the plots table


=== [http://cran.r-project.org/web/packages/Rcpp/index.html Rcpp] ===
surveys # The plot_id column also features in the surveys table


* [http://lists.r-forge.r-project.org/pipermail/rcpp-devel/ Discussion archive]
# Join databases method 1
* (Video) [https://www.rstudio.com/resources/videos/extending-r-with-c-a-brief-introduction-to-rcpp/ Extending R with C++: A Brief Introduction to Rcpp]
plots %>%
* [http://dirk.eddelbuettel.com/blog/2017/06/13/#007_c++14_r_travis C++14, R and Travis -- A useful hack]
  filter(plot_id == 1) %>%
  inner_join(surveys) %>%
  collect()
</pre>


It may be necessary to install dependency packages for RcppEigen.
=== NoSQL ===
<syntaxhighlight lang='rsplus'>
[https://ropensci.org/technotes/2018/01/25/nodbi/ nodbi: the NoSQL Database Connector]
sudo apt-get install libblas-dev liblapack-dev
sudo apt-get install gfortran
</syntaxhighlight>


==== Speed Comparison ====
== Github ==
* [http://blog.revolutionanalytics.com/2015/06/a-comparison-of-high-performance-computing-techniques-in-r.html A comparison of high-performance computing techniques in R]. It compares Rcpp to an R looping operator (like mapply), a parallelized version of a looping operator (like mcmapply), explicit parallelization, via the parallel package or the ParallelR suite.
* In the following example, C++ avoids the overhead of creating an intermediate object (eg vector of the same length as the original vector). The c++ uses an intermediate scalar. So C++ wins R over memory management in this case.
<syntaxhighlight lang='rsplus'>
# http://blog.mckuhn.de/2016/03/avoiding-unnecessary-memory-allocations.html
library(Rcpp)


`%count<%` <- cppFunction('
=== R source  ===
size_t count_less(NumericVector x, NumericVector y) {
https://github.com/wch/r-source/  Daily update, interesting, should be visited every day. Clicking '''1000+ commits''' to look at daily changes.
  const size_t nx = x.size();
  const size_t ny = y.size();
  if (nx > 1 & ny > 1) stop("Only one parameter can be a vector!");
  size_t count = 0;
  if (nx == 1) {
    double c = x[0];
    for (int i = 0; i < ny; i++) count += c < y[i];
  } else {
    double c = y[0];
    for (int i = 0; i < nx; i++) count += x[i] < c;
  }
  return count;
}
')


set.seed(42)
If we are interested in a certain branch (say 3.2), look for R-3-2-branch.


N <- 10^7
=== R packages (only) source (metacran) ===
v <- runif(N, 0, 10000)
* https://github.com/cran/ by [https://github.com/gaborcsardi Gábor Csárdi], the author of '''[http://igraph.org/ igraph]''' software.


# Testing on my ODroid xu4 running ubuntu 15.10
=== Bioconductor packages source ===
system.time(sum(v < 5000))
<strike>[https://stat.ethz.ch/pipermail/bioc-devel/2015-June/007675.html Announcement], https://github.com/Bioconductor-mirror </strike>
#  user  system elapsed
#  1.135  0.305  1.453
system.time(v %count<% 5000)
#  user  system elapsed
#  0.535  0.000  0.540
</syntaxhighlight>
* [https://www.enchufa2.es/archives/boost-the-speed-of-r-calls-from-rcpp.html Boost the speed of R calls from Rcpp]


==== Use Rcpp in RStudio ====
=== Send local repository to Github in R by using reports package ===
RStudio makes it easy to use Rcpp package.
http://www.youtube.com/watch?v=WdOI_-aZV0Y


Open RStudio, click New File -> C++ File. It will create a C++ template on the RStudio editor
=== My collection ===
<pre>
* https://github.com/arraytools
#include <Rcpp.h>
* https://gist.github.com/4383351 heatmap using leukemia data
using namespace Rcpp;
* https://gist.github.com/4382774 heatmap using sequential data
* https://gist.github.com/4484270 biocLite


// Below is a simple example of exporting a C++ function to R. You can
=== How to download ===
// source this function into an R session using the Rcpp::sourceCpp
// function (or via the Source button on the editor toolbar)


// For more on using Rcpp click the Help button on the editor toolbar
Clone ~ Download.
 
* Command line
// [[Rcpp::export]]
int timesTwo(int x) {
  return x * 2;
}
</pre>
Now in R console, type
<pre>
<pre>
library(Rcpp)
git clone https://gist.github.com/4484270.git
sourceCpp("~/Downloads/timesTwo.cpp")
timesTwo(9)
# [1] 18
</pre>
</pre>
See more examples on http://adv-r.had.co.nz/Rcpp.html and [http://blog.revolutionanalytics.com/2017/08/kmeans-r-rcpp.html Calculating a fuzzy kmeans membership matrix]
This will create a subdirectory called '4484270' with all cloned files there.


If we wan to test Boost library, we can try it in RStudio. Consider the following example in [http://stackoverflow.com/questions/19034564/can-the-bh-r-package-link-to-boost-math-and-numeric stackoverflow.com].
* Within R
<pre>
<pre>
// [[Rcpp::depends(BH)]]
library(devtools)
#include <Rcpp.h>
source_gist("4484270")
#include <boost/foreach.hpp>
#include <boost/math/special_functions/gamma.hpp>
 
#define foreach BOOST_FOREACH
 
using namespace boost::math;
 
//[[Rcpp::export]]
Rcpp::NumericVector boost_gamma( Rcpp::NumericVector x ) {
  foreach( double& elem, x ) {
    elem = boost::math::tgamma(elem);
  };
 
  return x;
}
</pre>
</pre>
Then the R console
or
First download the json file from
https://api.github.com/users/MYUSERLOGIN/gists
and then
<pre>
<pre>
boost_gamma(0:10 + 1)
library(RJSONIO)
#  [1]      1      1      2      6      24    120    720    5040   40320
x <- fromJSON("~/Downloads/gists.json")
# [10]  362880 3628800
setwd("~/Downloads/")
gist.id <- lapply(x, "[[", "id")
lapply(gist.id, function(x){
  cmd <- paste0("git clone https://gist.github.com/", x, ".git")
   system(cmd)
})
</pre>


identical( boost_gamma(0:10 + 1), factorial(0:10) )
=== Jekyll ===
# [1] TRUE
[http://statistics.rainandrhino.org/2015/12/15/jekyll-r-blogger-knitr-hyde.html An Easy Start with Jekyll, for R-Bloggers]
 
== Connect R with Arduino ==
* https://zhuhao.org/post/connect-arduino-chips-with-r/
* http://lamages.blogspot.com/2012/10/connecting-real-world-to-r-with-arduino.html
* http://jean-robert.github.io/2012/11/11/thermometer-R-using-Arduino-Java.html
* http://bio7.org/?p=2049
* http://www.rforge.net/Arduino/svn.html
 
== Android App ==
* [https://play.google.com/store/apps/details?id=appinventor.ai_RInstructor.R2&hl=zh_TW R Instructor] $4.84
* [http://realxyapp.blogspot.tw/2010/12/statistical-distribution.html Statistical Distribution] (Not R related app)
* [https://datascienceplus.com/data-driven-introspection-of-my-android-mobile-usage-in-r/ Data-driven Introspection of my Android Mobile usage in R]
 
== Common plots tips ==
=== Create an empty plot ===
'''plot.new()'''   
 
=== Overlay plots ===
[https://finnstats.com/index.php/2021/08/15/how-to-overlay-plots-in-r/ How to Overlay Plots in R-Quick Guide with Example].
<pre>
#Step1:-create scatterplot
plot(x1, y1)
#Step 2:-overlay line plot
lines(x2, y2)
#Step3:-overlay scatterplot
points(x2, y2)
</pre>
</pre>


==== Example 1. convolution example ====
=== Save the par() and restore it ===
First, Rcpp package should be installed (I am working on Linux system). Next we try one example shipped in Rcpp package.
'''Example 1''': Don't use old.par <- par() directly. no.readonly = FALSE by default. * The '''`no.readonly = TRUE`''' argument in the [https://www.rdocumentation.org/packages/graphics/versions/3.6.2/topics/par par()] function in R is used to get the full list of graphical parameters '''that can be restored'''.
 
* When you call `par()` with no arguments or `par(no.readonly = TRUE)`, it returns an invisible named list of all the graphical parameters. This includes both parameters that can be set and those that are read-only.
PS. If R was not available in global environment (such as built by ourselves), we need to modify 'Makefile' file by replacing 'R' command with its complete path (4 places).
* If we use par(old.par) where old.par <- par(), we will get several warning messages like 'In par(op) : graphical parameter "cin" cannot be set'.
<pre>
<pre>
cd ~/R/x86_64-pc-linux-gnu-library/3.0/Rcpp/examples/ConvolveBenchmarks/
old.par <- par(no.readonly = TRUE); par(mar = c(5, 4, 4, 2) - 2)  # OR in one step
make
old.par <- par(mar = c(5, 4, 4, 2) - 2)
R
## do plotting stuff with new settings
par(old.par)
</pre>
</pre>
Then type the following in an R session to see how it works. Note that we don't need to issue '''library(Rcpp)''' in R.
'''Example 2''': Use it inside a function with the [https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/on.exit on.exit(0] function.
<pre>
<pre>
dyn.load("convolve3_cpp.so")
ex <- function() {
x <- .Call("convolve3cpp", 1:3, 4:6)
  old.par <- par(no.readonly = TRUE) # all par settings which
x # 4 13 28 27 18
                                      # could be changed.
  on.exit(par(old.par))
  ## ... do lots of par() settings and plots
  ## ...
  invisible() #-- now, par(old.par) will be executed
}
</pre>
</pre>
 
'''Example 3''': It seems par() inside a function will affect the global environment. But if we use dev.off(), it will reset all parameters.
If we have our own cpp file, we need to use the following way to create dynamic loaded library file. Note that the  character ([http://bash.cyberciti.biz/guide/Command_substitution grave accent]) ` is not (single quote)'. If you mistakenly use ', it won't work.
<pre>
<pre>
export PKG_CXXFLAGS=`Rscript -e "Rcpp:::CxxFlags()"`
ex <- function() { par(mar=c(5,4,4,1)) }
export PKG_LIBS=`Rscript -e "Rcpp:::LdFlags()"`
ex()
R CMD SHLIB xxxx.cpp
par()$mar
</pre>
</pre>
==== Example 2. Use together with inline package ====
* http://adv-r.had.co.nz/C-interface.html#calling-c-functions-from-r
<pre>
<pre>
library(inline)
ex = function() { png("~/Downloads/test.png"); par(mar=c(5,4,4,1)); dev.off()}
src <-'
ex()
Rcpp::NumericVector xa(a);
par()$mar
Rcpp::NumericVector xb(b);
int n_xa = xa.size(), n_xb = xb.size();
 
Rcpp::NumericVector xab(n_xa + n_xb - 1);
for (int i = 0; i < n_xa; i++)
for (int j = 0; j < n_xb; j++)
xab[i + j] += xa[i] * xb[j];
return xab;
'
fun <- cxxfunction(signature(a = "numeric", b = "numeric"),
src, plugin = "Rcpp")
fun(1:3, 1:4)  
# [1]  1  4 10 16 17 12
</pre>
</pre>


==== Example 3. Calling an R function ====
=== Grouped boxplots ===
* [http://r-video-tutorial.blogspot.com/2013/06/box-plot-with-r-tutorial.html Step by step to create a grouped boxplots]
** 'at' parameter in boxplot() to change the equal spaced boxplots
** embed par(mar=) in boxplot()
** mtext(line=) to solve the problem the xlab overlapped with labels.
* [https://stackoverflow.com/questions/28426026/plotting-boxplots-of-multiple-y-variables-using-ggplot2-qplot-or-others ggplot2 approach] (Hint: '''facet_grid''' is used)


==== [http://cran.r-project.org/web/packages/RcppParallel/index.html RcppParallel] ====
=== [https://www.samruston.co.uk/ Weather Time Line] ===
The plot looks similar to a boxplot though it is not. See a [https://www.samruston.co.uk/images/screens/screen_2.png screenshot] on Android by [https://www.samruston.co.uk/ Sam Ruston].


=== [http://cran.r-project.org/web/packages/caret/index.html caret] ===
=== Horizontal bar plot ===
* http://topepo.github.io/caret/index.html & https://github.com/topepo/caret/
{{Pre}}
* https://www.r-project.org/conferences/useR-2013/Tutorials/kuhn/user_caret_2up.pdf
library(ggplot2)
* https://github.com/cran/caret source code mirrored on github
dtf <- data.frame(x = c("ETB", "PMA", "PER", "KON", "TRA",
* Cheatsheet https://www.rstudio.com/resources/cheatsheets/
                        "DDR", "BUM", "MAT", "HED", "EXP"),
                  y = c(.02, .11, -.01, -.03, -.03, .02, .1, -.01, -.02, 0.06))
ggplot(dtf, aes(x, y)) +
  geom_bar(stat = "identity", aes(fill = x), show.legend = FALSE) +
  coord_flip() + xlab("") + ylab("Fold Change") 
</pre>


=== Tool for connecting Excel with R ===
[[:File:Ggplot2bar.svg]]
* https://bert-toolkit.com/
* [http://www.thertrader.com/2016/11/30/bert-a-newcomer-in-the-r-excel-connection/ BERT: a newcomer in the R Excel connection]
* http://blog.revolutionanalytics.com/2018/08/how-to-use-r-with-excel.html


=== Read/Write Excel files package ===
=== Include bar values in a barplot ===
* http://www.milanor.net/blog/?p=779
* https://stats.stackexchange.com/questions/3879/how-to-put-values-over-bars-in-barplot-in-r.
* [https://www.displayr.com/how-to-read-an-excel-file-into-r/?utm_medium=Feed&utm_source=Syndication flipAPI]. One useful feature of DownloadXLSX, which is not supported by the readxl package, is that it can read Excel files directly from the URL.  
* [http://stackoverflow.com/questions/12481430/how-to-display-the-frequency-at-the-top-of-each-factor-in-a-barplot-in-r barplot(), text() and axis()] functions. The data can be from a table() object.
* [http://cran.r-project.org/web/packages/xlsx/index.html xlsx]: depends on Java
* [https://stackoverflow.com/questions/11938293/how-to-label-a-barplot-bar-with-positive-and-negative-bars-with-ggplot2 How to label a barplot bar with positive and negative bars with ggplot2]
* [http://cran.r-project.org/web/packages/openxlsx/index.html openxlsx]: not depend on Java. Depend on zip application. On Windows, it seems to be OK without installing Rtools. But it can not read xls file; it works on xlsx file.
 
** When I try the package to read an xlsx file, I got a warning: No data found on worksheet. 6/28/2018
Use text().  
** [https://fabiomarroni.wordpress.com/2018/08/07/use-r-to-write-multiple-tables-to-a-single-excel-file/ Use R to write multiple tables to a single Excel file]
* [https://github.com/hadley/readxl readxl]: it does not depend on anything although it can only read but not write Excel files.  [https://github.com/rstudio/webinars/tree/master/36-readxl readxl webinar]. One advantage of read_excel (as with read_csv in the readr package) is that the data imports into an easy to print object with three attributes a '''tbl_df''', a '''tbl''' and a '''data.frame.'''
* [https://ropensci.org/blog/technotes/2017/09/08/writexl-release writexl]: zero dependency xlsx writer for R


Tested it on Ubuntu machine with R 3.1.3 using <BRCA.xls> file. Usage:
Or use geom_text() if we are using the ggplot2 package. See an example [http://dsgeek.com/2014/09/19/Customizingggplot2charts.html here] or [https://rpubs.com/escott8908/RGC_Ch3_Gar_Graphs this].
<syntaxhighlight lang='rsplus'>
library(readxl)
read_excel(path, sheet = 1, col_names = TRUE, col_types = NULL, na = "", skip = 0)
</syntaxhighlight>
For the Chromosome column, integer values becomes strings (but converted to double, so 5 becomes 5.000000) or NA (empty on sheets).  
<syntaxhighlight lang='rsplus'>
> head(read_excel("~/Downloads/BRCA.xls", 4)[ , -9], 3)
  UniqueID (Double-click) CloneID UGCluster
1                  HK1A1  21652 Hs.445981
2                  HK1A2  22012 Hs.119177
3                  HK1A4  22293 Hs.501376
                                                    Name Symbol EntrezID
1 Catenin (cadherin-associated protein), alpha 1, 102kDa CTNNA1    1495
2                              ADP-ribosylation factor 3  ARF3      377
3                          Uroporphyrinogen III synthase  UROS    7390
  Chromosome      Cytoband ChimericClusterIDs Filter
1  5.000000        5q31.2              <NA>      1
2  12.000000        12q13              <NA>      1
3      <NA> 10q25.2-q26.3              <NA>      1
</syntaxhighlight>


The hidden worksheets become visible (Not sure what are those first rows mean in the output).
For stacked barplot, see [http://t-redactyl.io/blog/2016/01/creating-plots-in-r-using-ggplot2-part-4-stacked-bar-plots.html this] post.
<syntaxhighlight lang='rsplus'>
> excel_sheets("~/Downloads/BRCA.xls")
DEFINEDNAME: 21 00 00 01 0b 00 00 00 02 00 00 00 00 00 00 0d 3b 01 00 00 00 9a 0c 00 00 1a 00
DEFINEDNAME: 21 00 00 01 0b 00 00 00 04 00 00 00 00 00 00 0d 3b 03 00 00 00 9b 0c 00 00 0a 00
DEFINEDNAME: 21 00 00 01 0b 00 00 00 03 00 00 00 00 00 00 0d 3b 02 00 00 00 9a 0c 00 00 06 00
[1] "Experiment descriptors" "Filtered log ratio"    "Gene identifiers"     
[4] "Gene annotations"      "CollateInfo"            "GeneSubsets"         
[7] "GeneSubsetsTemp"     
</syntaxhighlight>


The Chinese character works too.
=== Grouped barplots ===
<syntaxhighlight lang='rsplus'>
* https://www.r-graph-gallery.com/barplot/, https://www.r-graph-gallery.com/48-grouped-barplot-with-ggplot2/ (simpliest, no error bars)
> read_excel("~/Downloads/testChinese.xlsx", 1)
{{Pre}}
  中文 B C
library(ggplot2)
1     a b c
# mydata <- data.frame(OUTGRP, INGRP, value)
2     1 2 3
ggplot(mydata, aes(fill=INGRP, y=value, x=OUTGRP)) +
</syntaxhighlight>
      geom_bar(position="dodge", stat="identity")
</pre>
* https://datascienceplus.com/building-barplots-with-error-bars/. The error bars define 2 se (95% interval) for the black-and-white version and 1 se (68% interval) for ggplots. Be careful.
{{Pre}}
> 1 - 2*(1-pnorm(1))
[1] 0.6826895
> 1 - 2*(1-pnorm(1.96))
[1] 0.9500042
</pre>
* [http://stackoverflow.com/questions/27466035/adding-values-to-barplot-of-table-in-r two bars in one factor] (stack). The data can be a 2-dim matrix with numerical values.
* [http://stats.stackexchange.com/questions/3879/how-to-put-values-over-bars-in-barplot-in-r two bars in one factor], [https://stats.stackexchange.com/questions/14118/drawing-multiple-barplots-on-a-graph-in-r Drawing multiple barplots on a graph in R] (next to each other)
** [https://datascienceplus.com/building-barplots-with-error-bars/ Include error bars]
* [http://bl.ocks.org/patilv/raw/7360425/ Three variables] barplots
* [https://peltiertech.com/stacked-bar-chart-alternatives/ More alternatives] (not done by R)


To read all worksheets we need a convenient function
=== Unicode symbols ===
<syntaxhighlight lang='rsplus'>
[https://www.r-bloggers.com/2024/09/mind-reader-game-and-unicode-symbols/ Mind reader game, and Unicode symbols]
read_excel_allsheets <- function(filename) {
    sheets <- readxl::excel_sheets(filename)
    sheets <- sheets[-1] # Skip sheet 1
    x <- lapply(sheets, function(X) readxl::read_excel(filename, sheet = X, col_types = "numeric"))
    names(x) <- sheets
    x
}
dcfile <- "table0.77_dC_biospear.xlsx"
dc <- read_excel_allsheets(dcfile)
# Each component (eg dc[[1]]) is a tibble.
</syntaxhighlight>


=== [https://cran.r-project.org/web/packages/readr/ readr] ===
=== Math expression ===
Note: '' '''readr''' package is not designed to read Excel files.''
* [https://www.rdocumentation.org/packages/grDevices/versions/3.5.0/topics/plotmath ?plotmath]
* https://stackoverflow.com/questions/4973898/combining-paste-and-expression-functions-in-plot-labels
* Some cases
** Use [https://www.rdocumentation.org/packages/base/versions/3.6.0/topics/expression expression()] function
** Don't need the backslash; use ''eta'' instead of ''\eta''. ''eta'' will be recognized as a special keyword in expression()
** Use parentheses instead of curly braces; use ''hat(eta)'' instead of ''hat{eta}''
** Summary: use expression(hat(eta)) instead of expression(\hat{\eta})
** [] means subscript, while ^ means superscript. See [https://statisticsglobe.com/add-subscript-and-superscript-to-plot-in-r Add Subscript and Superscript to Plot in R]
** Spacing can be done with ~.
** Mix math symbols and text using paste()
** Using substitute() and paste() if we need to substitute text (this part is advanced)
{{Pre}}
# Expressions
plot(x,y, xlab = expression(hat(x)[t]),
    ylab = expression(phi^{rho + a}),
    main = "Pure Expressions")


Compared to base equivalents like '''read.csv()''', '''readr''' is much faster and gives more convenient output: it never converts strings to factors, can parse date/times, and it doesn’t munge the column names.
# Superscript
plot(1:10, main = expression("My Title"^2))
# Subscript
plot(1:10, main = expression("My Title"[2])) 


[https://blog.rstudio.org/2016/08/05/readr-1-0-0/ 1.0.0] released.
# Expressions with Spacing
# '~' is to add space and '*' is to squish characters together
plot(1:10, xlab= expression(Delta * 'C'))
plot(x,y, xlab = expression(hat(x)[t] ~ z ~ w),
    ylab = expression(phi^{rho + a} * z * w),
    main = "Pure Expressions with Spacing")


The '''read_csv()''' function from the '''readr''' package is as fast as '''fread()''' function from '''data.table''' package. ''For files beyond 100MB in size fread() and read_csv() can be expected to be around 5 times faster than read.csv().'' See 5.3 of Efficient R Programming book.
# Expressions with Text
plot(x,y,
    xlab = expression(paste("Text here ", hat(x), " here ", z^rho, " and here")),
    ylab = expression(paste("Here is some text of ", phi^{rho})),
    main = "Expressions with Text")


Note that '''fread()''' can read-n a selection of the columns.
# Substituting Expressions
plot(x,y,
    xlab = substitute(paste("Here is ", pi, " = ", p), list(p = py)),
    ylab = substitute(paste("e is = ", e ), list(e = ee)),
    main = "Substituted Expressions")
</pre>


=== [http://cran.r-project.org/web/packages/ggplot2/index.html ggplot2] ===
=== Impose a line to a scatter plot ===
Books
* abline + lsfit # least squares
* [http://r4ds.had.co.nz/graphics-for-communication.html R for Data Science] Chapter 28 Graphics for communication
{{Pre}}
* [http://www.cookbook-r.com/Graphs/ R Graphics Cookbook] by Winston Chang. Lots of recipes. For example, the [http://www.cookbook-r.com/Graphs/Axes_(ggplot2)/ Axes] chapter talks how to set/hide tick marks.
plot(cars)
* [https://leanpub.com/hitchhikers_ggplot2 The Hitchhiker's Guide to Ggplot2 in R]
abline(lsfit(cars[, 1], cars[, 2]))
* [http://ggplot2.org/book/ ggplot2 book] and its [https://github.com/hadley/ggplot2-book source code]. Before I build the (pdf version) of the book, I need to follow [https://github.com/hadley/ggplot2-book/issues/118 this suggestion] by running the following in R before calling '''make'''.
# OR
* [http://blog.revolutionanalytics.com/2017/09/data-visualization-for-social-science.html Data Visualization for Social Science]
abline(lm(cars[,2] ~ cars[,1]))
</pre>
* abline + line # robust line fitting
{{Pre}}
plot(cars)
(z <- line(cars))
abline(coef(z), col = 'green')
</pre>
* lines
{{Pre}}
plot(cars)
fit <- lm(cars[,2] ~ cars[,1])
lines(cars[,1], fitted(fit), col="blue")
lines(stats::lowess(cars), col='red')
</pre>


<pre>
=== How to actually make a quality scatterplot in R: axis(), mtext() ===
devtools::install_github("hadley/oldbookdown")
[https://www.r-bloggers.com/2021/08/how-to-actually-make-a-quality-scatterplot-in-r/ How to actually make a quality scatterplot in R]
</pre>
* [https://www.packtpub.com/big-data-and-business-intelligence/r-graph-essentials R Graph Essentials Essentials] by David Lillis. Chapters 3 and 4.


Some examples:
=== 3D scatterplot ===
* [http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Jitter%20Plot Top 50 ggplot2 Visualizations] - The Master List
* [http://sthda.com/english/wiki/scatterplot3d-3d-graphics-r-software-and-data-visualization Scatterplot3d: 3D graphics - R software and data visualization]. [https://stackoverflow.com/a/24510286 how to add legend to scatterplot3d in R] and consider '''xpd=TRUE'''.
* http://blog.diegovalle.net/2015/01/the-74-most-violent-cities-in-mexico.html
* [[R_web#plotly|R web > plotly]]
* [http://shiny.stat.ubc.ca/r-graph-catalog/ R Graph Catalog]


Introduction
=== Rotating x axis labels for barplot ===
* https://www.youtube.com/watch?v=SaJCKpYX5Lo&t=2742
https://stackoverflow.com/questions/10286473/rotating-x-axis-labels-in-r-for-barplot
{{Pre}}
barplot(mytable,main="Car makes",ylab="Freqency",xlab="make",las=2)
</pre>


Cheat sheet
=== Set R plots x axis to show at y=0 ===
* https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf
https://stackoverflow.com/questions/3422203/set-r-plots-x-axis-to-show-at-y-0
{{Pre}}
plot(1:10, rnorm(10), ylim=c(0,10), yaxs="i")
</pre>


==== Examples from 'R for Data Science' book - Aesthetic mappings ====
=== Different colors of axis labels in barplot ===
<syntaxhighlight lang='rsplus'>
See [https://stackoverflow.com/questions/18839731/vary-colors-of-axis-labels-in-r-based-on-another-variable Vary colors of axis labels in R based on another variable]
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy))


# template
Method 1: Append labels for the 2nd, 3rd, ... color gradually because 'col.axis' argument cannot accept more than one color.
ggplot(data = <DATA>) +
{{Pre}}
  <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))
tN <- table(Ni <- stats::rpois(100, lambda = 5))
r <- barplot(tN, col = rainbow(20))
axis(1, 1, LETTERS[1], col.axis="red", col="red")
axis(1, 2, LETTERS[2], col.axis="blue", col = "blue")
</pre>


# add another variable through color, size, alpha or shape
Method 2: text() which can accept multiple colors in 'col' parameter but we need to find out the (x, y) by ourselves.
ggplot(data = mpg) +
{{Pre}}
  geom_point(mapping = aes(x = displ, y = hwy, color = class))
barplot(tN, col = rainbow(20), axisnames = F)
 
text(4:6, par("usr")[3]-2 , LETTERS[4:6], col=c("black","red","blue"), xpd=TRUE)
ggplot(data = mpg) +
</pre>
  geom_point(mapping = aes(x = displ, y = hwy, size = class))


ggplot(data = mpg) +  
=== Use text() to draw labels on X/Y-axis including rotation ===
  geom_point(mapping = aes(x = displ, y = hwy, alpha = class))
* adj = 1 means top/right alignment.  For left-bottom alignment, set adj = 0. The default is to center the text. [[https://www.rdocumentation.org/packages/graphics/versions/3.4.3/topics/text ?text]
* [https://www.rdocumentation.org/packages/graphics/versions/3.4.3/topics/par par("usr")] gives the extremes of the user coordinates of the plotting region of the form c(x1, x2, y1, y2).
** par("usr") is determined *after* a plot has been created
** [http://sphaerula.com/legacy/R/placingTextInPlots.html Example of using the "usr" parameter]
* https://datascienceplus.com/building-barplots-with-error-bars/
{{Pre}}
par(mar = c(5, 6, 4, 5) + 0.1)
plot(..., xaxt = "n") # "n" suppresses plotting of the axis; need mtext() and axis() to supplement
text(x = barCenters, y = par("usr")[3] - 1, srt = 45,
    adj = 1, labels = myData$names, xpd = TRUE)
</pre>
* https://www.r-bloggers.com/rotated-axis-labels-in-r-plots/
 
=== Vertically stacked plots with the same x axis ===
https://stackoverflow.com/questions/11794436/stacking-multiple-plots-vertically-with-the-same-x-axis-but-different-y-axes-in
 
=== Include labels on the top axis/margin: axis() and mtext() ===
<pre>
plot(1:4, rnorm(4), axes = FALSE)
axis(3, at=1:4, labels = LETTERS[1:4], tick = FALSE, line = -0.5) # las, cex.axis
box()
mtext("Groups selected", cex = 0.8, line = 1.5) # default side = 3
</pre>
See also [[#15_Questions_All_R_Users_Have_About_Plots| 15_Questions_All_R_Users_Have_About_Plots]]


ggplot(data = mpg) +
This can be used to annotate each plot with the script name, date, ...
  geom_point(mapping = aes(x = displ, y = hwy, shape = class))
<pre>
mtext(text=paste("Prepared on", format(Sys.time(), "%d %B %Y at %H:%M")),
      adj=.99,  # text align to right
      cex=.75, side=3, las=1, line=2)
</pre>


ggplot(data = mpg) +
ggplot2 uses '''breaks''' instead of '''at''' parameter. See [[Ggplot2#Add_axis_on_top_or_right_hand_side|ggplot2 &rarr; Add axis on top or right hand side]], [[Ggplot2#ggplot2::scale_-_axes.2Faxis.2C_legend|ggplot2 &rarr; scale_x_continus(name, breaks, labels)]] and the [https://ggplot2.tidyverse.org/reference/scale_continuous.html scale_continuous documentation].
  geom_point(mapping = aes(x = displ, y = hwy), color = "blue")


# add another variable through facets
=== Legend tips ===
ggplot(data = mpg) +
[https://r-coder.com/add-legend-r/ Add legend to a plot in R]
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_wrap(~ class, nrow = 2)


# add another 2 variables through facets
[https://stackoverflow.com/a/36842578 Increase/decrease legend font size] '''cex''' & [[Ggplot2#Legend_size|ggplot2]] package case.
ggplot(data = mpg) +
{{Pre}}
  geom_point(mapping = aes(x = displ, y = hwy)) +
plot(rnorm(100))
  facet_grid(drv ~ cyl)
# op <- par(cex=2)
</syntaxhighlight>
legend("topleft", legend = 1:4, col=1:4, pch=1, lwd=2, lty = 1, cex =2)
# par(op)
</pre>


==== Examples from 'R for Data Science' book - Geometric objects ====
'''legend inset'''. Default is 0. % (from 0 to 1) to draw the legend away from x and y axis. The inset argument with [https://stackoverflow.com/a/10528078 negative values moves the legend outside the plot].
<pre>
legend("bottomright", inset=.05, )
</pre>


<syntaxhighlight lang='rsplus'>
'''legend without a box'''
# Points
<pre>
ggplot(data = mpg) +
legend(, bty = "n")
  geom_point(mapping = aes(x = displ, y = hwy))
</pre>


# Smoothed
'''Add a legend title'''
ggplot(data = mpg) +
<pre>
  geom_smooth(mapping = aes(x = displ, y = hwy))
legend(, title = "")
</pre>


# Points + smoother
[https://stackoverflow.com/a/60971923 Add a common legend to multiple plots]. Use the layout function.
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy)) +
  geom_smooth(mapping = aes(x = displ, y = hwy))


# Colored points + smoother
=== Superimpose a density plot or any curves ===
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
Use '''lines()'''.
  geom_point(mapping = aes(color = class)) +
  geom_smooth()
</syntaxhighlight>


==== Examples from 'R for Data Science' book - Transformation ====
Example 1
<syntaxhighlight lang='rsplus'>
{{Pre}}
# y axis = counts
plot(cars, main = "Stopping Distance versus Speed")
# bar plot
lines(stats::lowess(cars))
ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = cut))
# Or
ggplot(data = diamonds) +
  stat_count(mapping = aes(x = cut))


# y axis = proportion
plot(density(x), col = "#6F69AC", lwd = 3)
ggplot(data = diamonds) +
lines(density(y), col = "#95DAC1", lwd = 3)
  geom_bar(mapping = aes(x = cut, y = ..prop.., group = 1))
lines(density(z), col = "#FFEBA1", lwd = 3)
</pre>


# bar plot with 2 variables
Example 2
ggplot(data = diamonds) +
{{Pre}}
   geom_bar(mapping = aes(x = cut, fill = clarity))
require(survival)
</syntaxhighlight>
n = 10000
beta1 = 2; beta2 = -1
lambdaT = 1 # baseline hazard
lambdaC = 2 # hazard of censoring
set.seed(1234)
x1 = rnorm(n,0)
x2 = rnorm(n,0)
# true event time
T = rweibull(n, shape=1, scale=lambdaT*exp(-beta1*x1-beta2*x2))
C <- rweibull(n, shape=1, scale=lambdaC)    
time = pmin(T,C) 
status <- 1*(T <= C)
status2 <- 1-status
plot(survfit(Surv(time, status2) ~ 1),
    ylab="Survival probability",
    main = 'Exponential censoring time')
xseq <- seq(.1, max(time), length =100)
func <- function(x) 1-pweibull(x, shape = 1, scale = lambdaC)
lines(xseq, func(xseq), col = 'red') # survival function of Weibull
</pre>


==== [https://github.com/cttobin/ggthemr ggthemr]: Themes for ggplot2 ====
Example 3. Use ggplot(df, aes(x = x, color = factor(grp))) + geom_density(). Then each density curve will represent data from each "grp".
* http://www.shanelynn.ie/themes-and-colours-for-r-ggplots-with-ggthemr/


==== ggedit & ggplotgui – interactive ggplot aesthetic and theme editor ====
=== log scale ===
* https://www.r-statistics.com/2016/11/ggedit-interactive-ggplot-aesthetic-and-theme-editor/
If we set y-axis to use log-scale, then what we display is the value log(Y) or log10(Y) though we still label the values using the input. For example, when we plot c(1, 10, 100) using the log scale, it is like we draw log10(c(1, 10, 100)) = c(0,1,2) on the plot but label the axis using the true values c(1, 10, 100).
* https://github.com/gertstulp/ggplotgui/. It allows to change text (axis, title, font size), themes, legend, et al. A docker website was set up for the online version.


==== ggconf: Simpler Appearance Modification of 'ggplot2' ====
[[:File:Logscale.png]]
https://github.com/caprice-j/ggconf


==== Plotting individual observations and group means ====
=== Custom scales ===
https://drsimonj.svbtle.com/plotting-individual-observations-and-group-means-with-ggplot2
[https://rcrastinate.rbind.io/post/using-custom-scales-with-the-scales-package/ Using custom scales with the 'scales' package]


==== Colors ====
== Time series ==
* [http://novyden.blogspot.com/2013/09/how-to-expand-color-palette-with-ggplot.html How to expand color palette with ggplot and RColorBrewer]
* [https://www.amazon.com/Applied-Time-Analysis-R-Second/dp/1498734227 Applied Time Series Analysis with R]
* palette_explorer() function from the [https://cran.r-project.org/web/packages/tmaptools/index.html tmaptools] package. See [https://www.computerworld.com/article/3184778/data-analytics/6-useful-r-functions-you-might-not-know.html selecting color palettes with shiny].
* [http://www.springer.com/us/book/9780387759586 Time Series Analysis With Applications in R]
* [http://www.ucl.ac.uk/~zctpep9/Archived%20webpages/Cookbook%20for%20R%20%C2%BB%20Colors%20(ggplot2).htm Cookbook for R]
* [http://ggplot2.tidyverse.org/reference/scale_brewer.html Sequential, diverging and qualitative colour scales/palettes from colorbrewer.org]: scale_colour_brewer(), scale_fill_brewer(), ...
* http://colorbrewer2.org/
* It seems there is no choice of getting only 2 colors no matter which set name we can use
* To see the set names used in brewer.pal, see
** RColorBrewer::display.brewer.all()
** For example, [http://colorbrewer2.org/#type=qualitative&scheme=Set1&n=4 Set1] from http://colorbrewer2.org/
* To list all R color names, colors()
* [https://stackoverflow.com/questions/28461326/convert-hex-color-code-to-color-name convert hex value to color names] <syntaxhighlight lang='rsplus'>
library(plotrix)
sapply(rainbow(4), color.id)
sapply(RColorBrewer::brewer.pal(4, "Set1"), color.id)
</syntaxhighlight>


Below is an example using the option ''scale_fill_brewer''(palette = "[http://colorbrewer2.org/#type=qualitative&scheme=Paired&n=9 Paired]"). See the source code at [https://gist.github.com/JohannesFriedrich/c7d80b4e47b3331681cab8e9e7a46e17 gist]. Note that only 'set1' and 'set3' palettes in '''qualitative scheme''' can support up to 12 classes.  
=== Time series stock price plot ===
* http://blog.revolutionanalytics.com/2015/08/plotting-time-series-in-r.html (ggplot2, xts, [https://rstudio.github.io/dygraphs/ dygraphs])
* [https://datascienceplus.com/visualize-your-portfolios-performance-and-generate-a-nice-report-with-r/ Visualize your Portfolio’s Performance and Generate a Nice Report with R]
* https://timelyportfolio.github.io/rCharts_time_series/history.html


According to the information from the colorbrew website, '''qualitative''' schemes do not imply magnitude differences between legend classes, and hues are used to create the primary visual differences between classes.
{{Pre}}
library(quantmod)
getSymbols("AAPL")
getSymbols("IBM") # similar to AAPL
getSymbols("CSCO") # much smaller than AAPL, IBM
getSymbols("DJI") # Dow Jones, huge
chart_Series(Cl(AAPL), TA="add_TA(Cl(IBM), col='blue', on=1); add_TA(Cl(CSCO), col = 'green', on=1)",
    col='orange', subset = '2017::2017-08')
 
tail(Cl(DJI))
</pre>


[[File:GgplotPalette.svg|300px]]
=== tidyquant: Getting stock data ===
[http://varianceexplained.org/r/stock-changes/ The 'largest stock profit or loss' puzzle: efficient computation in R]


==== subplot ====
=== Timeline plot ===
https://ikashnitsky.github.io/2017/subplots-in-maps/
* https://stackoverflow.com/questions/20695311/chronological-timeline-with-points-in-time-and-format-date
* [https://github.com/shosaco/vistime vistime] - Pretty Timelines in R


==== Easy way to mix multiple graphs on the same page ====
=== Clockify ===
* http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_(ggplot2)/
[https://datawookie.dev/blog/2021/09/clockify-time-tracking-from-r/ Clockify]
* [http://www.sthda.com/english/wiki/ggplot2-easy-way-to-mix-multiple-graphs-on-the-same-page Easy Way to Mix Multiple Graphs on The Same Page]. Four packages are included: '''ggpubr, cowplot, gridExtra''' and '''grid'''.
* [https://cran.rstudio.com/web/packages/egg/ egg]: Extensions for 'ggplot2', to Align Plots, Plot insets, and Set Panel Sizes.
* [http://www.sharpsightlabs.com/blog/master-small-multiple/ Why you should master small multiple chart]
* [https://cran.r-project.org/web/packages/gridExtra/index.html gridExtra]
** [https://datascienceplus.com/machine-learning-results-one-plot-to-rule-them-all/ Machine Learning Results in R: one plot to rule them all!]


==== x and y labels ====
== Circular plot ==
https://stackoverflow.com/questions/10438752/adding-x-and-y-axis-labels-in-ggplot2 or the '''Labels''' part of the [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf cheatsheet]
* http://freakonometrics.hypotheses.org/20667 which uses [https://cran.r-project.org/web/packages/circlize/ circlize] package; see also the '''ComplexHeatmap''' package.
* https://www.biostars.org/p/17728/
* [https://cran.r-project.org/web/packages/RCircos/ RCircos] package from CRAN.
* [http://www.bioconductor.org/packages/release/bioc/html/OmicCircos.html OmicCircos] from Bioconductor.


You can set the labels with xlab() and ylab(), or make it part of the scale_*.* call.
== Word cloud ==
* [http://www.sthda.com/english/wiki/text-mining-and-word-cloud-fundamentals-in-r-5-simple-steps-you-should-know Text mining and word cloud fundamentals in R : 5 simple steps you should know]
* [https://www.displayr.com/alternatives-word-cloud/ 7 Alternatives to Word Clouds for Visualizing Long Lists of Data]
* [https://www.littlemissdata.com/blog/steam-data-art1 Data + Art STEAM Project: Initial Results]
* [https://github.com/lepennec/ggwordcloud?s=09 ggwordcloud]


<pre>
== Text mining ==
labs(x = "sample size", y = "ngenes (glmnet)")
* [https://cran.r-project.org/web/packages/tm/index.html tm] package. It was used by [https://github.com/jtleek/swfdr/blob/master/getPvalues.R R code] of [https://doi.org/10.1093/biostatistics/kxt007 An estimate of the science-wise false discovery rate and application to the top medical literature].
</pre>


==== ylim and xlim in ggplot2 ====
== World map ==
https://stackoverflow.com/questions/3606697/how-to-set-limits-for-axes-in-ggplot2-r-plots or the '''Zooming''' part of the [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf cheatsheet]
[https://www.enchufa2.es/archives/visualising-ssh-attacks-with-r.html Visualising SSH attacks with R] ([https://cran.r-project.org/package=rworldmap rworldmap] and [https://cran.r-project.org/package=rgeolocate rgeolocate] packages)


Use one of the following
== Diagram/flowchart/Directed acyclic diagrams (DAGs) ==
* + scale_x_continuous(limits = c(-5000, 5000))
* [https://finnstats.com/index.php/2021/06/29/transition-plot-in-r-change-in-time-visualization/ Transition plot in R-change in time visualization]
* + coord_cartesian(xlim = c(-5000, 5000))
* + xlim(-5000, 5000)


==== Center title ====
=== [https://cran.r-project.org/web/packages/DiagrammeR/index.html DiagrammeR] ===
See the '''Legends''' part of the [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf cheatsheet].
* [https://blog.rstudio.com/2015/05/01/rstudio-v0-99-preview-graphviz-and-diagrammer/ Graphviz and DiagrammeR]
<pre>
* http://rich-iannone.github.io/DiagrammeR/,
ggtitle("MY TITLE") +
** [http://rich-iannone.github.io/DiagrammeR/io.html#r-markdown rmarkdown]
  theme(plot.title = element_text(hjust = 0.5))
** [http://rich-iannone.github.io/DiagrammeR/graphviz_and_mermaid.html graphviz and mermaid] doc and examples
</pre>
* https://donlelek.github.io/2015-03-31-dags-with-r/
* [https://mikeyharper.uk/flowcharts-in-r-using-diagrammer/ Data-driven flowcharts in R using DiagrammeR]


==== Time series plot ====
=== [https://cran.r-project.org/web/packages/diagram/ diagram] ===
* [http://sharpsightlabs.com/blog/line-chart-ggplot2-amzn/ How to make a line chart with ggplot2]
Functions for Visualising Simple Graphs (Networks), Plotting Flow Diagrams
* [http://ggplot2.tidyverse.org/reference/scale_brewer.html#palettes Colour palettes]. Note some palette options like ''Accent'' from the Qualitative category will give a warning message In RColorBrewer::brewer.pal(n, pal) :  n too large, allowed maximum for palette Accent is 8.


Multiple lines plot https://stackoverflow.com/questions/14860078/plot-multiple-lines-data-series-each-with-unique-color-in-r
=== DAGitty (browser-based and R package) ===
<syntaxhighlight lang='rsplus'>
* http://dagitty.net/
set.seed(45)
* https://cran.r-project.org/web/packages/dagitty/index.html
nc <- 9
df <- data.frame(x=rep(1:5, nc), val=sample(1:100, 5*nc),
                  variable=rep(paste0("category", 1:nc), each=5))
# plot
# http://colorbrewer2.org/#type=qualitative&scheme=Paired&n=9
ggplot(data = df, aes(x=x, y=val)) +
    geom_line(aes(colour=variable)) +
    scale_colour_manual(values=c("#a6cee3", "#1f78b4", "#b2df8a", "#33a02c", "#fb9a99", "#e31a1c", "#fdbf6f", "#ff7f00", "#cab2d6"))
</syntaxhighlight>
Versus old fashion
<syntaxhighlight lang='rsplus'>
dat <- matrix(runif(40,1,20),ncol=4) # make data
matplot(dat, type = c("b"),pch=1,col = 1:4) #plot
legend("topleft", legend = 1:4, col=1:4, pch=1) # optional legend
</syntaxhighlight>


==== Github style calendar plot ====
=== dagR ===
* https://mvuorre.github.io/post/2016/2016-03-24-github-waffle-plot/
* https://cran.r-project.org/web/packages/dagR
* https://gist.github.com/marcusvolz/84d69befef8b912a3781478836db9a75 from [https://github.com/marcusvolz/strava Create artistic visualisations with your exercise data]


==== geom_errorbar(): error bars ====
=== Gmisc ===
* Can ggplot2 do this? https://www.nature.com/articles/nature25173/figures/1
[http://gforge.se/2020/08/easy-flowchart/ Easiest flowcharts eveR?]
* [https://stackoverflow.com/questions/14069629/plotting-confidence-intervals plotCI() from the plotrix package or geom_errorbar() from ggplot2 package]
* http://sape.inf.usi.ch/quick-reference/ggplot2/geom_errorbar
* [http://ggplot2.tidyverse.org/reference/geom_linerange.html Vertical error bars]
* [http://ggplot2.tidyverse.org/reference/geom_errorbarh.html Horizontal error bars]
* [http://timelyportfolio.blogspot.com/2012/08/horizon-on-ggplot2.html Horizontal panel plot] example and [http://timelyportfolio.blogspot.com/2012/08/plotxts-with-moving-average-panel.html more]
* [https://stackoverflow.com/questions/13032777/scatter-plot-with-error-bars R does not draw error bars out of the box]. R has arrows() to create the error bars. Using just arrows(x0, y0, x1, y1, code=3, angle=90, length=.05, col). See
** [https://datascienceplus.com/building-barplots-with-error-bars/ Building Barplots with Error Bars]. Note that the segments() statement is not necessary.
** https://www.rdocumentation.org/packages/graphics/versions/3.4.3/topics/arrows
* Toy example (see this [https://www.nature.com/articles/nature25173/figures/1 nature paper])
<syntaxhighlight lang='rsplus'>
set.seed(301)
x <- rnorm(10)
SE <- rnorm(10)
y <- 1:10


par(mfrow=c(2,1))
=== Concept Maps ===
par(mar=c(0,4,4,4))
[https://github.com/rstudio/concept-maps/ concept-maps] where the diagrams are generated from https://app.diagrams.net/.
xlim <- c(-4, 4)
plot(x[1:5], 1:5, xlim=xlim, ylim=c(0+.1,6-.1), yaxs="i", xaxt = "n", ylab = "", pch = 16, las=1)
mtext("group 1", 4, las = 1, adj = 0, line = 1) # las=text rotation, adj=alignment, line=spacing
par(mar=c(5,4,0,4))
plot(x[6:10], 6:10, xlim=xlim, ylim=c(5+.1,11-.1), yaxs="i", ylab ="", pch = 16, las=1, xlab="")
arrows(x[6:10]-SE[6:10], 6:10, x[6:10]+SE[6:10], 6:10, code=3, angle=90, length=0)
mtext("group 2", 4, las = 1, adj = 0, line = 1)
</syntaxhighlight>


[[File:Stklnpt.svg|350px]]
=== flow ===
[https://cran.r-project.org/web/packages/flow/ flow], [https://predictivehacks.com/?all-tips=how-to-draw-flow-diagrams-in-r How To Draw Flow Diagrams In R]


==== text labels on scatterplots: ggrepel package ====
== Venn Diagram ==
[https://cran.r-project.org/web/packages/ggrepel/vignettes/ggrepel.html ggrepel] package. Found on [https://simplystatistics.org/2018/01/22/the-dslabs-package-provides-datasets-for-teaching-data-science/ Some datasets for teaching data science] by Rafael Irizarry.
[[Venn_diagram|Venn diagram]]


==== graphics::smoothScatter ====
== hexbin plot ==
[https://www.inwt-statistics.com/read-blog/smoothscatter-with-ggplot2-513.html smoothScatter with ggplot2]
* [https://datasciencetut.com/how-to-create-a-hexbin-chart-in-r/ How to create a hexbin chart in R]
* [https://cran.r-project.org/web/packages/hextri/index.html hextri]: Hexbin Plots with Triangles. See an example on this https://www.pnas.org/content/117/48/30266#F4 paper] about the postpi method.


=== Data Manipulation & Tidyverse ===
== Bump chart/Metro map ==
* [https://www.rstudio.com/resources/webinars/pipelines-for-data-analysis-in-r/ Pipelines for data analysis in R], [https://www.rstudio.com/resources/videos/data-science-in-the-tidyverse/ Data Science in the Tidyverse]
https://dominikkoch.github.io/Bump-Chart/
<pre>
  Import
    |
    | readr, readxl
    | haven, DBI, httr  +----- Visualize ------+
    |                    |    ggplot2, ggvis    |
    |                    |                      |
  Tidy ------------- Transform
  tibble              dplyr                  Model
  tidyr                  |                    broom
                          +------ Model ---------+
</pre>
* [http://r4ds.had.co.nz/ R for Data Science] and [http://tidyverse.org/ tidyverse] package (it is a collection of '''ggplot2, tibble, tidyr, readr, purrr''' & '''dplyr''' packages).
** tidyverse, among others, was used at [http://juliasilge.com/blog/Mining-CRAN-DESCRIPTION/ Mining CRAN DESCRIPTION Files] (tbl_df(), %>%, summarise(), count(), mutate(), arrange(), unite(), ggplot(), filter(), select(), ...). Note that there is a problem to reproduce the result. I need to run ''cran <- cran[, -14]'' to remove the MD5sum column.
** [http://brettklamer.com/diversions/statistical/compile-r-for-data-science-to-a-pdf/ Compile R for Data Science to a PDF]
* [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling with dplyr and tidyr Cheat Sheet]
* [https://hbctraining.github.io/Intro-to-R/lessons/07_intro_tidyverse.html Data Wrangling with Tidyverse] from the Harvard Chan School of Public Health.
* [http://datascienceplus.com/best-packages-for-data-manipulation-in-r/ Best packages for data manipulation in R]. It demonstrates to perform the same tasks using '''data.table''' and '''dplyr''' packages. '''data.table''' is faster and it may be a go-to package when performance and memory are the constraints.


==== [http://rpubs.com/danmirman/Rgroup-part1 5 most useful data manipulation functions] ====
== Amazing/special plots ==
* subset() for making subsets of data (natch)
See [[Amazing_plot|Amazing plot]].
* merge() for combining data sets in a smart and easy way
* melt()-reshape2 package for converting from wide to long data formats
* dcast()-reshape2 package for converting from long to wide data formats (or just use [https://datascienceplus.com/building-barplots-with-error-bars/ tapply()]), and for making summary tables
* ddply()-plyr package for doing split-apply-combine operations, which covers a huge swath of the most tricky data operations


==== [https://cran.r-project.org/web/packages/data.table/index.html data.table] ====
== Google Analytics ==
Fast aggregation of large data (e.g. 100GB in RAM or just several GB size file), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns and a fast file reader (fread).
=== GAR package ===
http://www.analyticsforfun.com/2015/10/query-your-google-analytics-data-with.html


Question: how to make use multicore with data.table package?
== Linear Programming ==
http://www.r-bloggers.com/modeling-and-solving-linear-programming-with-r-free-book/


* [https://github.com/rstudio/cheatsheets/raw/master/datatable.pdf Cheat sheet] from [https://www.rstudio.com/resources/cheatsheets/ RStudio]
== Linear Algebra ==
* [https://www.r-bloggers.com/importing-data-into-r-part-two/ Reading large data tables in R]
* [https://jimskinner.github.io/post/elegant-linear-algebra-in-r-with-the-matrix-package/ Elegant linear algebra in R with the Matrix package]. Matrix package is used.
<syntaxhighlight lang='rsplus'>
* [https://datascienceplus.com/linear-algebra-for-machine-learning-and-deep-learning-in-r/ Linear Algebra for Machine Learning and Deep Learning in R]. MASS library is used.
library(data.table)
x <- fread("mylargefile.txt")
</syntaxhighlight>
* Note that '''x[, 2]'' always return 2. If you want to do the thing you want, use ''x[, 2, with=FALSE]'' or ''x[, V2]'' where V2 is the header name. See the FAQ #1 in [http://datatable.r-forge.r-project.org/datatable-faq.pdf data.table].
* [http://r-norberg.blogspot.com/2016/06/understanding-datatable-rolling-joins.html Understanding data.table Rolling Joins]
* [https://rollingyours.wordpress.com/2016/06/14/fast-aggregation-of-large-data-with-the-data-table-package/ Intro to The data.table Package]
* In the [https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro-vignette.html Introduction to data.table] vignette, the data.table::order() function is SLOWER than base::order() from my Odroid xu4 (running Ubuntu 14.04.4 trusty on uSD)
<syntaxhighlight lang='rsplus'>
odt = data.table(col=sample(1e7))
(t1 <- system.time(ans1 <- odt[base::order(col)]))  ## uses order from base R
#  user  system elapsed
#  2.730  0.210  2.947
(t2 <- system.time(ans2 <- odt[order(col)]))        ## uses data.table's order
#  user  system elapsed
#  2.830  0.215  3.052
(identical(ans1, ans2))
# [1] TRUE
</syntaxhighlight>
* [https://jangorecki.github.io/blog/2016-06-30/Boost-Your-Data-Munging-with-R.html Boost Your Data Munging with R]


==== reshape & reshape2 ====
== Amazon Alexa ==
* [http://r-exercises.com/2016/07/06/data-shape-transformation-with-reshape/ Data Shape Transformation With Reshape()]
* http://blagrants.blogspot.com/2016/02/theres-party-at-alexas-place.html
* Use '''acast()''' function in reshape2 package. It will convert data.frame used for analysis to a table-like data.frame good for display.
* http://lamages.blogspot.com/2013/10/creating-matrix-from-long-dataframe.html


==== [http://cran.r-project.org/web/packages/tidyr/index.html tidyr] ====
== R and Singularity ==
An evolution of reshape2. It's designed specifically for data tidying (not general reshaping or aggregating) and works well with dplyr data pipelines.
https://rviews.rstudio.com/2017/03/29/r-and-singularity/


* [https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html vignette("tidy-data")] & [https://github.com/rstudio/cheatsheets/blob/master/data-import.pdf Cheat sheet]
== Teach kids about R with Minecraft ==
* Main functions
http://blog.revolutionanalytics.com/2017/06/teach-kids-about-r-with-minecraft.html
** Reshape data: '''gather()''' & '''spread()'''
** Split cells: '''separate()''' & '''unite()'''
** Handle missing: drop_na() & fill() & replace_na()
* http://blog.rstudio.org/2014/07/22/introducing-tidyr/
* http://rpubs.com/seandavi/GEOMetadbSurvey2014
* http://timelyportfolio.github.io/rCharts_factor_analytics/factors_with_new_R.html
* [http://www.milanor.net/blog/reshape-data-r-tidyr-vs-reshape2/ tidyr vs reshape2]


Make wide tables long with '''gather()''' (see 6.3.1 of Efficient R Programming)
== Secure API keys ==
<syntaxhighlight lang='rsplus'>
[http://blog.revolutionanalytics.com/2017/07/secret-package.html Securely store API keys in R scripts with the "secret" package]
library(tidyr)
library(efficient)
data(pew) # wide table
dim(pew) # 18 x 10,  (religion, '<$10k', '$10--20k', '$20--30k', ..., '>150k')
pewt <- gather(data = pew, key = Income, value = Count, -religion)
dim(pew) # 162 x 3,  (religion, Income, Count)


args(gather)
== Credentials and secrets ==
# function(data, key, value, ..., na.rm = FALSE, convert = FALSE, factor_key = FALSE)
[https://datascienceplus.com/how-to-manage-credentials-and-secrets-safely-in-r/ How to manage credentials and secrets safely in R]
</syntaxhighlight>
where the three arguments of gather() requires:
* data: a data frame in which column names will become row vaues
* key: the name of the categorical variable into which the column names in the original datasets are converted.
* value: the name of cell value columns


In this example, the 'religion' column will not be included (-religion).
== Hide a password ==
=== keyring package ===
* https://cran.r-project.org/web/packages/keyring/index.html
* [http://theautomatic.net/2019/06/25/how-to-hide-a-password-in-r-with-the-keyring-package/ How to hide a password in R with the Keyring package]


==== dplyr, plyr packages ====
=== getPass ===
* Essential functions: 3 rows functions, 3 column functions and 1 mixed function.
[https://cran.r-project.org/web/packages/getPass/README.html getPass]
<pre>
          select, mutate, rename
            +------------------+
filter      +                  +
arrange    +                  +
group_by    +                  +
            + summarise        +
            +------------------+
</pre>
* These functions works on data frames and tibble objects.
<syntaxhighlight lang='rsplus'>
iris %>% filter(Species == "setosa") %>% count()
head(iris %>% filter(Species == "setosa") %>% arrange(Sepal.Length))
</syntaxhighlight>
* [http://r4ds.had.co.nz/transform.html Data Transformation] in the book '''R for Data Science'''. Five key functions in the '''dplyr''' package:
** Filter rows: filter()
** Arrange rows: arrange()
** Select columns: select()
** Add new variables: mutate()
** Grouped summaries: group_by() & summarise()
<syntaxhighlight lang='rsplus'>
# filter
jan1 <- filter(flights, month == 1, day == 1)
filter(flights, month == 11 | month == 12)
filter(flights, arr_delay <= 120, dep_delay <= 120)
df <- tibble(x = c(1, NA, 3))
filter(df, x > 1)
filter(df, is.na(x) | x > 1)


# arrange
== Vision and image recognition ==
arrange(flights, year, month, day)
* https://www.stoltzmaniac.com/google-vision-api-in-r-rooglevision/ Google vision API IN R] – RoogleVision
arrange(flights, desc(arr_delay))
* [http://www.bnosac.be/index.php/blog/66-computer-vision-algorithms-for-r-users Computer Vision Algorithms for R users] and https://github.com/bnosac/image


# select
== Creating a Dataset from an Image ==
select(flights, year, month, day)
[https://ivelasq.rbind.io/blog/reticulate-data-recreation/ Creating a Dataset from an Image in R Markdown using reticulate]
select(flights, year:day)
select(flights, -(year:day))


# mutate
== Turn pictures into coloring pages ==
flights_sml <- select(flights,
https://gist.github.com/jeroen/53a5f721cf81de2acba82ea47d0b19d0
  year:day,
  ends_with("delay"),
  distance,
  air_time
)
mutate(flights_sml,
  gain = arr_delay - dep_delay,
  speed = distance / air_time * 60
)
# if you only want to keep the new variables
transmute(flights,
  gain = arr_delay - dep_delay,
  hours = air_time / 60,
  gain_per_hour = gain / hours
)


# summarise()
== Numerical optimization ==
by_day <- group_by(flights, year, month, day)
[https://cran.r-project.org/web/views/NumericalMathematics.html CRAN Task View: Numerical Mathematics], [https://cran.r-project.org/web/views/Optimization.html CRAN Task View: Optimization and Mathematical Programming]
summarise(by_day, delay = mean(dep_delay, na.rm = TRUE))


# pipe. Note summarise() can return more than 1 variable.
* [http://stat.ethz.ch/R-manual/R-patched/library/stats/html/uniroot.html uniroot]: One Dimensional Root (Zero) Finding. This is used in [http://onlinelibrary.wiley.com/doi/10.1002/sim.7178/full simulating survival data for predefined censoring rate]
delays <- flights %>%
* [http://stat.ethz.ch/R-manual/R-patched/library/stats/html/optimize.html optimize]: One Dimensional Optimization
  group_by(dest) %>%
* [http://stat.ethz.ch/R-manual/R-patched/library/stats/html/optim.html optim]: General-purpose optimization based on Nelder–Mead, quasi-Newton and conjugate-gradient algorithms.  
  summarise(
* [http://stat.ethz.ch/R-manual/R-patched/library/stats/html/constrOptim.html constrOptim]: Linearly Constrained Optimization
    count = n(),
* [http://stat.ethz.ch/R-manual/R-patched/library/stats/html/nlm.html nlm]: Non-Linear Minimization
    dist = mean(distance, na.rm = TRUE),
* [http://stat.ethz.ch/R-manual/R-patched/library/stats/html/nls.html nls]: Nonlinear Least Squares
    delay = mean(arr_delay, na.rm = TRUE)
* [https://blogs.rstudio.com/ai/posts/2021-04-22-torch-for-optimization/ torch for optimization]. L-BFGS optimizer.
  ) %>%
  filter(count > 20, dest != "HNL")
flights %>%
  group_by(year, month, day) %>%
  summarise(mean = mean(dep_delay, na.rm = TRUE))
</syntaxhighlight>
* Videos
** [https://youtu.be/jWjqLW-u3hc Hands-on dplyr tutorial for faster data manipulation in R] by Data School. At time 17:00, it compares the '''%>%''' operator, '''with()''' and '''aggregate()''' for finding group mean.
** https://youtu.be/aywFompr1F4 (shorter video) by Roger Peng
** https://youtu.be/8SGif63VW6E by Hadley Wickham
** [https://www.rstudio.com/resources/videos/tidy-eval-programming-with-dplyr-tidyr-and-ggplot2/ Tidy eval: Programming with dplyr, tidyr, and ggplot2]. Bang bang "!!" operator was introduced for use in a function call.
* [https://csgillespie.github.io/efficientR/data-carpentry.html#dplyr Efficient R Programming]
* [http://www.r-exercises.com/2017/07/19/data-wrangling-transforming-23/ Data wrangling: Transformation] from R-exercises.
* [https://rollingyours.wordpress.com/2016/06/29/express-intro-to-dplyr/ Express Intro to dplyr] by rollingyours.
* [https://martinsbioblogg.wordpress.com/2017/05/21/using-r-when-using-do-in-dplyr-dont-forget-the-dot/ the dot].
* [http://martinsbioblogg.wordpress.com/2013/03/24/using-r-reading-tables-that-need-a-little-cleaning/ stringr and plyr] A '''data.frame''' is pretty much a list of vectors, so we use plyr to apply over the list and stringr to search and replace in the vectors.
* https://randomjohn.github.io/r-maps-with-census-data/ dplyr and stringr are used
* [https://datascienceplus.com/5-interesting-subtle-insights-from-ted-videos-data-analysis-in-r/ 5 interesting subtle insights from TED videos data analysis in R]
* [https://www.mango-solutions.com/blog/what-is-tidy-eval-and-why-should-i-care What is tidy eval and why should I care?]


==== stringr ====
== Ryacas: R Interface to the 'Yacas' Computer Algebra System ==
* https://www.rstudio.com/wp-content/uploads/2016/09/RegExCheatsheet.pdf
[https://blog.ephorie.de/doing-maths-symbolically-r-as-a-computer-algebra-system-cas Doing Maths Symbolically: R as a Computer Algebra System (CAS)]
* [https://github.com/rstudio/cheatsheets/raw/master/strings.pdf stringr Cheat sheet] (2 pages, this will immediately download the pdf file)


==== [https://github.com/smbache/magrittr magrittr] ====
== Game ==
* [https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html Vignettes]
* [https://kbroman.org/miner_book/?s=09 R Programming with Minecraft]
* [http://www.win-vector.com/blog/2018/04/magrittr-and-wrapr-pipes-in-r-an-examination/ magrittr and wrapr Pipes in R, an Examination]
* [https://cran.r-project.org/web/packages/pixelpuzzle/index.html pixelpuzzle]
* [https://www.rostrum.blog/2022/09/24/pixeltrix/ Interactive pixel art in R with {pixeltrix}]
* [https://rtaoist.blogspot.com/2021/03/r-shiny-maths-games-for-6-years-old.html Shiny math game]
* [https://cran.microsoft.com/web/packages/mazing/index.html mazing]: Utilities for Making and Plotting Mazes
* [https://github.com/jeroenjanssens/raylibr/blob/main/demo/snake.R snake] which is based on [https://github.com/jeroenjanssens/raylibr raylibr]


Instead of nested statements, it is using pipe operator '''%>%'''. So the code is easier to read. Impressive!
== Music ==
<syntaxhighlight lang='rsplus'>
* [https://flujoo.github.io/gm/ gm]. Require to install [https://musescore.org/en MuseScore], an open source and free notation software.
x %>% f    # f(x)
x %>% f(y)  # f(x, y)
x %>% f(arg=y)  # f(x, arg=y)
x %>% f(z, .) # f(z, x)
x %>% f(y) %>% g(z)  #  g(f(x, y), z)


x %>% select(which(colSums(!is.na(.))>0))  # remove columns with all missing data
== SAS ==
x %>% select(which(colSums(!is.na(.))>0)) %>% filter((rowSums(!is.na(.))>0)) # remove all-NA columns _and_ rows
[https://github.com/MangoTheCat/sasMap sasMap] Static code analysis for SAS scripts
</syntaxhighlight>
* [https://stackoverflow.com/questions/27100678/how-to-extract-subset-an-element-from-a-list-with-the-magrittr-pipe Subset an element from a list]
<syntaxhighlight lang='rsplus'>
iris$Species
iris[["Species"]]


iris %>%
= R packages =
`[[`("Species")
[[R_packages|R packages]]


iris %>%
= Tricks =
`[[`(5)


iris %>%
== Getting help ==
  subset(select = "Species")
* http://stackoverflow.com/questions/tagged/r and [https://stackoverflow.com/tags/r/info R page] contains resources.  
</syntaxhighlight>
* https://stat.ethz.ch/pipermail/r-help/
* '''Split-apply-combine''': group + summarize + sort/arrange + top n. The following example is from [https://csgillespie.github.io/efficientR/data-carpentry.html#data-aggregation Efficient R programming].
* https://stat.ethz.ch/pipermail/r-devel/
<syntaxhighlight lang='rsplus'>
data(wb_ineq, package = "efficient")
wb_ineq %>%
  filter(grepl("g", Country)) %>%
  group_by(Year) %>%
  summarise(gini = mean(gini, na.rm  = TRUE)) %>%
  arrange(desc(gini)) %>%
  top_n(n = 5)
</syntaxhighlight>
* [https://drdoane.com/writing-pipe-friendly-functions/ Writing Pipe-friendly Functions]
* http://rud.is/b/2015/02/04/a-step-to-the-right-in-r-assignments/
* http://rpubs.com/tjmahr/pipelines_2015
* http://danielmarcelino.com/i-loved-this-crosstable/
* http://moderndata.plot.ly/using-the-pipe-operator-in-r-with-plotly/
* Videos
** [https://www.rstudio.com/resources/videos/writing-readable-code-with-pipes/ Writing Readable Code with Pipes]
** [https://youtu.be/iIBTI_qiq9g Pipes in R - An Introduction to magrittr package]
<syntaxhighlight lang='rsplus'>
# Examples from R for Data Science-Import, Tidy, Transform, Visualize, and Model
diamonds <- ggplot2::diamonds
diamonds2 <- diamonds %>% dplyr::mutate(price_per_carat = price / carat)


pryr::object_size(diamonds)
== Better Coder/coding, best practices ==
pryr::object_size(diamonds2)
* http://www.mango-solutions.com/wp/2015/10/10-top-tips-for-becoming-a-better-coder/
pryr::object_size(diamonds, diamonds2)
* [https://www.rstudio.com/rviews/2016/12/02/writing-good-r-code-and-writing-well/ Writing Good R Code and Writing Well]
* [http://www.thertrader.com/2018/09/01/r-code-best-practices/ R Code – Best practices]
* [https://stackoverflow.com/a/2258292 What best practices do you use for programming in R?]
* [https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.9169?campaign=woletoc Best practices in statistical computing] Sanchez 2021


rnorm(100) %>% matrix(ncol = 2) %>% plot() %>% str()
== [https://en.wikipedia.org/wiki/Scientific_notation#E-notation E-notation] ==
rnorm(100) %>% matrix(ncol = 2) %T>% plot() %>% str() # 'tee' pipe
6.022E23 (or 6.022e23) is equivalent to 6.022×10^23
    # %T>% works like %>% except that it returns the lefthand side (rnorm(100) %>% matrix(ncol = 2)
    # instead of the righthand side.


# If a function does not have a data frame based api, you can use %$%.
== Getting user's home directory ==
# It explodes out the variables in a data frame.
See [https://cran.r-project.org/bin/windows/base/rw-FAQ.html#What-are-HOME-and-working-directories_003f What are HOME and working directories?]
mtcars %$% cor(disp, mpg)  
{{Pre}}
# Windows
normalizePath("~")  # "C:\\Users\\brb\\Documents"
Sys.getenv("R_USER") # "C:/Users/brb/Documents"
Sys.getenv("HOME")   # "C:/Users/brb/Documents"


# For assignment, magrittr provides the %<>% operator
# Mac
mtcars <- mtcars %>% transform(cyl = cyl * 2) # can be simplified by
normalizePath("~")   # [1] "/Users/brb"
mtcars %<>% transform(cyl = cyl * 2)
Sys.getenv("R_USER") # [1] ""
</syntaxhighlight>
Sys.getenv("HOME")  # "/Users/brb"


Upsides of using magrittr: no need to create intermediate objects, code is easy to read.
# Linux
normalizePath("~")  # [1] "/home/brb"
Sys.getenv("R_USER") # [1] ""
Sys.getenv("HOME")  # [1] "/home/brb"
</pre>


When not to use the pipe
== tempdir() ==
* your pipes are longer than (say) 10 steps
* The path is a per-session temporary directory. On parallel use, R processes forked by functions such as '''mclapply''' and '''makeForkCluster''' in package '''parallel''' share a per-session temporary directory.
* you have multiple inputs or outputs
* [https://www.r-bloggers.com/2024/07/r-set-temporary-folder-for-r-in-rstudio-server/ Set temporary folder for R in Rstudio server]
* Functions that use the current environment: assign(), get(), load()
* Functions that use lazy evaluation: tryCatch(), try()


==== outer() ====
== Distinguish Windows and Linux/Mac, R.Version() ==
identical(.Platform$OS.type, "unix") returns TRUE on Mac and Linux.


==== Genomic sequence ====
* [https://www.r-bloggers.com/identifying-the-os-from-r/ Identifying the OS from R]
* chartr
* [https://stackoverflow.com/questions/4747715/how-to-check-the-os-within-r How to check the OS within R]
<syntaxhighlight lang='bash'>
<pre>
> yourSeq <- "AAAACCCGGGTTTNNN"
get_os <- function(){
> chartr("ACGT", "TGCA", yourSeq)
  sysinf <- Sys.info()
[1] "TTTTGGGCCCAAANNN"
  if (!is.null(sysinf)){
</syntaxhighlight>
    os <- sysinf['sysname']
    if (os == 'Darwin')
      os <- "osx"
  } else { ## mystery machine
    os <- .Platform$OS.type
    if (grepl("^darwin", R.version$os))
      os <- "osx"
    if (grepl("linux-gnu", R.version$os))
      os <- "linux"
  }
  tolower(os)
}
</pre>
<pre>
names(R.Version())
[1] "platform"      "arch"          "os"            "system"       
#  [5] "status"        "major"          "minor"          "year"         
#  [9] "month"          "day"            "svn rev"        "language"     
# [13] "version.string" "nickname"
getRversion()
# [1] ‘4.3.0’
</pre>


=== Data Science ===
== Rprofile.site, Renviron.site (all platforms) and Rconsole (Windows only) ==
==== How to prepare data for collaboration ====
* https://cran.r-project.org/doc/manuals/r-release/R-admin.html ('''Rprofile.site'''). Put R statements.
[https://peerj.com/preprints/3139.pdf How to share data for collaboration]. Especially [https://peerj.com/preprints/3139.pdf#page=7 Page 7] has some (raw data) variable coding guidelines.  
* https://cran.r-project.org/doc/manuals/r-release/R-exts.html  ('''Renviron.site'''). Define environment variables.
* naming variables: using meaning variable names, no spacing in column header, avoiding separator (except an underscore)
* https://cran.r-project.org/doc/manuals/r-release/R-intro.html ('''Rprofile.site, Renviron.site, Rconsole''' (Windows only))
* coding variables: be consistent, no spelling error
* [http://blog.revolutionanalytics.com/2015/11/how-to-store-and-use-authentication-details-with-r.html How to store and use webservice keys and authentication details]
* date and time: YYYY-MM-DD (ISO 8601 standard). A gene symbol "Oct-4" will be interpreted as a date and reformatted in Excel.
* [http://itsalocke.com/use-rprofile-give-important-notifications/ Use your .Rprofile to give you important notifications]
* missing data: "NA". Not leave any cells blank.
* [https://rviews.rstudio.com/2017/04/19/r-for-enterprise-understanding-r-s-startup/ *R for Enterprise: Understanding R’s Startup]
* using a '''code book''' file (*.docx for example): any lengthy explanation about variables should be put here. See p5 for an example.
* [https://support.rstudio.com/hc/en-us/articles/360047157094-Managing-R-with-Rprofile-Renviron-Rprofile-site-Renviron-site-rsession-conf-and-repos-conf *Managing R with .Rprofile, .Renviron, Rprofile.site, Renviron.site, rsession.conf, and repos.conf]


Five types of data:
If we like to install R packages to a personal directory, follow [https://stat.ethz.ch/pipermail/r-devel/2015-July/071562.html this]. Just add the line
* continuous
* oridinal
* categorical
* missing
* censored
 
Some extra from [https://peerj.com/preprints/3183/ Data organization in spreadsheets] (the paper appears in [https://www.tandfonline.com/doi/full/10.1080/00031305.2017.1375989 American Statistician])
* No empty cells
* Put one thing in a cell
* Make a rectangle
* No calculation in the raw data files
* Create a '''data dictionary''' (same as '''code book''')
 
==== [https://www.rdocumentation.org/packages/stats/versions/3.5.1/topics/complete.cases complete.cases()] ====
Count the number of rows in a data frame that have missing values with
<syntaxhighlight lang='rsplus'>
sum(!complete.cases(dF))
</syntaxhighlight>
<pre>
<pre>
> tmp <- matrix(1:6, 3, 2)
R_LIBS_SITE=F:/R/library
> tmp
    [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6
> tmp[2,1] <- NA
> complete.cases(tmp)
[1]  TRUE FALSE  TRUE
</pre>
</pre>
to the file '''R_HOME/etc/x64/Renviron.site'''. In R, run '''Sys.getenv("R_LIBS_SITE")''' or '''Sys.getenv("R_LIBS_USER")''' to query the environment variable. See [https://stat.ethz.ch/R-manual/R-devel/library/base/html/EnvVar.html Environment Variables].


==== Wrangling categorical data in R ====
=== What is the best place to save Rconsole on Windows platform ===
https://peerj.com/preprints/3163.pdf
Put/create the file <Rconsole> under ''C:/Users/USERNAME/Documents'' folder so no matter how R was upgraded/downgraded, it always find my preference.


Some approaches:
My preferred settings:
* Font: Consolas (it will be shown as "TT Consolas" in Rconsole)
* Size: 12
* background: black
* normaltext: white
* usertext: GreenYellow or orange (close to RStudio's Cobalt theme) or sienna1 or SpringGreen or tan1 or yellow


* options(stringAsFactors=FALSE)  
and others (default options)
* Use the '''tidyverse''' package
* pagebg: white
* pagetext: navy
* highlight: DarkRed
* dataeditbg: white
* dataedittext: navy (View() function)
* dataedituser: red
* editorbg: white (edit() function)
* editortext: black
 
A copy of the Rconsole is saved in [https://gist.github.com/arraytools/ed16a486e19702ae94bde4212ad59ecb github].


Base R approach:
=== How R starts up ===
<syntaxhighlight lang='rsplus'>
https://rstats.wtf/r-startup.html
GSS <- read.csv("XXX.csv")
GSS$BaseLaborStatus <- GSS$LaborStatus
levels(GSS$BaseLaborStatus)
summary(GSS$BaseLaborStatus)
GSS$BaseLaborStatus <- as.character(GSS$BaseLaborStatus)
GSS$BaseLaborStatus[GSS$BaseLaborStatus == "Temp not working"] <- "Temporarily not working"
GSS$BaseLaborStatus[GSS$BaseLaborStatus == "Unempl, laid off"] <- "Unemployed, laid off"
GSS$BaseLaborStatus[GSS$BaseLaborStatus == "Working fulltime"] <- "Working full time"
GSS$BaseLaborStatus[GSS$BaseLaborStatus == "Working parttime"] <- "Working part time"
GSS$BaseLaborStatus <- factor(GSS$BaseLaborStatus)
</syntaxhighlight>


Tidyverse approach:
=== startup - Friendly R Startup Configuration ===
<syntaxhighlight lang='rsplus'>
https://github.com/henrikbengtsson/startup
GSS <- GSS %>%
    mutate(tidyLaborStatus =
        recode(LaborStatus,
            `Temp not working` = "Temporarily not working",
            `Unempl, laid off` = "Unemployed, laid off",
            `Working fulltime` = "Working full time",
            `Working parttime ` = "Working part time"))
</syntaxhighlight>


=== [http://cran.r-project.org/web/packages/jpeg/index.html jpeg] ===
== Saving and loading history automatically: .Rprofile & local() ==
If we want to create the image on this wiki left hand side panel, we can use the '''jpeg''' package to read an existing plot and then edit and save it.
<ul>
<li>[http://stat.ethz.ch/R-manual/R-patched/library/utils/html/savehistory.html savehistory("filename")]. It will save everything from the beginning to the command savehistory() to a text file.
<li>'''.Rprofile''' will automatically be loaded when R has started from that directory
<li>Don't do things in your .Rprofile that affect how R code runs, such as loading a package like dplyr or ggplot or setting an option such as stringsAsFactors = FALSE. See [https://www.tidyverse.org/articles/2017/12/workflow-vs-script/ Project-oriented workflow].
<li>'''.Rprofile''' has been created/used by the '''packrat''' package to restore a packrat environment. See the packrat/init.R file and [[R_packages|R packages &rarr; packrat]].
<li>[http://www.statmethods.net/interface/customizing.html Customizing Startup] from R in Action, [http://www.onthelambda.com/2014/09/17/fun-with-rprofile-and-customizing-r-startup/ Fun with .Rprofile and customizing R startup]  
* You can also place a '''.Rprofile''' file in any directory that you are going to run R from or in the user home directory.
* At startup, R will source the '''Rprofile.site''' file. It will then look for a '''.Rprofile''' file to source in the current working directory. If it doesn't find it, it will look for one in the user's home directory.
<pre>
options(continue="  ") # default is "+ "
options(prompt="R> ", continue=" ")
options(editor="nano") # default is "vi" on Linux
# options(htmlhelp=TRUE)


We can also use the jpeg package to import and manipulate a jpg image. See [http://moderndata.plot.ly/fun-with-heatmaps-and-plotly/ Fun with Heatmaps and Plotly].
local({r <- getOption("repos")
      r["CRAN"] <- "https://cran.rstudio.com"
      options(repos=r)})


=== [http://cran.r-project.org/web/packages/Cairo/index.html Cairo] ===
.First <- function(){
See [[Heatmap#White_strips_.28artifacts.29|White strips problem]] in png() or tiff().
# library(tidyverse)
cat("\nWelcome at", date(), "\n")
}


=== [https://cran.r-project.org/web/packages/cairoDevice/ cairoDevice] ===
.Last <- function(){
PS. Not sure the advantage of functions in this package compared to R's functions (eg. Cairo_svg() vs svg()).
cat("\nGoodbye at ", date(), "\n")
</pre>
<li>https://stackoverflow.com/questions/16734937/saving-and-loading-history-automatically
<li>The history file will always be read from the $HOME directory and the history file will be overwritten by a new session. These two problems can be solved if we define '''R_HISTFILE''' system variable.
<li>[https://www.rdocumentation.org/packages/base/versions/3.5.0/topics/eval local()] function can be used in .Rprofile file to set up the environment even no new variables will be created (change repository, install packages, load libraries, source R files, run system() function, file/directory I/O, etc)
</ul>
'''Linux''' or '''Mac'''


For ubuntu OS, we need to install 2 libraries and 1 R package '''RGtk2'''.
In '''~/.profile''' or '''~/.bashrc''' I put:
<pre>
export R_HISTFILE=~/.Rhistory
</pre>
In '''~/.Rprofile''' I put:
<pre>
<pre>
sudo apt-get install libgtk2.0-dev libcairo2-dev
if (interactive()) {
  if (.Platform$OS.type == "unix")  .First <- function() try(utils::loadhistory("~/.Rhistory"))
  .Last <- function() try(savehistory(file.path(Sys.getenv("HOME"), ".Rhistory")))
}
</pre>
</pre>


On Windows OS, we may got the error: '''unable to load shared object 'C:/Program Files/R/R-3.0.2/library/cairoDevice/libs/x64/cairoDevice.dll' '''. We need to follow the instruction in [http://tolstoy.newcastle.edu.au/R/e6/help/09/05/15613.html here].
'''Windows'''


=== [http://igraph.org/r/ igraph] ===
If you launch R by clicking its icon from Windows Desktop, the R starts in '''C:\User\$USER\Documents''' directory. So we can create a new file '''.Rprofile''' in this directory.
[https://shiring.github.io/genome/2016/12/14/homologous_genes_part2_post creating directed networks with igraph]
<pre>
if (interactive()) {
  .Last <- function() try(savehistory(file.path(Sys.getenv("HOME"), ".Rhistory")))
}
</pre>


=== Identifying dependencies of R functions and scripts ===
== Disable "Save workspace image?" prompt when exit R? ==
https://stackoverflow.com/questions/8761857/identifying-dependencies-of-r-functions-and-scripts
[https://stackoverflow.com/a/4996252 How to disable "Save workspace image?" prompt in R?]
<syntaxhighlight lang='rsplus'>
library(mvbutils)
foodweb(where = "package:batr")


foodweb( find.funs("package:batr"), prune="survRiskPredict", lwd=2)
== R release versions ==
[http://cran.r-project.org/web/packages/rversions/index.html rversions]: Query the main 'R' 'SVN' repository to find the released versions & dates.


foodweb( find.funs("package:batr"), prune="classPredict", lwd=2)
== getRversion() ==
</syntaxhighlight>
<pre>
getRversion()
[1] ‘4.3.0’
</pre>


=== [http://cran.r-project.org/web/packages/iterators/ iterators] ===
== Detect number of running R instances in Windows ==
Iterator is useful over for-loop if the data is already a '''collection'''. It can be used to iterate over a vector, data frame, matrix, file
* http://stackoverflow.com/questions/15935931/detect-number-of-running-r-instances-in-windows-within-r
<pre>
C:\Program Files\R>tasklist /FI "IMAGENAME eq Rscript.exe"
INFO: No tasks are running which match the specified criteria.


Iterator can be combined to use with foreach package http://www.exegetic.biz/blog/2013/11/iterators-in-r/ has more elaboration.
C:\Program Files\R>tasklist /FI "IMAGENAME eq Rgui.exe"


=== Colors ===
Image Name                    PID Session Name        Session#    Mem Usage
* http://www.bauer.uh.edu/parks/truecolor.htm Interactive RGB, Alpha and Color Picker
============================================================================
* http://deanattali.com/blog/colourpicker-package/ Not sure what it is doing
Rgui.exe                      1096 Console                    1    44,712 K
* [http://www.lifehack.org/484519/how-to-choose-the-best-colors-for-your-data-charts How to Choose the Best Colors For Your Data Charts]
* [http://novyden.blogspot.com/2013/09/how-to-expand-color-palette-with-ggplot.html How to expand color palette with ggplot and RColorBrewer]
* [http://sape.inf.usi.ch/quick-reference/ggplot2/colour Color names in R]


==== [http://rpubs.com/gaston/colortools colortools] ====
C:\Program Files\R>tasklist /FI "IMAGENAME eq Rserve.exe"
Tools that allow users generate color schemes and palettes


==== [https://github.com/daattali/colourpicker colourpicker] ====
Image Name                    PID Session Name        Session#    Mem Usage
A Colour Picker Tool for Shiny and for Selecting Colours in Plots
============================================================================
Rserve.exe                    6108 Console                    1    381,796 K
</pre>
In R, we can use
<pre>
> system('tasklist /FI "IMAGENAME eq Rgui.exe" ', intern = TRUE)
[1] ""                                                                           
[2] "Image Name                    PID Session Name        Session#    Mem Usage"
[3] "============================================================================"
[4] "Rgui.exe                      1096 Console                    1    44,804 K"


=== [https://github.com/kevinushey/rex rex] ===
> length(system('tasklist /FI "IMAGENAME eq Rgui.exe" ', intern = TRUE))-3
Friendly Regular Expressions
</pre>


=== [http://cran.r-project.org/web/packages/formatR/index.html formatR] ===
== Editor ==
'''The best strategy to avoid failure is to put comments in complete lines or after complete R expressions.'''
http://en.wikipedia.org/wiki/R_(programming_language)#Editors_and_IDEs
 
See also [http://stackoverflow.com/questions/3017877/tool-to-auto-format-r-code this discussion] on stackoverflow talks about R code reformatting.


<ul>
<li>Emacs + ESS. The ESS is useful in the case I want to tidy R code (the tidy_source() function in the formatR package sometimes gives errors; eg when I tested it on an R file like <GetComparisonResults.R> from BRB-ArrayTools v4.4 stable).
* Edit the file ''C:\Program Files\GNU Emacs 23.2\site-lisp\site-start.el'' with something like
<pre>
<pre>
library(formatR)
(setq-default inferior-R-program-name
tidy_source("Input.R", file = "output.R", width.cutoff=70)
              "c:/program files/r/r-2.15.2/bin/i386/rterm.exe")
tidy_source("clipboard")
# default width is getOption("width") which is 127 in my case.
</pre>
</pre>
* [https://blog.rwhitedwarf.com/post/use_emacs_for_r/ Using Emacs for R] 2022
</ul>
* [http://www.rstudio.com/ Rstudio] - editor/R terminal/R graphics/file browser/package manager. The new version (0.98) also provides a new feature for debugging step-by-step. See also [https://www.rstudio.com/rviews/2016/11/11/easy-tricks-you-mightve-missed/ RStudio Tricks]
* [http://www.geany.org/ geany] - I like the feature that it shows defined functions on the side panel even for R code. RStudio can also do this (see the bottom of the code panel).
* [http://rgedit.sourceforge.net/ Rgedit] which includes a feature of splitting screen into two panes and run R in the bottom panel. See [http://www.stattler.com/article/using-gedit-or-rgedit-r here].
* Komodo IDE with browser preview http://www.youtube.com/watch?v=wv89OOw9roI at 4:06 and http://docs.activestate.com/komodo/4.4/editor.html


Some issues
== GUI for Data Analysis ==
* Comments appearing at the beginning of a line within a long complete statement. This will break tidy_source().
[https://www.r-bloggers.com/2023/06/update-to-data-science-software-popularity/ Update to Data Science Software Popularity] 6/7/2023
<pre>
 
cat("abcd",
=== BlueSky Statistics ===
    # This is my comment
* https://www.blueskystatistics.com/Default.asp
    "defg")
* [https://r4stats.com/articles/software-reviews/bluesky/ A Comparative Review of the BlueSky Statistics GUI for R]
 
=== Rcmdr ===
http://cran.r-project.org/web/packages/Rcmdr/index.html. After loading a dataset, click Statistics -> Fit models. Then select Linear regression, Linear model, GLM, Multinomial logit model, Ordinal regression model, Linear mixed model, and Generalized linear mixed model. However, Rcmdr does not include, e.g. random forest, SVM, glmnet, et al.
 
=== Deducer ===
http://cran.r-project.org/web/packages/Deducer/index.html
 
=== jamovi ===
* https://www.jamovi.org/
* [http://r4stats.com/2019/01/09/updated-review-jamovi/ Updated Review: jamovi User Interface to R]
 
== Scope ==
See
* [http://cran.r-project.org/doc/manuals/R-intro.html#Assignment-within-functions Assignments within functions] in the '''An Introduction to R''' manual.
 
=== source() ===
* [https://twitter.com/henrikbengtsson/status/1563849697084809219?s=20&t=nStcqVabAQ_HvJ2FaBloNQ source() assigns to the global environment, not the calling environment, which might not be what you want/expect]. Instead, use source("file.R", local = TRUE) to avoid assigning functions and variables to the global environment.
* [[#How_to_exit_a_sourced_R_script|source()]] does not work like C's preprocessor where statements in header files will be literally inserted into the code. It does not work when you define a variable in a function but want to use it outside the function (even through '''source()''')
 
{{Pre}}
## foo.R ##
cat(ArrayTools, "\n")
## End of foo.R
 
# 1. Error
predict <- function() {
  ArrayTools <- "C:/Program Files" # or through load() function
  source("foo.R")                  # or through a function call; foo()
}
predict()   # Object ArrayTools not found
 
# 2. OK. Make the variable global
predict <- function() {
  ArrayTools <<- "C:/Program Files'
  source("foo.R")
}
predict() 
ArrayTools
 
# 3. OK. Create a global variable
ArrayTools <- "C:/Program Files"
predict <- function() {
  source("foo.R")
}
predict()
</pre>
</pre>
will result in
 
'''Note that any ordinary assignments done within the function are local and temporary and are lost after exit from the function.'''
 
Example 1.
<pre>
<pre>
> tidy_source("clipboard")
> ttt <- data.frame(type=letters[1:5], JpnTest=rep("999", 5), stringsAsFactors = F)
Error in base::parse(text = code, srcfile = NULL) :
> ttt
   3:1: unexpected string constant
   type JpnTest
2: invisible(".BeGiN_TiDy_IdEnTiFiEr_HaHaHa# This is my comment.HaHaHa_EnD_TiDy_IdEnTiFiEr")
1   a    999
3: "defg"
2   b    999
   ^
3    c    999
4    d    999
5    e    999
> jpntest <- function() { ttt$JpnTest[1] ="N5"; print(ttt)}
> jpntest()
  type JpnTest
1    a      N5
2    b    999
3    c    999
4    d    999
5    e    999
> ttt
  type JpnTest
1    a    999
2    b    999
3   c    999
4    d    999
5   e    999
</pre>
</pre>
* Comments appearing at the end of a line within a long complete statement ''won't break'' tidy_source() but tidy_source() cannot re-locate/tidy the comma sign.  
 
<pre>
Example 2. [http://stackoverflow.com/questions/1236620/global-variables-in-r How can we set global variables inside a function?] The answer is to use the "<<-" operator or '''assign(, , envir = .GlobalEnv)''' function.
cat("abcd"
 
    ,"defg"  # This is my comment
Other resource: [http://adv-r.had.co.nz/Functions.html Advanced R] by Hadley Wickham.
  ,"ghij")
 
Example 3. [https://stackoverflow.com/questions/1169534/writing-functions-in-r-keeping-scoping-in-mind Writing functions in R, keeping scoping in mind]
 
=== New environment ===
* http://adv-r.had.co.nz/Environments.html.
* [https://www.r-bloggers.com/2011/06/environments-in-r/ Environments in R]
* load(), attach(), with().
* [https://stackoverflow.com/questions/33109379/how-to-switch-to-a-new-environment-and-stick-into-it How to switch to a new environment and stick into it?] seems not possible!
 
Run the same function on a bunch of R objects
{{Pre}}
mye = new.env()
load(<filename>, mye)
for(n in names(mye)) n = as_tibble(<nowiki>mye[[n]]</nowiki>)
</pre>
</pre>
will become
 
Just look at the contents of rda file without saving to anywhere (?load)
<pre>
<pre>
cat("abcd", "defg"  # This is my comment
local({
, "ghij")  
  load("myfile.rda")
  ls()
})
</pre>
</pre>
Still bad!!
Or use '''attach()''' which is a wrapper of load(). It creates an environment and slots it into the list right after the global environment, then populates it with the objects we're attaching.
* Comments appearing at the end of a line within a long complete statement ''breaks'' tidy_source() function. For example,
{{Pre}}
<pre>
attach("all.rda") # safer and will warn about masked objects w/ same name in .GlobalEnv
cat("</p>",
ls(pos = 2)
"<HR SIZE=5 WIDTH=\"100%\" NOSHADE>",
##  also typically need to cleanup the search path:
ifelse(codeSurv == 0,"<h3><a name='Genes'><b><u>Genes which are differentially expressed among classes:</u></b></a></h3>", #4/9/09
detach("file:all.rda")
                    "<h3><a name='Genes'><b><u>Genes significantly associated with survival:</u></b></a></h3>"),
file=ExternalFileName, sep="\n", append=T)
</pre>
</pre>
will result in
If we want to read data from internet, '''load()''' works but not attach().
<pre>
<pre>
> tidy_source("clipboard", width.cutoff=70)
con <- url("http://some.where.net/R/data/example.rda")
Error in base::parse(text = code, srcfile = NULL) :
## print the value to see what objects were created.
  3:129: unexpected SPECIAL
print(load(con))
2: "<HR SIZE=5 WIDTH=\"100%\" NOSHADE>" ,
close(con)
3: ifelse ( codeSurv == 0 , "<h3><a name='Genes'><b><u>Genes which are differentially expressed among classes:</u></b></a></h3>" , %InLiNe_IdEnTiFiEr%
# Github example
# https://stackoverflow.com/a/62954840
</pre>
</pre>
* ''width.cutoff'' parameter is not always working. For example, there is no any change for the following snippet though I hope it will move the cat() to the next line.
[https://stackoverflow.com/a/39621091 source() case].  
<pre>
<pre>
if (codePF & !GlobalTest & !DoExactPermTest) cat(paste("Multivariate Permutations test was computed based on",  
myEnv <- new.env()  
    NumPermutations, "random permutations"), "<BR>", " ", file = ExternalFileName,
source("some_other_script.R", local=myEnv)
    sep = "\n", append = T)
attach(myEnv, name="sourced_scripts")
</pre>
search()
* It merges lines though I don't always want to do that. For example
ls(2)
<pre>
ls(myEnv)
cat("abcd"
with(myEnv, print(x))
    ,"defg" 
  ,"ghij")
</pre>
will become
<pre>
cat("abcd", "defg", "ghij")  
</pre>
</pre>


=== Download papers ===
=== str( , max) function ===
==== [http://cran.r-project.org/web/packages/biorxivr/index.html biorxivr] ====
Use '''max.level''' parameter to avoid a long display of the structure of a complex R object. Use '''give.head = FALSE''' to hide the attributes. See [https://www.rdocumentation.org/packages/utils/versions/3.6.1/topics/str ?str]
Search and Download Papers from the bioRxiv Preprint Server
 
If we use str() on a function like str(lm), it is equivalent to args(lm)


==== [http://cran.r-project.org/web/packages/aRxiv/index.html aRxiv] ====
For a complicated list object, it is useful to use the '''max.level''' argument; e.g. str(, max.level = 1)
Interface to the arXiv API


==== [https://cran.r-project.org/web/packages/pdftools/index.html pdftools] ====
For a large data frame, we can use the '''tibble()''' function; e.g. mydf %>% tibble()
* http://ropensci.org/blog/2016/03/01/pdftools-and-jeroen
* http://r-posts.com/how-to-extract-data-from-a-pdf-file-with-r/
=== [https://github.com/ColinFay/aside aside]: set it aside ===
An RStudio addin to run long R commands aside your current session.


=== Teaching ===
=== tidy() function ===
* [https://cran.r-project.org/web/packages/smovie/vignettes/smovie-vignette.html smovie]: Some Movies to Illustrate Concepts in Statistics
broom::tidy() provides a simplified form of an R object (obtained from running some analysis). See [[Tidyverse#broom|here]].


=== packrat on [https://cran.r-project.org/web/packages/packrat/ cran] & [https://rstudio.github.io/packrat/ github] for reproducible search ===
=== View all objects present in a package, ls() ===
* Videos:
https://stackoverflow.com/a/30392688. In the case of an R package created by Rcpp.package.skeleton("mypackage"), we will get
** https://www.rstudio.com/resources/webinars/managing-package-dependencies-in-r-with-packrat/
{{Pre}}
** https://www.rstudio.com/resources/webinars/rstudio-essentials-webinar-series-managing-part-3/
> devtools::load_all("mypackage")
* [https://rstudio.github.io/packrat/limitations.html limitations].
> search()
* [https://stackoverflow.com/questions/36187543/using-r-with-git-and-packrat Git and packrat]
[1] ".GlobalEnv"        "devtools_shims"    "package:mypackage"
[4] "package:stats"    "package:graphics"  "package:grDevices"
[7] "package:utils"    "package:datasets"  "package:methods"
[10] "Autoloads"        "package:base"


'''Create a snapshot''':
> ls("package:mypackage")
* Do we really need to call packrat::snapshot()? The [https://rstudio.github.io/packrat/walkthrough.html walk through] page says it is not needed but the lock file is not updated from my testing.
[1] "_mypackage_rcpp_hello_world" "evalCpp"                    "library.dynam.unload"     
* I got an error when it is trying to fetch the source code from bioconductor and local repositories: packrat is trying to fetch the source from CRAN in these two packages.  
[4] "rcpp_hello_world"           "system.file"
** On normal case, the packrat/packrat.lock file contains two entries in 'Repos' field (line 4).
</pre>
** The cause of the error is I ran snapshot() after I quitted R and entered again. So the solution is to add bioc and local repositories to options(repos).
** So what is important of running snapshot()?
** Check out the [https://groups.google.com/forum/#!forum/packrat-discuss forum].
<syntaxhighlight lang='rsplus'>
> dir.create("~/projects/babynames", recu=T)
> packrat::init("~/projects/babynames")
Initializing packrat project in directory:
- "~/projects/babynames"


Adding these packages to packrat:
Note that the first argument of ls() (or detach()) is used to specify the environment. It can be
            _
* an integer (the position in the ‘search’ list);
    packrat  0.4.9-3
* the character string name of an element in the search list;
* an explicit ‘environment’ (including using ‘sys.frame’ to access the currently active function calls).


Fetching sources for packrat (0.4.9-3) ... OK (CRAN current)
== Speedup R code ==
Snapshot written to '/home/brb/projects/babynames/packrat/packrat.lock'
* [http://datascienceplus.com/strategies-to-speedup-r-code/ Strategies to speedup R code] from DataScience+
Installing packrat (0.4.9-3) ...
OK (built source)
Initialization complete!
Unloading packages in user library:
- packrat
Packrat mode on. Using library in directory:
- "~/projects/babynames/packrat/lib"


> install.packages("reshape2")
=== Profiler ===
> packrat::snapshot()
* [https://www.rstudio.com/resources/videos/understand-code-performance-with-the-profiler/ Understand Code Performance with the profiler] (Video)
* [https://github.com/atheriel/xrprof-package xrprof] package, [https://www.infoworld.com/article/3604688/top-r-tips-and-news-from-rstudio-global-2021.amp.html Top R tips and news from RStudio Global 2021]


> system("tree -L 2 ~/projects/babynames/packrat/")
== && vs & ==
/home/brb/projects/babynames/packrat/
See https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/Logic.  
├── init.R
├── lib
│   └── x86_64-pc-linux-gnu
├── lib-ext
│   └── x86_64-pc-linux-gnu
├── lib-R            # base packages
│   └── x86_64-pc-linux-gnu
├── packrat.lock
├── packrat.opts
└── src
    ├── bitops
    ├── glue
    ├── magrittr
    ├── packrat
    ├── plyr
    ├── Rcpp
    ├── reshape2
    ├── stringi
    └── stringr
</syntaxhighlight>


'''Restoring snapshots''':
* The shorter form performs elementwise comparisons in much the same way as arithmetic operators. The return is a vector.
* The longer form evaluates left to right examining only the first element of each vector. The return is one value.
* '''The longer form''' evaluates left to right examining only the first element of each vector. '''Evaluation proceeds only until the result is determined.'''
* The idea of the longer form && in R seems to be the same as the && operator in linux shell; see [https://youtu.be/AVXYq8aL47Q?t=1475 here].
* [https://medium.com/biosyntax/single-or-double-and-operator-and-or-operator-in-r-442f00332d5b Single or double?: AND operator and OR operator in R]. The confusion might come from the inconsistency when choosing these operators in different languages. For example, in C, & performs bitwise AND, while && does Boolean logical AND.
* [https://www.tjmahr.com/think-of-stricter-logical-operators/ Think of && as a stricter &]


Suppose a packrat project was created on Ubuntu 16.04 and we now want to repeat the analysis on Ubuntu 18.04. We first copy the whole project directory ('babynames') to Ubuntu 18.04. Then we should delete the library subdirectory ('packrat/lib') which contains binary files (*.so) that do not work on the new OS. After we delete the library subdirectory, start R from the project directory. Now if we run '''packrat::restore()'' command, it will re-install all missing libraries. Bingo!
<pre>
c(T,F,T) & c(T,T,T)
# [1]  TRUE FALSE  TRUE
c(T,F,T) && c(T,T,T)
# [1] TRUE
c(T,F,T) && c(F,T,T)
# [1] FALSE
c(T,F,T) && c(NA,T,T)
# [1] NA
</pre>
<pre>
# Assume 'b' is not defined
> if (TRUE && b==3) cat("end")
Error: object 'b' not found
> if (FALSE && b==3) cat("end")
> # No error since the 2nd condition is never evaluated
</pre>
It's useful in functions(). We don't need nested if statements. In this case if 'arg' is missing, the argument 'L' is not needed so there is not syntax error.
<pre>
> foo <- function(arg, L) {
  # Suppose 'L' is meaningful only if 'arg' is provided
  #
  # Evaluate 'L' only if 'arg' is provided
  #
  if (!missing(arg) && L) {
    print("L is true")
  } else {
    print("Either arg is missing or L is FALSE")
  }
}
> foo()
[1] "arg is missing or L is FALSE"
> foo("a", F)
[1] "arg is missing or L is FALSE"
> foo("a", T)
[1] "L is true"
</pre>
Other examples: '''&&''' is more flexible than '''&'''.
<pre>
nspot <- ifelse(missing(rvm) || !rvm, nrow(exprTrain), sum(filter))
 
if (!is.null(exprTest) && any(is.na(exprTest))) { ... }
</pre>


Note: some OS level libraries (e.g. libXXX-dev) need to be installed manually beforehand in order for the magic to work.
== for-loop, control flow ==
<syntaxhighlight lang='rsplus'>
* [https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/Control ?Control]
$ rm -rf ~/projects/babynames/packrat/lib
* '''next''' can be used to skip the rest of the inner-most loop
$ cd ~/projects/babynames/
* [https://www.programiz.com/r/ifelse-function ifelse() Function]
$ R
>
> packrat::status()
> remove.packages("plyr")
> packrat::status()
> packrat::restore()
</syntaxhighlight>


'''Set Up a Custom CRAN-like Repository''':
== Vectorization ==
* [https://en.wikipedia.org/wiki/Vectorization_%28mathematics%29 Vectorization (Mathematics)] from wikipedia
* [https://en.wikipedia.org/wiki/Array_programming Array programming] from wikipedia
* [https://en.wikipedia.org/wiki/SIMD Single instruction, multiple data (SIMD)] from wikipedia
* [https://stackoverflow.com/a/1422181 What is vectorization] stackoverflow
* http://www.noamross.net/blog/2014/4/16/vectorization-in-r--why.html
* https://github.com/vsbuffalo/devnotes/wiki/R-and-Vectorization
* [https://statcompute.wordpress.com/2018/09/16/why-vectorize/ Why Vectorize?] statcompute.wordpress.com
* [https://www.jimhester.com/2018/04/12/vectorize/ Beware of Vectorize] from Jim Hester
* [https://github.com/henrikbengtsson/matrixstats matrixStats]: Functions that Apply to Rows and Columns of Matrices (and to Vectors). E.g. col / rowMedians(), col / rowRanks(), and col / rowSds(). [https://github.com/HenrikBengtsson/matrixStats/wiki/Benchmark-reports Benchmark reports].


See https://rstudio.github.io/packrat/custom-repos.html. Note the personal repository name ('sushi' in this example) used in "Repository" field of the personal package will be used in <packrat/packrat.lock> file. So as long as we work on the same computer, it is easy to restore a packrat project containing packages coming from personal repository.
=== sapply vs vectorization ===
[http://theautomatic.net/2019/03/13/speed-test-sapply-vs-vectorization/ Speed test: sapply vs vectorization]


'''[https://rstudio.github.io/packrat/commands.html Common functions]''':
=== lapply vs for loop ===
* packrat::init()
* [https://stackoverflow.com/a/42440872 lapply vs for loop - Performance R]
* packrat::snapshot()
* https://code-examples.net/en/q/286e03a
* packrat::restore()
* [https://johanndejong.wordpress.com/2016/07/07/r-are-apply-loops-faster-than-for-loops/ R: are *apply loops faster than for loops?]
* packrat::clean()
* packrat::status()
* packrat::install_local() # http://rstudio.github.io/packrat/limitations.html
* packrat::bundle() # see @28:44 of the [https://www.rstudio.com/resources/webinars/managing-package-dependencies-in-r-with-packrat/ video]
* packrat::unbundle() # see @29:17 of the same video. This will rebuild all packages
* packrat::on(), packrat::off()
* packrat::get_opts()
* packrat::set_opts() # http://rstudio.github.io/packrat/limitations.html
* packrat::opts$local.repos("~/local-cran")
* packrat::opts$external.packages(c("devtools")) # break the isolation
* packrat::extlib()
* packrat::with_extlib()
* packrat::project_dir(), .libPaths()


'''Warning'''
=== [https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/split split()] and sapply() ===
* If we download and modify some function definition from a package in CRAN without changing DESCRIPTION file or the package name, the snapshot created using packrat::snapshot() will contain the package source from CRAN instead of local repository. This is because (I guess) the DESCRIPTION file contains a field 'Repository' with the value 'CRAN'.
split() can be used to split a vector, columns or rows. See [https://stackoverflow.com/questions/3302356/how-to-split-a-data-frame How to split a data frame?]
<ul>
<li>Split divides the data in the '''vector''' or '''data frame''' x into the groups defined by f. The syntax is
{{Pre}}
split(x, f, drop = FALSE, …)
</pre>


=== Text to speech ===
<li>split() + cut(). [https://www.r-bloggers.com/2024/10/how-to-split-data-into-equal-sized-groups-in-r-a-comprehensive-guide-for-beginners/ How to Split Data into Equal Sized Groups in R: A Comprehensive Guide for Beginners]
[https://shirinsplayground.netlify.com/2018/06/googlelanguager/ Text-to-Speech with the googleLanguageR package]
<li>[https://stackoverflow.com/a/3321659 Split a vector into chunks]. split() returns a vector/indices and the indices can be used in lapply() to subset the data. Useful for the '''split() + lapply() + do.call()''' or '''split() + sapply()''' operations.
<pre>
d <- 1:10
chunksize <- 4
ceiling(1:10/4)
# [1] 1 1 1 1 2 2 2 2 3 3
split(d, ceiling(seq_along(d)/chunksize))
# $`1`
# [1] 1 2 3 4
#
# $`2`
# [1] 5 6 7 8
#
# $`3`
# [1] 9 10
do.call(c, lapply(split(d, ceiling(seq_along(d)/4)), function(x) sum(x)) )
#  1  2  3
# 10 26 19


=== Weather data ===
# bigmemory vignette
* [https://github.com/ropensci/prism prism] package
planeindices <- split(1:nrow(x), x[,'TailNum'])
* [http://www.weatherbase.com/weather/weather.php3?s=507781&cityname=Rockville-Maryland-United-States-of-America Weatherbase]
planeStart <- sapply(planeindices,
                    function(i) birthmonth(x[i, c('Year','Month'),
                                            drop=FALSE]))
</pre>


== Different ways of using R ==
<li>Split rows of a data frame/matrix; e.g. rows represents genes. The data frame/matrix is split directly.
{{Pre}}
split(mtcars,mtcars$cyl)


=== dyn.load ===
split(data.frame(matrix(1:20, nr=10) ), ceiling(1:10/chunksize)) # data.frame/tibble works
Error: [https://stackoverflow.com/questions/43662542/not-resolved-from-current-namespace-error-when-calling-c-routines-from-r “not resolved from current namespace” error, when calling C routines from R]
split.data.frame(matrix(1:20, nr=10), ceiling(1:10/chunksize))  # split.data.frame() works for matrices
</pre>


Solution: add '''getNativeSymbolInfo()''' around your C/Fortran symbols. Search Google:r dyn.load not resolved from current namespace
<li>Split columns of a data frame/matrix.
{{Pre}}
ma <- cbind(x = 1:10, y = (-4:5)^2, z = 11:20)
split(ma, cbind(rep(1,10), rep(2, 10), rep(1,10))) # not an interesting example
# $`1`
#  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
#
# $`2`
#  [1] 16  9  4  1  0  1  4  9 16 25
</pre>


=== R call C/C++ ===
<li>split() + sapply() to merge columns. See below [[#Mean_of_duplicated_columns:_rowMeans.3B_compute_Means_by_each_row|Mean of duplicated columns]] for more detail.  
Mainly talks about .C() and .Call().


* [http://cran.r-project.org/doc/manuals/R-exts.html R-Extension manual] of course.
<li>split() + sapply() to split a vector. See [https://www.rdocumentation.org/packages/genefilter/versions/1.54.2/topics/nsFilter nsFilter()] function which can remove duplicated probesets/rows using unique Entrez Gene IDs ('''genefilter''' package). The source code of [https://github.com/Bioconductor/genefilter/blob/b86f2cf47cf420b1444188bfe970714a7cc7f33b/R/nsFilter.R#L224 nsFilter()] and [https://github.com/Bioconductor/genefilter/blob/b86f2cf47cf420b1444188bfe970714a7cc7f33b/R/all.R#L170 findLargest()].
* http://faculty.washington.edu/kenrice/sisg-adv/sisg-07.pdf
{{Pre}}
* http://www.stat.berkeley.edu/scf/paciorek-cppWorkshop.pdf (Very useful)
tSsp = split.default(testStat, lls)
* http://www.stat.harvard.edu/ccr2005/
# testStat is a vector of numerics including probeset IDs as names
* http://mazamascience.com/WorkingWithData/?p=1099
# lls is a vector of entrez IDs (same length as testStat)
# tSSp is a list of the same length as unique elements of lls.


=== SEXP ===
sapply(tSsp, function(x) names(which.max(x)))
Some examples from packages
# return a vector of probset IDs of length of unique entrez IDs
</pre>
</ul>


* [https://www.bioconductor.org/packages/release/bioc/html/sva.html sva] package has one C code function
=== strsplit and sapply ===
{{Pre}}
> namedf <- c("John ABC", "Mary CDE", "Kat FGH")
> strsplit(namedf, " ")
[[1]]
[1] "John" "ABC"


=== R call Fortran 90 ===
[[2]]
* https://stat.ethz.ch/pipermail/r-devel/2015-March/070851.html
[1] "Mary" "CDE"


=== Embedding R ===
[[3]]
[1] "Kat" "FGH"


* See [http://cran.r-project.org/doc/manuals/R-exts.html#Linking-GUIs-and-other-front_002dends-to-R Writing for R Extensions] Manual Chapter 8.
> sapply(strsplit(namedf, " "), "[", 1)
* [http://www.ci.tuwien.ac.at/Conferences/useR-2004/abstracts/supplements/Urbanek.pdf Talk by Simon Urbanek] in UseR 2004.
[1] "John" "Mary" "Kat"
* [http://epub.ub.uni-muenchen.de/2085/1/tr012.pdf Technical report] by Friedrich Leisch in 2007.
> sapply(strsplit(namedf, " "), "[", 2)
* https://stat.ethz.ch/pipermail/r-help/attachments/20110729/b7d86ed7/attachment.pl
[1] "ABC" "CDE" "FGH"
</pre>


==== An very simple example (do not return from shell) from Writing R Extensions manual ====
=== Mean of duplicated columns: rowMeans; compute Means by each row ===
The command-line R front-end, R_HOME/bin/exec/R, is one such example. Its source code is in file <src/main/Rmain.c>.
<ul>
<li>[https://stackoverflow.com/questions/35925529/reduce-columns-of-a-matrix-by-a-function-in-r Reduce columns of a matrix by a function in R]. To use rowMedians() instead of rowMeans(), we need to install [https://cran.r-project.org/web/packages/matrixStats/index.html matrixStats] from CRAN.
<syntaxhighlight lang='r'>
set.seed(1)
x <- matrix(1:60, nr=10); x[1, 2:3] <- NA
colnames(x) <- c("b", "b", "b", "c", "a", "a"); x
res <- sapply(split(1:ncol(x), colnames(x)),
              function(i) rowMeans(x[, i, drop=F], na.rm = TRUE))
res  # notice the sorting of columns
      a  b  c
[1,] 46  1 31
[2,] 47 12 32
[3,] 48 13 33
[4,] 49 14 34
[5,] 50 15 35
[6,] 51 16 36
[7,] 52 17 37
[8,] 53 18 38
[9,] 54 19 39
[10,] 55 20 40


This example can be run by
# vapply() is safter than sapply().
<pre>R_HOME/bin/R CMD R_HOME/bin/exec/R</pre>
# The 3rd arg in vapply() is a template of the return value.
res2 <- vapply(split(1:ncol(x), colnames(x)),
              function(i) rowMeans(x[, i, drop=F], na.rm = TRUE),
              rep(0, nrow(x)))
</syntaxhighlight>
</li>
<li>[https://www.rdocumentation.org/packages/base/versions/3.5.2/topics/colSums colSums, rowSums, colMeans, rowMeans] (no group variable). These functions are equivalent to use of ‘apply’ with ‘FUN = mean’ or ‘FUN = sum’ with appropriate margins, but are a lot faster.
{{Pre}}
rowMeans(x, na.rm=T)
# [1] 31 27 28 29 30 31 32 33 34 35


Note:
apply(x, 1, mean, na.rm=T)
# '''R_HOME/bin/exec/R''' is the R binary. However, it couldn't be launched directly unless R_HOME and LD_LIBRARY_PATH are set up. Again, this is explained in Writing R Extension manual.
# [1] 31 27 28 29 30 31 32 33 34 35
# '''R_HOME/bin/R''' is a shell-script front-end where users can invoke it. It sets up the environment for the executable. It can be copied to ''/usr/local/bin/R''. When we run ''R_HOME/bin/R'', it actually runs ''R_HOME/bin/R CMD R_HOME/bin/exec/R'' (see line 259 of ''R_HOME/bin/R'' as in R 3.0.2) so we know the important role of ''R_HOME/bin/exec/R''.
</pre>
</li>
<li>[https://cran.r-project.org/web/packages/matrixStats/index.html matrixStats]: Functions that Apply to Rows and Columns of Matrices (and to Vectors)
</li>
<li>[https://www.statforbiology.com/2020/stat_r_tidyverse_columnwise/ From ''for()'' loops to the ''split-apply-combine'' paradigm for column-wise tasks: the transition for a dinosaur]
</li>
</ul>


More examples of embedding can be found in ''tests/Embedding'' directory. Read <index.html> for more information about these test examples.
=== Mean of duplicated rows: colMeans and rowsum ===
<ul>
<li>[https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/colSums colMeans(x, na.rm = FALSE, dims = 1)], take mean per columns & sum over rows. It returns a vector. Other similar idea functions include '''colSums, rowSums, rowMeans'''.
{{Pre}}
x <- matrix(1:60, nr=10); x[1, 2:3] <- NA; x
rownames(x) <- c(rep("b", 2), rep("c", 3), rep("d", 4), "a") # move 'a' to the last
res <- sapply(split(1:nrow(x), rownames(x)),
              function(i) colMeans(x[i, , drop=F], na.rm = TRUE))
res <- t(res) # transpose is needed since sapply() will form the resulting matrix by columns
res  # still a matrix, rows are ordered
#  [,1] [,2] [,3] [,4] [,5] [,6]
# a 10.0 20.0 30.0 40.0 50.0 60.0
# b  1.5 12.0 22.0 31.5 41.5 51.5
# c  4.0 14.0 24.0 34.0 44.0 54.0
# d  7.5 17.5 27.5 37.5 47.5 57.5
table(rownames(x))
# a b c d
# 1 2 3 4


==== An example from Bioconductor workshop ====
aggregate(x, list(rownames(x)), FUN=mean, na.rm = T) # EASY, but it becomes a data frame, rows are ordered
* What is covered in this section is different from [[R#Create_a_standalone_Rmath_library|Create and use a standalone Rmath library]].
#  Group.1  V1  V2  V3  V4  V5  V6
* Use eval() function. See R-Ext [http://cran.r-project.org/doc/manuals/R-exts.html#Embedding-R-under-Unix_002dalikes 8.1] and [http://cran.r-project.org/doc/manuals/R-exts.html#Embedding-R-under-Windows 8.2] and [http://cran.r-project.org/doc/manuals/R-exts.html#Evaluating-R-expressions-from-C 5.11].
# 1      a 10.0 20.0 30.0 40.0 50.0 60.0
* http://stackoverflow.com/questions/2463437/r-from-c-simplest-possible-helloworld (obtained from searching R_tryEval on google)
# 2      b  1.5 12.0 22.0 31.5 41.5 51.5
* http://stackoverflow.com/questions/7457635/calling-r-function-from-c
# 3      c  4.0 14.0 24.0 34.0 44.0 54.0
# 4      d  7.5 17.5 27.5 37.5 47.5 57.5
</pre>
<li>[[Arraytools#Reducing_multiple_probes.2Fprobe_sets_to_one_per_gene_symbol|Reduce multiple probes by the maximally expressed probe (set) measured by average intensity across arrays]]


Example:
</li>
Create <embed.c> file
<li>[https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/rowsum rowsum(x, group, reorder = TRUE, …)]. Sum over rows. It returns a matrix. This is very special. It's not the same as rowSums. There is no "colsum" function. ''It has the speed advantage over sapply+colSums OR aggregate.''
<pre>
{{Pre}}
#include <Rembedded.h>
group <- rownames(x)
#include <Rdefines.h>
rowsum(x, group, na.rm=T)/as.vector(table(group))
 
#  [,1] [,2] [,3] [,4] [,5] [,6]
static void doSplinesExample();
# a 10.0 20.0 30.0 40.0 50.0 60.0
int
# b  1.5  6.0 11.0 31.5 41.5 51.5
main(int argc, char *argv[])
# c  4.0 14.0 24.0 34.0 44.0 54.0
{
# d  7.5 17.5 27.5 37.5 47.5 57.5
    Rf_initEmbeddedR(argc, argv);
    doSplinesExample();
    Rf_endEmbeddedR(0);
    return 0;
}
static void
doSplinesExample()
{
    SEXP e, result;
    int errorOccurred;
 
    // create and evaluate 'library(splines)'
    PROTECT(e = lang2(install("library"), mkString("splines")));
    R_tryEval(e, R_GlobalEnv, &errorOccurred);
    if (errorOccurred) {
        // handle error
    }
    UNPROTECT(1);
 
    // 'options(FALSE)' ...
    PROTECT(e = lang2(install("options"), ScalarLogical(0)));
    // ... modified to 'options(example.ask=FALSE)' (this is obscure)
    SET_TAG(CDR(e), install("example.ask"));
    R_tryEval(e, R_GlobalEnv, NULL);
    UNPROTECT(1);
 
    // 'example("ns")'
    PROTECT(e = lang2(install("example"), mkString("ns")));
    R_tryEval(e, R_GlobalEnv, &errorOccurred);
    UNPROTECT(1);
}
</pre>
</pre>
Then build the executable. Note that I don't need to create R_HOME variable.
</li>
<pre>
</ul>
cd
* [https://stackoverflow.com/questions/25198442/how-to-calculate-mean-median-per-group-in-a-dataframe-in-r How to calculate mean/median per group in a dataframe in r] where '''doBy''' and '''dplyr''' are recommended.
tar xzvf
* [https://cran.r-project.org/web/packages/matrixStats/index.html matrixStats]: Functions that Apply to Rows and Columns of Matrices (and to Vectors)
cd R-3.0.1
* [https://cran.r-project.org/web/packages/doBy/ doBy] package
./configure --enable-R-shlib
* [http://stackoverflow.com/questions/7881660/finding-the-mean-of-all-duplicates use ave() and unique()]
make
* [http://stackoverflow.com/questions/17383635/average-between-duplicated-rows-in-r data.table package]
cd tests/Embedding
* [http://stackoverflow.com/questions/10180132/consolidate-duplicate-rows plyr package]
make
<ul>
~/R-3.0.1/bin/R CMD ./Rtest
<li>'''by()''' function. [https://thomasadventure.blog/posts/calculating-change-from-baseline-in-r/ Calculating change from baseline in R]
</li>
<li>See [https://finnstats.com/index.php/2021/06/20/aggregate-function-in-r/ '''aggregate''' Function in R- A powerful tool for data frames] & [https://finnstats.com/index.php/2021/06/01/summarize-in-r-data-summarization-in-r/ summarize in r, Data Summarization In R] </li>
<li>[http://www.statmethods.net/management/aggregate.html aggregate()] function. Too slow! http://slowkow.com/2015/01/28/data-table-aggregate/. [http://www.win-vector.com/blog/2015/10/dont-use-statsaggregate/ Don't use aggregate] post.
{{Pre}}
> attach(mtcars)
dim(mtcars)
[1] 32 11
> head(mtcars)
                  mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4        21.0  6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag    21.0  6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8  4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4  6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7  8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant          18.1  6  225 105 2.76 3.460 20.22  1  0    3    1
> with(mtcars, table(cyl, vs))
  vs
cyl  0  1
  4  1 10
  6  3  4
  8 14  0
> aggdata <-aggregate(mtcars, by=list(cyl,vs),  FUN=mean, na.rm=TRUE)
> print(aggdata)
  Group.1 Group.2      mpg cyl  disp      hp    drat      wt    qsec vs
1      4      0 26.00000  4 120.30  91.0000 4.430000 2.140000 16.70000  0
2      6      0 20.56667  6 155.00 131.6667 3.806667 2.755000 16.32667  0
3      8      0 15.10000  8 353.10 209.2143 3.229286 3.999214 16.77214  0
4      4      1 26.73000  4 103.62  81.8000 4.035000 2.300300 19.38100  1
5      6      1 19.12500  6 204.55 115.2500 3.420000 3.388750 19.21500  1
        am    gear    carb
1 1.0000000 5.000000 2.000000
2 1.0000000 4.333333 4.666667
3 0.1428571 3.285714 3.500000
4 0.7000000 4.000000 1.500000
5 0.0000000 3.500000 2.500000
> detach(mtcars)


nano embed.c
# Another example: select rows with a minimum value from a certain column (yval in this case)
# Using a single line will give an error and cannot not show the real problem.
> mydf <- read.table(header=T, text='
# ../../bin/R CMD gcc -I../../include -L../../lib -lR embed.c
id xval yval
# A better way is to run compile and link separately
A 1  1
gcc -I../../include -c embed.c
A -2  2
gcc -o embed embed.o -L../../lib -lR -lRblas
B 3  3
../../bin/R CMD ./embed
B 4  4
C 5  5
')
> x = mydf$xval
> y = mydf$yval
> aggregate(mydf[, c(2,3)], by=list(id=mydf$id), FUN=function(x) x[which.min(y)])
  id xval yval
1  A    1    1
2  B    3    3
3  C    5    5
</pre>
</pre>
</li>
</ul>


Note that if we want to call the executable file ./embed directly, we shall set up R environment by specifying '''R_HOME''' variable and including the directories used in linking R in '''LD_LIBRARY_PATH'''. This is based on the inform provided by [http://cran.r-project.org/doc/manuals/r-devel/R-exts.html Writing R Extensions].
=== Mean by Group ===
[https://statisticsglobe.com/mean-by-group-in-r Mean by Group in R (2 Examples) | dplyr Package vs. Base R]
<pre>
aggregate(x = iris$Sepal.Length,                # Specify data column
          by = list(iris$Species),              # Specify group indicator
          FUN = mean)                          # Specify function (i.e. mean)
</pre>
<pre>
<pre>
export R_HOME=/home/brb/Downloads/R-3.0.2
library(dplyr)
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/brb/Downloads/R-3.0.2/lib
iris %>%                                        # Specify data frame
./embed # No need to include R CMD in front.
  group_by(Species) %>%                        # Specify group indicator
  summarise_at(vars(Sepal.Length),              # Specify column
              list(name = mean))              # Specify function
</pre>
</pre>
* [https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/ave ave(x, ..., FUN)],
* aggregate(x, by, FUN),
* by(x, INDICES, FUN): return is a list
* tapply(): return results as a matrix or array. Useful for [https://en.wikipedia.org/wiki/Jagged_array ragged array].
== Apply family ==
Vectorize, aggregate, apply, by, eapply, lapply, mapply, rapply, replicate, scale, sapply, split, tapply, and vapply.
The following list gives a hierarchical relationship among these functions.
* '''apply'''(X, MARGIN, FUN, ...) – Apply a Functions Over Array Margins
* '''lapply'''(X, FUN, ...) – Apply a Function over a List (including a data frame) or Vector X.
** '''sapply'''(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE) – Apply a Function over a List or Vector
*** '''replicate'''(n, expr, simplify = "array")
** '''mapply'''(FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE) – Multivariate version of sapply
*** '''Vectorize'''(FUN, vectorize.args = arg.names, SIMPLIFY = TRUE, USE.NAMES = TRUE) - Vectorize a Scalar Function
*** '''Map'''(FUN, ...) A wrapper to mapply with SIMPLIFY = FALSE, so it is guaranteed to return a list.
** '''vapply'''(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE) – similar to sapply, but has a pre-specified type of return value
** '''rapply'''(object, f, classes = "ANY", deflt = NULL, how = c("unlist", "replace", "list"), ...) – A recursive version of lapply
* '''tapply'''(V, INDEX, FUN = NULL, ..., default = NA, simplify = TRUE) – Apply a Function Over a [https://en.wikipedia.org/wiki/Jagged_array "Ragged" Array]. V is typically a vector where split() will be applied. INDEX is a list of one or more factors.
** '''aggregate'''(D, by, FUN, ..., simplify = TRUE, drop = TRUE) - Apply a function to each '''columns''' of subset data frame split by factors. FUN (such as mean(), weighted.mean(), sum()) is a simple function applied to a vector. D is typically a data frame. This is used to '''summarize''' data.
** '''by'''(D, INDICES, FUN, ..., simplify = TRUE) - Apply a Function to each '''subset data frame''' split by factors. FUN (such as summary(), lm()) is applied to a data frame. D is typically a data frame.
* '''eapply'''(env, FUN, ..., all.names = FALSE, USE.NAMES = TRUE) – Apply a Function over values in an environment


Question: Create a data frame in C? Answer: [https://stat.ethz.ch/pipermail/r-devel/2013-August/067107.html Use data.frame() via an eval() call from C]. Or see the code is stats/src/model.c, as part of model.frame.default. Or using Rcpp as [https://stat.ethz.ch/pipermail/r-devel/2013-August/067109.html here].
[https://www.queryhome.com/tech/76799/r-difference-between-apply-vs-sapply-vs-lapply-vs-tapply Difference between apply vs sapply vs lapply vs tapply?]
* apply - When you want to apply a function to the rows or columns or both of a matrix and output is a one-dimensional if only row or column is selected else it is a 2D-matrix
* lapply - When you want to apply a function to each element of a list in turn and get a list back.
* sapply - When you want to apply a function to each element of a list in turn, but you want a vector back, rather than a list.
* tapply - When you want to apply a function to subsets of a vector and the subsets are defined by some other vector, usually a factor.
 
Some short examples:
* [http://people.stern.nyu.edu/ylin/r_apply_family.html stern.nyu.edu].
* [http://www.datasciencemadesimple.com/apply-function-r/ Apply Function in R – apply vs lapply vs sapply vs mapply vs tapply vs rapply vs vapply] from datasciencemadesimple.com.
* [https://stackoverflow.com/a/7141669 How to use which one (apply family) when?]
 
=== Apply vs for loop ===
Note that, apply's performance is not always better than a for loop. See
* http://tolstoy.newcastle.edu.au/R/help/06/05/27255.html (answered by Brian Ripley)
* https://stat.ethz.ch/pipermail/r-help/2014-October/422455.html (has one example)
* [https://johanndejong.wordpress.com/2016/07/07/r-are-apply-loops-faster-than-for-loops/ R: are *apply loops faster than for loops?]. The author said '' 'an important reason for using *apply() functions may instead be that they fit the functional programming paradigm better, where everything is done using function calls and side effects are reduced'... The scope of the variables defined within f is limited to f, and variables defined outside f cannot be modified inside f (except using the special scoping assignment operator <<-).  ''
** [http://adv-r.had.co.nz/Functional-programming.html Functional programming]
* [https://privefl.github.io/blog/why-loops-are-slow-in-r/ Why loops are slow in R]
* [https://stackoverflow.com/a/18763102 Why is `unlist(lapply)` faster than `sapply`?]
 
=== Progress bar ===
[http://peter.solymos.org/code/2016/09/11/what-is-the-cost-of-a-progress-bar-in-r.html What is the cost of a progress bar in R?]


Reference http://bioconductor.org/help/course-materials/2012/Seattle-Oct-2012/AdvancedR.pdf
The package 'pbapply' creates a text-mode progress bar - it works on any platforms. On Windows platform, check out [http://www.theanalystatlarge.com/for-loop-tracking-windows-progress-bar/ this post]. It uses  winProgressBar() and setWinProgressBar() functions.


==== Create a Simple Socket Server in R ====
[https://www.jottr.org/2020/07/04/progressr-erum2020-slides/ e-Rum 2020 Slides on Progressr] by Henrik Bengtsson. [https://www.jottr.org/2021/06/11/progressr-0.8.0/ progressr 0.8.0: RStudio's progress bar, Shiny progress updates, and absolute progress], [https://www.r-bloggers.com/2022/06/progressr-0-10-1-plyr-now-supports-progress-updates-also-in-parallel/ progressr 0.10.1: Plyr Now Supports Progress Updates also in Parallel]
This example is coming from this [http://epub.ub.uni-muenchen.de/2085/1/tr012.pdf paper].


Create an R function
=== simplify option in sapply() ===
<pre>
<pre>
simpleServer <- function(port=6543)
library(KEGGREST)
{
 
  sock <- socketConnection ( port=port , server=TRUE)
names1 <- keggGet(c("hsa05340", "hsa05410"))
  on.exit(close( sock ))
names2 <- sapply(names1, function(x) x$GENE)
  cat("\nWelcome to R!\nR>" ,file=sock )
length(names2)  # same if we use lapply() above
  while(( line <- readLines ( sock , n=1)) != "quit")
# [1] 2
  {
 
    cat(paste("socket >" , line , "\n"))
names3 <- keggGet(c("hsa05340"))
    out<- capture.output (try(eval(parse(text=line ))))
names4 <- sapply(names3, function(x) x$GENE)
    writeLines ( out , con=sock )
length(names4)  # may or may not be what we expect
    cat("\nR> " ,file =sock )
# [1] 76
  }
names4 <- sapply(names3, function(x) x$GENE, simplify = FALSE)
}
length(names4)  # same if we use lapply() w/o simplify
# [1] 1
</pre>
 
=== lapply and its friends Map(), Reduce(), Filter() from the base package for manipulating lists ===
* Examples of using lapply() + split() on a data frame. See [http://rollingyours.wordpress.com/category/r-programming-apply-lapply-tapply/ rollingyours.wordpress.com].
<ul>
<li>mapply() [https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/mapply documentation]. [https://stackoverflow.com/questions/9519543/merge-two-lists-in-r Use mapply() to merge lists].
<pre>
mapply(rep, 1:4, 4:1)
mapply(rep, times = 1:4, x = 4:1)
mapply(function(x, y) seq_len(x) + y,
      c(a = 1, b = 2, c = 3),  # names from first
      c(A = 10, B = 0, C = -10))
mapply(c, firstList, secondList, SIMPLIFY=FALSE)
</pre>
</pre>
Then run simpleServer(). Open another terminal and try to communicate with the server
</li>
<li>[https://bensstats.wordpress.com/2020/10/06/robservations-3-finding-the-expected-value-of-the-maximum-of-two-bivariate-normal-variables-with-simulation/ Finding the Expected value of the maximum of two Bivariate Normal variables with simulation] sapply + mapply.
<pre>
<pre>
$ telnet localhost 6543
z <- mapply(function(u, v) { max(u, v) },
Trying 127.0.0.1...
            u = x[, 1], v = x[, 2])
Connected to localhost.
</pre>
Escape character is '^]'.
</li>
<li>[http://www.brodrigues.co/functional_programming_and_unit_testing_for_data_munging/fprog.html Map() and Reduce()] in functional programming </li>
<li>Map(), Reduce(), and Filter() from [http://adv-r.had.co.nz/Functionals.html#functionals-fp Advanced R] by Hadley
<ul>
<li>If you have two or more lists (or data frames) that you need to process in <span style="color: red">parallel</span>, use '''Map()'''. One good example is to compute the weighted.mean() function that requires two input objects. Map() is similar to '''mapply()''' function and is more concise than '''lapply()'''. [http://adv-r.had.co.nz/Functionals.html#functionals-loop Advanced R] has a comment that Map() is better than mapply().
{{Pre}}
# Syntax: Map(f, ...)


Welcome to R!
xs <- replicate(5, runif(10), simplify = FALSE)
R> summary(iris[, 3:5])
ws <- replicate(5, rpois(10, 5) + 1, simplify = FALSE)
  Petal.Length    Petal.Width          Species 
Map(weighted.mean, xs, ws)
Min.  :1.000  Min.  :0.100  setosa    :50 
1st Qu.:1.600  1st Qu.:0.300  versicolor:50 
Median :4.350  Median :1.300  virginica :50 
Mean  :3.758  Mean  :1.199                 
3rd Qu.:5.100  3rd Qu.:1.800                 
Max.  :6.900  Max.  :2.500                 


R> quit
# instead of a more clumsy way
Connection closed by foreign host.
lapply(seq_along(xs), function(i) {
  weighted.mean(xs[[i]], ws[[i]])
})
</pre>
</pre>
</li>
<li>Reduce() reduces a vector, x, to a single value by <span style="color: red">recursively</span> calling a function, f, two arguments at a time. A good example of using '''Reduce()''' function is to read a list of matrix files and merge them. See [https://stackoverflow.com/questions/29820029/how-to-combine-multiple-matrix-frames-into-one-using-r How to combine multiple matrix frames into one using R?]
{{Pre}}
# Syntax: Reduce(f, x, ...)


==== [http://www.rforge.net/Rserve/doc.html Rserve] ====
> m1 <- data.frame(id=letters[1:4], val=1:4)
Note the way of launching Rserve is like the way we launch C program when R was embedded in C. See [[R#Call_R_from_C.2FC.2B.2B|Call R from C/C++]] or [[R#An_Example_from_Bioconductor_Workshop|Example from Bioconductor workshop]].
> m2 <- data.frame(id=letters[2:6], val=2:6)
> merge(m1, m2, "id", all = T)
  id val.x val.y
1  a    1    NA
2  b    2    2
3  c    3    3
4  d    4    4
5  e    NA    5
6  f    NA    6
> m <- list(m1, m2)
> Reduce(function(x,y) merge(x,y, "id",all=T), m)
  id val.x val.y
1  a    1    NA
2  b    2    2
3  c    3    3
4  d    4    4
5  e    NA    5
6  f    NA    6
</pre>
</li>
</ul>
</li>
</ul>
* [https://statcompute.wordpress.com/2018/09/08/playing-map-and-reduce-in-r-subsetting/ Playing Map() and Reduce() in R – Subsetting] - using parallel and future packages. [https://statcompute.wordpress.com/2018/09/22/union-multiple-data-frames-with-different-column-names/ Union Multiple Data.Frames with Different Column Names]
 
=== sapply & vapply ===
* [http://stackoverflow.com/questions/12339650/why-is-vapply-safer-than-sapply This] discusses why '''vapply''' is safer and faster than sapply.
* [http://adv-r.had.co.nz/Functionals.html#functionals-loop Vector output: sapply and vapply] from Advanced R (Hadley Wickham).
* [http://theautomatic.net/2018/11/13/those-other-apply-functions/ THOSE “OTHER” APPLY FUNCTIONS…]. rapply(), vapply() and eapply() are covered.
* [http://theautomatic.net/2019/03/13/speed-test-sapply-vs-vectorization/ Speed test: sapply vs. vectorization]
* sapply can be used in plotting; for example, [https://cran.r-project.org/web/packages/glmnet/vignettes/relax.pdf#page=13 glmnet relax vignette] uses '''sapply(myList, lines, col="grey") ''' to draw multiple lines simultaneously on a list of matrices.
 
See parallel::parSapply() for a parallel version of sapply(1:n, function(x)). We can this technique to speed up [https://github.com/SRTRdevhub/C_Statistic_Github/blob/master/Simulation_Demonstration.Rmd#L115 this example].
 
=== rapply - recursive version of lapply ===
* http://4dpiecharts.com/tag/recursive/
* [https://github.com/wch/r-source/search?utf8=%E2%9C%93&q=rapply Search in R source code]. Mainly [https://github.com/wch/r-source/blob/trunk/src/library/stats/R/dendrogram.R r-source/src/library/stats/R/dendrogram.R].


See my [[Rserve]] page.
=== replicate ===
https://www.datacamp.com/community/tutorials/tutorial-on-loops-in-r
{{Pre}}
> replicate(5, rnorm(3))
          [,1]      [,2]      [,3]      [,4]        [,5]
[1,]  0.2509130 -0.3526600 -0.3170790  1.064816 -0.53708856
[2,]  0.5222548  1.5343319  0.6120194 -1.811913 -1.09352459
[3,] -1.9905533 -0.8902026 -0.5489822  1.308273  0.08773477
</pre>
 
See [[#parallel_package|parSapply()]] for a parallel version of replicate().


==== (Commercial) [http://www.statconn.com/ StatconnDcom] ====
=== Vectorize ===
* [https://www.rdocumentation.org/packages/base/versions/3.5.3/topics/Vectorize Vectorize(FUN, vectorize.args = arg.names, SIMPLIFY = TRUE, USE.NAMES = TRUE)]: creates a function wrapper that vectorizes a scalar function. Its value is a list or vector or array. It calls '''mapply()'''.
{{Pre}}
> rep(1:4, 4:1)
[1] 1 1 1 1 2 2 2 3 3 4
> vrep <- Vectorize(rep.int)
> vrep(1:4, 4:1)
[[1]]
[1] 1 1 1 1


==== [http://rdotnet.codeplex.com/ R.NET] ====
[[2]]
[1] 2 2 2


==== [https://cran.r-project.org/web/packages/rJava/index.html rJava] ====
[[3]]
* [https://jozefhajnala.gitlab.io/r/r901-primer-java-from-r-1/ A primer in using Java from R - part 1]
[1] 3 3
* Note rJava is needed by [https://cran.r-project.org/web/packages/xlsx/index.html xlsx] package.


Terminal
[[4]]
<syntaxhighlight lang='bash'>
[1] 4
# jdk 7
</pre>
sudo apt-get install openjdk-7-*
* [http://biolitika.si/vectorizing-functions-in-r-is-easy.html Vectorizing functions in R is easy]
update-alternatives --config java
{{Pre}}
# oracle jdk 8
> rweibull(1, 1, c(1, 2)) # no error but not sure what it gives?
sudo add-apt-repository -y ppa:webupd8team/java
[1] 2.17123
sudo apt-get update
> Vectorize("rweibull")(n=1, shape = 1, scale = c(1, 2))
echo debconf shared/accepted-oracle-license-v1-1 select true | sudo debconf-set-selections
[1] 1.6491761 0.9610109
echo debconf shared/accepted-oracle-license-v1-1 seen true | sudo debconf-set-selections
sudo apt-get -y install openjdk-8-jdk
</syntaxhighlight>
and then run the following (thanks to http://stackoverflow.com/questions/12872699/error-unable-to-load-installed-packages-just-now) to fix an error: libjvm.so: cannot open shared object file: No such file or directory.
* Create the file '''/etc/ld.so.conf.d/java.conf''' with the following entries:
<pre>
/usr/lib/jvm/java-8-oracle/jre/lib/amd64
/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server
</pre>
</pre>
* And then run '''sudo ldconfig'''
* https://blogs.msdn.microsoft.com/gpalem/2013/03/28/make-vectorize-your-friend-in-r/ 
{{Pre}}
myfunc <- function(a, b) a*b
myfunc(1, 2) # 2
myfunc(3, 5) # 15
myfunc(c(1,3), c(2,5)) # 2 15
Vectorize(myfunc)(c(1,3), c(2,5)) # 2 15


Now go back to R
myfunc2 <- function(a, b) if (length(a) == 1) a * b else NA
<syntaxhighlight lang='rsplus'>
myfunc2(1, 2) # 2
install.packages("rJava")
myfunc2(3, 5) # 15
</syntaxhighlight>
myfunc2(c(1,3), c(2,5)) # NA
Done!
Vectorize(myfunc2)(c(1, 3), c(2, 5)) # 2 15
 
Vectorize(myfunc2)(c(1, 3, 6), c(2, 5)) # 2 15 12
If above does not work, a simple way is by (under Ubuntu) running
                                        # parameter will be re-used
<pre>
sudo apt-get install r-cran-rjava
</pre>
</pre>
which will create new package 'default-jre' (under '''/usr/lib/jvm''') and 'default-jre-headless'.


==== RCaller ====
== plyr and dplyr packages ==
[https://peerj.com/collections/50-practicaldatascistats/ Practical Data Science for Stats - a PeerJ Collection]


==== RApache ====
[http://www.jstatsoft.org/v40/i01/paper The Split-Apply-Combine Strategy for Data Analysis] (plyr package) in J. Stat Software.
* http://www.stat.ucla.edu/~jeroen/files/seminar.pdf


==== [http://dirk.eddelbuettel.com/code/littler.html littler] ====
[http://seananderson.ca/courses/12-plyr/plyr_2012.pdf A quick introduction to plyr] with a summary of apply functions in R and compare them with functions in plyr package.
Provides hash-bang (#!) capability for R


[http://stackoverflow.com/questions/3205302/difference-between-rscript-and-littler Difference between Rscript and littler]
# plyr has a common syntax -- easier to remember
# plyr requires less code since it takes care of the input and output format
# plyr can easily be run in parallel -- faster


We can install littler using two ways.
Tutorials
* install.packages("littler"). This will install the latest version but the binary 'r' program is only available under the package/bin directory (eg ''~/R/x86_64-pc-linux-gnu-library/3.4/littler/bin/r''). You need to create a soft link in order to access it globally.
* [http://dplyr.tidyverse.org/articles/dplyr.html Introduction to dplyr] from http://dplyr.tidyverse.org/.
* sudo apt install littler. This will install 'r' globally; however, the installed version may be old.
* A video of [http://cran.r-project.org/web/packages/dplyr/index.html dplyr] package can be found on [http://vimeo.com/103872918 vimeo].
* [http://www.dataschool.io/dplyr-tutorial-for-faster-data-manipulation-in-r/ Hands-on dplyr tutorial for faster data manipulation in R] from dataschool.io.
 
Examples of using dplyr:
* [http://wiekvoet.blogspot.com/2015/03/medicines-under-evaluation.html Medicines under evaluation]
* [http://rpubs.com/seandavi/GEOMetadbSurvey2014 CBI GEO Metadata Survey]
* [http://datascienceplus.com/r-for-publication-by-page-piccinini-lesson-3-logistic-regression/ Logistic Regression] by Page Piccinini. mutate(), inner_join() and %>%.  
* [http://rpubs.com/turnersd/plot-deseq-results-multipage-pdf DESeq2 post analysis] select(), gather(), arrange() and %>%.  


After the installation, vignette contains several examples. The off-line vignette has a table of contents. Nice! The [http://dirk.eddelbuettel.com/code/littler.examples.html web version of examples] does not have the TOC.
=== [https://cran.r-project.org/web/packages/tibble/ tibble] ===
[https://www.r-bloggers.com/2024/08/tidy-dataframes-but-not-tibbles/ Tidy DataFrames but not Tibbles]


'''r''' was not meant to run interactively like '''R'''. See ''man r''.
Tibble objects
* it does not have row names (cf data frame),
* it never changes the type of the inputs (e.g. it never converts strings to factors!),
* it never changes the names of variables


==== RInside: Embed R in C++ ====
To show all rows or columns of a tibble object,
See [[R#RInside|RInside]]
<pre>
print(tbObj, n= Inf)


(''From RInside documentation'') The RInside package makes it easier to embed R in your C++ applications. There is no code you would execute directly from the R environment. Rather, you write C++ programs that embed R which is illustrated by some the included examples.
print(tbObj, width = Inf)
</pre>


The included examples are armadillo, eigen, mpi, qt, standard, threads and wt.
If we try to do a match on some column of a tibble object, we will get zero matches. The issue is we cannot use an index to get a tibble column.


To run 'make' when we don't have a global R, we should modify the file <Makefile>. Also if we just want to create one executable file, we can do, for example, 'make rinside_sample1'.
'''Subsetting''': to [https://stackoverflow.com/questions/21618423/extract-a-dplyr-tbl-column-as-a-vector extract a column from a tibble object], use '''[[''' or '''$''' or dplyr::pull(). [https://www.datanovia.com/en/lessons/select-data-frame-columns-in-r/ Select Data Frame Columns in R].  
{{Pre}}
TibbleObject$VarName
# OR
TibbleObject[["VarName"]]
# OR
pull(TibbleObject, VarName) # won't be a tibble object anymore


To run any executable program, we need to specify '''LD_LIBRARY_PATH''' variable, something like
# For multiple columns, use select()
<pre>export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/brb/Downloads/R-3.0.2/lib </pre>
dplyr::select(TibbleObject, -c(VarName1, VarName2)) # still a tibble object
# OR
dplyr::select(TibbleObject, 2:5) #
</pre>


The real build process looks like (check <Makefile> for completeness)
'''Convert a data frame to a tibble''' See [http://www.sthda.com/english/wiki/tibble-data-format-in-r-best-and-modern-way-to-work-with-your-data Tibble Data Format in R: Best and Modern Way to Work with Your Data]
<pre>
<pre>
g++ -I/home/brb/Downloads/R-3.0.2/include \
my_data <- as_tibble(iris)
    -I/home/brb/Downloads/R-3.0.2/library/Rcpp/include \
class(my_data)
    -I/home/brb/Downloads/R-3.0.2/library/RInside/include -g -O2 -Wall \
</pre>
    -I/usr/local/include  \
 
    rinside_sample0.cpp  \
=== llply() ===
    -L/home/brb/Downloads/R-3.0.2/lib -lR  -lRblas -lRlapack \
llply is equivalent to lapply except that it will preserve labels and can display a progress bar. This is handy if we want to do a crazy thing.
    -L/home/brb/Downloads/R-3.0.2/library/Rcpp/lib -lRcpp \
<pre>
    -Wl,-rpath,/home/brb/Downloads/R-3.0.2/library/Rcpp/lib \
LLID2GOIDs <- lapply(rLLID, function(x) get("org.Hs.egGO")[[x]])
    -L/home/brb/Downloads/R-3.0.2/library/RInside/lib -lRInside \
    -Wl,-rpath,/home/brb/Downloads/R-3.0.2/library/RInside/lib \
    -o rinside_sample0
</pre>
</pre>
 
where rLLID is a list of entrez ID. For example,
Hello World example of embedding R in C++.
<pre>
<pre>
#include <RInside.h>                    // for the embedded R via RInside
get("org.Hs.egGO")[["6772"]]
</pre>
returns a list of 49 GOs.


int main(int argc, char *argv[]) {
=== ddply() ===
http://lamages.blogspot.com/2012/06/transforming-subsets-of-data-in-r-with.html


    RInside R(argc, argv);              // create an embedded R instance
=== ldply() ===
[http://rpsychologist.com/an-r-script-to-automatically-look-at-pubmed-citation-counts-by-year-of-publication/ An R Script to Automatically download PubMed Citation Counts By Year of Publication]


    R["txt"] = "Hello, world!\n"; // assign a char* (string) to 'txt'
=== Performance/speed comparison ===
[https://www.r-bloggers.com/2023/01/performance-comparison-of-converting-list-to-data-frame-with-r-language/ Performance comparison of converting list to data.frame with R language]


    R.parseEvalQ("cat(txt)");          // eval the init string, ignoring any returns
== Using R's set.seed() to set seeds for use in C/C++ (including Rcpp) ==
http://rorynolan.rbind.io/2018/09/30/rcsetseed/


    exit(0);
=== get_seed() ===
See the same blog
{{Pre}}
get_seed <- function() {
  sample.int(.Machine$integer.max, 1)
}
}
</pre>
</pre>
Note: .Machine$integer.max = 2147483647 = 2^31 - 1.


The above can be compared to the Hello world example in Qt.
=== Random seeds ===
By default, R uses the exact time in milliseconds of the computer's clock when R starts up to generate a seed. See [https://stat.ethz.ch/R-manual/R-patched/library/base/html/Random.html ?Random].  
<pre>
<pre>
#include <QApplication.h>
set.seed(as.numeric(Sys.time()))
#include <QPushButton.h>


int main( int argc, char **argv )
set.seed(as.numeric(Sys.Date()))  # same seed for each day
{
</pre>
    QApplication app( argc, argv );


    QPushButton hello( "Hello world!", 0 );
=== .Machine and the largest integer, double ===
    hello.resize( 100, 30 );
See [https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/.Machine ?.Machine].
{{Pre}}
                          Linux/Mac  32-bit Windows 64-bit Windows
double.eps              2.220446e-16  2.220446e-16  2.220446e-16
double.neg.eps          1.110223e-16  1.110223e-16  1.110223e-16
double.xmin            2.225074e-308  2.225074e-308  2.225074e-308
double.xmax            1.797693e+308  1.797693e+308  1.797693e+308
double.base            2.000000e+00  2.000000e+00  2.000000e+00
double.digits          5.300000e+01  5.300000e+01  5.300000e+01
double.rounding        5.000000e+00  5.000000e+00  5.000000e+00
double.guard            0.000000e+00  0.000000e+00  0.000000e+00
double.ulp.digits      -5.200000e+01  -5.200000e+01  -5.200000e+01
double.neg.ulp.digits  -5.300000e+01  -5.300000e+01  -5.300000e+01
double.exponent        1.100000e+01  1.100000e+01  1.100000e+01
double.min.exp        -1.022000e+03  -1.022000e+03  -1.022000e+03
double.max.exp          1.024000e+03  1.024000e+03  1.024000e+03
integer.max            2.147484e+09  2.147484e+09  2.147484e+09
sizeof.long            8.000000e+00  4.000000e+00  4.000000e+00
sizeof.longlong        8.000000e+00  8.000000e+00  8.000000e+00
sizeof.longdouble      1.600000e+01  1.200000e+01  1.600000e+01
sizeof.pointer          8.000000e+00  4.000000e+00  8.000000e+00
</pre>


    app.setMainWidget( &hello );
=== NA when overflow ===
    hello.show();
<pre>
 
tmp <- 156287L
    return app.exec();
tmp*tmp
}
# [1] NA
# Warning message:
# In tmp * tmp : NAs produced by integer overflow
.Machine$integer.max
# [1] 2147483647
</pre>
</pre>


==== [http://www.rfortran.org/ RFortran] ====
== How to select a seed for simulation or randomization ==
RFortran is an open source project with the following aim:
* [https://sciprincess.wordpress.com/2019/03/14/how-to-select-a-seed-for-simulation-or-randomization/ How to select a seed for simulation or randomization]
* [https://www.makeuseof.com/tag/lesson-gamers-rng/ What Is RNG? A Lesson for Gamers ]


''To provide an easy to use Fortran software library that enables Fortran programs to transfer data and commands to and from R.''
== set.seed() allow alphanumeric seeds ==
https://stackoverflow.com/a/10913336


It works only on Windows platform with Microsoft Visual Studio installed:(
== set.seed(), for loop and saving random seeds ==
<ul>
<li>[https://www.jottr.org/2020/09/21/detect-when-the-random-number-generator-was-used/ Detect When the Random Number Generator Was Used]
<pre>
if (interactive()) {
  invisible(addTaskCallback(local({
    last <- .GlobalEnv$.Random.seed
   
    function(...) {
      curr <- .GlobalEnv$.Random.seed
      if (!identical(curr, last)) {
        msg <- "NOTE: .Random.seed changed"
        if (requireNamespace("crayon", quietly=TRUE)) msg <- crayon::blurred(msg)
        message(msg)
        last <<- curr
      }
      TRUE
    }
  }), name = "RNG tracker"))
}
</pre>
</li>
<li>http://r.789695.n4.nabble.com/set-seed-and-for-loop-td3585857.html. This question is legitimate when we want to debug on a certain iteration.
<pre>
set.seed(1001)
data <- vector("list", 30)
seeds <- vector("list", 30)
for(i in 1:30) {
  seeds[[i]] <- .Random.seed
  data[[i]] <- runif(5)
}
# If we save and load .Random.seed from a file using scan(), make
# sure to convert its type from doubles to integers.
# Otherwise, .Random.seed will complain!


=== Call R from other languages ===
.Random.seed <- seeds[[23]]  # restore
==== JRI ====
data.23 <- runif(5)
http://www.rforge.net/JRI/
data.23
data[[23]]
</pre>
</li>
</ul>
* [https://www.rdocumentation.org/packages/impute/versions/1.46.0/topics/impute.knn impute.knn]
* Duncan Murdoch: ''This works in this example, but wouldn't work with all RNGs, because some of them save state outside of .Random.seed.  See ?.Random.seed for details.''
* Uwe Ligges's comment: ''set.seed() actually generates a seed. See ?set.seed that points us to .Random.seed (and relevant references!) which contains the actual current seed.''
* Petr Savicky's comment is also useful in the situation when it is not difficult to re-generate the data.
* [http://www.questionflow.org/2019/08/13/local-randomness-in-r/ Local randomness in R].


==== ryp2 ====
== sample() ==
http://rpy.sourceforge.net/rpy2.html
=== sample() inaccurate on very large populations, fixed in R 3.6.0 ===
* [https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17494 The default method for generating from a discrete uniform distribution (used in ‘sample()’, for instance) has been changed]. In prior versions, the probability of generating each integer could vary from equal by up to 0.04% (or possibly more if generating more than a million different integers). See also [https://www.r-bloggers.com/whats-new-in-r-3-6-0/amp/ What's new in R 3.6.0] by David Smith.
{{Pre}}
# R 3.5.3
set.seed(123)
m <- (2/5)*2^32
m > 2^31
# [1] FALSE
log10(m)
# [1] 9.23502
x <- sample(m, 1000000, replace = TRUE)
table(x %% 2)
#      0      1
# 400070 599930
</pre>
* [https://blog.daqana.com/en/fast-sampling-support-in-dqrng/ Fast sampling support in dqrng]
* Differences of the output of sample()
{{Pre}}
# R 3.5.3
# docker run --net=host -it --rm r-base:3.5.3
> set.seed(1234)
> sample(5)
[1] 1 3 2 4 5


=== Create a standalone Rmath library ===
# R 3.6.0
R has many math and statistical functions. We can easily use these functions in our C/C++/Fortran. The definite guide of doing this is on Chapter 9 "The standalone Rmath library" of [http://cran.r-project.org/doc/manuals/R-admin.html#The-standalone-Rmath-library R-admin manual].
# docker run --net=host -it --rm r-base:3.6.0
> set.seed(1234)
> sample(5)
[1] 4 5 2 3 1
> RNGkind(sample.kind = "Rounding")
Warning message:
In RNGkind(sample.kind = "Rounding") : non-uniform 'Rounding' sampler used
> set.seed(1234)
> sample(5)
[1] 1 3 2 4 5
</pre>


Here is my experience based on R 3.0.2 on Windows OS.
=== Getting different results with set.seed() in RStudio ===
[https://community.rstudio.com/t/getting-different-results-with-set-seed/31624/2 Getting different results with set.seed()].  ''It's possible that you're loading an R package that is changing the requested random number generator; RNGkind().''


==== Create a static library <libRmath.a> and a dynamic library <Rmath.dll> ====
=== dplyr::sample_n() ===
Suppose we have downloaded R source code and build R from its source. See [[R#Build_R_from_its_source|Build_R_from_its_source]]. Then the following 2 lines will generate files <libRmath.a> and <Rmath.dll> under C:\R\R-3.0.2\src\nmath\standalone directory.
The function has a parameter [https://dplyr.tidyverse.org/reference/sample.html weight]. For example if we have some download statistics for each day and we want to do sampling based on their download numbers, we can use this function.
<pre>
cd C:\R\R-3.0.2\src\nmath\standalone
make -f Makefile.win
</pre>


==== Use Rmath library in our code ====
== Regular Expression ==
<pre>
See [[Regular_expression|here]].
set CPLUS_INCLUDE_PATH=C:\R\R-3.0.2\src\include
set LIBRARY_PATH=C:\R\R-3.0.2\src\nmath\standalone
# It is not LD_LIBRARY_PATH in above.


# Created <RmathEx1.cpp> from the book "Statistical Computing in C++ and R" web site
== Read rrd file ==
# http://math.la.asu.edu/~eubank/CandR/ch4Code.cpp
* https://en.wikipedia.org/wiki/RRDtool
# It is OK to save the cpp file under any directory.
* http://oss.oetiker.ch/rrdtool/
* https://github.com/pldimitrov/Rrd
* http://plamendimitrov.net/blog/2014/08/09/r-package-for-working-with-rrd-files/


# Force to link against the static library <libRmath.a>
== on.exit() ==
g++ RmathEx1.cpp -lRmath -lm -o RmathEx1.exe
Examples of using on.exit(). In all these examples, '''add = TRUE''' is used in the on.exit() call to ensure that each exit action is added to the list of actions to be performed when the function exits, rather than replacing the previous actions.
# OR
<ul>
g++ RmathEx1.cpp -Wl,-Bstatic -lRmath -lm -o RmathEx1.exe
<li>Database connections
 
<pre>
# Force to link against dynamic library <Rmath.dll>
library(RSQLite)
g++ RmathEx1.cpp Rmath.dll -lm -o RmathEx1Dll.exe
sqlite_get_query <- function(db, sql) {
  conn <- dbConnect(RSQLite::SQLite(), db)
  on.exit(dbDisconnect(conn), add = TRUE)
  dbGetQuery(conn, sql)
}
</pre>
<li>File connections
<pre>
read_chars <- function(file_name) {
  conn <- file(file_name, "r")
  on.exit(close(conn), add = TRUE)
  readChar(conn, file.info(file_name)$size)
}
</pre>
</pre>
Test the executable program. Note that the executable program ''RmathEx1.exe'' can be transferred to and run in another computer without R installed. Isn't it cool!
<li>Temporary files
<pre>
<pre>
c:\R>RmathEx1
history_lines <- function() {
Enter a argument for the normal cdf:
  f <- tempfile()
1
  on.exit(unlink(f), add = TRUE)
Enter a argument for the chi-squared cdf:
  savehistory(f)
1
  readLines(f, encoding = "UTF-8")
Prob(Z <= 1) = 0.841345
}
Prob(Chi^2 <= 1)= 0.682689
</pre>
</pre>
 
<li>Printing messages
Below is the cpp program <RmathEx1.cpp>.
<pre>
<pre>
//RmathEx1.cpp
myfun = function(x) {
#define MATHLIB_STANDALONE
  on.exit(print("first"))
#include <iostream>
  on.exit(print("second"), add = TRUE)
#include "Rmath.h"
  return(x)
}
</pre>
</ul>


using std::cout; using std::cin; using std::endl;
== file, connection ==
* [https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/cat cat()] and [https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/scan scan()] (read data into a vector or list from the console or file)
* read() and write()
* read.table() and write.table()
{{Pre}}
out = file('tmp.txt', 'w')
writeLines("abcd", out)
writeLines("eeeeee", out)
close(out)
readLines('tmp.txt')
unlink('tmp.txt')
args(writeLines)
# function (text, con = stdout(), sep = "\n", useBytes = FALSE)


int main()
foo <- function() {
{
   con <- file()
   double x1, x2;
   ...
   cout << "Enter a argument for the normal cdf:" << endl;
   on.exit(close(con))
   cin >> x1;
   ...
   cout << "Enter a argument for the chi-squared cdf:" << endl;
}
  cin >> x2;
</pre>
 
[https://r.789695.n4.nabble.com/Why-I-get-this-error-Error-in-close-connection-f-invalid-connection-td904413.html Error in close.connection(f) : invalid connection]. If we want to use '''close(con)''', we have to specify how to '''open''' the connection; such as
  cout << "Prob(Z <= " << x1 << ") = " <<
<pre>
    pnorm(x1, 0, 1, 1, 0) << endl;
con <- gzfile(FileName, "r") # Or gzfile(FileName, open = 'r')
  cout << "Prob(Chi^2 <= " << x2 << ")= " <<
x <- read.delim(con)
    pchisq(x2, 1, 1, 0) << endl;
close(x)
  return 0;
}
</pre>
</pre>


=== Calling R.dll directly ===
=== withr package ===
See Chapter 8.2.2 of [http://cran.r-project.org/doc/manuals/R-exts.html#Calling-R_002edll-directly|Writing R Extensions]. This is related to embedding R under Windows. The file <R.dll> on Windows is like <libR.so> on Linux.
https://cran.r-project.org/web/packages/withr/index.html . Reverse suggested by [https://cran.r-project.org/web/packages/languageserver/index.html languageserver].


=== [https://bookdown.org/ bookdown.org] ===
== Clipboard (?connections), textConnection(), pipe() ==
The website is full of open-source books written with R markdown.
<ul>
<li>On Windows, we can use readClipboard() and writeClipboard().
{{Pre}}
source("clipboard")
read.table("clipboard")
</pre></li>
<li>Clipboard -> R. Reading/writing clipboard on macOS. Use [https://www.rdocumentation.org/packages/base/versions/3.5.0/topics/textConnection textConnection()] function:
{{Pre}}
x <- read.delim(textConnection("<USE_KEYBOARD_TO_PASTE_FROM_CLIPBOARD>"))
# Or on Mac
x <- read.delim(pipe("pbpaste"))
# safely ignore the warning: incomplete final line found by readTableHeader on 'pbpaste'
</pre>
An example is to copy data from [https://stackoverflow.com/questions/28426026/plotting-boxplots-of-multiple-y-variables-using-ggplot2-qplot-or-others?answertab=active#tab-top this post]. In this case we need to use read.table() instead of read.delim().
</li>
<li>R -> clipboard on Mac. Note: '''pbcopy''' and '''pbpaste''' are macOS terminal commands. See [http://osxdaily.com/2007/03/05/manipulating-the-clipboard-from-the-command-line/ pbcopy & pbpaste: Manipulating the Clipboard from the Command Line].
* pbcopy: takes standard input and places it in the clipboard buffer
* pbpaste: takes data from the clipboard buffer and writes it to the standard output
{{Pre}}
clip <- pipe("pbcopy", "w")
write.table(apply(x, 1, mean), file = clip, row.names=F, col.names=F)
# write.table(data.frame(Var1, Var2), file = clip, row.names=F, quote=F, sep="\t")
close(clip)
</pre>
<li>
<li>Clipboard -> Excel.
* Method 1: Paste icon -> Text import wizard -> Delimit (Tab, uncheck Space) or Fixed width depending on the situation -> Finish.
* Method 2: Ctrl+v first. Then choose Data -> Text to Columns. Fixed width -> Next -> Next -> Finish.
</li>
<li>On Linux, we need to install "xclip". See [https://stackoverflow.com/questions/45799496/r-copy-from-clipboard-in-ubuntu-linux R Copy from Clipboard in Ubuntu Linux]. It seems to work.
{{Pre}}
# sudo apt-get install xclip
read.table(pipe("xclip -selection clipboard -o",open="r"))
</pre>
</li>
</ul>


* [https://blog.rstudio.org/2016/12/02/announcing-bookdown/ Announce bookdown]
=== clipr ===
* [https://bookdown.org/yihui/bookdown/ bookdown package]: Authoring Books and Technical Documents with R Markdown
[https://cran.rstudio.com/web/packages/clipr/ clipr]: Read and Write from the System Clipboard
* [http://brettklamer.com/diversions/statistical/compile-r-for-data-science-to-a-pdf/ Compile R for Data Science to a PDF]


==== Writing a R book and self-publishing it in Amazon ====
== read/manipulate binary data ==
https://msperlin.github.io/2017-02-16-Writing-a-book/
* x <- readBin(fn, raw(), file.info(fn)$size)
* rawToChar(x[1:16])
* See Biostrings C API


=== Scheduling R Markdown Reports via Email ===
== String Manipulation ==
http://www.analyticsforfun.com/2016/01/scheduling-r-markdown-reports-via-email.html
* [https://www.gastonsanchez.com/r4strings/ Handling Strings with R](ebook) by Gaston Sanchez.
* [http://blog.revolutionanalytics.com/2018/06/handling-strings-with-r.html A guide to working with character data in R] (6/22/2018)
* Chapter 7 of the book 'Data Manipulation with R' by Phil Spector.
* Chapter 7 of the book 'R Cookbook' by Paul Teetor.
* Chapter 2 of the book 'Using R for Data Management, Statistical Analysis and Graphics' by Horton and Kleinman.
* http://www.endmemo.com/program/R/deparse.php. '''It includes lots of examples for each R function it lists.'''
* [http://theautomatic.net/2019/05/17/four-ways-to-reverse-a-string-in-r/ Four ways to reverse a string in R]
* [https://statisticaloddsandends.wordpress.com/2022/05/05/a-short-note-on-the-startswith-function/ A short note on the startsWith function]


=== Create presentation file (beamer) ===
=== format(): padding with zero ===
* http://rmarkdown.rstudio.com/beamer_presentation_format.html
* http://www.theresearchkitchen.com/archives/1017 (markdown and presentation files)
* http://rmarkdown.rstudio.com/
 
# Create Rmd file first in Rstudio by File -> R markdown. Select Presentation > choose pdf (beamer) as output format.
# Edit the template created by RStudio.
# Click 'Knit pdf' button (Ctrl+Shift+k) to create/display the pdf file.
 
An example of Rmd is
<pre>
<pre>
---
ngenes <- 10
title: "My Example"
genenames <- paste0("bm", gsub(" ", "0", format(1:ngenes))); genenames
author: You Know Me
#  [1] "bm01" "bm02" "bm03" "bm04" "bm05" "bm06" "bm07" "bm08" "bm09" "bm10"
date: Dec 32, 2014
</pre>
output: beamer_presentation
---


## R Markdown
=== noquote() ===
[https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/noquote noqute] Print character strings without quotes.


This is an R Markdown presentation. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents.  
=== stringr package ===
For more details on using R Markdown see <http://rmarkdown.rstudio.com>.
* https://stringr.tidyverse.org/index.html
* [https://stringr.tidyverse.org/articles/from-base.html Vignette compares stringr functions to their base R equivalents]
* When I try to use trimws() on data obtained from readxl::read_excell(), I find trimws() does not work but [https://stringr.tidyverse.org/reference/str_trim.html stringr::str_trim()] works. [https://stackoverflow.com/questions/45050617/trimws-bug-leading-whitespace-not-removed trimws bug? leading whitespace not removed].


When you click the **Knit** button a document will be generated that includes both content as well as the output of any
=== glue package ===
embedded R code chunks within the document.
<ul>
<li>[https://cran.r-project.org/web/packages/glue/index.html glue]. Useful in a loop and some function like ggtitle() or ggsave(). Inside the curly braces {R-Expression}, the expression is evaluated.
<syntaxhighlight lang='r'>
library(glue)
name <- "John"
age <- 30
glue("My name is {name} and I am {age} years old.")
# My name is John and I am 30 years old.


## Slide with Bullets
price <- 9.99
quantity <- 3
total <- glue("The total cost is {round(price * quantity, 2)}.")
# Inside the curly braces {}, the expression round(price * quantity, 2) is evaluated.
print(total)
# The total cost is 29.97.
</syntaxhighlight>
The syntax of glue() in R is quite similar to Python's print() function when using formatted strings. In Python, you typically use [https://www.pythontutorial.net/python-basics/python-f-strings/ f-strings] to embed variables inside strings.
<syntaxhighlight lang='python'>
name = "John"
age = 30
print(f"My name is {name} and I am {age} years old.")
# My name is John and I am 30 years old.


- Bullet 1
price = 9.99
- Bullet 2
quantity = 3
- Bullet 3. Mean is $\frac{1}{n} \sum_{i=1}^n x_i$.
total = f"The total cost is {price * quantity:.2f}."
$$
print(total)
\mu = \frac{1}{n} \sum_{i=1}^n x_i
# The total cost is 29.97.
$$
</syntaxhighlight>


## New slide
</li>
<li>[https://en.wikipedia.org/wiki/String_interpolation String interpolation] </li>
</ul>


![picture of BDGE](/home/brb/Pictures/BDGEFinished.png)
=== Raw data type ===
[https://twitter.com/hadleywickham/status/1387747735441395712 Fun with strings], [https://en.wikipedia.org/wiki/Cyrillic_alphabets Cyrillic alphabets]
<pre>
a1 <- "А"
a2 <- "A"
a1 == a2
# [1] FALSE
charToRaw("А")
# [1] d0 90
charToRaw("A")
# [1] 41
</pre>


## Slide with R Code and Output
=== number of characters limit ===
[https://twitter.com/eddelbuettel/status/1438326822635180036 It's a limit on a (single) input line in the REPL]


```{r}
=== Comparing strings to numeric ===
summary(cars)
[https://stackoverflow.com/a/57348393 ">" coerces the number to a string before comparing].
```
<syntaxhighlight lang='r' inline>"10" < 2 # TRUE</syntaxhighlight>


## Slide with Plot
== HTTPs connection ==
HTTPS connection becomes default in R 3.2.2. See
* http://blog.rstudio.org/2015/08/17/secure-https-connections-for-r/
* http://blog.revolutionanalytics.com/2015/08/good-advice-for-security-with-r.html


```{r, echo=FALSE}
[http://developer.r-project.org/blosxom.cgi/R-devel/2016/12/15#n2016-12-15 R 3.3.2 patched] The internal methods of ‘download.file()’ and ‘url()’ now report if they are unable to follow the redirection of a ‘http://’ URL to a ‘https://’ URL (rather than failing silently)
plot(cars)
 
```
== setInternet2 ==
There was a bug in ftp downloading in R 3.2.2 (r69053) Windows though it is fixed now in R 3.2 patch.
 
Read the [https://stat.ethz.ch/pipermail/r-devel/2015-August/071595.html discussion] reported on 8/8/2015. The error only happened on ftp not http connection. The final solution is explained in [https://stat.ethz.ch/pipermail/r-devel/2015-August/071623.html this post]. The following demonstrated the original problem.
<pre>
url <- paste0("ftp://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/All/",
              "GCF_000001405.13.assembly.txt")
f1 <- tempfile()
download.file(url, f1)
</pre>
</pre>
It seems the bug was fixed in R 3.2-branch. See [https://github.com/wch/r-source/commit/3a02ed3a50ba17d9a093b315bf5f31ffc0e21b89 8/16/2015] patch r69089 where a new argument INTERNET_FLAG_PASSIVE was added to [https://msdn.microsoft.com/en-us/library/windows/desktop/aa385098%28v=vs.85%29.aspx InternetOpenUrl()] function of [https://msdn.microsoft.com/en-us/library/windows/desktop/aa385473%28v=vs.85%29.aspx wininet] library. [http://slacksite.com/other/ftp.html This article] and [http://stackoverflow.com/questions/1699145/what-is-the-difference-between-active-and-passive-ftp this post] explain differences of active and passive FTP.


=== Create HTML report ===
The following R command will show the exact svn revision for the R you are currently using.
[http://www.bioconductor.org/packages/release/bioc/html/ReportingTools.html ReportingTools] (Jason Hackney) from Bioconductor.
<pre>
R.Version()$"svn rev"
</pre>


==== [http://cran.r-project.org/web/packages/htmlTable/index.html htmlTable] package ====
If setInternet2(T), then https protocol is supported in download.file().  
The htmlTable package is intended for generating tables using HTML formatting. This format is compatible with Markdown when used for HTML-output. The most basic table can easily be created by just passing a matrix or a data.frame to the htmlTable-function.


* http://cran.r-project.org/web/packages/htmlTable/vignettes/general.html
When setInternet(T) is enabled by default, download.file() does not work for ftp protocol (this is used in getGEO() function of the GEOquery package). If I use setInternet(F), download.file() works again for ftp protocol.  
* http://gforge.se/2014/01/fast-track-publishing-using-knitr-part-iv/


==== formattable ====
The setInternet2() function is defined in [https://github.com/wch/r-source/commits/trunk/src/library/utils/R/windows/sysutils.R R> src> library> utils > R > windows > sysutils.R].
http://www.magesblog.com/2016/01/formatting-table-output-in-r.html
==== [https://github.com/crubba/htmltab htmltab] package ====
This package is NOT used to CREATE html report but EXTRACT html table.


==== [http://cran.r-project.org/web/packages/ztable/index.html ztable] package ====
'''R up to 3.2.2'''
Makes zebra-striped tables (tables with alternating row colors) in LaTeX and HTML formats easily from a data.frame, matrix, lm, aov, anova, glm or coxph objects.
<pre>
setInternet2 <- function(use = TRUE) .Internal(useInternet2(use))
</pre>
See also
* <src/include/Internal.h> (declare do_setInternet2()),
* <src/main/names.c> (show do_setInternet2() in C)
* <src/main/internet.c>  (define do_setInternet2() in C).


=== Create academic report ===
Note that: setInternet2(T) becomes default in R 3.2.2. To revert to the previous default use setInternet2(FALSE). See the <doc/NEWS.pdf> file. If we use setInternet2(F), then it solves the bug of getGEO() error. But it disables the https file download using the download.file() function. In R < 3.2.2,  it is also possible to download from https by setIneternet2(T).
[http://cran.r-project.org/web/packages/reports/index.html reports] package in CRAN and in [https://github.com/trinker/reports github] repository. The youtube video gives an overview of the package.


=== Create pdf and epub files ===
'''R 3.3.0'''
<syntaxhighlight lang='rsplus'>
<pre>
# Idea:
setInternet2 <- function(use = TRUE) {
#        knitr        pdflatex
    if(!is.na(use)) stop("use != NA is defunct")
#  rnw -------> tex ----------> pdf
    NA
library(knitr)
}
knit("example.rnw") # create example.tex file
</pre>
</syntaxhighlight>
 
* A very simple example <002-minimal.Rnw> from [http://yihui.name/knitr/demo/minimal/ yihui.name] works fine on linux.
Note that setInternet2.Rd says As from \R 3.3.0 it changes nothing, and only \code{use = NA} is accepted. Also NEWS.Rd says setInternet2() has no effect and will be removed in due course.
<syntaxhighlight lang='bash'>
git clone https://github.com/yihui/knitr-examples.git
</syntaxhighlight>
* <knitr-minimal.Rnw>. I have no problem to create pdf file on Windows but still cannot generate pdf on Linux from tex file. Some people suggested to run '''sudo apt-get install texlive-fonts-recommended''' to install missing fonts. It works!


To see a real example, check out [http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html DESeq2] package (inst/doc subdirectory). In addition to DESeq2, I also need to install '''DESeq, BiocStyle, airway, vsn, gplots''', and '''pasilla''' packages from Bioconductor. Note that, it is best to use sudo/admin account to install packages.
== Finite, Infinite and NaN Numbers: is.finite(), is.infinite(), is.nan() ==
In R, basically all mathematical functions (including basic Arithmetic), are supposed to work properly with +/-, '''Inf''' and '''NaN''' as input or output.


Or starts with markdown file. Download the example <001-minimal.Rmd> and remove the last line of getting png file from internet.
See [https://stat.ethz.ch/R-manual/R-devel/library/base/html/is.finite.html ?is.finite].
<syntaxhighlight lang='bash'>
# Idea:
#        knitr        pandoc
#  rmd -------> md ----------> pdf


git clone https://github.com/yihui/knitr-examples.git
[https://datasciencetut.com/how-to-replace-inf-values-with-na-in-r/ How to replace Inf with NA in All or Specific Columns of the Data Frame]
cd knitr-examples
R -e "library(knitr); knit('001-minimal.Rmd')"
pandoc 001-minimal.md -o 001-minimal.pdf # require pdflatex to be installed !!
</syntaxhighlight>


To create an epub file (not success yet on Windows OS, missing figures on Linux OS)
== replace() function ==
<syntaxhighlight lang='rsplus'>
* [https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/replace replace](vector, index, values)
# Idea:
* https://stackoverflow.com/a/11811147
#        knitr        pandoc
#  rnw -------> tex ----------> markdown or epub


library(knitr)
== File/path operations ==
knit("DESeq2.Rnw") # create DESeq2.tex
* list.files(, include.dirs =F, recursive = T, pattern = "\\.csv$", all.files = TRUE)
system("pandoc  -f latex -t markdown -o DESeq2.md DESeq2.tex")
* file.info()
</syntaxhighlight>
* dir.create()
* file.create()
* file.copy()
* file.exists()
<ul>
<li>'''basename'''() - remove the parent path, '''dirname'''() - returns the part of the path up to but excluding the last path separator
<pre>
> file.path("~", "Downloads")
[1] "~/Downloads"
> dirname(file.path("~", "Downloads"))
[1] "/home/brb"
> basename(file.path("~", "Downloads"))
[1] "Downloads"
</pre>
</li></ul>
* '''path.expand'''("~/.Renviron")  # "/home/brb/.Renviron"
<ul>
<li> '''normalizePath'''() # Express File Paths in Canonical Form
<pre>
> cat(normalizePath(c(R.home(), tempdir())), sep = "\n")
/usr/lib/R
/tmp/RtmpzvDhAe
</pre>
</li>
<li>[https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/system.file system.file()] - Finds the full file names of files in packages etc
<pre>
<pre>
## Windows OS, epub cannot be built
> system.file("extdata", "ex1.bam", package="Rsamtools")
pandoc:
[1] "/home/brb/R/x86_64-pc-linux-gnu-library/4.0/Rsamtools/extdata/ex1.bam"
Error:
</pre>
"source" (line 41, column 7):
</li></ul>
unexpected "k"
* tools::file_path_sans_ext() - [https://stackoverflow.com/a/29114021 remove the file extension] or the sub() function.
expecting "{document}"


## Linux OS, epub missing figures and R codes.
== read/download/source a file from internet ==
## First install texlive base and extra packages
=== Simple text file http ===
## sudo apt-get install texlive-latex-base texlive-latex-extra
<pre>
pandoc: Could not find media `figure/SchwederSpjotvoll-1', skipping...
retail <- read.csv("http://robjhyndman.com/data/ausretail.csv",header=FALSE)
pandoc: Could not find media `figure/sortedP-1', skipping...
pandoc: Could not find media `figure/figHeatmap2c-1', skipping...
pandoc: Could not find media `figure/figHeatmap2b-1', skipping...
pandoc: Could not find media `figure/figHeatmap2a-1', skipping...
pandoc: Could not find media `figure/plotCountsAdv-1', skipping...
pandoc: Could not find media `figure/plotCounts-1', skipping...
pandoc: Could not find media `figure/MA-1', skipping...
pandoc: Could not find media `figure/MANoPrior-1', skipping...
</pre>
</pre>
The problems are at least
* figures need to be generated under the same directory as the source code
* figures cannot be in the format of pdf (DESeq2 generates both pdf and png files format)
* missing R codes


Convert tex to epub
=== Zip, RData, gz file and url() function ===
* http://tex.stackexchange.com/questions/156668/tex-to-epub-conversion
<pre>
x <- read.delim(gzfile("filename.txt.gz"), nrows=10)
</pre>
<pre>
con = gzcon(url('http://www.systematicportfolio.com/sit.gz', 'rb'))
source(con)
close(con)
</pre>
Here url() function is like file(),  gzfile(), bzfile(), xzfile(), unz(), pipe(), fifo(), socketConnection(). They are used to create connections. By default, the connection is not opened (except for ‘socketConnection’), but may be opened by setting a non-empty value of argument ‘open’. See ?url.


=== Create Word report ===
Another example is [https://stackoverflow.com/a/9548672 Read gzipped csv directly from a url in R]
 
==== knitr + pandoc ====
* http://www.r-statistics.com/2013/03/write-ms-word-document-using-r-with-as-little-overhead-as-possible/
* http://www.carlboettiger.info/2012/04/07/writing-reproducibly-in-the-open-with-knitr.html
* http://rmarkdown.rstudio.com/articles_docx.html
 
It is better to create rmd file in RStudio. Rstudio provides a template for rmd file and it also provides a quick reference to R markdown language.
<pre>
<pre>
# Idea:
con <- gzcon(url(paste("http://dumps.wikimedia.org/other/articlefeedback/",
#        knitr      pandoc
                      "aa_combined-20110321.csv.gz", sep="")))
#  rmd -------> md --------> docx
txt <- readLines(con)
library(knitr)
dat <- read.csv(textConnection(txt))
knit2html("example.rmd") #Create md and html files
</pre>
</pre>
and then
 
Another example of using url() is
<pre>
<pre>
FILE <- "example"
load(url("http:/www.example.com/example.RData"))
system(paste0("pandoc -o ", FILE, ".docx ", FILE, ".md"))
</pre>
</pre>
Note. For example reason, if I play around the above 2 commands for several times, the knit2html() does not work well. However, if I click 'Knit HTML' button on the RStudio, it then works again.


Another way is
This does not work with load(), dget(), read.table() for files on '''OneDrive'''. In fact, I cannot use wget with shared files from OneDrive. The following trick works: [https://mangolassi.it/topic/19276/how-to-configure-a-onedrive-file-for-use-with-wget How to configure a OneDrive file for use with wget].
<pre>
 
library(pander)
'''Dropbox''' is easy and works for load(), wget, ...
name = "demo"
knit(paste0(name, ".Rmd"), encoding = "utf-8")
Pandoc.brew(file = paste0(name, ".md"), output = paste0(-name, "docx"), convert = "docx")
</pre>


Note that once we have used knitr command to create a md file, we can use pandoc shell command to convert it to different formats:
[https://stackoverflow.com/a/46875562 R download .RData] or [https://stackoverflow.com/a/56670130 Directly loading .RData from github] from Github.
* A pdf file: pandoc -s report.md -t latex -o report.pdf
* A html file: pandoc -s report.md -o report.html (with the -c flag html files can be added easily)
* Openoffice: pandoc report.md -o report.odt
* Word docx: pandoc report.md -o report.docx


We can also create the epub file for reading on Kobo ereader. For example, download [https://gist.github.com/jeromyanglim/2716336 this file] and save it as example.Rmd. I need to remove the line containing the link to http://i.imgur.com/RVNmr.jpg since it creates an error when I run pandoc (not sure if it is the pandoc version I have is too old). Now we just run these 2 lines to get the epub file. Amazing!
=== zip function ===
This will include 'hallmarkFiles' root folder in the files inside zip.
<pre>
<pre>
knit("example.Rmd")
zip(zipfile = 'myFile.zip',
pandoc("example.md", format="epub")
    files = dir('hallmarkFiles', full.names = TRUE))
 
# Verify/view the files. 'list = TRUE' won't extract
unzip('testZip.zip', list = TRUE)  
</pre>
</pre>


PS. If we don't remove the link, we will get an error message (pandoc 1.10.1 on Windows 7)
=== [http://cran.r-project.org/web/packages/downloader/index.html downloader] package ===
<pre>
This package provides a wrapper for the download.file function, making it possible to download files over https on Windows, Mac OS X, and other Unix-like platforms. The RCurl package provides this functionality (and much more) but can be difficult to install because it must be compiled with external dependencies. This package has no external dependencies, so it is much easier to install.
> pandoc("Rmd_to_Epub.md", format="epub")
 
executing pandoc  -f markdown -t epub -o Rmd_to_Epub.epub "Rmd_to_Epub.utf8md"
=== Google drive file based on https using [http://www.omegahat.org/RCurl/FAQ.html RCurl] package ===
pandoc.exe: .\.\http://i.imgur.com/RVNmr.jpg: openBinaryFile: invalid argument (Invalid argument)
{{Pre}}
Error in (function (input, format, ext, cfg) : conversion failed
require(RCurl)
In addition: Warning message:
myCsv <- getURL("https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AkuuKBh0jM2TdGppUFFxcEdoUklCQlJhM2kweGpoUUE&single=true&gid=0&output=csv")
running command 'pandoc  -f markdown -t epub -o Rmd_to_Epub.epub "Rmd_to_Epub.utf8md"' had status 1
read.csv(textConnection(myCsv))
</pre>
</pre>


==== pander ====
=== Google sheet file using [https://github.com/jennybc/googlesheets googlesheets] package ===
Try pandoc[1] with a minimal reproducible example, you might give a try to my "[http://cran.r-project.org/web/packages/pander/ pander]" package [2] too:
[http://www.opiniomics.org/reading-data-from-google-sheets-into-r/ Reading data from google sheets into R]


=== Github files https using RCurl package ===
* http://support.rstudio.org/help/kb/faq/configuring-r-to-use-an-http-proxy
* http://tonybreyal.wordpress.com/2011/11/24/source_https-sourcing-an-r-script-from-github/
<pre>
<pre>
library(pander)
x = getURL("https://gist.github.com/arraytools/6671098/raw/c4cb0ca6fe78054da8dbe253a05f7046270d5693/GeneIDs.txt",  
Pandoc.brew(system.file('examples/minimal.brew', package='pander'),
            ssl.verifypeer = FALSE)
            output = tempfile(), convert = 'docx')
read.table(text=x)
</pre>
</pre>
Where the content of the "minimal.brew" file is something you might have
* [http://cran.r-project.org/web/packages/gistr/index.html gistr] package
got used to with Sweave - although it's using "brew" syntax instead. See
 
the examples of pander [3] for more details. Please note that pandoc should
== data summary table ==
be installed first, which is pretty easy on Windows.
=== summarytools: create summary tables for vectors and data frames ===
https://github.com/dcomtois/summarytools. R Package for quickly and neatly summarizing vectors and data frames.


# http://johnmacfarlane.net/pandoc/
=== skimr: A frictionless, pipeable approach to dealing with summary statistics ===
# http://rapporter.github.com/pander/
[https://ropensci.org/blog/2017/07/11/skimr/ skimr for useful and tidy summary statistics]
# http://rapporter.github.com/pander/#examples


==== R2wd ====
=== modelsummary ===
Use [http://cran.r-project.org/web/packages/R2wd/ R2wd] package. However, only 32-bit R is allowed and sometimes it can not produce all 'table's.
[https://cloud.r-project.org/web/packages/modelsummary/index.html modelsummary]: Summary Tables and Plots for Statistical Models and Data: Beautiful, Customizable, and Publication-Ready
<pre>
> library(R2wd)
> wdGet()
Loading required package: rcom
Loading required package: rscproxy
rcom requires a current version of statconnDCOM installed.
To install statconnDCOM type
    installstatconnDCOM()


This will download and install the current version of statconnDCOM
=== broom ===
[[Tidyverse#broom|Tidyverse->broom]]


You will need a working Internet connection
=== Create publication tables using '''tables''' package ===
because installation needs to download a file.
See p13 for example at [http://www.ianwatson.com.au/stata/tabout_tutorial.pdf#page=13 here]
Error in if (wdapp[["Documents"]][["Count"]] == 0) wdapp[["Documents"]]$Add() :
 
   argument is of length zero
R's [http://cran.r-project.org/web/packages/tables/index.html tables] packages is the best solution. For example,
{{Pre}}
> library(tables)
> tabular( (Species + 1) ~ (n=1) + Format(digits=2)*
+          (Sepal.Length + Sepal.Width)*(mean + sd), data=iris )
                                                 
                Sepal.Length      Sepal.Width   
Species    n  mean        sd  mean        sd 
setosa      50 5.01        0.35 3.43        0.38
versicolor  50 5.94        0.52 2.77        0.31
virginica  50 6.59        0.64 2.97        0.32
All        150 5.84        0.83 3.06        0.44
> str(iris)
'data.frame':  150 obs. of  5 variables:
$ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species    : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
</pre>
and
<pre>
# This example shows some of the less common options       
> Sex <- factor(sample(c("Male", "Female"), 100, rep=TRUE))
> Status <- factor(sample(c("low", "medium", "high"), 100, rep=TRUE))
> z <- rnorm(100)+5
> fmt <- function(x) {
  s <- format(x, digits=2)
  even <- ((1:length(s)) %% 2) == 0
  s[even] <- sprintf("(%s)", s[even])
  s
}
> tabular( Justify(c)*Heading()*z*Sex*Heading(Statistic)*Format(fmt())*(mean+sd) ~ Status )
                  Status             
Sex    Statistic high  low    medium
Female mean      4.88  4.96  5.17
        sd        (1.20) (0.82) (1.35)
Male  mean      4.45   4.31  5.05
        sd        (1.01) (0.93) (0.75)
</pre>
</pre>


The solution is to launch 32-bit R instead of 64-bit R since statconnDCOM does not support 64-bit R.
=== fgsea example ===
[http://www.bioconductor.org/packages/release/bioc/vignettes/fgsea/inst/doc/fgsea-tutorial.html  vignette] & [https://github.com/ctlab/fgsea/blob/master/R/plot.R#L28 source code]


==== Convert from pdf to word ====
=== (archived) ClinReport: Statistical Reporting in Clinical Trials ===
The best rendering of advanced tables is done by converting from pdf to Word. See http://biostat.mc.vanderbilt.edu/wiki/Main/SweaveConvert
https://cran.r-project.org/web/packages/ClinReport/index.html


==== rtf ====
== Append figures to PDF files ==
Use [http://cran.r-project.org/web/packages/rtf/ rtf] package for Rich Text Format (RTF) Output.
[https://stackoverflow.com/a/13274272 How to append a plot to an existing pdf file]. Hint: use the recordPlot() function.


==== [https://www.rdocumentation.org/packages/xtable/versions/1.8-2 xtable] ====
== Save base graphics as pseudo-objects ==
Package xtable will produce html output. <syntaxhighlight lang='rsplus'>print(xtable(X), type="html")</syntaxhighlight>
[https://www.andrewheiss.com/blog/2016/12/08/save-base-graphics-as-pseudo-objects-in-r/ Save base graphics as pseudo-objects in R]. Note there are some cons with this approach.
<pre>
pdf(NULL)
dev.control(displaylist="enable")
plot(df$x, df$y)
text(40, 0, "Random")
text(60, 2, "Text")
lines(stats::lowess(df$x, df$y))
p1.base <- recordPlot()
invisible(dev.off())


If you save the file and then open it with Word, you will get serviceable results. I've had better luck copying the output from xtable and pasting it into Excel.
# Display the saved plot
grid::grid.newpage()
p1.base
</pre>


==== [http://cran.r-project.org/web/packages/ReporteRs/index.html ReporteRs] ====
== Extracting tables from PDFs ==  
Microsoft Word, Microsoft Powerpoint and HTML documents generation from R. The source code is hosted on https://github.com/davidgohel/ReporteRs
<ul>
<li>[http://datascienceplus.com/extracting-tables-from-pdfs-in-r-using-the-tabulizer-package/ extracting Tables from PDFs in R] using Tabulizer. This needs the [https://cran.r-project.org/web/packages/rJava/index.html rJava] package. Linux works fine. Some issue came out on my macOS 10.12 Sierra. '''Library not loaded: /Library/Java/JavaVirtualMachines/jdk-9.jdk/Contents/Home/lib/server/libjvm.dylib. Referenced from: /Users/XXXXXXX/Library/R/3.5/library/rJava/libs/rJava.so'''.
</li>
<li>
[https://docs.ropensci.org/pdftools/ pdftools] - Text Extraction, Rendering and Converting of PDF Documents. [https://ropensci.org/technotes/2018/12/14/pdftools-20/ pdf_text() and pdf_data()] functions.  
{{Pre}}
library(pdftools)
pdf_file <- "https://github.com/ropensci/tabulizer/raw/master/inst/examples/data.pdf"
txt <- pdf_text(pdf_file) # length = number of pages
# Suppose the table we are interested in is on page 1
cat(txt[1]) # Good but not in a data frame format


[https://statbandit.wordpress.com/2016/10/28/a-quick-exploration-of-reporters/ A quick exploration]
pdf_data(pdf_file)[[1]]  # data frame/tibble format
</pre>
However, it seems it does not work on [http://www.bloodjournal.org/content/109/8/3177/tab-figures-only Table S6]. Tabulizer package is better at this case.


=== R Graphs Gallery ===
This is another example. [https://mp.weixin.qq.com/s?__biz=MzAxMDkxODM1Ng==&mid=2247490327&idx=1&sn=cca7d4423426318e0c23adb098cf0ad7&chksm=9b485bacac3fd2ba2196b380c59b5eab9d29795d3334b040f50a2fa58124ec6e3be9472829e0&scene=21#wechat_redirect 神技能-自动化批量从PDF里面提取表格]
* [https://www.facebook.com/pages/R-Graph-Gallery/169231589826661 Romain François]
</li>
* [http://shinyapps.stat.ubc.ca/r-graph-catalog/ R Graph Catalog] written using R + Shiny. The source code is available on [https://github.com/jennybc/r-graph-catalog Github].
<li>[https://www.linuxuprising.com/2019/05/how-to-convert-pdf-to-text-on-linux-gui.html?m=1 How To Convert PDF To Text On Linux (GUI And Command Line)]. It works when I tested my PDF file.
* Forest plot. See the packages [https://cran.r-project.org/web/packages/rmeta/index.html rmeta] and [https://cran.r-project.org/web/packages/forestplot/ forestplot]. The forest plot can be used to plot the quantities like relative risk (with 95% CI) in survival data.
{{Pre}}
sudo apt install poppler-utils
pdftotext -layout input.pdf output.txt
pdftotext -layout -f 3 -l 4 input.pdf output.txt # from page 3 to 4.
</pre>
</li>
<li>[https://www.adobe.com/acrobat/how-to/pdf-to-excel-xlsx-converter.html Convert PDF files into Excel spreadsheets] using Adobe Acrobat. See [https://helpx.adobe.com/acrobat/how-to/extract-pages-from-pdf.html How to extract pages from a PDF]. Note the PDF file should not be opened by Excel since it is binary format Excel can't recognize.
<li>I found it is easier to use copy the column (it works) from PDF and paste them to Excel </li>
<li>[https://www.r-bloggers.com/2024/04/tabulapdf-extract-tables-from-pdf-documents/ tabulapdf: Extract Tables from PDF Documents]
</ul>


=== COM client or server ===
== Print tables ==


==== Client ====
=== addmargins() ===
* [https://www.rdocumentation.org/packages/stats/versions/3.5.1/topics/addmargins addmargins]. Puts Arbitrary Margins On Multidimensional Tables Or Arrays.
* [https://datasciencetut.com/how-to-put-margins-on-tables-or-arrays-in-r/ How to put margins on tables or arrays in R?]


[http://www.omegahat.org/RDCOMClient/ RDCOMClient] where [http://cran.r-project.org/web/packages/excel.link/index.html excel.link] depends on it.
=== tableone ===
* https://cran.r-project.org/web/packages/tableone/
* [https://datascienceplus.com/table-1-and-the-characteristics-of-study-population/ Table 1 and the Characteristics of Study Population]
* [https://www.jianshu.com/p/e76f2b708d45 如何快速绘制论文的表1(基本特征三线表)?]
* See Table 1 from [https://boiled-data.github.io/ClassificationDiabetes.html Tidymodels Machine Learning: Diabetes Classification]


==== Server ====
=== Some examples ===
[http://www.omegahat.org/RDCOMServer/ RDCOMServer]
Cox models
 
* [https://aacrjournals.org/clincancerres/article/27/12/3383/671420/Integrative-Genomic-Analysis-of-Gemcitabine Integrative Genomic Analysis of Gemcitabine Resistance in Pancreatic Cancer by Patient-derived Xenograft Models]
=== Use R under proxy ===
http://support.rstudio.org/help/kb/faq/configuring-r-to-use-an-http-proxy
 
=== RStudio ===
* [https://github.com/rstudio/rstudio Github]
* Installing RStudio (1.0.44) on Ubuntu will not install Java even the source code contains 37.5% Java??
* [https://www.rstudio.com/products/rstudio/download/preview/ Preview]


==== rstudio.cloud ====
=== finalfit package ===
https://rstudio.cloud/
* https://cran.r-project.org/web/packages/finalfit/index.html. Lots of vignettes.
** [https://cran.r-project.org/web/packages/finalfit/vignettes/survival.html Survival]. It fits both univariate and multivariate regressions and reports the results for both of them.
* [https://finalfit.org/index.html summary_factorlist()] from the finalfit package.
* [https://www.r-bloggers.com/2018/05/elegant-regression-results-tables-and-plots-in-r-the-finalfit-package/ Elegant regression results tables and plots in R: the finalfit package]


==== Launch RStudio ====
=== table1 ===
If multiple versions of R was detected, Rstudio can not be launched successfully. A java-like clock will be spinning without a stop. The trick is to click Ctrl key and click the Rstudio at the same time.
* https://cran.r-project.org/web/packages/table1/
After done that, it will show up a selection of R to choose from.
* [https://www.rdatagen.net/post/2023-09-26-nice-looking-table-1-with-standardized-mean-difference/ Creating a nice looking Table 1 with standardized mean differences (SMD)]. SMD is the difference in group means divided by the pooled standard deviation (and is defined differently for categorical measures). Note that the pooled standard deviation defined here is different from we see on the '''[[T-test#Two_sample_test_assuming_equal_variance|t.test]]''' when we assume equivalent variance in two samples.


[[File:RStudio.jpg|100px]]
=== gtsummary ===
* [https://education.rstudio.com/blog/2020/07/gtsummary/ Presentation-Ready Summary Tables with gtsummary]
* [https://www.danieldsjoberg.com/gtsummary/ gtsummary] & on [https://cloud.r-project.org/web/packages/gtsummary/index.html CRAN]  
** [https://www.danieldsjoberg.com/gtsummary/articles/tbl_summary.html tbl_summary()]. The output is in the "Viewer" window.
* An example: [https://boiled-data.github.io/ClassificationDiabetes.html Tidymodels Machine Learning: Diabetes Classification]. The table is saved in a png file. The column variable is response.


==== Create .Rproj file ====
=== gt* ===
If you have an existing package that doesn't have an .Rproj file, you can use devtools::use_rstudio("path/to/package") to add it.
* [https://cran.r-project.org/web/packages/gt/index.html gt]: Easily Create Presentation-Ready Display Tables
 
* [https://www.r-bloggers.com/2024/02/introduction-to-clinical-tables-with-the-gt-package/ Introduction to Clinical Tables with the {gt} Package]
With an RStudio project file, you can
* [https://www.youtube.com/watch?v=qFOFMed18T4 Add any Plot to your {gt} table]
* Restore .RData into workspace at startup
* Save workspace to .RData on exit
* Always save history (even if no saving .RData)
* etc


==== package search ====
=== dplyr ===
https://github.com/RhoInc/CRANsearcher
https://stackoverflow.com/a/34587522. The output includes counts and proportions in a publication like fashion.


==== Git ====
=== tables::tabular() ===
* (Video) [https://www.rstudio.com/resources/videos/happy-git-and-gihub-for-the-user-tutorial/ Happy Git and Gihub for the useR – Tutorial]
* [https://owi.usgs.gov/blog/beyond-basic-git/ Beyond Basic R - Version Control with Git]


=== Visual Studio ===
=== gmodels::CrossTable() ===
[http://blog.revolutionanalytics.com/2017/05/r-and-python-support-now-built-in-to-visual-studio-2017.html R and Python support now built in to Visual Studio 2017]
https://www.statmethods.net/stats/frequencies.html


=== List files using regular expression ===
=== base::prop.table(x, margin) ===
* Extension
[http://developer.r-project.org/blosxom.cgi/R-devel/2020/02/13#n2020-02-13 New function ‘proportions()’ and ‘marginSums()’. These should replace the unfortunately named ‘prop.table()’ and ‘margin.table()’.] for R 4.0.0.
<pre>
<pre>
list.files(pattern = "\\.txt$")
R> m <- matrix(1:4, 2)
</pre>
R> prop.table(m, 1) # row percentage
where the dot (.) is a metacharacter. It is used to refer to any character.
          [,1]      [,2]
* Start with
[1,] 0.2500000 0.7500000
<pre>
[2,] 0.3333333 0.6666667
list.files(pattern = "^Something")
R> prop.table(m, 2) # column percentage
          [,1]      [,2]
[1,] 0.3333333 0.4285714
[2,] 0.6666667 0.5714286
</pre>
</pre>


Using '''Sys.glob()"' as
=== stats::xtabs() ===
<pre>
> Sys.glob("~/Downloads/*.txt")
[1] "/home/brb/Downloads/ip.txt"      "/home/brb/Downloads/valgrind.txt"
</pre>


=== Hidden tool: rsync in Rtools ===
=== stats::ftable() ===
<pre>
{{Pre}}
c:\Rtools\bin>rsync -avz "/cygdrive/c/users/limingc/Downloads/a.exe" "/cygdrive/c/users/limingc/Documents/"
> ftable(Titanic, row.vars = 1:3)
sending incremental file list
                  Survived  No Yes
a.exe
Class Sex    Age                 
 
1st  Male  Child            0  5
sent 323142 bytes received 31 bytes 646346.00 bytes/sec
            Adult          118  57
total size is 1198416 speedup is 3.71
      Female Child            0  1
 
            Adult            4 140
c:\Rtools\bin>
2nd  Male  Child            0  11
</pre>
            Adult          154  14
And rsync works best when we need to sync folder.
      Female Child            0  13
<pre>
            Adult          13  80
c:\Rtools\bin>rsync -avz "/cygdrive/c/users/limingc/Downloads/binary" "/cygdrive/c/users/limingc/Documents/"
3rd  Male  Child          35  13
sending incremental file list
            Adult          387 75
binary/
      Female Child          17 14
binary/Eula.txt
            Adult          89 76
binary/cherrytree.lnk
Crew  Male  Child            0  0
binary/depends64.chm
            Adult          670 192
binary/depends64.dll
      Female Child            0  0
binary/depends64.exe
            Adult            3  20
binary/mtputty.exe
> ftable(Titanic, row.vars = 1:2, col.vars = "Survived")
binary/procexp.chm
            Survived  No Yes
binary/procexp.exe
Class Sex                   
binary/pscp.exe
1st  Male            118  62
binary/putty.exe
      Female            4 141
binary/sqlite3.exe
2nd  Male            154  25
binary/wget.exe
      Female          13  93
 
3rd  Male            422  88
sent 4115294 bytes received 244 bytes 1175868.00 bytes/sec
      Female          106  90
total size is 8036311 speedup is 1.95
Crew  Male            670 192
 
      Female            3  20
c:\Rtools\bin>rm c:\users\limingc\Documents\binary\procexp.exe
> ftable(Titanic, row.vars = 2:1, col.vars = "Survived")
cygwin warning:
            Survived  No Yes
  MS-DOS style path detected: c:\users\limingc\Documents\binary\procexp.exe
Sex    Class               
   Preferred POSIX equivalent is: /cygdrive/c/users/limingc/Documents/binary/procexp.exe
Male  1st            118  62
   CYGWIN environment variable option "nodosfilewarning" turns off this warning.
      2nd            154  25
   Consult the user's guide for more details about POSIX paths:
      3rd            422 88
     http://cygwin.com/cygwin-ug-net/using.html#using-pathnames
      Crew          670 192
 
Female 1st              4 141
c:\Rtools\bin>rsync -avz "/cygdrive/c/users/limingc/Downloads/binary" "/cygdrive/c/users/limingc/Documents/"
      2nd            13 93
sending incremental file list
      3rd            106 90
binary/
      Crew            3  20
binary/procexp.exe
> str(Titanic)
 
table [1:4, 1:2, 1:2, 1:2] 0 0 35 0 0 0 17 0 118 154 ...
sent 1767277 bytes received 35 bytes 3534624.00 bytes/sec
- attr(*, "dimnames")=List of 4
total size is 8036311 speedup is 4.55
   ..$ Class   : chr [1:4] "1st" "2nd" "3rd" "Crew"
 
   ..$ Sex    : chr [1:2] "Male" "Female"
c:\Rtools\bin>
  ..$ Age     : chr [1:2] "Child" "Adult"
</pre>
  ..$ Survived: chr [1:2] "No" "Yes"
 
> x <- ftable(mtcars[c("cyl", "vs", "am", "gear")])
Unforunately, if the destination is a network drive, I could get a permission denied (13) error. See also http://superuser.com/questions/69620/rsync-file-permissions-on-windows
> x
 
          gear  3  4  5
=== Install rgdal package (geospatial Data) on ubuntu ===
cyl vs am             
Terminal
4  0  0        0  0  0
<syntaxhighlight lang='bash'>
      1        0 0 1
sudo apt-get install libgdal1-dev libproj-dev
    1  0        1 2  0
</syntaxhighlight>
      1        0  6  1
 
6  0  0        0  0  0
R
      1        0  2  1
<syntaxhighlight lang='rsplus'>
    1  0        2  2  0
install.packages("rgdal")
      1        0  0  0
</syntaxhighlight>
8  0  0      12  0  0
 
      1        0  0  2
=== Set up Emacs on Windows ===
    1  0        0  0  0
Edit the file ''C:\Program Files\GNU Emacs 23.2\site-lisp\site-start.el'' with something like
      1        0  0  0
<pre>
> ftable(x, row.vars = c(2, 4))
(setq-default inferior-R-program-name
        cyl  4    6    8 
              "c:/program files/r/r-2.15.2/bin/i386/rterm.exe")
        am  0  1  0  1  0  1
</pre>
vs gear                     
 
0  3        0  0  0  0 12  0
=== Database ===
  4        0  0  0  2 0  0
[http://blog.revolutionanalytics.com/2017/08/a-modern-database-interface-for-r.html A modern database interface for R]
  5        0  1  0  1  0  2
 
1  3        1  0  2  0  0  0
==== [http://cran.r-project.org/web/packages/RSQLite/index.html RSQLite] ====
  4        2 2 0  0  0
* https://cran.r-project.org/web/packages/RSQLite/vignettes/RSQLite.html
  5        0  1  0  0  0  0
* https://github.com/rstats-db/RSQLite
>  
 
> ## Start with expressions, use table()'s "dnn" to change labels
'''Creating a new database''':
> ftable(mtcars$cyl, mtcars$vs, mtcars$am, mtcars$gear, row.vars = c(2, 4),
<syntaxhighlight lang='rsplus'>
        dnn = c("Cylinders", "V/S", "Transmission", "Gears"))
library(DBI)
 
mydb <- dbConnect(RSQLite::SQLite(), "my-db.sqlite")
dbDisconnect(mydb)
unlink("my-db.sqlite")


# temporary database
          Cylinders    4    6    8 
mydb <- dbConnect(RSQLite::SQLite(), "")
          Transmission  0  1  0  1  0  1
dbDisconnect(mydb)
V/S Gears                             
</syntaxhighlight>
0  3                  0  0  0  0 12  0
    4                  0  0  0  2  0  0
    5                  0  1  0  1  0  2
1  3                  1  0  2  0  0  0
    4                  2  6  2  0  0  0
    5                  0  1  0  0  0  0
</pre>


'''Loading data''':
== tracemem, data type, copy ==
<syntaxhighlight lang='rsplus'>
[http://stackoverflow.com/questions/18359940/r-programming-vector-a1-2-avoid-copying-the-whole-vector/18361181#18361181 How to avoid copying a long vector]
mydb <- dbConnect(RSQLite::SQLite(), "")
dbWriteTable(mydb, "mtcars", mtcars)
dbWriteTable(mydb, "iris", iris)


dbListTables(mydb)
== Tell if the current R is running in 32-bit or 64-bit mode ==
<pre>
8 * .Machine$sizeof.pointer
</pre>
where '''sizeof.pointer''' returns the number of *bytes* in a C SEXP type and '8' means number of bits per byte.


dbListFields(con, "mtcars")
== 32- and 64-bit ==
See [http://cran.r-project.org/doc/manuals/R-admin.html#Choosing-between-32_002d-and-64_002dbit-builds R-admin.html].
* For speed you may want to use a 32-bit build, but to handle large datasets a 64-bit build.
* Even on 64-bit builds of R there are limits on the size of R objects, some of which stem from the use of 32-bit integers (especially in FORTRAN code). For example, the dimensionas of an array are limited to 2^31 -1.
* Since R 2.15.0, it is possible to select '64-bit Files' from the standard installer even on a 32-bit version of Windows (2012/3/30).


dbReadTable(con, "mtcars")
== Handling length 2^31 and more in R 3.0.0 ==
</syntaxhighlight>


'''Queries''':
From R News for 3.0.0 release:
<syntaxhighlight lang='rsplus'>
dbGetQuery(mydb, 'SELECT * FROM mtcars LIMIT 5')


dbGetQuery(mydb, 'SELECT * FROM iris WHERE "Sepal.Length" < 4.6')
''There is a subtle change in behaviour for numeric index values 2^31 and larger. These never used to be legitimate and so were treated as NA, sometimes with a warning. They are now legal for long vectors so there is no longer a warning, and x[2^31] <- y will now extend the vector on a 64-bit platform and give an error on a 32-bit one.  
''


dbGetQuery(mydb, 'SELECT * FROM iris WHERE "Sepal.Length" < :x', params = list(x = 4.6))
In R 2.15.2, if I try to assign a vector of length 2^31, I will get an error
<pre>
> x <- seq(1, 2^31)
Error in from:to : result would be too long a vector
</pre>


res <- dbSendQuery(con, "SELECT * FROM mtcars WHERE cyl = 4")
However, for R 3.0.0 (tested on my 64-bit Ubuntu with 16GB RAM. The R was compiled by myself):
dbFetch(res)
<pre>
</syntaxhighlight>
> system.time(x <- seq(1,2^31))
  user  system elapsed
  8.604  11.060 120.815
> length(x)
[1] 2147483648
> length(x)/2^20
[1] 2048
> gc()
            used    (Mb) gc trigger    (Mb)  max used    (Mb)
Ncells    183823    9.9    407500    21.8    350000    18.7
Vcells 2147764406 16386.2 2368247221 18068.3 2148247383 16389.9
>
</pre>
Note:
# 2^31 length is about 2 Giga length. It takes about 16 GB (2^31*8/2^20 MB) memory.
# On Windows, it is almost impossible to work with 2^31 length of data if the memory is less than 16 GB because virtual disk on Windows does not work well. For example, when I tested on my 12 GB Windows 7, the whole Windows system freezes for several minutes before I force to power off the machine.
# My slide in http://goo.gl/g7sGX shows the screenshots of running the above command on my Ubuntu and RHEL machines. As you can see the linux is pretty good at handling large (> system RAM) data. That said, as long as your linux system is 64-bit, you can possibly work on large data without too much pain.
# For large dataset, it makes sense to use database or specially crafted packages like [http://cran.r-project.org/web/packages/bigmemory/ bigmemory] or [http://cran.r-project.org/web/packages/ff/ ff] or [https://privefl.github.io/bigstatsr/ bigstatsr].
# [https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17330 [[<- for index 2^31 fails]
 
== NA in index ==
* Question: what is seq(1, 3)[c(1, 2, NA)]?


'''Batched queries''':
Answer: It will reserve the element with NA in indexing and return the value NA for it.
<syntaxhighlight lang='rsplus'>
dbClearResult(rs)
rs <- dbSendQuery(mydb, 'SELECT * FROM mtcars')
while (!dbHasCompleted(rs)) {
  df <- dbFetch(rs, n = 10)
  print(nrow(df))
}


dbClearResult(rs)
* Question: What is TRUE & NA?
</syntaxhighlight>
Answer: NA


'''Multiple parameterised queries''':
* Question: What is FALSE & NA?
<syntaxhighlight lang='rsplus'>
Answer: FALSE
rs <- dbSendQuery(mydb, 'SELECT * FROM iris WHERE "Sepal.Length" = :x')
dbBind(rs, param = list(x = seq(4, 4.4, by = 0.1)))
nrow(dbFetch(rs))
#> [1] 4
dbClearResult(rs)
</syntaxhighlight>


'''Statements''':
* Question: c("A", "B", NA) != "" ?
<syntaxhighlight lang='rsplus'>
Answer: TRUE TRUE NA
dbExecute(mydb, 'DELETE FROM iris WHERE "Sepal.Length" < 4')
#> [1] 0
rs <- dbSendStatement(mydb, 'DELETE FROM iris WHERE "Sepal.Length" < :x')
dbBind(rs, param = list(x = 4.5))
dbGetRowsAffected(rs)
#> [1] 4
dbClearResult(rs)
</syntaxhighlight>


==== [https://cran.r-project.org/web/packages/sqldf/ sqldf] ====
* Question: which(c("A", "B", NA) != "") ?
Manipulate R data frames using SQL. Depends on RSQLite. [http://datascienceplus.com/a-use-of-gsub-reshape2-and-sqldf-with-healthcare-data/ A use of gsub, reshape2 and sqldf with healthcare data]
Answer: 1 2


==== [https://cran.r-project.org/web/packages/RPostgreSQL/index.html RPostgreSQL] ====
* Question: c(1, 2, NA) != "" & !is.na(c(1, 2, NA)) ?
Answer: TRUE TRUE FALSE


==== [[MySQL#Use_through_R|RMySQL]] ====
* Question: c("A", "B", NA) != "" & !is.na(c("A", "B", NA)) ?
* http://datascienceplus.com/bringing-the-powers-of-sql-into-r/
Answer: TRUE TRUE FALSE
* See [[MySQL#Installation|here]] about the installation of the required package ('''libmysqlclient-dev''') in Ubuntu.


==== MongoDB ====
'''Conclusion''': In order to exclude empty or NA for numerical or character data type, we can use '''which()''' or a convenience function '''keep.complete(x) <- function(x) x != "" & !is.na(x)'''. This will guarantee return logical values and not contain NAs.
* http://www.r-bloggers.com/r-and-mongodb/
* http://watson.nci.nih.gov/~sdavis/blog/rmongodb-using-R-with-mongo/


==== odbc ====
Don't just use x != "" OR !is.na(x).


==== RODBC ====
=== Some functions ===
* X %>% [https://tidyr.tidyverse.org/reference/drop_na.html tidyr::drop_na()]
* '''stats::na.omit()''' and '''stats::complete.cases()'''. [https://statisticsglobe.com/na-omit-r-example/ NA Omit in R | 3 Example Codes for na.omit (Data Frame, Vector & by Column)]


==== DBI ====
== Constant and 'L' ==
Add 'L' after a constant. For example,
{{Pre}}
for(i in 1L:n) { }


==== [https://cran.r-project.org/web/packages/dbplyr/index.html dbplyr] ====
if (max.lines > 0L) { }
* To use databases with dplyr, you need to first install dbplyr
* https://db.rstudio.com/dplyr/
* Five commonly used backends: RMySQL, RPostgreSQ, RSQLite, ODBC, bigrquery.
* http://www.datacarpentry.org/R-ecology-lesson/05-r-and-databases.html


'''Create a new SQLite database''':
label <- paste0(n-i+1L, ": ")
<syntaxhighlight lang='rsplus'>
surveys <- read.csv("data/surveys.csv")
plots <- read.csv("data/plots.csv")


my_db_file <- "portal-database.sqlite"
n <- length(x);  if(n == 0L) { }
my_db <- src_sqlite(my_db_file, create = TRUE)
</pre>
 
== Vector/Arrays ==
R indexes arrays from 1 like Fortran, not from 0 like C or Python.
 
=== remove integer(0) ===
[https://stackoverflow.com/a/27980810 How to remove integer(0) from a vector?]
 
=== Append some elements ===
[https://www.r-bloggers.com/2023/09/3-r-functions-that-i-enjoy/ append() and its after argument]


copy_to(my_db, surveys)
=== setNames() ===
copy_to(my_db, plots)
Assign names to a vector
my_db
</syntaxhighlight>


'''Connect to a database''':
<pre>
<syntaxhighlight lang='rsplus'>
z <- setNames(1:3, c("a", "b", "c"))
download.file(url = "https://ndownloader.figshare.com/files/2292171",
# OR
              destfile = "portal_mammals.sqlite", mode = "wb")
z <- 1:3; names(z) <- c("a", "b", "c")
# OR
z <- c("a"=1, "b"=2, "c"=3) # not work if "a", "b", "c" is like x[1], x[2], x[3].
</pre>


library(dbplyr)
== Factor ==
library(dplyr)
=== labels argument ===
mammals <- src_sqlite("portal_mammals.sqlite")
We can specify the factor levels and new labels using the factor() function.
</syntaxhighlight>


'''Querying the database with the SQL syntax''':
{{Pre}}
<syntaxhighlight lang='rsplus'>
sex <- factor(sex, levels = c("0", "1"), labels = c("Male", "Female"))
tbl(mammals, sql("SELECT year, species_id, plot_id FROM surveys"))
drug_treatment <- factor(drug_treatment, levels = c("Placebo", "Low dose", "High dose"))
</syntaxhighlight>
health_status <- factor(health_status, levels = c("Healthy", "Alzheimer's"))


'''Querying the database with the dplyr syntax''':
factor(rev(letters[1:3]), labels = c("A", "B", "C"))
<syntaxhighlight lang='rsplus'>
# C B A
surveys <- tbl(mammals, "surveys")
# Levels: A B C
surveys %>%
</pre>
    select(year, species_id, plot_id)
head(surveys, n = 10)


show_query(head(surveys, n = 10)) # show which SQL commands are actually sent to the database
=== Create a factor/categorical variable from a continuous variable: cut() and dplyr::case_when() ===
* [https://www.spsanderson.com/steveondata/posts/2024-03-20/index.html Mastering Data Segmentation: A Guide to Using the cut() Function in R]
:<syntaxhighlight lang='r'>
cut(
    c(0, 10, 30),
    breaks = c(0, 30, 50, Inf),
    labels = c("Young", "Middle-aged", "Elderly")
) # Default include.lowest = FALSE
# [1] <NA>  Young Young
</syntaxhighlight>
</syntaxhighlight>
* https://dplyr.tidyverse.org/reference/case_when.html
* [https://rpubs.com/DaveRosenman/ifelsealternative Using dplyr’s mutate and case_when functions as alternative for if else statement]
* [http://www.datasciencemadesimple.com/case-statement-r-using-case_when-dplyr/ Case when in R using case_when() Dplyr – case_when in R]
* [https://predictivehacks.com/how-to-convert-continuous-variables-into-categorical-by-creating-bins/ How To Convert Continuous Variables Into Categorical By Creating Bins]
<ul>
<li>[https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/cut ?cut]
{{Pre}}
set.seed(1)
x <- rnorm(100)
facVar <- cut(x, c(min(x), -1, 1, max(x)), labels = c("low", "medium", "high"))
table(facVar, useNA = "ifany")
facVar
#  low medium  high  <NA>
#    10    74    15      1
</pre>
Note the option '''include.lowest = TRUE''' is needed when we use cut() + quantile(); otherwise the smallest data will become NA since the intervals have the format '''(a, b]'''.
<pre>
x2 <- cut(x, quantile(x, 0:2/2), include.lowest = TRUE) # split x into 2 levels
x2 <- cut(x, quantile(x, 0:3/3), include.lowest = TRUE) # split x into 3 levels


'''Simple database queries''':
library(tidyverse); library(magrittr)
<syntaxhighlight lang='rsplus'>
set.seed(1)
surveys %>%
breaks <- quantile(runif(100), probs=seq(0, 1, len=20))
  filter(weight < 5) %>%
x <- runif(50)
  select(species_id, sex, weight)
bins <- cut(x, breaks=unique(breaks), include.lowest=T, right=T)
</syntaxhighlight>


'''Laziness''' (instruct R to stop being lazy):
data.frame(sc=x, bins=bins) %>%
<syntaxhighlight lang='rsplus'>
  group_by(bins) %>%  
data_subset <- surveys %>%
   summarise(n=n()) %>%  
   filter(weight < 5) %>%
   ggplot(aes(x = bins, y = n)) +
   select(species_id, sex, weight) %>%
    geom_col(color = "black", fill = "#90AACB") +
   collect()
    theme_minimal() +
</syntaxhighlight>
    theme(axis.text.x = element_text(angle = 90)) +
    theme(legend.position = "none") + coord_flip()
</pre>
<li>[https://www.spsanderson.com/steveondata/posts/2024-03-20/index.html A Guide to Using the cut() Function in R]
<li>[https://youtu.be/7oyiPBjLAWY?t=2480 tibble object]
{{Pre}}
library(tidyverse)
tibble(age_yrs = c(0, 4, 10, 15, 24, 55),
      age_cat = case_when(
          age_yrs < 2 ~ "baby",
          age_yrs < 13 ~ "kid",
          age_yrs < 20 ~ "teen",
          TRUE        ~ "adult")
)
</pre>
</li>
<li>[https://youtu.be/JsNqXLl3eFc?t=96 R tip: Learn dplyr’s case_when() function]
<pre>
case_when(
  condition1 ~ value1,  
  condition2 ~ value2,
  TRUE ~ ValueAnythingElse
)
# Example
case_when(
  x %%2 == 0 ~ "even",
  x %%2 == 1 ~ "odd",
   TRUE ~ "Neither even or odd"
)
</pre>
<li>
</ul>


'''Complex database queries''':
=== How to change one of the level to NA ===
<syntaxhighlight lang='rsplus'>
https://stackoverflow.com/a/25354985. Note that the factor level is removed.
plots <- tbl(mammals, "plots")
<pre>
plots # # The plot_id column features in the plots table
x <- factor(c("a", "b", "c", "NotPerformed"))
levels(x)[levels(x) == 'NotPerformed'] <- NA
</pre>


surveys # The plot_id column also features in the surveys table
[https://webbedfeet.netlify.app/post/creating-missing-values-in-factors/ Creating missing values in factors]


# Join databases method 1
=== Concatenating two factor vectors ===
plots %>%
Not trivial. [https://stackoverflow.com/a/5068939 How to concatenate factors, without them being converted to integer level?].
  filter(plot_id == 1) %>%
<pre>
  inner_join(surveys) %>%
unlist(list(f1, f2))
  collect()
# unlist(list(factor(letters[1:5]), factor(letters[5:2])))
</syntaxhighlight>
</pre>


==== NoSQL ====
=== droplevels() ===
[https://ropensci.org/technotes/2018/01/25/nodbi/ nodbi: the NoSQL Database Connector]
[https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/droplevels droplevels()]: drop unused levels from a factor or, more commonly, from factors in a data frame.


=== Github ===
=== factor(x , levels = ...) vs levels(x) <-  ===
<span style="color: red">Note [https://stat.ethz.ch/R-manual/R-devel/library/base/html/levels.html levels(x)] is to set/rename levels, not reorder.</span> Use <s>'''relevel()'''</s> or '''factor()''' to reorder.


==== R source  ====
{| class="wikitable"
https://github.com/wch/r-source/ Daily update, interesting, should be visited every day. Clicking '''1000+ commits''' to look at daily changes.
|-
| [https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/levels levels()]</br>[https://www.rdocumentation.org/packages/plyr/versions/1.8.9/topics/revalue plyr::revalue()]</br>[https://rdocumentation.org/packages/forcats/versions/1.0.0/topics/fct_recode forcats::fct_recode()]
| rename levels
|-
| [https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/factor factor(, levels)]
| reorder levels
|}


If we are interested in a certain branch (say 3.2), look for R-3-2-branch.
<syntaxhighlight lang='rsplus'>
sizes <- factor(c("small", "large", "large", "small", "medium"))
sizes
#> [1] small  large  large  small  medium
#> Levels: large medium small


==== R packages (only) source (metacran) ====
sizes2 <- factor(sizes, levels = c("small", "medium", "large")) # reorder levels but data is not changed
* https://github.com/cran/ by [https://github.com/gaborcsardi Gábor Csárdi], the author of '''[http://igraph.org/ igraph]''' software.
sizes2
# [1] small  large  large  small  medium
# Levels: small medium large


==== Bioconductor packages source ====
sizes3 <- sizes
<strike>[https://stat.ethz.ch/pipermail/bioc-devel/2015-June/007675.html Announcement], https://github.com/Bioconductor-mirror </strike>
levels(sizes3) <- c("small", "medium", "large") # rename, not reorder
                                                # large -> small
                                                # medium -> medium
                                                # small -> large
sizes3
# [1] large  small  small  large  medium
# Levels: small medium large
</syntaxhighlight>
A regression example.
<syntaxhighlight lang='rsplus'>
set.seed(1)
x <- sample(1:2, 500, replace = TRUE)
y <- round(x + rnorm(500), 3)
x <- as.factor(x)
sample_data <- data.frame(x, y)
# create linear model
summary(lm( y~x, sample_data))
# Coefficients:
#            Estimate Std. Error t value Pr(>|t|)   
# (Intercept)  0.96804    0.06610  14.65  <2e-16 ***
# x2          0.99620    0.09462  10.53  <2e-16 ***


==== Send local repository to Github in R by using reports package ====
# Wrong way when we want to change the baseline level to '2'
http://www.youtube.com/watch?v=WdOI_-aZV0Y
# No change on the model fitting except the apparent change on the variable name in the printout
levels(sample_data$x) <- c("2", "1")
summary(lm( y~x, sample_data))
# Coefficients:
#            Estimate Std. Error t value Pr(>|t|)   
# (Intercept)  0.96804    0.06610  14.65  <2e-16 ***
# x1          0.99620    0.09462  10.53  <2e-16 ***


==== My collection ====
# Correct way if we want to change the baseline level to '2'
* https://github.com/arraytools
# The estimate was changed by flipping the sign from the original data
* https://gist.github.com/4383351 heatmap using leukemia data
sample_data$x <- relevel(x, ref = "2")
* https://gist.github.com/4382774 heatmap using sequential data
summary(lm( y~x, sample_data))
* https://gist.github.com/4484270 biocLite
# Coefficients:
#            Estimate Std. Error t value Pr(>|t|)   
# (Intercept)  1.96425    0.06770  29.01  <2e-16 ***
# x1          -0.99620    0.09462  -10.53  <2e-16 ***
</syntaxhighlight>


==== How to download ====
=== stats::relevel() ===
[https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/relevel relevel]. This function can only be used to change the '''reference level''' of a factor variable. '''It does not directly create an arbitrary order of levels'''. That is, it is useful in lm() or aov(), etc.


Clone ~ Download.  
=== reorder(), levels() and boxplot() ===
* Command line
<ul>
<li>[https://www.r-bloggers.com/2023/09/how-to-reorder-boxplots-in-r-a-comprehensive-guide/ How to Reorder Boxplots in R: A Comprehensive Guide] (tapply() method, simple & effective)
<li>[https://stat.ethz.ch/R-manual/R-devel/library/stats/html/reorder.factor.html reorder()].This is useful in barplot (ggplot2::geom_col()) where we want to sort the bars by a numerical variable.
<pre>
<pre>
git clone https://gist.github.com/4484270.git
# Syntax:
# newFac <- with(df, reorder(fac, vec, FUN=mean)) # newFac is like fac except it has a new order
 
(bymedian <- with(InsectSprays, reorder(spray, count, median)) )
class(bymedian)
levels(bymedian)
boxplot(count ~ bymedian, data = InsectSprays,
        xlab = "Type of spray", ylab = "Insect count",
        main = "InsectSprays data", varwidth = TRUE,
        col = "lightgray") # boxplots are sorted according to the new levels
boxplot(count ~ spray, data = InsectSprays,
        xlab = "Type of spray", ylab = "Insect count",
        main = "InsectSprays data", varwidth = TRUE,
        col = "lightgray") # not sorted
</pre>
</pre>
This will create a subdirectory called '4484270' with all cloned files there.
<li>[http://www.deeplytrivial.com/2020/05/statistics-sunday-my-2019-reading.html Statistics Sunday: My 2019 Reading] (reorder function)
</ul>


* Within R
=== factor() vs ordered() ===
<pre>
<pre>
library(devtools)
factor(levels=c("a", "b", "c"), ordered=TRUE)
source_gist("4484270")
# ordered(0)
# Levels: a < b < c
 
factor(levels=c("a", "b", "c"))
# factor(0)
# Levels: a b c
 
ordered(levels=c("a", "b", "c"))
# Error in factor(x, ..., ordered = TRUE) :
#  argument "x" is missing, with no default
</pre>
</pre>
or
 
First download the json file from
== Data frame ==
https://api.github.com/users/MYUSERLOGIN/gists
* http://adv-r.had.co.nz/Data-structures.html#data-frames. '''A data frame is a list of equal-length vectors'''. So a data frame is not a vector nor a matrix though it looks like a matrix.
and then
* http://blog.datacamp.com/15-easy-solutions-data-frame-problems-r/
 
=== stringsAsFactors = FALSE ===
http://www.win-vector.com/blog/2018/03/r-tip-use-stringsasfactors-false/
 
We can use '''options(stringsAsFactors=FALSE)''' forces R to import character data as character objects.
 
In R 4.0.0, [https://developer.r-project.org/Blog/public/2020/02/16/stringsasfactors/ stringAsFactors=FALSE] will be default. This also affects read.table() function.
 
=== check.names = FALSE ===
Note this option will not affect rownames. So if the rownames contains special symbols, like dash, space, parentheses, etc, they will not be modified.
<pre>
<pre>
library(RJSONIO)
> data.frame("1a"=1:2, "2a"=1:2, check.names = FALSE)
x <- fromJSON("~/Downloads/gists.json")
   1a 2a
setwd("~/Downloads/")
1  1  1
gist.id <- lapply(x, "[[", "id")
2  2  2
lapply(gist.id, function(x){
> data.frame("1a"=1:2, "2a"=1:2) # default
   cmd <- paste0("git clone https://gist.github.com/", x, ".git")
   X1a X2a
   system(cmd)
1  1  1
})
2  2  2
</pre>
</pre>


==== Jekyll ====
=== Create unique rownames: make.unique() ===
[http://statistics.rainandrhino.org/2015/12/15/jekyll-r-blogger-knitr-hyde.html An Easy Start with Jekyll, for R-Bloggers]
<pre>
groupCodes <- c(rep("Cont",5), rep("Tre1",5), rep("Tre2",5))
rownames(mydf) <- make.unique(groupCodes)
</pre>


=== Connect R with Arduino ===
=== data.frame() will change rownames ===
* http://lamages.blogspot.com/2012/10/connecting-real-world-to-r-with-arduino.html
<pre>
* http://jean-robert.github.io/2012/11/11/thermometer-R-using-Arduino-Java.html
class(df2)
* http://bio7.org/?p=2049
# [1] "matrix" "array"
* http://www.rforge.net/Arduino/svn.html
rownames(df2)[c(9109, 44999)]
# [1] "A1CF"    "A1BG-AS1"
rownames(data.frame(df2))[c(9109, 44999)]
# [1] "A1CF"    "A1BG.AS1"
</pre>
 
=== Print a data frame without rownames ===
<pre>
# Method 1.  
rownames(df1) <- NULL


=== Android App ===
# Method 2.
* [https://play.google.com/store/apps/details?id=appinventor.ai_RInstructor.R2&hl=zh_TW R Instructor] $4.84
print(df1, row.names = FALSE)
* [http://realxyapp.blogspot.tw/2010/12/statistical-distribution.html Statistical Distribution] (Not R related app)
</pre>
 
=== Convert data frame factor columns to characters ===
[https://stackoverflow.com/questions/2851015/convert-data-frame-columns-from-factors-to-characters Convert data.frame columns from factors to characters]
{{Pre}}
# Method 1:
bob <- data.frame(lapply(bob, as.character), stringsAsFactors=FALSE)
 
# Method 2:
bob[] <- lapply(bob, as.character)
</pre>
 
[https://stackoverflow.com/a/2853231 To replace only factor columns]:
<pre>
# Method 1:
i <- sapply(bob, is.factor)
bob[i] <- lapply(bob[i], as.character)


=== Common plots tips ===
# Method 2:
==== Grouped boxplots ====
library(dplyr)
* [http://sphaerula.com/legacy/R/boxplotTwoWay.html Box Plots of Two-Way Layout]
bob %>% mutate_if(is.factor, as.character) -> bob
* [http://r-video-tutorial.blogspot.com/2013/06/box-plot-with-r-tutorial.html Step by step to create a grouped boxplots]
</pre>
** 'at' parameter in boxplot() to change the equal spaced boxplots
** embed par(mar=) in boxplot()
** mtext(line=) to solve the problem the xlab overlapped with labels.


==== [https://www.samruston.co.uk/ Weather Time Line] ====
=== Sort Or Order A Data Frame ===
The plot looks similar to a boxplot though it is not. See a [https://www.samruston.co.uk/images/screens/screen_2.png screenshot] on Android by [https://www.samruston.co.uk/ Sam Ruston].
[https://howtoprogram.xyz/2018/01/07/r-how-to-order-a-data-frame/ How To Sort Or Order A Data Frame In R]
# df[order(df$x), ], df[order(df$x, decreasing = TRUE), ], df[order(df$x, df$y), ]
# library(plyr); arrange(df, x), arrange(df, desc(x)), arrange(df, x, y)
# library(dplyr); df %>% arrange(x),df %>% arrange(x, desc(x)), df %>% arrange(x, y)
# library(doBy); order(~x, df), order(~ -x, df), order(~ x+y, df)


==== Horizontal bar plot ====
=== data.frame to vector ===
<syntaxhighlight lang='rsplus'>
<pre>
library(ggplot2)
df <- data.frame(x = c(1, 2, 3), y = c(4, 5, 6))
dtf <- data.frame(x = c("ETB", "PMA", "PER", "KON", "TRA",
                        "DDR", "BUM", "MAT", "HED", "EXP"),
                  y = c(.02, .11, -.01, -.03, -.03, .02, .1, -.01, -.02, 0.06))
ggplot(dtf, aes(x, y)) +
  geom_bar(stat = "identity", aes(fill = x), show.legend = FALSE) +
  coord_flip() + xlab("") + ylab("Fold Change") 
</syntaxhighlight>


[[File:Ggplot2bar.svg|300px]]
class(df)
# [1] "data.frame"
class(t(df))
# [1] "matrix" "array"
class(unlist(df))
# [1] "numeric"


==== Include bar values in a barplot ====
# Method 1: Convert data frame to matrix using as.matrix()
* https://stats.stackexchange.com/questions/3879/how-to-put-values-over-bars-in-barplot-in-r.
# and then Convert matrix to vector using as.vector() or c()
* [http://stackoverflow.com/questions/12481430/how-to-display-the-frequency-at-the-top-of-each-factor-in-a-barplot-in-r barplot(), text() and axis()] functions. The data can be from a table() object.
mat <- as.matrix(df)
* [https://stackoverflow.com/questions/11938293/how-to-label-a-barplot-bar-with-positive-and-negative-bars-with-ggplot2 How to label a barplot bar with positive and negative bars with ggplot2]
vec1 <- as.vector(mat)   # [1] 1 2 3 4 5 6
vec2 <- c(mat)


Use text().  
# Method 2: Convert data frame to matrix using t()/transpose
# and then Convert matrix to vector using as.vector() or c()
vec3 <- as.vector(t(df)) # [1] 1 4 2 5 3 6
vec4 <- c(t(df))


Or use geom_text() if we are using the ggplot2 package. See an example [http://dsgeek.com/2014/09/19/Customizingggplot2charts.html here] or [https://rpubs.com/escott8908/RGC_Ch3_Gar_Graphs this].
# Not working
as.vector(df)
# $x
# [1] 1 2 3
# $y
# [1] 4 5 6


For stacked barplot, see [http://t-redactyl.io/blog/2016/01/creating-plots-in-r-using-ggplot2-part-4-stacked-bar-plots.html this] post.
# Method 3: unlist() - easiest solution
unlist(df)
# x1 x2 x3 y1 y2 y3
#  1  2  3  4  5  6
unlist(data.frame(df), use.names = F) # OR dplyr::pull()
# [1] 1 2 3 4 5 6
</pre>
Q: Why as.vector(df) cannot convert a data frame into a vector?


==== Grouped barplots ====
A: The as.vector function cannot be used directly on a data frame to convert it into a vector because a data frame is a list of vectors (i.e., its columns) and '''as.vector only removes the attributes of an object to create a vector'''. When you apply as.vector to a data frame, R does not know how to concatenate these independent columns (which could be of different types) into a single vector. Therefore, it doesn’t perform the operation. Therefore as.vector() returns the underlying list structure of the data frame instead of converting it into a vector.
* https://www.r-graph-gallery.com/barplot/, https://www.r-graph-gallery.com/48-grouped-barplot-with-ggplot2/ (simpliest, no error bars)<syntaxhighlight lang='rsplus'>
 
library(ggplot2)
However, when you transpose the data frame using t(), it gets converted into a matrix. A matrix in R is a vector with dimensions. Therefore, all elements of the matrix must be of the same type. If they are not, R will coerce them to be so. Once you have a matrix, as.vector() can easily convert it into a vector because all elements are of the same type.
# mydata <- data.frame(OUTGRP, INGRP, value)
ggplot(mydata, aes(fill=INGRP, y=value, x=OUTGRP)) +
      geom_bar(position="dodge", stat="identity")
</syntaxhighlight>
* https://datascienceplus.com/building-barplots-with-error-bars/. The error bars define 2 se (95% interval) for the black-and-white version and 1 se (68% interval) for ggplots. Be careful.<syntaxhighlight lang='rsplus'>
> 1 - 2*(1-pnorm(1))
[1] 0.6826895
> 1 - 2*(1-pnorm(1.96))
[1] 0.9500042
</syntaxhighlight>
* [http://stackoverflow.com/questions/27466035/adding-values-to-barplot-of-table-in-r two bars in one factor] (stack). The data can be a 2-dim matrix with numerical values.
* [http://stats.stackexchange.com/questions/3879/how-to-put-values-over-bars-in-barplot-in-r two bars in one factor], [https://stats.stackexchange.com/questions/14118/drawing-multiple-barplots-on-a-graph-in-r Drawing multiple barplots on a graph in R] (next to each other)
** [https://datascienceplus.com/building-barplots-with-error-bars/ Include error bars]
* [http://bl.ocks.org/patilv/raw/7360425/ Three variables] barplots
* [https://peltiertech.com/stacked-bar-chart-alternatives/ More alternatives] (not done by R)


==== Math expression ====
=== Using cbind() to merge vectors together? ===
* [https://www.rdocumentation.org/packages/grDevices/versions/3.5.0/topics/plotmath ?plotmath]
It’s a common mistake to try and create a data frame by cbind()ing vectors together. This doesn’t work because cbind() will create a matrix unless one of the arguments is already a data frame. Instead use data.frame() directly. See [http://adv-r.had.co.nz/Data-structures.html#data-frames Advanced R -> Data structures] chapter.  
* https://stackoverflow.com/questions/4973898/combining-paste-and-expression-functions-in-plot-labels
* http://vis.supstat.com/2013/04/mathematical-annotation-in-r/
* https://andyphilips.github.io/blog/2017/08/16/mathematical-symbols-in-r-plots.html


<syntaxhighlight lang='rsplus'>
=== cbind NULL and data.frame ===
# Expressions
[https://9to5tutorial.com/cbind-can-t-combine-null-with-dataframe cbind can't combine NULL with dataframe]. Add as.matrix() will fix the problem.
plot(x,y, xlab = expression(hat(x)[t]),
    ylab = expression(phi^{rho + a}),
    main = "Pure Expressions")


# Expressions with Spacing
=== merge ===
# '~' is to add space and '*' is to squish characters together
* [https://thomasadventure.blog/posts/r-merging-datasets/ All You Need To Know About Merging (Joining) Datasets in R]. If we like to merge/join by the rownames, we can use '''dplyr::rownames_to_column()'''; see [https://stackoverflow.com/a/42418771 dplyr left_join() by rownames].
plot(1:10, xlab= expression(Delta * 'C'))
* [https://www.geeksforgeeks.org/merge-dataframes-by-row-names-in-r/ Merge DataFrames by Row Names in R]
plot(x,y, xlab = expression(hat(x)[t] ~ z ~ w),
* [https://jozefhajnala.gitlab.io/r/r006-merge/ How to perform merges (joins) on two or more data frames with base R, tidyverse and data.table]
    ylab = expression(phi^{rho + a} * z * w),
* [https://www.dummies.com/programming/r/how-to-use-the-merge-function-with-data-sets-in-r/ How to understand the different types of merge]
    main = "Pure Expressions with Spacing")


# Expressions with Text
Special character in the matched variable can create a trouble when we use merge() or dplyr::inner_join(). I guess R internally turns df2 (a matrix but not a data frame) to a data frame (so rownames are changed if they contain special character like "-"). This still does not explain the situation when I
plot(x,y,
<pre>
    xlab = expression(paste("Text here ", hat(x), " here ", z^rho, " and here")),
class(df1); class(df2)
     ylab = expression(paste("Here is some text of ", phi^{rho})),  
# [1] "data.frame"  # 2 x 2
     main = "Expressions with Text")
# [1] "matrix" "array" # 52439 x 2
rownames(df1)
# [1] "A1CF"    "A1BG-AS1"
merge(df1, df2[c(9109, 44999), ], by=0)
#  Row.names 786-0 A498 ACH-000001 ACH-000002
# 1  A1BG-AS1    0    0  7.321358  6.908333
# 2     A1CF    0    0  3.011470  1.189578
merge(df1, df2[c(9109, 38959:44999), ], by= 0) # still correct
merge(df1, df2[c(9109, 38958:44999), ], by= 0) # same as merge(df1, df2, by=0)
#  Row.names 786-0 A498 ACH-000001 ACH-000002
# 1     A1CF    0    0    3.01147  1.189578
rownames(df2)[38958:38959]
# [1] "ITFG2-AS1"  "ADGRD1-AS1"


# Substituting Expressions
rownames(df1)[2] <- "A1BGAS1"
plot(x,y,
rownames(df2)[44999] <- "A1BGAS1"
    xlab = substitute(paste("Here is ", pi, " = ", p), list(p = py)),
merge(df1, df2, by= 0)
    ylab = substitute(paste("e is = ", e ), list(e = ee)),
#  Row.names 786-0 A498 ACH-000001 ACH-000002
     main = "Substituted Expressions")
# 1  A1BGAS1    0    0  7.321358  6.908333
</syntaxhighlight>
# 2     A1CF    0    0  3.011470  1.189578
</pre>


==== Rotating x axis labels for barplot ====
=== is.matrix: data.frame is not necessarily a matrix ===
https://stackoverflow.com/questions/10286473/rotating-x-axis-labels-in-r-for-barplot
See [https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/matrix ?matrix]. is.matrix returns TRUE '''if x is a vector and has a "dim" attribute of length 2''' and FALSE otherwise.
<syntaxhighlight lang='rsplus'>
barplot(mytable,main="Car makes",ylab="Freqency",xlab="make",las=2)
</syntaxhighlight>


==== Set R plots x axis to show at y=0 ====
An example that is a data frame (is.data.frame() returns TRUE) but not a matrix (is.matrix() returns FALSE) is an object returned by
https://stackoverflow.com/questions/3422203/set-r-plots-x-axis-to-show-at-y-0
<pre>
<syntaxhighlight lang='rsplus'>
X <- data.frame(x=1:2, y=3:4)
plot(1:10, rnorm(10), ylim=c(0,10), yaxs="i")
</pre>
</syntaxhighlight>
The 'X' object is NOT a vector and it does NOT have the "dim" attribute. It has only 3 attributes: "names", "row.names" & "class". Note that dim() function works fine and returns correctly though there is not "dim" attribute.
 
Another example that is a data frame but not a matrix is the built-in object ''cars''; see ?matrix. It is not a vector


==== Different colors of axis labels in barplot ====
=== Convert a data frame to a matrix: as.matrix() vs data.matrix() ===
See [https://stackoverflow.com/questions/18839731/vary-colors-of-axis-labels-in-r-based-on-another-variable Vary colors of axis labels in R based on another variable]
If I have a data frame X which recorded the time of some files.


Method 1: Append labels for the 2nd, 3rd, ... color gradually because 'col.axis' argument cannot accept more than one color.
* is.data.frame(X) shows TRUE but is.matrix(X) show FALSE
<syntaxhighlight lang='rsplus'>
* as.matrix(X) will keep the time mode. The returned object is not a data frame anymore.
tN <- table(Ni <- stats::rpois(100, lambda = 5))
* [https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/data.matrix data.matrix(X)] will convert the time to numerical values. So use data.matrix() if the data is numeric. The returned object is not a data frame anymore.
r <- barplot(tN, col = rainbow(20))
axis(1, 1, LETTERS[1], col.axis="red", col="red")
axis(1, 2, LETTERS[2], col.axis="blue", col = "blue")
</syntaxhighlight>


Method 2: text() which can accept multiple colors in 'col' parameter but we need to find out the (x, y) by ourselves.
<syntaxhighlight lang='r'>
<syntaxhighlight lang='rsplus'>
# latex directory contains cache files from knitting an rmarkdown file
barplot(tN, col = rainbow(20), axisnames = F)
X <- list.files("latex/", full.names = T) %>%
text(4:6, par("usr")[3]-2 , LETTERS[4:6], col=c("black","red","blue"), xpd=TRUE)
    grep("RData", ., value=T) %>%
    file.info() %>% 
    `[`("mtime")
X %>% is.data.frame() # TRUE
X %>% is.matrix() # FALSE
X %>% as.matrix() %>% is.matrix() # TRUE
X %>% data.matrix() %>% is.matrix() # TRUE
X %>% as.matrix() %>% "["(1:2, ) # timestamps
X %>% data.matrix() %>% "["(1:2, ) # numeric
</syntaxhighlight>
</syntaxhighlight>


==== Use [https://www.rdocumentation.org/packages/graphics/versions/3.4.3/topics/text text()] to draw labels on X/Y-axis including rotation ====
* The '''as.matrix()''' function is used to coerce an object into a matrix. It can be used with various types of R objects, such as vectors, data frames, and arrays.
* adj = 1 means top/rigth alignment. The default is to center the text.
* The '''data.matrix()''' function is specifically designed for converting a data frame into a matrix by coercing all columns to numeric values. If the data frame contains non-numeric columns, such as character or factor columns, data.matrix() will convert them to numeric values if possible (e.g., by converting factors to their integer codes).
* [https://www.rdocumentation.org/packages/graphics/versions/3.4.3/topics/par par("usr")] gives the extremes of the user coordinates of the plotting region of the form c(x1, x2, y1, y2).
* See the following example where as.matrix() and data.matrix() return different resuls.
** par("usr") is determined *after* a plot has been created
<syntaxhighlight lang='r'>
** [http://sphaerula.com/legacy/R/placingTextInPlots.html Example of using the "usr" parameter]
df <- data.frame(a = c(1, 2, 3), b = c("x", "y", "z"))
* https://datascienceplus.com/building-barplots-with-error-bars/
mat <- as.matrix(df)
<syntaxhighlight lang='rsplus'>
mat
par(mar = c(5, 6, 4, 5) + 0.1)
#      a  b 
plot(..., xaxt = "n") # "n" suppresses plotting of the axis; need mtext() and axis() to supplement
# [1,] "1" "x"
text(x = barCenters, y = par("usr")[3] - 1, srt = 45,
# [2,] "2" "y"
    adj = 1, labels = myData$names, xpd = TRUE)
# [3,] "3" "z"
class(mat)
# [1] "matrix" "array"  
mat2 <- data.matrix(df)
mat2
#      a b
# [1,] 1 1
# [2,] 2 2
# [3,] 3 3
class(mat2)
# [1] "matrix" "array"
typeof(mat)
# [1] "character"
typeof(mat2)
# [1] "double"
</syntaxhighlight>
</syntaxhighlight>
* https://www.r-bloggers.com/rotated-axis-labels-in-r-plots/


==== Vertically stacked plots with the same x axis ====
=== matrix vs data.frame ===
https://stackoverflow.com/questions/11794436/stacking-multiple-plots-vertically-with-the-same-x-axis-but-different-y-axes-in
Case 1: colnames() is safer than names() if the object could be a data frame or a matrix.
<pre>
Browse[2]> names(res2$surv.data.new[[index]])
NULL
Browse[2]> colnames(res2$surv.data.new[[index]])
[1] "time"  "status" "treat"  "AKT1"  "BRAF"  "FLOT2"  "MTOR"  "PCK2"  "PIK3CA"
[10] "RAF1" 
Browse[2]> mode(res2$surv.data.new[[index]])
[1] "numeric"
Browse[2]> is.matrix(res2$surv.data.new[[index]])
[1] TRUE
Browse[2]> dim(res2$surv.data.new[[index]])
[1] 991  10
</pre>


=== Time series ===
Case 2:
* [https://www.amazon.com/Applied-Time-Analysis-R-Second/dp/1498734227 Applied Time Series Analysis with R]
{{Pre}}
* [http://www.springer.com/us/book/9780387759586 Time Series Analysis With Applications in R]
ip1 <- installed.packages()[,c(1,3:4)] # class(ip1) = 'matrix'
unique(ip1$Priority)
# Error in ip1$Priority : $ operator is invalid for atomic vectors
unique(ip1[, "Priority"])  # OK


==== Time series stock price plot ====
ip2 <- as.data.frame(installed.packages()[,c(1,3:4)], stringsAsFactors = FALSE) # matrix -> data.frame
* http://blog.revolutionanalytics.com/2015/08/plotting-time-series-in-r.html (ggplot2, xts, [https://rstudio.github.io/dygraphs/ dygraphs])
unique(ip2$Priority)     # OK
* https://timelyportfolio.github.io/rCharts_time_series/history.html
</pre>


<syntaxhighlight lang='rsplus'>
The length of a matrix and a data frame is different.
library(quantmod)
{{Pre}}
getSymbols("AAPL")
> length(matrix(1:6, 3, 2))
getSymbols("IBM") # similar to AAPL
[1] 6
getSymbols("CSCO") # much smaller than AAPL, IBM
> length(data.frame(matrix(1:6, 3, 2)))
getSymbols("DJI") # Dow Jones, huge
[1] 2
chart_Series(Cl(AAPL), TA="add_TA(Cl(IBM), col='blue', on=1); add_TA(Cl(CSCO), col = 'green', on=1)",  
> x[1]
    col='orange', subset = '2017::2017-08')
  X1
1  1
2  2
3  3
4  4
5  5
6  6
> x[[1]]
[1] 1 2 3 4 5 6
</pre>
So the length of a data frame is the number of columns. When we use sapply() function on a data frame, it will apply to each column of the data frame.


tail(Cl(DJI))
=== How to Remove Duplicates ===
</syntaxhighlight>
[https://www.r-bloggers.com/2021/08/how-to-remove-duplicates-in-r-with-example/ How to Remove Duplicates in R with Example]


==== Timeline plot ====
=== Convert a matrix (not data frame) of characters to numeric ===
https://stackoverflow.com/questions/20695311/chronological-timeline-with-points-in-time-and-format-date
[https://stackoverflow.com/a/20791975 Just change the mode of the object]
{{Pre}}
tmp <- cbind(a=c("0.12", "0.34"), b =c("0.567", "0.890")); tmp
    a    b
1 0.12 0.567
2 0.34 0.890
> is.data.frame(tmp) # FALSE
> is.matrix(tmp)    # TRUE
> sum(tmp)
Error in sum(tmp) : invalid 'type' (character) of argument
> mode(tmp)  # "character"


=== Circular plot ===
> mode(tmp) <- "numeric"
* http://freakonometrics.hypotheses.org/20667 which uses https://cran.r-project.org/web/packages/circlize/ circlize] package.
> sum(tmp)
* https://www.biostars.org/p/17728/
[1] 1.917
* [https://cran.r-project.org/web/packages/RCircos/ RCircos] package from CRAN.
</pre>
* [http://www.bioconductor.org/packages/release/bioc/html/OmicCircos.html OmicCircos] from Bioconductor.


=== Word cloud ===
=== Convert Data Frame Row to Vector ===
* [http://www.sthda.com/english/wiki/text-mining-and-word-cloud-fundamentals-in-r-5-simple-steps-you-should-know Text mining and word cloud fundamentals in R : 5 simple steps you should know]
as.numeric() or '''c()'''
* [https://www.displayr.com/alternatives-word-cloud/ 7 Alternatives to Word Clouds for Visualizing Long Lists of Data]
* [https://www.littlemissdata.com/blog/steam-data-art1 Data + Art STEAM Project: Initial Results]


=== World map ===
=== Convert characters to integers ===
[https://www.enchufa2.es/archives/visualising-ssh-attacks-with-r.html Visualising SSH attacks with R] ([https://cran.r-project.org/package=rworldmap rworldmap] and [https://cran.r-project.org/package=rgeolocate rgeolocate] packages)
mode(x) <- "integer"


=== [https://cran.r-project.org/web/packages/DiagrammeR/index.html DiagrammeR] ===
=== Non-Standard Evaluation ===
http://rich-iannone.github.io/DiagrammeR/
[https://thomasadventure.blog/posts/understanding-nse-part1/ Understanding Non-Standard Evaluation. Part 1: The Basics]


=== Venn Diagram ===
=== Select Data Frame Columns in R ===
* limma http://www.ats.ucla.edu/stat/r/faq/venn.htm - only black and white?
This is part of series of DATA MANIPULATION IN R from [https://www.datanovia.com/en/lessons/select-data-frame-columns-in-r/ datanovia.com]
* VennDiagram - input has to be the numbers instead of the original vector?
* http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual#TOC-Venn-Diagrams and the [http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/overLapper.R R code] or the [http://www.bioconductor.org/packages/release/bioc/html/systemPipeR.html Bioc package systemPipeR]
<syntaxhighlight lang='rsplus'>
# systemPipeR package method
library(systemPipeR)
setlist <- list(A=sample(letters, 18), B=sample(letters, 16), C=sample(letters, 20), D=sample(letters, 22), E=sample(letters, 18))
OLlist <- overLapper(setlist[1:3], type="vennsets")
vennPlot(list(OLlist))                           


# R script source method
* pull(): Extract column values as a vector. The column of interest can be specified either by name or by index.
source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/overLapper.R")
* select(): Extract one or multiple columns as a data table. It can be also used to remove columns from the data frame.
setlist <- list(A=sample(letters, 18), B=sample(letters, 16), C=sample(letters, 20), D=sample(letters, 22), E=sample(letters, 18))  
* select_if(): Select columns based on a particular condition. One can use this function to, for example, select columns if they are numeric.
# or (obtained by dput(setlist))
* Helper functions - starts_with(), ends_with(), contains(), matches(), one_of(): Select columns/variables based on their names
setlist <- structure(list(A = c("o", "h", "u", "p", "i", "s", "a", "w",  
"b", "z", "n", "c", "k", "j", "y", "m", "t", "q"), B = c("h",
"r", "x", "y", "b", "t", "d", "o", "m", "q", "g", "v", "c", "u",
"f", "z"), C = c("b", "e", "t", "u", "s", "j", "o", "k", "d",
"l", "g", "i", "w", "n", "p", "a", "y", "x", "m", "z"), D = c("f",
"g", "b", "k", "j", "m", "e", "q", "i", "d", "o", "l", "c", "t",
"x", "r", "s", "u", "w", "a", "z", "n"), E = c("u", "w", "o",
"k", "n", "h", "p", "z", "l", "m", "r", "d", "q", "s", "x", "b",
"v", "t"), F = c("o", "j", "r", "c", "l", "l", "u", "b", "f",
"d", "u", "m", "y", "t", "y", "s", "a", "g", "t", "m", "x", "m"
)), .Names = c("A", "B", "C", "D", "E", "F"))


OLlist <- overLapper(setlist[1:3], type="vennsets")
Another way is to the dollar sign '''$''' operator (?"$") to extract rows or column from a data frame.
counts <- list(sapply(OLlist$Venn_List, length))
<pre>
vennPlot(counts=counts)                          
class(USArrests)  # "data.frame"
</syntaxhighlight>
USArrests$"Assault"
</pre>
Note that for both data frame and matrix objects, we need to use the '''[''' operator to extract columns and/or rows.
<pre>
USArrests[c("Alabama", "Alask"), c("Murder", "Assault")]
#        Murder Assault
# Alabama  13.2    236
# Alaska    10.0    263
USArrests[c("Murder", "Assault")]  # all rows


[[File:Vennplot.png|250px]]
tmp <- data(package="datasets")
class(tmp$results)  # "matrix" "array"
tmp$results[, "Item"]
# Same method can be used if rownames are available in a matrix
</pre>
Note for a '''data.table''' object, we can extract columns using the column names without double quotes.
<pre>
data.table(USArrests)[1:2, list(Murder, Assault)]
</pre>


=== Bump chart/Metro map ===
=== Add columns to a data frame ===
https://dominikkoch.github.io/Bump-Chart/
[https://datasciencetut.com/how-to-add-columns-to-a-data-frame-in-r/ How to add columns to a data frame in R]


=== Amazing plots ===
=== Exclude/drop/remove data frame columns ===
==== New R logo 2/11/2016 ====
* [https://datasciencetut.com/remove-columns-from-a-data-frame/ How to Remove Columns from a data frame in R]
* http://rud.is/b/2016/02/11/plot-the-new-svg-r-logo-with-ggplot2/
* [https://www.listendata.com/2015/06/r-keep-drop-columns-from-data-frame.html R: keep / drop columns from data frame]
* https://www.stat.auckland.ac.nz/~paul/Reports/Rlogo/Rlogo.html
<pre>
<syntaxhighlight lang='rsplus'>
# method 1
library(sp)
df = subset(mydata, select = -c(x,z) )
library(maptools)
library(ggplot2)
library(ggthemes)
# rgeos requires the installation of GEOS from http://trac.osgeo.org/geos/
system("curl http://download.osgeo.org/geos/geos-3.5.0.tar.bz2 | tar jx")
system("cd geos-3.5.0; ./configure; make; sudo make install")
library(rgeos)
r_wkt_gist_file <- "https://gist.githubusercontent.com/hrbrmstr/07d0ccf14c2ff109f55a/raw/db274a39b8f024468f8550d7aeaabb83c576f7ef/rlogo.wkt"
if (!file.exists("rlogo.wkt")) download.file(r_wkt_gist_file, "rlogo.wkt")
rlogo <- readWKT(paste0(readLines("rlogo.wkt", warn=FALSE))) # rgeos
rlogo_shp <- SpatialPolygonsDataFrame(rlogo, data.frame(poly=c("halo", "r"))) # sp
rlogo_poly <- fortify(rlogo_shp, region="poly") # ggplot2
ggplot(rlogo_poly) +
  geom_polygon(aes(x=long, y=lat, group=id, fill=id)) +
  scale_fill_manual(values=c(halo="#b8babf", r="#1e63b5")) +
  coord_equal() +
  theme_map() +
  theme(legend.position="none")
</syntaxhighlight>


==== 3D plot ====
# method 2
Using [https://chitchatr.wordpress.com/2010/06/28/fun-with-persp-function/ persp] function to create the following plot.
drop <- c("x","z")
df = mydata[,!(names(mydata) %in% drop)]


[[File:3dpersp.png|200px]]
# method 3: dplyr
<syntaxhighlight lang='rsplus'>
mydata2 = select(mydata, -a, -x, -y)
### Random pattern
mydata2 = select(mydata, -c(a, x, y))
# Create matrix with random values with dimension of final grid
mydata2 = select(mydata, -a:-y)
  rand <- rnorm(441, mean=0.3, sd=0.1)
mydata2 = mydata[,!grepl("^INC",names(mydata))]
  mat.rand <- matrix(rand, nrow=21)
</pre>
# Create another matrix for the colors. Start by making all cells green
  fill <- matrix("green3", nr = 21, nc = 21)
# Change colors in each cell based on corresponding mat.rand value
  fcol <- fill
  fcol[] <- terrain.colors(40)[cut(mat.rand,
    stats::quantile(mat.rand, seq(0,1, len = 41),
    na.rm=T), include.lowest = TRUE)]
# Create concave surface using expontential function
  x <- -10:10
  y <- x^2
  y <- as.matrix(y)
  y1 <- y
  for(i in 1:20){tmp <- cbind(y,y1); y1 <- tmp[,1]; y <- tmp;}
  mat <- tmp[1:21, 1:21]
# Plot it up!
  persp(1:21, 1:21, t(mat)/10, theta = 90, phi = 35,col=fcol,
    scale = FALSE, axes = FALSE, box = FALSE)


### Organized pattern
=== Remove Rows from the data frame ===
# Same as before
[https://datasciencetut.com/remove-rows-from-the-data-frame-in-r/ Remove Rows from the data frame in R]
  rand <- rnorm(441, mean=0.3, sd=0.1)
# Create concave surface using expontential function
  x <- -10:10
  y <- x^2
  y <- as.matrix(y)
  for(i in 1:20){tmp <- cbind(y,y); y1 <- tmp[,1]; y <- tmp;}
  mat <- tmp[1:21, 1:21]
###Organize rand by y and put into matrix form
  o <- order(rand,as.vector(mat))
  o.tmp <- cbind(rand[o], rev(sort(as.vector(mat))))
  mat.org <- matrix(o.tmp[,1], nrow=21)
  half.1 <- mat.org[,seq(1,21,2)]
  half.2 <- mat.org[,rev(seq(2,20,2))]
  full <- cbind(half.1, half.2)
  full <- t(full)
# Again, create color matrix and populate using rand values
zi <- full[-1, -1] + full[-1, -21] + full[-21,-1] + full[-21, -21]
fill <- matrix("green3", nr = 20, nc = 20)
fcol <- fill
fcol[] <- terrain.colors(40)[cut(zi,
        stats::quantile(zi, seq(0,1, len = 41), na.rm=T),
        include.lowest = TRUE)]
# Plot it up!       
persp(1:21, 1:21, t(mat)/10, theta = 90, phi = 35,col=t(fcol),
    scale = FALSE, axes = FALSE, box = FALSE)
</syntaxhighlight>


==== Christmas tree ====
=== Danger of selecting rows from a data frame ===
http://wiekvoet.blogspot.com/2014/12/merry-christmas.html
<pre>
<syntaxhighlight lang='rsplus'>
> dim(cars)
# http://blogs.sas.com/content/iml/2012/12/14/a-fractal-christmas-tree/
[1] 50 2
# Each row is a 2x2 linear transformation
> data.frame(a=cars[1,], b=cars[2, ])
# Christmas tree
   a.speed a.dist b.speed b.dist
L <- matrix(
1      4      2       4    10
    c(0.03,  0,    0  ,  0.1,
> dim(data.frame(a=cars[1,], b=cars[2, ]))
        0.85, 0.00,  0.00, 0.85,
[1] 1 4
        0.8,   0.00,  0.00, 0.8,
> cars2 = as.matrix(cars)
        0.2,  -0.08,  0.15, 0.22,
> data.frame(a=cars2[1,], b=cars2[2, ])
        -0.2,  0.08,  0.15, 0.22,
      a  b
        0.25, -0.1,   0.12, 0.25,
speed 4  4
        -0.2,   0.1,  0.12, 0.2),
dist  2 10
    nrow=4)
</pre>
# ... and each row is a translation vector
B <- matrix(
    c(0, 0,
        0, 1.5,
        0, 1.5,
        0, 0.85,
        0, 0.85,
        0, 0.3,
        0, 0.4),
    nrow=2)


prob = c(0.02, 0.6,.08, 0.07, 0.07, 0.07, 0.07)
=== Creating data frame using structure() function ===
[https://tomaztsql.wordpress.com/2019/05/27/creating-data-frame-using-structure-function-in-r/ Creating data frame using structure() function in R]


# Iterate the discrete stochastic map
=== Create an empty data.frame ===
N = 1e5 #5 #  number of iterations
https://stackoverflow.com/questions/10689055/create-an-empty-data-frame
x = matrix(NA,nrow=2,ncol=N)
<pre>
x[,1] = c(0,2)   # initial point
# the column types default as logical per vector(), but are then overridden
k <- sample(1:7,N,prob,replace=TRUE) # values 1-7
a = data.frame(matrix(vector(), 5, 3,
              dimnames=list(c(), c("Date", "File", "User"))),
              stringsAsFactors=F)
str(a) # NA but they are logical , not numeric.
a[1,1] <- rnorm(1)
str(a)


for (i in 2:N)
# similar to above
  x[,i] = crossprod(matrix(L[,k[i]],nrow=2),x[,i-1]) + B[,k[i]] # iterate
a <- data.frame(matrix(NA, nrow = 2, ncol = 3))


# Plot the iteration history
# different data type
png('card.png')
a <- data.frame(x1 = character(),
par(bg='darkblue',mar=rep(0,4))  
                x2 = numeric(),
plot(x=x[1,],y=x[2,],
                x3 = factor(),
    col=grep('green',colors(),value=TRUE),
                stringsAsFactors = FALSE)
    axes=FALSE,
</pre>
    cex=.1,
    xlab='',
    ylab='' )#,pch='.')


bals <- sample(N,20)
=== Objects from subsetting a row in a data frame vs matrix ===
points(x=x[1,bals],y=x[2,bals]-.1,
* [https://stackoverflow.com/a/23534617 Warning: row names were found from a short variable and have been discarded]
    col=c('red','blue','yellow','orange'),
<ul>
     cex=2,
<li>Subsetting creates repeated rows. This will create unexpected rownames.
    pch=19
<pre>
)
R> z <- data.frame(x=1:3, y=2:4)
text(x=-.7,y=8,
R> rownames(z) <- letters[1:3]
    labels='Merry',
R> rownames(z)[c(1,1)]
    adj=c(.5,.5),
[1] "a" "a"
    srt=45,
R> rownames(z[c(1,1),])
    vfont=c('script','plain'),
[1] "a"  "a.1"
    cex=3,
R> z[c(1,1), ]
    col='gold'
     x y
)
a  1 2
text(x=0.7,y=8,
a.1 1 2
    labels='Christmas',
</pre>
    adj=c(.5,.5),
</li>
    srt=-45,
<li>[https://stackoverflow.com/a/2545548 Convert a dataframe to a vector (by rows)] The solution is as.vector(t(mydf[i, ])) or c(mydf[i, ]). My example:
    vfont=c('script','plain'),
{{Pre}}
    cex=3,
str(trainData)
    col='gold'
# 'data.frame': 503 obs. of  500 variables:
)
#  $ bm001: num  0.429 1 -0.5 1.415 -1.899 ...
</syntaxhighlight>
#  $ bm002: num  0.0568 1 0.5 0.3556 -1.16 ...
[[File:XMastree.png|150px]]
# ...
trainData[1:3, 1:3]
#        bm001      bm002    bm003
# 1  0.4289449 0.05676296 1.657966
# 2  1.0000000 1.00000000 1.000000
# 3 -0.5000000 0.50000000 0.500000
o <- data.frame(time = trainData[1, ], status = trainData[2, ], treat = trainData[3, ], t(TData))
# Warning message:
# In data.frame(time = trainData[1, ], status = trainData[2, ], treat = trainData[3, :
#  row names were found from a short variable and have been discarded
</pre>


==== Happy Thanksgiving ====
'trees' data from the 'datasets' package
[http://blog.revolutionanalytics.com/2015/11/happy-thanksgiving.html Turkey]
<pre>
trees[1:3,]
#  Girth Height Volume
# 1  8.3    70  10.3
# 2  8.6    65  10.3
# 3  8.8    63  10.2


[[File:Turkey.png|150px]]
# Wrong ways:
data.frame(trees[1,] , trees[2,])
#  Girth Height Volume Girth.1 Height.1 Volume.1
# 1  8.3    70  10.3    8.6      65    10.3
data.frame(time=trees[1,] , status=trees[2,])
#  time.Girth time.Height time.Volume status.Girth status.Height status.Volume
# 1        8.3          70        10.3          8.6            65          10.3
data.frame(time=as.vector(trees[1,]) , status=as.vector(trees[2,]))
#  time.Girth time.Height time.Volume status.Girth status.Height status.Volume
# 1        8.3          70        10.3          8.6            65          10.3
data.frame(time=c(trees[1,]) , status=c(trees[2,]))
# time.Girth time.Height time.Volume status.Girth status.Height status.Volume
# 1        8.3          70        10.3          8.6            65          10.3


==== Happy Valentine's Day ====
# Right ways:
https://rud.is/b/2017/02/14/geom%E2%9D%A4%EF%B8%8F/
# method 1: dropping row names
data.frame(time=c(t(trees[1,])) , status=c(t(trees[2,])))
# OR
data.frame(time=as.numeric(trees[1,]) , status=as.numeric(trees[2,]))
#  time status
# 1  8.3    8.6
# 2 70.0  65.0
# 3 10.3  10.3
# method 2: keeping row names
data.frame(time=t(trees[1,]) , status=t(trees[2,]))
#          X1  X2
# Girth  8.3  8.6
# Height 70.0 65.0
# Volume 10.3 10.3
data.frame(time=unlist(trees[1,]) , status=unlist(trees[2,]))
#        time status
# Girth  8.3    8.6
# Height 70.0  65.0
# Volume 10.3  10.3


==== treemap ====
# Method 3: convert a data frame to a matrix
http://ipub.com/treemap/
is.matrix(trees)
# [1] FALSE
trees2 <- as.matrix(trees)
data.frame(time=trees2[1,] , status=trees2[2,]) # row names are kept
#        time status
# Girth  8.3    8.6
# Height 70.0  65.0
# Volume 10.3  10.3


[[File:TreemapPop.png|150px]]
dim(trees[1,])
# [1] 1 3
dim(trees2[1, ])
# NULL
trees[1, ]  # notice the row name '1' on the left hand side
#  Girth Height Volume
# 1  8.3    70  10.3
trees2[1, ]
#  Girth Height Volume
#    8.3  70.0  10.3
</pre>
</li>
</ul>


==== [https://en.wikipedia.org/wiki/Voronoi_diagram Voronoi diagram] ====
=== Convert a list to data frame ===
* https://www.stat.auckland.ac.nz/~paul/Reports/VoronoiTreemap/voronoiTreeMap.html
[https://www.statology.org/convert-list-to-data-frame-r/ How to Convert a List to a Data Frame in R].
* http://letstalkdata.com/2014/05/creating-voronoi-diagrams-with-ggplot/
<pre>
# method 1
data.frame(t(sapply(my_list,c)))


==== Silent Night ====
# method 2
[[File:Silentnight.png|200px]]
library(dplyr)
bind_rows(my_list) # OR bind_cols(my_list)


<syntaxhighlight lang='rsplus'>
# method 3
# https://aschinchon.wordpress.com/2014/03/13/the-lonely-acacia-is-rocked-by-the-wind-of-the-african-night/
library(data.table)
depth <- 9
rbindlist(my_list)
angle<-30 #Between branches division
</pre>
L <- 0.90 #Decreasing rate of branches by depth
nstars <- 300 #Number of stars to draw
mstars <- matrix(runif(2*nstars), ncol=2)
branches <- rbind(c(1,0,0,abs(jitter(0)),1,jitter(5, amount = 5)), data.frame())
colnames(branches) <- c("depth", "x1", "y1", "x2", "y2", "inertia")
for(i in 1:depth)
{
  df <- branches[branches$depth==i,]
  for(j in 1:nrow(df))
  {
    branches <- rbind(branches, c(df[j,1]+1, df[j,4], df[j,5], df[j,4]+L^(2*i+1)*sin(pi*(df[j,6]+angle)/180),
                                  df[j,5]+L^(2*i+1)*cos(pi*(df[j,6]+angle)/180), df[j,6]+angle+jitter(10, amount = 8)))
    branches <- rbind(branches, c(df[j,1]+1, df[j,4], df[j,5], df[j,4]+L^(2*i+1)*sin(pi*(df[j,6]-angle)/180),
                                  df[j,5]+L^(2*i+1)*cos(pi*(df[j,6]-angle)/180), df[j,6]-angle+jitter(10, amount = 8)))
  }
}
nodes <- rbind(as.matrix(branches[,2:3]), as.matrix(branches[,4:5]))
png("image.png", width = 1200, height = 600)
plot.new()
par(mai = rep(0, 4), bg = "gray12")
plot(nodes, type="n", xlim=c(-7, 3), ylim=c(0, 5))
for (i in 1:nrow(mstars))
{
  points(x=10*mstars[i,1]-7, y=5*mstars[i,2], col = "blue4", cex=.7, pch=16)
  points(x=10*mstars[i,1]-7, y=5*mstars[i,2], col = "blue",  cex=.3, pch=16)
  points(x=10*mstars[i,1]-7, y=5*mstars[i,2], col = "white", cex=.1, pch=16)
}
# The moon
points(x=-5, y=3.5, cex=40, pch=16, col="lightyellow")
# The tree
for (i in 1:nrow(branches)) {
  lines(x=branches[i,c(2,4)], y=branches[i,c(3,5)],
    col = paste("gray", as.character(sample(seq(from=50, to=round(50+5*branches[i,1]), by=1), 1)), sep = ""),
    lwd=(65/(1+3*branches[i,1])))
}
rm(branches)
dev.off()
</syntaxhighlight>


==== The Travelling Salesman Portrait ====
=== tibble and data.table ===
https://fronkonstin.com/2018/04/04/the-travelling-salesman-portrait/
* [[R#tibble | tibble]]
* [[Tidyverse#data.table|data.table]]


=== Google Analytics ===
=== Clean  a dataset ===
==== GAR package ====
[https://finnstats.com/index.php/2021/04/04/how-to-clean-the-datasets-in-r/ How to clean the datasets in R]
http://www.analyticsforfun.com/2015/10/query-your-google-analytics-data-with.html


=== Linear Programming ===
== matrix ==
http://www.r-bloggers.com/modeling-and-solving-linear-programming-with-r-free-book/


=== Read rrd file ===
=== Define and subset a matrix ===
* https://en.wikipedia.org/wiki/RRDtool
* [https://www.tutorialkart.com/r-tutorial/r-matrix/ Matrix in R]
* http://oss.oetiker.ch/rrdtool/
** It is clear when a vector becomes a matrix the data is transformed column-wisely ('''byrow''' = FALSE, by default).
* https://github.com/pldimitrov/Rrd
** When subsetting a matrix, it follows the format: '''X[rows, colums]''' or '''X[y-axis, x-axis]'''.
* http://plamendimitrov.net/blog/2014/08/09/r-package-for-working-with-rrd-files/


=== Amazon Alexa ===
<pre>
* http://blagrants.blogspot.com/2016/02/theres-party-at-alexas-place.html
data <- c(2, 4, 7, 5, 10, 1)
A <- matrix(data, ncol = 3)
print(A)
#      [,1] [,2] [,3]
# [1,]    2    7  10
# [2,]    4    5    1


=== R and Singularity ===
A[1:1, 2:3, drop=F]
https://www.rstudio.com/rviews/2017/03/29/r-and-singularity/
#      [,1] [,2]
# [1,]    7  10
</pre>


=== Teach kids about R with Minecraft ===
=== Prevent automatic conversion of single column to vector ===
http://blog.revolutionanalytics.com/2017/06/teach-kids-about-r-with-minecraft.html
use '''drop = FALSE''' such as mat[, 1, drop = FALSE].


=== Secure API keys ===
=== complete.cases(): remove rows with missing in any column ===
[http://blog.revolutionanalytics.com/2017/07/secret-package.html Securely store API keys in R scripts with the "secret" package]
It works on a sequence of vectors, matrices and data frames.


=== Vision and image recognition ===
=== NROW vs nrow ===
* https://www.stoltzmaniac.com/google-vision-api-in-r-rooglevision/ Google vision API IN R] – RoogleVision
[https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/nrow ?nrow]. Use NROW/NCOL instead of nrow/ncol to treat vectors as 1-column matrices.
* [http://www.bnosac.be/index.php/blog/66-computer-vision-algorithms-for-r-users Computer Vision Algorithms for R users] and https://github.com/bnosac/image


=== Turn pictures into coloring pages ===
=== matrix (column-major order) multiply a vector ===
https://gist.github.com/jeroen/53a5f721cf81de2acba82ea47d0b19d0
* Matrices in R [https://en.wikipedia.org/wiki/Row-_and_column-major_order#Programming_languages_and_libraries R (like Fortran) are stored in a column-major order]. It means array slice A[,1] are contiguous.


=== Numerical optimization ===
{{Pre}}
* [http://stat.ethz.ch/R-manual/R-patched/library/stats/html/uniroot.html uniroot]: One Dimensional Root (Zero) Finding. This is used in [http://onlinelibrary.wiley.com/doi/10.1002/sim.7178/full simulating survival data for predefined censoring rate]
> matrix(1:6, 3,2)
* [http://stat.ethz.ch/R-manual/R-patched/library/stats/html/optimize.html optimize]: One Dimensional Optimization
    [,1] [,2]
* [http://stat.ethz.ch/R-manual/R-patched/library/stats/html/optim.html optim]: General-purpose optimization based on Nelder–Mead, quasi-Newton and conjugate-gradient algorithms.
[1,]   1    4
* [http://stat.ethz.ch/R-manual/R-patched/library/stats/html/constrOptim.html constrOptim]: Linearly Constrained Optimization
[2,]   2    5
* [http://stat.ethz.ch/R-manual/R-patched/library/stats/html/nlm.html nlm]: Non-Linear Minimization
[3,]   3    6
* [http://stat.ethz.ch/R-manual/R-patched/library/stats/html/nls.html nls]: Nonlinear Least Squares
> matrix(1:6, 3,2) * c(1,2,3) # c(1,2,3) will be recycled to form a matrix. Good quiz.
 
    [,1] [,2]
== R packages ==
[1,]   1    4
=== R package management ===
[2,]   4  10
==== Package related functions from package 'utils' ====
[3,]   9  18
* [http://stat.ethz.ch/R-manual/R-devel/library/utils/html/available.packages.html available.packages()]; see packageStatus().
> matrix(1:6, 3,2) * c(1,2,3,4) # c(1,2,3,4) will be recycled
* [http://stat.ethz.ch/R-manual/R-devel/library/utils/html/download.packages.html download.packages()]
    [,1] [,2]
* [http://stat.ethz.ch/R-manual/R-devel/library/utils/html/packageStatus.html packageStatus(), update(), upgrade()]. packageStatus() will return a list with two components:
[1,]   1   16
# inst - a data frame with columns as the matrix returned by '''installed.packages''' plus "Status", a factor with levels c("ok", "upgrade"). Note: the manual does not mention "unavailable" case (but I do get it) in R 3.2.0?
[2,]   4   5
# avail - a data frame with columns as the matrix returned by '''available.packages''' plus "Status", a factor with levels c("installed", "not installed", "unavailable"). Note: I don't get the "unavailable" case in R 3.2.0?
[3,]   9   12
<pre>
> x <- packageStatus()
> names(x)
[1] "inst"  "avail"
> dim(x[['inst']])
[1] 225  17
> x[['inst']][1:3, ]
              Package                            LibPath Version Priority              Depends Imports
acepack      acepack C:/Program Files/R/R-3.1.2/library 1.3-3.3    <NA>                  <NA>    <NA>
adabag        adabag C:/Program Files/R/R-3.1.2/library    4.0    <NA> rpart, mlbench, caret    <NA>
affxparser affxparser C:/Program Files/R/R-3.1.2/library  1.38.0    <NA>          R (>= 2.6.0)   <NA>
          LinkingTo                                                        Suggests Enhances
acepack        <NA>                                                            <NA>    <NA>
adabag          <NA>                                                            <NA>    <NA>
affxparser      <NA> R.oo (>= 1.18.0), R.utils (>= 1.32.4),\nAffymetrixDataTestFiles    <NA>
                      License License_is_FOSS License_restricts_use OS_type MD5sum NeedsCompilation Built
acepack    MIT + file LICENSE            <NA>                  <NA>    <NA>  <NA>              yes 3.1.2
adabag            GPL (>= 2)            <NA>                  <NA>    <NA>  <NA>              no 3.1.2
affxparser        LGPL (>= 2)            <NA>                  <NA>    <NA>  <NA>            <NA> 3.1.1
                Status
acepack            ok
adabag              ok
affxparser unavailable
> dim(x[['avail']])
[1] 6538  18
> x[['avail']][1:3, ]
                Package Version Priority                        Depends        Imports LinkingTo
A3                  A3   0.9.2    <NA> R (>= 2.15.0), xtable, pbapply          <NA>      <NA>
ABCExtremes ABCExtremes    1.0    <NA>      SpatialExtremes, combinat          <NA>      <NA>
ABCanalysis ABCanalysis  1.0.1    <NA>                    R (>= 2.10) Hmisc, plotrix      <NA>
                      Suggests Enhances   License License_is_FOSS License_restricts_use OS_type Archs
A3          randomForest, e1071    <NA> GPL (>= 2)            <NA>                  <NA>   <NA>  <NA>
ABCExtremes                <NA>    <NA>      GPL-2            <NA>                  <NA>    <NA>  <NA>
ABCanalysis                <NA>    <NA>      GPL-3           <NA>                  <NA>   <NA>  <NA>
            MD5sum NeedsCompilation File                                      Repository        Status
A3            <NA>            <NA> <NA> http://cran.rstudio.com/bin/windows/contrib/3.1 not installed
ABCExtremes   <NA>            <NA> <NA> http://cran.rstudio.com/bin/windows/contrib/3.1 not installed
ABCanalysis  <NA>            <NA> <NA> http://cran.rstudio.com/bin/windows/contrib/3.1 not installed
</pre>
</pre>
* [http://stat.ethz.ch/R-manual/R-devel/library/utils/html/packageDescription.html packageVersion(), packageDescription()]
* [http://stat.ethz.ch/R-manual/R-devel/library/utils/html/install.packages.html install.packages()], [http://stat.ethz.ch/R-manual/R-devel/library/utils/html/remove.packages.html remove.packages()].
* [http://stat.ethz.ch/R-manual/R-devel/library/utils/html/installed.packages.html installed.packages()]; see packageStatus().
* [http://stat.ethz.ch/R-manual/R-devel/library/utils/html/update.packages.html update.packages(), old.packages(), new.packages()]
* [http://stat.ethz.ch/R-manual/R-devel/library/utils/html/setRepositories.html setRepositories()]
* [http://stat.ethz.ch/R-manual/R-devel/library/utils/html/contrib.url.html contrib.url()]
* [http://stat.ethz.ch/R-manual/R-devel/library/utils/html/chooseCRANmirror.html chooseCRANmirror()], [http://stat.ethz.ch/R-manual/R-devel/library/utils/html/chooseBioCmirror.html chooseBioCmirror()]
* [http://stat.ethz.ch/R-manual/R-devel/library/utils/html/globalVariables.html suppressForeignCheck()]


==== install.packages() ====
* [https://stackoverflow.com/a/20596490 How to divide each row of a matrix by elements of a vector in R]
By default, install.packages() will check versions and install uninstalled packages shown in 'Depends', 'Imports', and 'LinkingTo' fields. See [http://cran.r-project.org/doc/manuals/r-release/R-exts.html R-exts] manual.


If we want to install packages listed in 'Suggests' field, we should specify it explicitly by using ''dependencies'' argument:
=== add a vector to all rows of a matrix ===
<pre>
[https://stackoverflow.com/a/39443126 add a vector to all rows of a matrix]. sweep() or rep() is the best.
install.packages(XXXX, dependencies = c("Depends", "Imports", "Suggests", "LinkingTo"))
# OR
install.packages(XXXX, dependencies = TRUE)
</pre>
For example, if I use a plain install.packages() command to install [http://cran.r-project.org/web/packages/downloader/index.html downloader] package
<pre>
install.packages("downloader")
</pre>
it will only install 'digest' and 'downloader' packages. If I use
<pre>
install.packages("downloader", dependencies=TRUE)
</pre>
it will also install 'testhat' package.


The '''install.packages''' function source code can be found in R -> src -> library -> utils -> R -> [https://github.com/wch/r-source/blob/trunk/src/library/utils/R/packages2.R packages2.R] file from [https://github.com/wch/r-source Github] repository (put 'install.packages' in the search box).
=== sparse matrix ===
[https://stackoverflow.com/a/10555270 R convert matrix or data frame to sparseMatrix]


==== Check installed Bioconductor version ====
To subset a vector from some column of a sparseMatrix, we need to convert it to a regular vector, '''as.vector()'''.
Following [https://www.biostars.org/p/150920/ this post], use '''tools:::.BioC_version_associated_with_R_version()'''.


''Mind the '.' in front of the 'BioC'. It may be possible for some installed packages to have been sourced from a different BioC version.''
== Attributes ==
* [https://statisticaloddsandends.wordpress.com/2020/10/19/attributes-in-r/ Attributes in R]
* [http://adv-r.had.co.nz/Data-structures.html#attributes Data structures] in "Advanced R"


<syntaxhighlight lang='rsplus'>
== Names ==
tools:::.BioC_version_associated_with_R_version() # `3.6'
[https://masalmon.eu/2023/11/06/functions-dealing-with-names/ Useful functions for dealing with object names]. (Un)Setting object names: stats::setNames(), unname() and rlang::set_names()
tools:::.BioC_version_associated_with_R_version() == '3.6'  # TRUE
</syntaxhighlight>


==== CRAN Package Depends on Bioconductor Package ====
=== Print a vector by suppressing names ===
For example, if I run ''install.packages("NanoStringNorm")'' to install the [https://cran.r-project.org/web/packages/NanoStringNorm/index.html package] from CRAN, I may get
Use '''unname'''. sapply(, , USE.NAMES = FALSE).
<pre>
ERROR: dependency ‘vsn’ is not available for package ‘NanoStringNorm’
</pre>
This is because the NanoStringNorm package depends on the vsn package which is on Bioconductor.


Another instance is CRAN's 'biospear' depends on Bioc's 'survcomp'.
== format.pval/print p-values/format p values ==
[https://rdrr.io/r/base/format.pval.html format.pval()]. By default it will show 5 significant digits (getOption("digits")-2).
{{Pre}}
> set.seed(1); format.pval(c(stats::runif(5), pi^-100, NA))
[1] "0.26551" "0.37212" "0.57285" "0.90821" "0.20168" "< 2e-16" "NA"
> format.pval(c(0.1, 0.0001, 1e-27))
[1] "1e-01"  "1e-04"  "<2e-16"


One solution is to run a line '''setRepositories(ind=1:2)'''. See [http://stackoverflow.com/questions/14343817/cran-package-depends-on-bioconductor-package-installing-error this post] or [https://stackoverflow.com/questions/34617306/r-package-with-cran-and-bioconductor-dependencies this one]. Note that the default repository list can be found at (Ubuntu) '''/usr/lib/R/etc/repositories''' file.
R> pvalue
<syntaxhighlight lang='rsplus'>
[1] 0.0004632104
options("repos") # display the available repositories (only CRAN)
R> print(pvalue, digits =20)
setRepositories(ind=1:2)
[1] 0.00046321036188223807528
options("repos") # CRAN and bioc are included
R> format.pval(pvalue)
#                                        CRAN
[1] "0.00046321"
#                "https://cloud.r-project.org"
R> format.pval(pvalue * 1e-1)
# "https://bioconductor.org/packages/3.6/bioc"
[1] "4.6321e-05"
R> format.pval(0.00004632)
[1] "4.632e-05"
R> getOption("digits")
[1] 7
</pre>


install.packages("biospear") # it will prompt to select CRAN
=== Return type ===
The format.pval() function returns a string, so it’s not appropriate to use the returned object for operations like sorting.


install.packages("biospear", repos = "http://cran.rstudio.com") # NOT work since bioc repos is erased
=== Wrong number of digits in format.pval() ===
</syntaxhighlight>  
See [https://stackoverflow.com/questions/59779131/wrong-number-of-digits-in-format-pval here]. The solution is to apply round() and then format.pval().
<pre>
x <- c(6.25433625041843e-05, NA, 0.220313341361346, NA, 0.154029880744594,
  0.0378437685448703, 0.023358329881356, NA, 0.0262561986351483,
  0.000251274794673796)
format.pval(x, digits=3)
# [1] "6.25e-05" "NA"      "0.220313" "NA"      "0.154030" "0.037844" "0.023358"
# [8] "NA"      "0.026256" "0.000251"


This will also install the '''BiocInstaller''' package if it has not been installed before. See also [https://www.bioconductor.org/install/ Install Bioconductor Packages].
round(x, 3) |> format.pval(digits=3, eps=.001)
# [1] "<0.001" "NA"    "0.220"  "NA"    "0.154"  "0.038"  "0.023"  "NA"
# [9] "0.026"  "<0.001"
</pre>


==== install a tar.gz (e.g. an archived package) from a local directory ====
=== dplr::mutate_if() ===
<syntaxhighlight lang='bash'>
<pre>
R CMD INSTALL <package-name>.tar.gz
library(dplyr)
</syntaxhighlight>
df <- data.frame(
Or in R:
  char_var = c("A", "B", "C"),
<syntaxhighlight lang='rsplus'>
  num_var1 = c(1.123456, 2.123456, 3.123456),
install(<pathtopackage>) # this will use 'R CMD INSTALL' to install the package.
  num_var2 = c(4.654321, 5.654321, 6.654321),
                        # It will try to install dependencies of the package from CRAN,
  stringsAsFactors = FALSE
                        # if they're not already installed.
)
install.packages(<pathtopackage>, repos = NULL)
</syntaxhighlight>


The installation process can be nasty due to the dependency issue. Consider the 'biospear' package
# Round numerical variables to 4 digits after the decimal point
<pre>
df_rounded <- df %>%
biospear - plsRcox (archived) - plsRglm (archived) - bipartite
  mutate_if(is.numeric, round, digits = 4)
                              - lars
                              - pls
                              - kernlab
                              - mixOmics
                              - risksetROC
                              - survcomp (Bioconductor)
                              - rms
</pre>
</pre>
So in order to install the 'plsRcox' package, we need to do the following steps. Note: plsRcox package is back on 6/2/2018.
<syntaxhighlight lang='bash'>
# For curl
system("apt update")
system("apt install curl libcurl4-openssl-dev libssl-dev")


# For X11
== Customize R: options() ==
system("apt install libcgal-dev libglu1-mesa-dev libglu1-mesa-dev")
system("apt install libfreetype6-dev") # https://stackoverflow.com/questions/31820865/error-in-installing-rgl-package
</syntaxhighlight>


<syntaxhighlight lang='rsplus'>
=== Change the default R repository, my .Rprofile ===
source("https://bioconductor.org/biocLite.R")
[[Rstudio#Change_repository|Change R repository]]
biocLite("survcomp") # this has to be run before the next command of installing a bunch of packages from CRAN


install.packages("https://cran.r-project.org/src/contrib/Archive/biospear/biospear_1.0.1.tar.gz",
Edit global Rprofile file. On *NIX platforms, it's located in /usr/lib/R/library/base/R/Rprofile although local '''.Rprofile''' settings take precedence.
                repos = NULL, type="source")
# ERROR: dependencies ‘pkgconfig’, ‘cobs’, ‘corpcor’, ‘devtools’, ‘glmnet’, ‘grplasso’, ‘mboost’, ‘plsRcox’,
# ‘pROC’, ‘PRROC’, ‘RCurl’, ‘survAUC’ are not available for package ‘biospear’
install.packages(c("pkgconfig", "cobs", "corpcor", "devtools", "glmnet", "grplasso", "mboost",
                  "plsRcox", "pROC", "PRROC", "RCurl", "survAUC"))
# optional: install.packages(c("doRNG", "mvnfast"))
install.packages("https://cran.r-project.org/src/contrib/Archive/biospear/biospear_1.0.1.tar.gz",
                repos = NULL, type="source")
# OR
# devtools::install_github("cran/biospear")
library(biospear) # verify
</syntaxhighlight>


To install the (deprecated, bioc) packages 'inSilicoMerging',
For example, I can specify the R mirror I like by creating a single line '''.Rprofile''' file under my home directory. Another good choice of repository is '''cloud.r-project.org'''.
<syntaxhighlight lang='bash'>
biocLite(c('rjson', 'Biobase', 'RCurl'))


# destination directory is required
Type '''file.edit("~/.Rprofile")'''
# download.file("http://www.bioconductor.org/packages/3.3/bioc/src/contrib/inSilicoDb_2.7.0.tar.gz",
{{Pre}}
#              "~/Downloads/inSilicoDb_2.7.0.tar.gz")
local({
# download.file("http://www.bioconductor.org/packages/3.3/bioc/src/contrib/inSilicoMerging_1.15.0.tar.gz",
  r = getOption("repos")
#              "~/Downloads/inSilicoMerging_1.15.0.tar.gz")
  r["CRAN"] = "https://cran.rstudio.com/"
# ~/Downloads or $HOME/Downloads won't work in untar()
  options(repos = r)
# untar("~/Downloads/inSilicoDb_2.7.0.tar.gz", exdir="/home/brb/Downloads")  
})
# untar("~/Downloads/inSilicoMerging_1.15.0.tar.gz", exdir="/home/brb/Downloads")  
options(continue = " ", editor = "nano")
# install.packages("~/Downloads/inSilicoDb", repos = NULL)
message("Hi MC, loading ~/.Rprofile")
# install.packages("~/Downloads/inSilicoMerging", repos = NULL)
if (interactive()) {
install.packages("http://www.bioconductor.org/packages/3.3/bioc/src/contrib/inSilicoDb_2.7.0.tar.gz",
  .Last <- function() try(savehistory("~/.Rhistory"))
                repos = NULL, type = "source")
}
install.packages("http://www.bioconductor.org/packages/3.3/bioc/src/contrib/inSilicoMerging_1.15.0.tar.gz",
</pre>
                repos = NULL, type = "source")
</syntaxhighlight>


==== Query an R package installed locally ====
=== Change the default web browser for utils::browseURL() ===
When I run help.start() function in LXLE, it cannot find its default web browser (seamonkey). The solution is to put
<pre>
<pre>
packageDescription("MASS")
options(browser='seamonkey')
packageVersion("MASS")
</pre>
</pre>
in the '''.Rprofile''' of your home directory. If the browser is not in the global PATH, we need to put the full path above.


==== Query an R package (from CRAN) basic information ====
For one-time only purpose, we can use the ''browser'' option in help.start() function:
<syntaxhighlight lang='rsplus'>
{{Pre}}
packageStatus() # Summarize information about installed packages
> help.start(browser="seamonkey")
If the browser launched by 'seamonkey' is already running, it is *not*
    restarted, and you must switch to its window.
Otherwise, be patient ...
</pre>


available.packages() # List Available Packages at CRAN-like Repositories
We can work made a change (or create the file) ~/.Renviron or etc/Renviron. See
</syntaxhighlight>
* [https://stat.ethz.ch/pipermail/r-help/2003-August/037484.html Changing default browser in options()].
The '''available.packages()''' command is useful for understanding package dependency. Use '''setRepositories()''' or 'RGUI -> Packages -> select repositories' to select repositories and '''options()$repos''' to check or change the repositories.
* https://stat.ethz.ch/R-manual/R-devel/library/utils/html/browseURL.html


Also the '''packageStatus()''' is another useful function for query how many packages are in the repositories, how many have been installed, and individual package status (installed or not, needs to be upgraded or not).
=== Change the default editor ===
<syntaxhighlight lang='rsplus'>
On my Linux and mac, the default editor is "vi". To change it to "nano",
> options()$repos
{{Pre}}
                      CRAN
options(editor = "nano")
"https://cran.rstudio.com/"
</pre>


> packageStatus()
=== Change prompt and remove '+' sign ===
Number of installed packages:
See https://stackoverflow.com/a/1448823.
                                   
{{Pre}}
                                      ok upgrade unavailable
options(prompt="R> ", continue=" ")
  C:/Program Files/R/R-3.0.1/library 110      0          1
</pre>


Number of available packages (each package counted only once):
=== digits ===
                                                                                 
* [https://gist.github.com/arraytools/26a0b359541f4fc9fddc8f0a0c94489e Read and compute the sum of a numeric matrix file] using R vs Python vs C++. Note by default R does not show digits after the decimal point because the number is large.
                                                                                    installed not installed
* [https://stackoverflow.com/a/2288013 Controlling number of decimal digits in print output in R]
  http://watson.nci.nih.gov/cran_mirror/bin/windows/contrib/3.0                            76          4563
* [https://stackoverflow.com/a/10712012 ?print.default]
  http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/3.0                                0            5
* [https://stackoverflow.com/a/12135122 Formatting Decimal places in R, round()]. [https://www.rdocumentation.org/packages/base/versions/3.5.3/topics/format format()] where '''nsmall''' controls the minimum number of digits to the right of the decimal point
  http://www.bioconductor.org/packages/2.12/bioc/bin/windows/contrib/3.0                  16          625
* [https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17668 numerical error in round() causing round to even to fail] 2019-12-05
  http://www.bioconductor.org/packages/2.12/data/annotation/bin/windows/contrib/3.0        4          686
<ul>
> tmp <- available.packages()
<li>[https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/Round signif()] rounds x to n significant digits.
> str(tmp)
<pre>
chr [1:5975, 1:17] "A3" "ABCExtremes" "ABCp2" "ACCLMA" "ACD" "ACNE" "ADGofTest" "ADM3" "AER" ...
R> signif(pi, 3)
- attr(*, "dimnames")=List of 2
[1] 3.14
  ..$ : chr [1:5975] "A3" "ABCExtremes" "ABCp2" "ACCLMA" ...
R> signif(pi, 5)
  ..$ : chr [1:17] "Package" "Version" "Priority" "Depends" ...
[1] 3.1416
> tmp[1:3,]
</pre>
            Package      Version Priority Depends                    Imports LinkingTo Suggests           
</li>
A3          "A3"          "0.9.2" NA      "xtable, pbapply"          NA      NA        "randomForest, e1071"
</ul>
ABCExtremes "ABCExtremes" "1.0"  NA      "SpatialExtremes, combinat" NA      NA        NA                 
* The default digits 7 may be too small. For example, '''if a number is very large, then we may not be able to see (enough) value after the decimal point'''. The acceptable range is 1-22. See the following examples
ABCp2      "ABCp2"      "1.1"  NA      "MASS"                      NA      NA        NA                 
            Enhances License      License_is_FOSS License_restricts_use OS_type Archs MD5sum NeedsCompilation File
A3          NA      "GPL (>= 2)" NA              NA                    NA      NA    NA    NA              NA 
ABCExtremes NA      "GPL-2"      NA              NA                    NA      NA    NA    NA              NA 
ABCp2      NA      "GPL-2"      NA              NA                    NA      NA    NA    NA              NA 
            Repository                                                   
A3          "http://watson.nci.nih.gov/cran_mirror/bin/windows/contrib/3.0"
ABCExtremes "http://watson.nci.nih.gov/cran_mirror/bin/windows/contrib/3.0"
ABCp2      "http://watson.nci.nih.gov/cran_mirror/bin/windows/contrib/3.0"
</syntaxhighlight>
And the following commands find which package depends on Rcpp and also which are from bioconductor repository.
<syntaxhighlight lang='rsplus'>
> pkgName <- "Rcpp"
> rownames(tmp)[grep(pkgName, tmp[,"Depends"])]
> tmp[grep("Rcpp", tmp[,"Depends"]), "Depends"]


> ind <- intersect(grep(pkgName, tmp[,"Depends"]), grep("bioconductor", tmp[, "Repository"]))
In R,
> rownames(grep)[ind]
{{Pre}}
NULL
> options()$digits # Default
> rownames(tmp)[ind]
[1] 7
[1] "ddgraph"            "DESeq2"            "GeneNetworkBuilder" "GOSemSim"          "GRENITS"         
> print(.1+.2, digits=18)
[6] "mosaics"            "mzR"                "pcaMethods"        "Rdisop"            "Risa"             
[1] 0.300000000000000044
[11] "rTANDEM"   
> 100000.07 + .04
</syntaxhighlight>
[1] 100000.1
> options(digits = 16)
> 100000.07 + .04
[1] 100000.11
</pre>


==== CRAN vs Bioconductor packages ====
In Python,
<syntaxhighlight lang='rsplus'>
{{Pre}}
> R.version # 3.4.3
>>> 100000.07 + .04
# CRAN
100000.11
> x <- available.packages()
</pre>
> dim(x)
[1] 12581    17


# Bioconductor Soft
=== [https://stackoverflow.com/questions/5352099/how-to-disable-scientific-notation Disable scientific notation in printing]: options(scipen) ===
> biocinstallRepos()
[https://datasciencetut.com/how-to-turn-off-scientific-notation-in-r/ How to Turn Off Scientific Notation in R?]
                                              BioCsoft
          "https://bioconductor.org/packages/3.6/bioc"
                                                BioCann
"https://bioconductor.org/packages/3.6/data/annotation"
                                                BioCexp
"https://bioconductor.org/packages/3.6/data/experiment"
                                                  CRAN
                            "https://cran.rstudio.com/"
> y <- available.packages(repos = biocinstallRepos()[1])
> dim(y)
[1] 1477  17
> intersect(x[, "Package"], y[, "Package"])
character(0)
# Bioconductor Annotation
> dim(available.packages(repos = biocinstallRepos()[2]))
[1] 909  17
# Bioconductor Experiment
> dim(available.packages(repos = biocinstallRepos()[3]))
[1] 326  17


# CRAN + All Bioconductor
This also helps with write.table() results. For example, 0.0003 won't become 3e-4 in the output file.
> z <- available.packages(repos = biocinstallRepos())
{{Pre}}
> dim(z)
> numer = 29707; denom = 93874
[1] 15292    17
> c(numer/denom, numer, denom)  
</syntaxhighlight>
[1] 3.164561e-01 2.970700e+04 9.387400e+04


==== Downloading Bioconductor package with an old R ====
# Method 1. Without changing the global option
When I try to download the [https://bioconductor.org/packages/release/bioc/html/GenomicDataCommons.html GenomicDataCommons] package using R 3.4.4 with Bioc 3.6 (the current R version is 3.5.0), it was found it can only install version 1.2.0 instead the latest version 1.4.1.
> format(c(numer/denom, numer, denom), scientific=FALSE)
[1] "    0.3164561" "29707.0000000" "93874.0000000"


It does not work by running biocLite("BiocUpgrade") to upgrade Bioc from 3.6 to 3.7.
# Method 2. Change the global option
<syntaxhighlight lang='rsplus'>
> options(scipen=999)
source("https://bioconductor.org/biocLite.R")
> numer/denom
biocLite("BiocUpgrade")  
[1] 0.3164561
# Error: Bioconductor version 3.6 cannot be upgraded with R version 3.4.4
> c(numer/denom, numer, denom)
</syntaxhighlight>
[1]    0.3164561 29707.0000000 93874.0000000
> c(4/5, numer, denom)
[1]    0.8 29707.0 93874.0
</pre>


==== Analyzing data on CRAN packages ====
=== Suppress warnings: options() and capture.output() ===
New undocumented function in R 3.4.0: '''tools::CRAN_package_db()'''
Use [https://www.rdocumentation.org/packages/base/versions/3.4.1/topics/options options()]. If ''warn'' is negative all warnings are ignored. If ''warn'' is zero (the default) warnings are stored until the top--level function returns.
{{Pre}}
op <- options("warn")
options(warn = -1)
....
options(op)


http://blog.revolutionanalytics.com/2017/05/analyzing-data-on-cran-packages.html
# OR
 
warnLevel <- options()$warn
==== Install personal R packages after upgrade R, .libPaths() ====
options(warn = -1)
Scenario: We already have installed many R packages under R 3.1.X in the user's directory. Now we upgrade R to a new version (3.2.X). We like to have these packages available in R 3.2.X.
...
options(warn = warnLevel)
</pre>


<span style="color:#0000FF">For Windows OS, refer to [http://cran.r-project.org/bin/windows/base/rw-FAQ.html#What_0027s-the-best-way-to-upgrade_003f R for Windows FAQ]</span>
[https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/warning suppressWarnings()]
 
The follow method works on Linux and Windows.
 
<span style="color:#FF0000">Make sure only one instance of R is running</span>
<pre>
<pre>
# Step 1. update R's built-in packages and install them on my personal directory
suppressWarnings( foo() )
update.packages(ask=FALSE, checkBuilt = TRUE, repos="http://cran.rstudio.com")


# Step 2. update Bioconductor packages
foo <- capture.output(  
.libPaths() # The first one is my personal directory
bar <- suppressWarnings(  
# [1] "/home/brb/R/x86_64-pc-linux-gnu-library/3.2"
{print( "hello, world" );
# [2] "/usr/local/lib/R/site-library"
  warning("unwanted" )} ) )  
# [3] "/usr/lib/R/site-library"
# [4] "/usr/lib/R/library"
 
Sys.getenv("R_LIBS_USER") # equivalent to .libPaths()[1]
ul <- unlist(strsplit(Sys.getenv("R_LIBS_USER"), "/"))
src <- file.path(paste(ul[1:(length(ul)-1)], collapse="/"), "3.1")  
des <- file.path(paste(ul[1:(length(ul)-1)], collapse="/"), "3.2")
pkg <- dir(src, full.names = TRUE)
if (!file.exists(des)) dir.create(des) # If 3.2 subdirectory does not exist yet!
file.copy(pkg, des, overwrite=FALSE, recursive = TRUE)
source("http://www.bioconductor.org/biocLite.R")
biocLite(ask = FALSE)
</pre>
</pre>


<span style="color:#0000FF">From Robert Kabacoff (R in Action)</span>
[https://www.rdocumentation.org/packages/utils/versions/3.6.2/topics/capture.output capture.output()]
* If you have a customized '''Rprofile.site file''' (see appendix B), save a copy outside of R.
* Launch your current version of R and issue the following statements
<pre>
<pre>
oldip <- installed.packages()[,1]
str(iris, max.level=1) %>% capture.output(file = "/tmp/iris.txt")
save(oldip, file="path/installedPackages.Rdata")
</pre>
</pre>
where ''path'' is a directory outside of R.
 
* Download and install the newer version of R.
=== Converts warnings into errors ===
* If you saved a customized version of the Rprofile.site file in step 1, copy it into the new installation.
options(warn=2)
* Launch the new version of R, and issue the following statements
 
=== demo() function ===
<ul>
<li>[https://stackoverflow.com/a/18746519 How to wait for a keypress in R?] PS [https://stat.ethz.ch/R-manual/R-devel/library/base/html/readline.html readline()] is different from readLines().
<pre>
<pre>
load("path/installedPackages.Rdata")
for(i in 1:2) { print(i); readline("Press [enter] to continue")}
newip <- installed.packages()[,1]
for(i in setdiff(oldip, newip))
  install.packages(i)
</pre>
</pre>
where path is the location specified in step 2.
<li>Hit 'ESC' or Ctrl+c to skip the prompt "Hit <Return> to see next plot:" </li>
*  Delete the old installation (optional).
<li>demo() uses [https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/options options()] to ask users to hit Enter on each plot
 
This approach will install only packages that are available from the CRAN. It won’t find packages obtained from other locations. In fact, the process will display a list of packages that can’t be installed For example for packages obtained from Bioconductor, use the following method to update packages
<pre>
<pre>
source(http://bioconductor.org/biocLite.R)
op <- options(device.ask.default = ask) # ask = TRUE
biocLite(PKGNAME)
on.exit(options(op), add = TRUE)
</pre>
</pre>
</li>
</ul>


==== List vignettes from a package ====
== sprintf ==
<syntaxhighlight lang='rsplus'>
=== paste, paste0, sprintf ===
vignette(package=PACKAGENAME)
[https://www.r-bloggers.com/paste-paste0-and-sprintf/ this post], [https://www.r-bloggers.com/2023/09/3-r-functions-that-i-enjoy/ 3 R functions that I enjoy]
</syntaxhighlight>


==== List data from a package ====
=== sep vs collapse in paste() ===
<syntaxhighlight lang='rsplus'>
* sep is used if we supply '''multiple separate objects''' to paste(). A more powerful function is [https://tidyr.tidyverse.org/reference/unite.html tidyr::unite()] function.
data(package=PACKAGENAME)
* collapse is used to make the output of length 1. It is commonly used if we have only 1 input object
</syntaxhighlight>
<pre>
R> paste("a", "A", sep=",") # multi-vec -> multi-vec
[1] "a,A"
R> paste(c("Elon", "Taylor"), c("Mask", "Swift"))
[1] "Elon Mask"    "Taylor Swift"
# OR
R> sprintf("%s, %s", c("Elon", "Taylor"), c("Mask", "Swift"))


==== List installed packages and versions ====
R> paste(c("a", "A"), collapse="-") # one-vec/multi-vec  -> one-scale
* http://heuristicandrew.blogspot.com/2015/06/list-of-user-installed-r-packages-and.html
[1] "a-A"
* [http://cran.r-project.org/web/packages/checkpoint/index.html checkpoint] package


<syntaxhighlight lang='rsplus'>
# When use together, sep first and collapse second
ip <- as.data.frame(installed.packages()[,c(1,3:4)])
R> paste(letters[1:3], LETTERS[1:3], sep=",", collapse=" - ")
rownames(ip) <- NULL
[1] "a,A - b,B - c,C"
unique(ip$Priority)
R> paste(letters[1:3], LETTERS[1:3], sep=",")
# [1] <NA>        base        recommended
[1] "a,A" "b,B" "c,C"
# Levels: base recommended
R> paste(letters[1:3], LETTERS[1:3], sep=",") |> paste(collapse=" - ")
ip <- ip[is.na(ip$Priority),1:2,drop=FALSE]
[1] "a,A - b,B - c,C"
print(ip, row.names=FALSE)
</syntaxhighlight>
 
==== Query the names of outdated packages ====
<pre>
psi <- packageStatus()$inst
subset(psi, Status == "upgrade", drop = FALSE)
#                    Package                                  LibPath    Version    Priority                Depends
# RcppArmadillo RcppArmadillo C:/Users/brb/Documents/R/win-library/3.2 0.5.100.1.0        <NA>                  <NA>
# Matrix              Matrix      C:/Program Files/R/R-3.2.0/library      1.2-0 recommended R (>= 2.15.2), methods
#                                            Imports LinkingTo                Suggests
# RcppArmadillo                      Rcpp (>= 0.11.0)     Rcpp RUnit, Matrix, pkgKitten
# Matrix        graphics, grid, stats, utils, lattice      <NA>              expm, MASS
#                                            Enhances    License License_is_FOSS License_restricts_use OS_type MD5sum
# RcppArmadillo                                  <NA> GPL (>= 2)           <NA>                  <NA>    <NA>  <NA>
# Matrix        MatrixModels, graph, SparseM, sfsmisc GPL (>= 2)            <NA>                  <NA>    <NA>  <NA>
#              NeedsCompilation Built  Status
# RcppArmadillo              yes 3.2.0 upgrade
# Matrix                    yes 3.2.0 upgrade
</pre>
</pre>


The above output does not show the package version from the latest packages on CRAN. So the following snippet does that.
=== Format number as fixed width, with leading zeros ===
<pre>
* https://stackoverflow.com/questions/8266915/format-number-as-fixed-width-with-leading-zeros
psi <- packageStatus()$inst
* https://stackoverflow.com/questions/14409084/pad-with-leading-zeros-to-common-width?rq=1
pl <- unname(psi$Package[psi$Status == "upgrade"])  # List package names
 
{{Pre}}
# sprintf()
a <- seq(1,101,25)
sprintf("name_%03d", a)
[1] "name_001" "name_026" "name_051" "name_076" "name_101"
 
# formatC()
paste("name", formatC(a, width=3, flag="0"), sep="_")
[1] "name_001" "name_026" "name_051" "name_076" "name_101"


out <- cbind(subset(psi, Status == "upgrade")[, c("Package", "Version")], ap[match(pl, ap$Package), "Version"])
# gsub()
colnames(out)[2:3] <- c("OldVersion", "NewVersion")
paste0("bm", gsub(" ", "0", format(5:15)))
rownames(out) <- NULL
# [1] "bm05" "bm06" "bm07" "bm08" "bm09" "bm10" "bm11" "bm12" "bm13" "bm14" "bm15"
out
#        Package  OldVersion  NewVersion
# 1 RcppArmadillo 0.5.100.1.0 0.5.200.1.0
# 2        Matrix      1.2-0      1.2-1
</pre>
</pre>


To consider also the packages from Bioconductor, we have the following code. Note that "3.1" means the Bioconductor version and "3.2" is the R version. See [http://bioconductor.org/about/release-announcements/#release-versions Bioconductor release versions] page.
=== formatC and prettyNum (prettifying numbers) ===
* [https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/formatC formatC() & prettyNum()]
* [[R#format.pval|format.pval()]]
<pre>
<pre>
psic <- packageStatus(repos = c(contrib.url(getOption("repos")),
R> (x <- 1.2345 * 10 ^ (-8:4))
                                "http://bioconductor.org/packages/3.1/bioc/bin/windows/contrib/3.2",
[1] 1.2345e-08 1.2345e-07 1.2345e-06 1.2345e-05 1.2345e-04 1.2345e-03
                                "http://www.bioconductor.org/packages/3.1/data/annotation/bin/windows/contrib/3.2"))$inst
[7] 1.2345e-02 1.2345e-01 1.2345e+00 1.2345e+01 1.2345e+02 1.2345e+03
subset(psic, Status == "upgrade", drop = FALSE)
[13] 1.2345e+04
pl <- unname(psic$Package[psic$Status == "upgrade"])
R> formatC(x)
[1] "1.234e-08" "1.234e-07" "1.234e-06" "1.234e-05" "0.0001234" "0.001234"
[7] "0.01235"  "0.1235"    "1.234"    "12.34"    "123.4"    "1234"
[13] "1.234e+04"
R> formatC(x, digits=3)
[1] "1.23e-08" "1.23e-07" "1.23e-06" "1.23e-05" "0.000123" "0.00123"
[7] "0.0123"   "0.123"    "1.23"    "12.3"    " 123"    "1.23e+03"
[13] "1.23e+04"
R> formatC(x, digits=3, format="e")
[1] "1.234e-08" "1.234e-07" "1.234e-06" "1.234e-05" "1.234e-04" "1.234e-03"
[7] "1.235e-02" "1.235e-01" "1.234e+00" "1.234e+01" "1.234e+02" "1.234e+03"
[13] "1.234e+04"


# ap <- as.data.frame(available.packages()[, c(1,2,3)], stringsAsFactors = FALSE)
R> x <- .000012345
ap  <- as.data.frame(available.packages(c(contrib.url(getOption("repos")),
R> prettyNum(x)
                                "http://bioconductor.org/packages/3.1/bioc/bin/windows/contrib/3.2",
[1] "1.2345e-05"
                                "http://www.bioconductor.org/packages/3.1/data/annotation/bin/windows/contrib/3.2"))[, c(1:3)],
R> x <- .00012345
                      stringAsFactors = FALSE)
R> prettyNum(x)
 
[1] "0.00012345"
out <- cbind(subset(psic, Status == "upgrade")[, c("Package", "Version")], ap[match(pl, ap$Package), "Version"])
colnames(out)[2:3] <- c("OldVersion", "NewVersion")
rownames(out) <- NULL
out
#        Package  OldVersion  NewVersion
# 1        limma      3.24.5      3.24.9
# 2 RcppArmadillo 0.5.100.1.0 0.5.200.1.0
# 3        Matrix      1.2-0      1.2-1
</pre>
</pre>


==== Searching for packages in CRAN ====
=== format(x, scientific = TRUE) vs round() vs format.pval() ===
* [http://blog.revolutionanalytics.com/2015/06/fishing-for-packages-in-cran.html Fishing for packages in CRAN]
Print numeric data in exponential format, so .0001 prints as 1e-4
* [http://blog.revolutionanalytics.com/2017/01/cran-10000.html CRAN now has 10,000 R packages. Here's how to find the ones you need]
<syntaxhighlight lang='r'>
format(c(0.00001156, 0.84134, 2.1669), scientific = T, digits=4)
# [1] "1.156e-05" "8.413e-01" "2.167e+00"
round(c(0.00001156, 0.84134, 2.1669), digits=4)
# [1] 0.0000 0.8413 2.1669


==== [https://cran.r-project.org/web/packages/cranly/ cranly] visualisations and summaries for R packages ====
format.pval(c(0.00001156, 0.84134, 2.1669)) # output is char vector
[https://rviews.rstudio.com/2018/05/31/exploring-r-packages/ Exploring R packages with cranly]
# [1] "1.156e-05" "0.84134"  "2.16690"
format.pval(c(0.00001156, 0.84134, 2.1669), digits=4)
# [1] "1.156e-05" "0.8413"    "2.1669"
</syntaxhighlight>


==== Query top downloaded packages ====
== Creating publication quality graphs in R ==
* [https://github.com/metacran/cranlogs cranlogs] package - Download Logs from the RStudio CRAN Mirror
* http://teachpress.environmentalinformatics-marburg.de/2013/07/creating-publication-quality-graphs-in-r-7/
* http://blog.revolutionanalytics.com/2015/06/working-with-the-rstudio-cran-logs.html


==== Would you like to use a personal library instead? ====
== HDF5 : Hierarchical Data Format==
The problem can happen if the R was installed to the C:\Program Files\R folder by ''users'' but then some main packages want to be upgraded. R will always pops a message 'Would you like to use a personal library instead?'.  
HDF5 is an open binary file format for storing and managing large, complex datasets. The file format was developed by the HDF Group, and is widely used in scientific computing.


To suppress the message and use the personal library always,
* https://en.wikipedia.org/wiki/Hierarchical_Data_Format
* Run R as administrator. If you do that, main packages can be upgraded from C:\Program Files\R\R-X.Y.Z\library folder.
* [https://support.hdfgroup.org/HDF5/ HDF5 tutorial] and others
* [[Main_Page#Writable_R_package_directory_cannot_be_found|Writable R package directory cannot be found]] and a [[Main_Page#Download_required_R.2FBioconductor_.28software.29_packages|this]]. A solution here is to change the security of the R library folder so the user has a full control on the folder.
* [http://www.bioconductor.org/packages/release/bioc/html/rhdf5.html rhdf5] package
* [https://cran.r-project.org/bin/windows/base/rw-FAQ.html#Does-R-run-under-Windows-Vista_003f Does R run under Windows Vista/7/8/Server 2008?] There are 3 ways to get around the issue.
* rhdf5 is used by [http://amp.pharm.mssm.edu/archs4/data.html ARCHS4] where you can download R program that will download hdf5 file storing expression and metadata such as gene ID, sample/GSM ID, tissues, et al.
* [https://cran.r-project.org/bin/windows/base/rw-FAQ.html#I-don_0027t-have-permission-to-write-to-the-R_002d3_002e3_002e2_005clibrary-directory I don’t have permission to write to the R-3.3.2\library directory]


Actually the following hints will help us to create a convenient function UpdateMainLibrary() which will install updated main packages in the user's ''Documents'' directory without a warning dialog.
== Formats for writing/saving and sharing data ==
* '''.libPaths()''' only returns 1 string "C:/Program Files/R/R-x.y.z/library" on the machines that does not have this problem
[http://www.econometricsbysimulation.com/2016/12/efficiently-saving-and-sharing-data-in-r_46.html Efficiently Saving and Sharing Data in R]
* '''.libPaths()''' returns two strings "C:/Users/USERNAME/Documents/R/win-library/x.y" & "C:/Program Files/R/R-x.y.z/library" on machines with the problem.
<syntaxhighlight lang='rsplus'>
UpdateMainLibrary <- function() {
  # Update main/site packages
  # The function is used to fix the problem 'Would you like to use a personal library instead?' 
  if (length(.libPaths()) == 1) return()
 
  ind_mloc <- grep("Program", .libPaths()) # main library e.g. 2
  ind_ploc <- grep("Documents", .libPaths()) # personal library e.g. 1
  if (length(ind_mloc) > 0L && length(ind_ploc) > 0L)
    # search outdated main packages
old_mloc <- ! old.packages(.libPaths()[ind_mloc])[, "Package"] %in%
              installed.packages(.libPaths()[ind_ploc])[, "Package"]
    oldpac <- old.packages(.libPaths()[ind_mloc])[old_mloc, "Package"]
if (length(oldpac) > 0L)
        install.packages(oldpac, .libPaths()[ind_ploc]) 
}
</syntaxhighlight>


==== Warning: cannot remove prior installation of package ====
== Write unix format files on Windows and vice versa ==
http://stackoverflow.com/questions/15932152/unloading-and-removing-a-loaded-package-withouth-restarting-r
https://stat.ethz.ch/pipermail/r-devel/2012-April/063931.html


Instance 1.
== with() and within() functions ==
* [https://www.r-bloggers.com/2023/07/simplify-your-code-with-rs-powerful-functions-with-and-within/ Simplify Your Code with R’s Powerful Functions: with() and within()]
* within() is similar to with() except it is used to create new columns and merge them with the original data sets. But if we just want to create a new column, we can just use df$newVar = . The following example is from [http://www.youtube.com/watch?v=pZ6Bnxg9E8w&list=PLOU2XLYxmsIK9qQfztXeybpHvru-TrqAP youtube video].
<pre>
<pre>
# Install the latest hgu133plus2cdf package
closePr <- with(mariokart, totalPr - shipPr)
# Remove/Uninstall hgu133plus2.db package
head(closePr, 20)
# Put/Install an old version of IRanges (eg version 1.18.2 while currently it is version 1.18.3)
# Test on R 3.0.1
library(hgu133plus2cdf) # hgu133pluscdf does not depend or import IRanges
source("http://bioconductor.org/biocLite.R")
biocLite("hgu133plus2.db", ask=FALSE) # hgu133plus2.db imports IRanges
# Warning:cannot remove prior installation of package 'IRanges'
# Open Windows Explorer and check IRanges folder. Only see libs subfolder.
</pre>


Note:
mk <- within(mariokart, {
* In the above example, all packages were installed under C:\Program Files\R\R-3.0.1\library\.
            closePr <- totalPr - shipPr
* In another instance where I cannot reproduce the problem, new R packages were installed under C:\Users\xxx\Documents\R\win-library\3.0\. The different thing is IRanges package CAN be updated but if I use packageVersion("IRanges") command in R, it still shows the old version.
    })
* The above were tested on a desktop.
head(mk) # new column closePr


Instance 2.
mk <- mariokart
<pre>
aggregate(. ~ wheels + cond, mk, mean)
# On a fresh R 3.2.0, I install Bioconductor's depPkgTools & lumi packages. Then I close R, re-open it,
# create mean according to each level of (wheels, cond)
# and install depPkgTools package again.
> source("http://bioconductor.org/biocLite.R")
Bioconductor version 3.1 (BiocInstaller 1.18.2), ?biocLite for help
> biocLite("pkgDepTools")
BioC_mirror: http://bioconductor.org
Using Bioconductor version 3.1 (BiocInstaller 1.18.2), R version 3.2.0.
Installing package(s) ‘pkgDepTools’
trying URL 'http://bioconductor.org/packages/3.1/bioc/bin/windows/contrib/3.2/pkgDepTools_1.34.0.zip'
Content type 'application/zip' length 390579 bytes (381 KB)
downloaded 381 KB


package ‘pkgDepTools’ successfully unpacked and MD5 sums checked
aggregate(totalPr ~ wheels + cond, mk, mean)
Warning: cannot remove prior installation of package ‘pkgDepTools’


The downloaded binary packages are in
tapply(mk$totalPr, mk[, c("wheels", "cond")], mean)
        C:\Users\brb\AppData\Local\Temp\RtmpYd2l7i\downloaded_packages
> library(pkgDepTools)
Error in library(pkgDepTools) : there is no package called ‘pkgDepTools’
</pre>
</pre>
The pkgDepTools library folder appears in C:\Users\brb\Documents\R\win-library\3.2, but it is empty. The weird thing is if I try the above steps again, I cannot reproduce the problem.


==== Warning: Unable to move temporary installation ====
== stem(): stem-and-leaf plot (alternative to histogram), bar chart on terminals ==
The problem seems to happen only on virtual machines (Virtualbox).
* https://en.wikipedia.org/wiki/Stem-and-leaf_display
* '''Warning: unable to move temporary installation `C:\Users\brb\Documents\R\win-library\3.0\fileed8270978f5\quadprog`  to `C:\Users\brb\Documents\R\win-library\3.0\quadprog`''' when I try to run 'install.packages("forecast").
* [https://www.dataanalytics.org.uk/tally-plots-in-r/ Tally plots in R]
* '''Warning: unable to move temporary installation ‘C:\Users\brb\Documents\R\win-library\3.2\file5e0104b5b49\plyr’ to ‘C:\Users\brb\Documents\R\win-library\3.2\plyr’ ''' when I try to run 'biocLite("lumi")'. The other dependency packages look fine although I am not sure if any unknown problem can happen (it does, see below).
* https://stackoverflow.com/questions/14736556/ascii-plotting-functions-for-r
* [https://cran.r-project.org/web/packages/txtplot/index.html txtplot] package


Here is a note of my trouble shooting.
== Plot histograms as lines ==
# If I try to ignore the warning and load the lumi package. I will get an error.
https://stackoverflow.com/a/16681279. This is useful when we want to compare the distribution from different statistics.  
# If I try to run biocLite("lumi") again, it will only download & install lumi without checking missing 'plyr' package. Therefore, when I try to load the lumi package, it will give me an error again.
# Even I install the plyr package manually, library(lumi) gives another error - missing mclust package.
<pre>
<pre>
> biocLite("lumi")
x2=invisible(hist(out2$EB))
trying URL 'http://bioconductor.org/packages/3.1/bioc/bin/windows/contrib/3.2/BiocInstaller_1.18.2.zip'
y2=invisible(hist(out2$Bench))
Content type 'application/zip' length 114097 bytes (111 KB)
z2=invisible(hist(out2$EB0.001))
downloaded 111 KB
...
package ‘lumi’ successfully unpacked and MD5 sums checked


The downloaded binary packages are in
plot(x=x2$mids, y=x2$density, type="l")
        C:\Users\brb\AppData\Local\Temp\RtmpyUjsJD\downloaded_packages
lines(y2$mids, y2$density, lty=2, pwd=2)
Old packages: 'BiocParallel', 'Biostrings', 'caret', 'DESeq2', 'gdata', 'GenomicFeatures', 'gplots', 'Hmisc', 'Rcpp', 'RcppArmadillo', 'rgl',
lines(z2$mids, z2$density, lty=3, pwd=2)
  'stringr'
</pre>
Update all/some/none? [a/s/n]: a
also installing the dependencies ‘Rsamtools’, ‘GenomicAlignments’, ‘plyr’, ‘rtracklayer’, ‘gridExtra’, ‘stringi’, ‘magrittr’


trying URL 'http://bioconductor.org/packages/3.1/bioc/bin/windows/contrib/3.2/Rsamtools_1.20.1.zip'
== Histogram with density line ==
Content type 'application/zip' length 8138197 bytes (7.8 MB)
<pre>
downloaded 7.8 MB
hist(x, prob = TRUE)
...
lines(density(x), col = 4, lwd = 2)
package ‘Rsamtools’ successfully unpacked and MD5 sums checked
</pre>
package ‘GenomicAlignments’ successfully unpacked and MD5 sums checked
The overlayed density may looks strange in cases for example counts from single-cell RNASeq or p-values from RNASeq (there is a peak around x=0).
package ‘plyr’ successfully unpacked and MD5 sums checked
Warning: unable to move temporary installation ‘C:\Users\brb\Documents\R\win-library\3.2\file5e0104b5b49\plyr’
        to ‘C:\Users\brb\Documents\R\win-library\3.2\plyr’
package ‘rtracklayer’ successfully unpacked and MD5 sums checked
package ‘gridExtra’ successfully unpacked and MD5 sums checked
package ‘stringi’ successfully unpacked and MD5 sums checked
package ‘magrittr’ successfully unpacked and MD5 sums checked
package ‘BiocParallel’ successfully unpacked and MD5 sums checked
package ‘Biostrings’ successfully unpacked and MD5 sums checked
Warning: cannot remove prior installation of package ‘Biostrings’
package ‘caret’ successfully unpacked and MD5 sums checked
package ‘DESeq2’ successfully unpacked and MD5 sums checked
package ‘gdata’ successfully unpacked and MD5 sums checked
package ‘GenomicFeatures’ successfully unpacked and MD5 sums checked
package ‘gplots’ successfully unpacked and MD5 sums checked
package ‘Hmisc’ successfully unpacked and MD5 sums checked
package ‘Rcpp’ successfully unpacked and MD5 sums checked
package ‘RcppArmadillo’ successfully unpacked and MD5 sums checked
package ‘rgl’ successfully unpacked and MD5 sums checked
package ‘stringr’ successfully unpacked and MD5 sums checked


The downloaded binary packages are in
== Graphical Parameters, Axes and Text, Combining Plots ==
        C:\Users\brb\AppData\Local\Temp\RtmpyUjsJD\downloaded_packages
[http://www.statmethods.net/advgraphs/axes.html statmethods.net]
> library(lumi)
Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) :
  there is no package called ‘plyr’
Error: package or namespace load failed for ‘lumi’
> search()
[1] ".GlobalEnv"            "package:BiocInstaller" "package:Biobase"      "package:BiocGenerics"  "package:parallel"      "package:stats"       
[7] "package:graphics"      "package:grDevices"    "package:utils"        "package:datasets"      "package:methods"      "Autoloads"           
[13] "package:base"       
> biocLite("lumi")
BioC_mirror: http://bioconductor.org
Using Bioconductor version 3.1 (BiocInstaller 1.18.2), R version 3.2.0.
Installing package(s) ‘lumi’
trying URL 'http://bioconductor.org/packages/3.1/bioc/bin/windows/contrib/3.2/lumi_2.20.1.zip'
Content type 'application/zip' length 18185326 bytes (17.3 MB)
downloaded 17.3 MB


package ‘lumi’ successfully unpacked and MD5 sums checked
== 15 Questions All R Users Have About Plots ==
See [https://www.datacamp.com/tutorial/15-questions-about-r-plots 15 Questions All R Users Have About Plots]. This is a tremendous post. It covers the built-in plot() function and ggplot() from ggplot2 package.


The downloaded binary packages are in
# How To Draw An Empty R Plot? plot.new()
        C:\Users\brb\AppData\Local\Temp\RtmpyUjsJD\downloaded_packages
# How To Set The Axis Labels And Title Of The R Plots?
> search()
# How To Add And Change The Spacing Of The Tick Marks Of Your R Plot? axis()  
[1] ".GlobalEnv"            "package:BiocInstaller" "package:Biobase"      "package:BiocGenerics"  "package:parallel"      "package:stats"       
# How To Create Two Different X- or Y-axes? par(new=TRUE), axis(), mtext(). [https://www.rdocumentation.org/packages/graphics/versions/3.6.2/topics/par ?par].
[7] "package:graphics"      "package:grDevices"    "package:utils"        "package:datasets"      "package:methods"      "Autoloads"           
# How To Add Or Change The R Plot’s Legend? legend()
[13] "package:base"       
# How To Draw A Grid In Your R Plot? [https://r-charts.com/base-r/grid/ grid()]
> library(lumi)
# How To Draw A Plot With A PNG As Background? rasterImage() from the '''png''' package
Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) :
# How To Adjust The Size Of Points In An R Plot? cex argument
  there is no package called ‘plyr’
# How To Fit A Smooth Curve To Your R Data? loess() and lines()
Error: package or namespace load failed for ‘lumi’
# How To Add Error Bars In An R Plot? arrows()
> biocLite("plyr")
# How To Save A Plot As An Image On Disc
BioC_mirror: http://bioconductor.org
# How To Plot Two R Plots Next To Each Other? '''par(mfrow)'''[which means Multiple Figures (use ROW-wise)], '''gridBase''' package, '''lattice''' package
Using Bioconductor version 3.1 (BiocInstaller 1.18.2), R version 3.2.0.
# How To Plot Multiple Lines Or Points? plot(), lines()
Installing package(s) ‘plyr’
# How To Fix The Aspect Ratio For Your R Plots? asp parameter
trying URL 'http://cran.rstudio.com/bin/windows/contrib/3.2/plyr_1.8.2.zip'
# What Is The Function Of hjust And vjust In ggplot2?
Content type 'application/zip' length 1128621 bytes (1.1 MB)
downloaded 1.1 MB


package ‘plyr’ successfully unpacked and MD5 sums checked
== jitter function ==
* https://www.rdocumentation.org/packages/base/versions/3.5.2/topics/jitter
** jitter(, amount) function adds a random variation between -amount/2 and amount/2 to each element in x
* [https://stackoverflow.com/a/17552046 What does the “jitter” function do in R?]
* [https://www.r-bloggers.com/2023/09/when-to-use-jitter/ When to use Jitter]
* [https://stats.stackexchange.com/a/146174 How to calculate Area Under the Curve (AUC), or the c-statistic, by hand]


The downloaded binary packages are in
:[[File:Jitterbox.png|200px]]
        C:\Users\brb\AppData\Local\Temp\RtmpyUjsJD\downloaded_packages


> library(lumi)
== Scatterplot with the "rug" function ==
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) :
<pre>
  there is no package called ‘mclust’
require(stats) # both 'density' and its default method
Error: package or namespace load failed for ‘lumi’
with(faithful, {
 
    plot(density(eruptions, bw = 0.15))
> ?biocLite
    rug(eruptions)
Warning messages:
    rug(jitter(eruptions, amount = 0.01), side = 3, col = "light blue")
1: In read.dcf(file.path(p, "DESCRIPTION"), c("Package", "Version")) :
})
  cannot open compressed file 'C:/Users/brb/Documents/R/win-library/3.2/Biostrings/DESCRIPTION', probable reason 'No such file or directory'
2: In find.package(if (is.null(package)) loadedNamespaces() else package,  :
  there is no package called ‘Biostrings’
> library(lumi)
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) :
  there is no package called ‘mclust’
In addition: Warning messages:
1: In read.dcf(file.path(p, "DESCRIPTION"), c("Package", "Version")) :
  cannot open compressed file 'C:/Users/brb/Documents/R/win-library/3.2/Biostrings/DESCRIPTION', probable reason 'No such file or directory'
2: In find.package(if (is.null(package)) loadedNamespaces() else package,  :
  there is no package called ‘Biostrings’
Error: package or namespace load failed for ‘lumi’
</pre>
</pre>
[[:File:RugFunction.png]]


[http://r.789695.n4.nabble.com/unable-to-move-temporary-installation-td4521714.html Other people also have the similar problem]. The possible cause is the virus scanner locks the file and R cannot move them.
See also the [https://stat.ethz.ch/R-manual/R-devel/library/graphics/html/stripchart.html stripchart()] function which produces one dimensional scatter plots (or dot plots) of the given data.


Some possible solutions:
== Identify/Locate Points in a Scatter Plot ==
# Delete ALL folders under R/library (e.g. C:/Progra~1/R/R-3.2.0/library) folder and install the main package again using install.packages() or biocLite().
<ul>
# For specific package like 'lumi' from Bioconductor, we can [[R#Bioconductor.27s_pkgDepTools_package|find out all dependency packages]] and then install them one by one.
<li>[https://www.rdocumentation.org/packages/graphics/versions/3.5.1/topics/identify ?identify]
# Find out and install the top level package which misses dependency packages.
<li>[https://stackoverflow.com/a/23234142 Using the identify function in R]
## This is based on the fact that install.packages() or biocLite() '''sometimes''' just checks & installs the 'Depends' and 'Imports' packages and '''won't install all packages recursively'''
## we can do a small experiment by removing a package which is not directly dependent/imported by another package; e.g. 'iterators' is not dependent/imported by 'glment' directly but indirectly. So if we run '''remove.packages("iterators"); install.packages("glmnet")''', then the 'iterator' package is still missing.
## A real example is if the missing packages are 'Biostrings', 'limma', 'mclust' (these are packages that 'minfi' directly depends/imports although they should be installed when I run biocLite("lumi") command), then I should just run the command '''remove.packages("minfi"); biocLite("minfi")'''. If we just run biocLite("lumi") or biocLite("methylumi"), the missing packages won't be installed.
 
==== Error in download.file(url, destfile, method, mode = "wb", ...) ====
HTTP status was '404 Not Found'
 
Tested on an existing R-3.2.0 session. Note that VariantAnnotation 1.14.4 was just uploaded to Bioc.
<pre>
<pre>
> biocLite("COSMIC.67")
plot(x, y)
BioC_mirror: http://bioconductor.org
identify(x, y, labels = names, plot = TRUE)  
Using Bioconductor version 3.1 (BiocInstaller 1.18.3), R version 3.2.0.
# Use left clicks to select points we want to identify and "esc" to stop the process
Installing package(s) ‘COSMIC.67’
# This will put the labels on the plot and also return the indices of points
also installing the dependency ‘VariantAnnotation’
# [1] 143
names[143]
</pre>
</ul>


trying URL 'http://bioconductor.org/packages/3.1/bioc/bin/windows/contrib/3.2/VariantAnnotation_1.14.3.zip'
== Draw a single plot with two different y-axes ==
Error in download.file(url, destfile, method, mode = "wb", ...) :
* http://www.gettinggeneticsdone.com/2015/04/r-single-plot-with-two-different-y-axes.html
  cannot open URL 'http://bioconductor.org/packages/3.1/bioc/bin/windows/contrib/3.2/VariantAnnotation_1.14.3.zip'
In addition: Warning message:
In download.file(url, destfile, method, mode = "wb", ...) :
  cannot open: HTTP status was '404 Not Found'
Warning in download.packages(pkgs, destdir = tmpd, available = available,  :
  download of package ‘VariantAnnotation’ failed
installing the source package ‘COSMIC.67’


trying URL 'http://bioconductor.org/packages/3.1/data/experiment/src/contrib/COSMIC.67_1.4.0.tar.gz'
== Draw Color Palette ==
Content type 'application/x-gzip' length 40999037 bytes (39.1 MB)
* http://teachpress.environmentalinformatics-marburg.de/2013/07/creating-publication-quality-graphs-in-r-7/
</pre>


However, when I tested on a new R-3.2.0 (just installed in VM), the COSMIC package installation is successful. That VariantAnnotation version 1.14.4 was installed (this version was just updated today from Bioconductor).
=== Default palette before R 4.0 ===
palette() # black, red, green3, blue, cyan, magenta, yellow, gray


The cause of the error is the '''[https://github.com/wch/r-source/blob/trunk/src/library/utils/R/packages.R available.package()]''' function will read the rds file first from cache in a tempdir (C:\Users\XXXX\AppData\Local\Temp\RtmpYYYYYY). See lines 51-55 of <packages.R>.
<pre>
<pre>
dest <- file.path(tempdir(),
# Example from Coursera "Statistics for Genomic Data Science" by Jeff Leek
                  paste0("repos_", URLencode(repos, TRUE), ".rds"))
tropical = c('darkorange', 'dodgerblue', 'hotpink', 'limegreen', 'yellow')
if(file.exists(dest)) {
palette(tropical)
    res0 <- readRDS(dest)
plot(1:5, 1:5, col=1:5, pch=16, cex=5)
} else {
    ...
</pre>
</pre>
Since my R was opened 1 week ago, the rds file it reads today contains old information. Note that Bioconductor does not hold the source code or binary code for the old version of packages. This explains why biocLite() function broke. When I restart R, the original problem is gone.


If we look at the source code of available.packages(), we will see we could use '''cacheOK''' option in download.file() function.
=== New palette in R 4.0.0 ===
[https://youtu.be/I4k0LkTOKvU?t=464 R 4.0: 3 new features], [https://blog.revolutionanalytics.com/2020/04/r-400-is-released.html R 4.0.0 now available, and a look back at R's history]. For example, we can select "ggplot2" palette to make the base graphics charts that match the color scheme of ggplot2.
<pre>
<pre>
download.file(url, destfile, method, cacheOK = FALSE, quiet = TRUE, mode ="wb")
R> palette()
[1] "black"  "#DF536B" "#61D04F" "#2297E6" "#28E2E5" "#CD0BBC" "#F5C710"
[8] "gray62"
R> palette.pals()
[1] "R3"              "R4"              "ggplot2"       
[4] "Okabe-Ito"      "Accent"          "Dark 2"       
[7] "Paired"          "Pastel 1"        "Pastel 2"     
[10] "Set 1"          "Set 2"          "Set 3"         
[13] "Tableau 10"      "Classic Tableau" "Polychrome 36" 
[16] "Alphabet"
R> palette.colors(palette='R4') # same as palette()
[1] "#000000" "#DF536B" "#61D04F" "#2297E6" "#28E2E5" "#CD0BBC" "#F5C710"
[8] "#9E9E9E"
R> palette("R3")  # nothing return on screen but palette has changed
R> palette()
[1] "black"  "red"    "green3"  "blue"    "cyan"    "magenta" "yellow"
[8] "gray" 
R> palette("R4") # reset to the default color palette; OR palette("default")
 
R> scales::show_col(palette.colors(palette = "Okabe-Ito"))
R> for(id in palette.pals()) {
    scales::show_col(palette.colors(palette = id))
    title(id)
    readline("Press [enter] to continue")  
  }
</pre>
</pre>
 
The '''palette''' function can also be used to change the color palette. See [https://data.library.virginia.edu/setting-up-color-palettes-in-r/ Setting up Color Palettes in R]
==== Another case: Error in download.file(url, destfile, method, mode = "wb", ...) ====
<pre>
<pre>
> install.packages("quantreg")
palette("ggplot2")
 
palette(palette()[-1]) # Remove 'black'
  There is a binary version available but the source version is later:
  # OR palette(palette.colors(palette = "ggplot2")[-1] )
        binary source needs_compilation
with(iris, plot(Sepal.Length, Petal.Length, col = Species, pch=16))
quantreg  5.33  5.34              TRUE


Do you want to install from sources the package which needs compilation?
cc <- palette()
y/n: n
palette(c(cc,"purple","brown")) # Add two colors
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.4/quantreg_5.33.tgz'
Warning in install.packages :
  cannot open URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.4/quantreg_5.33.tgz': HTTP status was '404 Not Found'
Error in download.file(url, destfile, method, mode = "wb", ...) :
  cannot open URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.4/quantreg_5.33.tgz'
Warning in install.packages :
  download of package ‘quantreg’ failed
</pre>
</pre>
It seems the binary package cannot be found on the mirror. So the solution here is to download the package from the R main server. Note that after I have successfully installed the binary package from the main R server, I remove the package in R and try to install the binary package from rstudio.com server agin and it works this time.
<pre>
<pre>
> install.packages("quantreg", repos = "https://cran.r-project.org")
R> colors() |> length() # [1] 657
trying URL 'https://cran.r-project.org/bin/macosx/el-capitan/contrib/3.4/quantreg_5.34.tgz'
R> colors(distinct = T) |> length() # [1] 502
Content type 'application/x-gzip' length 1863561 bytes (1.8 MB)
==================================================
downloaded 1.8 MB
</pre>
</pre>


==== Another case: Error in download.file() on Windows 7 ====
=== evoPalette ===
For some reason, IE 8 cannot interpret https://ftp.ncbi.nlm.nih.gov though it understands ftp://ftp.ncbi.nlm.nih.gov.
[http://gradientdescending.com/evolve-new-colour-palettes-in-r-with-evopalette/ Evolve new colour palettes in R with evoPalette]


This is tested using R 3.4.3.
=== rtist ===
<pre>
[https://github.com/tomasokal/rtist?s=09 rtist]: Use the palettes of famous artists in your own visualizations.
> download.file("https://ftp.ncbi.nlm.nih.gov/geo/series/GSE7nnn/GSE7848/soft/GSE7848_family.soft.gz", "test.soft.gz")
trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE7nnn/GSE7848/soft/GSE7848_family.soft.gz'
Error in download.file("https://ftp.ncbi.nlm.nih.gov/geo/series/GSE7nnn/GSE7848/soft/GSE7848_family.soft.gz",  :
  cannot open URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE7nnn/GSE7848/soft/GSE7848_family.soft.gz'
In addition: Warning message:
In download.file("https://ftp.ncbi.nlm.nih.gov/geo/series/GSE7nnn/GSE7848/soft/GSE7848_family.soft.gz",  :
  InternetOpenUrl failed: 'An error occurred in the secure channel support'


> download.file("ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE7nnn/GSE7848/soft/GSE7848_family.soft.gz", "test.soft.gz")
== SVG ==
trying URL 'ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE7nnn/GSE7848/soft/GSE7848_family.soft.gz'
=== Embed svg in html ===
downloaded 9.1 MB
* http://www.magesblog.com/2016/02/using-svg-graphics-in-blog-posts.html
 
=== svglite ===
svglite is better R's svg(). It was used by ggsave().
[https://www.rstudio.com/blog/svglite-1-2-0/ svglite 1.2.0], [https://r-graphics.org/recipe-output-vector-svg R Graphics Cookbook].
 
=== pdf -> svg ===
Using Inkscape. See [https://robertgrantstats.wordpress.com/2017/09/07/svg-from-stats-software-the-good-the-bad-and-the-ugly/ this post].
 
=== svg -> png ===
[https://laustep.github.io/stlahblog/posts/SVG2PNG.html SVG to PNG] using the [https://cran.rstudio.com/web/packages/gyro/index.html gyro] package
 
== read.table ==
=== clipboard ===
{{Pre}}
source("clipboard")
read.table("clipboard")
</pre>
</pre>


==== Error in unloadNamespace(package) ====
=== inline text ===
<pre>
{{Pre}}
> d3heatmap(mtcars, scale = "column", colors = "Blues")
mydf <- read.table(header=T, text='
Error: 'col_numeric' is not an exported object from 'namespace:scales'
cond yval
> packageVersion("scales")
    A 2
[1] ‘0.2.5’
    B 2.5
> library(scales)
    C 1.6
Error in unloadNamespace(package) :
')
  namespace ‘scales’ is imported by ‘ggplot2’ so cannot be unloaded
In addition: Warning message:
package ‘scales’ was built under R version 3.2.1  
Error in library(scales) :
  Package ‘scales’ version 0.2.4 cannot be unloaded
> search()
[1] ".GlobalEnv"            "package:d3heatmap"      "package:ggplot2"     
[4] "package:microbenchmark" "package:COSMIC.67"      "package:BiocInstaller"
[7] "package:stats"          "package:graphics"      "package:grDevices"   
[10] "package:utils"          "package:datasets"      "package:methods"     
[13] "Autoloads"              "package:base"
</pre>
</pre>
If I open a new R session, the above error will not happen!


The problem occurred because the 'scales' package version required by the d3heatmap package/function is old. See [https://github.com/rstudio/d3heatmap/issues/16 this post]. And when I upgraded the 'scales' package, it was ''locked'' by the package was ''imported'' by the ''ggplot2'' package.
=== http(s) connection ===
{{Pre}}
temp = getURL("https://gist.github.com/arraytools/6743826/raw/23c8b0bc4b8f0d1bfe1c2fad985ca2e091aeb916/ip.txt",  
                          ssl.verifypeer = FALSE)
ip <- read.table(textConnection(temp), as.is=TRUE)
</pre>


==== Unload a package ====
=== read only specific columns ===
See an example below.
Use 'colClasses' option in read.table, read.delim, .... For example, the following example reads only the 3rd column of the text file and also changes its data type from a data frame to a vector. Note that we have include double quotes around NULL.
<pre>
{{Pre}}
require(splines)
x <- read.table("var_annot.vcf", colClasses = c(rep("NULL", 2), "character", rep("NULL", 7)),
detach(package:splines, unload=TRUE)
                skip=62, header=T, stringsAsFactors = FALSE)[, 1]
#
system.time(x <- read.delim("Methylation450k.txt",
                colClasses = c("character", "numeric", rep("NULL", 188)), stringsAsFactors = FALSE))
</pre>
</pre>


==== [http://www.r-pkg.org/ METACRAN] - Search and browse all CRAN/R packages ====
To know the number of columns, we might want to read the first row first.
* Source code on https://github.com/metacran. The 'PACKAGES' file is updated regularly to Github.
{{Pre}}
* [https://stat.ethz.ch/pipermail/r-devel/2015-May/thread.html Announcement] on R/mailing list
library(magrittr)
* Author's homepage on http://gaborcsardi.org/.
scan("var_annot.vcf", sep="\t", what="character", skip=62, nlines=1, quiet=TRUE) %>% length()
</pre>


==== New R packages as reported by [http://dirk.eddelbuettel.com/cranberries/ CRANberries] ====
Another method is to use '''pipe()''', '''cut''' or '''awk'''. See [https://stackoverflow.com/questions/2193742/ways-to-read-only-select-columns-from-a-file-into-r-a-happy-medium-between-re ways to read only selected columns from a file into R]
http://blog.revolutionanalytics.com/2015/07/mranspackages-spotlight.html


=== check.names = FALSE in read.table() ===
<pre>
<pre>
#----------------------------
gx <- read.table(file, header = T, row.names =1)
# SCRAPE CRANBERRIES FILES TO COUNT NEW PACKAGES AND PLOT
colnames(gx) %>% grep("[^[:alnum:] ]", ., value = TRUE)
#
# [1] "hCG_1642354" "IGH."       "IGHV1.69"   "IGKV1.5"     "IGKV2.24"   "KRTAP13.2"
library(ggplot2)
# [7] "KRTAP19.1"   "KRTAP2.4"   "KRTAP5.9"   "KRTAP6.3"   "Kua.UEV"
# Build a vextor of the directories of interest
year <- c("2013","2014","2015")
month <- c("01","02","03","04","05","06","07","08","09","10","11","12")
span <-c(rep(month,2),month[1:7])
dir <- "http://dirk.eddelbuettel.com/cranberries"


url2013 <- file.path(dir,"2013",month)
gx <- read.table(file, header = T, row.names =1, check.names = FALSE)
url2014 <- file.path(dir,"2014",month)
colnames(gx) %>% grep("[^[:alnum:] ]", ., value = TRUE)
url2015 <- file.path(dir,"2015",month[1:7])
# [1] "hCG_1642354" "IGH@"        "IGHV1-69"    "IGKV1-5"    "IGKV2-24"    "KRTAP13-2"
url <- c(url2013,url2014,url2015)
# [7] "KRTAP19-1"  "KRTAP2-4"    "KRTAP5-9"    "KRTAP6-3"    "Kua-UEV" 
</pre>


# Read each directory and count the new packages
=== setNames() ===
new_p <- vector()
Change the colnames. See an example from [https://www.tidymodels.org/start/models/ tidymodels]
for(i in url){
  raw.data <- readLines(i)
  new_p[i] <- length(grep("New package",raw.data,value=TRUE))
}


# Plot
=== Testing for valid variable names ===
time <- seq(as.Date("2013-01-01"), as.Date("2015-07-01"), by="months")
[https://www.r-bloggers.com/testing-for-valid-variable-names/ Testing for valid variable names]
new_pkgs <- data.frame(time,new_p)


ggplot(new_pkgs, aes(time,y=new_p)) +
=== make.names(): Make syntactically valid names out of character vectors ===
  geom_line() + xlab("") + ylab("Number of new packages") +
* [https://stat.ethz.ch/R-manual/R-devel/library/base/html/make.names.html make.names()]
  geom_smooth(method='lm') + ggtitle("New R packages as reported by CRANberries")  
* A valid variable name consists of letters, numbers and the '''dot''' or '''underline''' characters. The variable name starts with a letter or the dot not followed by a number. See [https://www.tutorialspoint.com/r/r_variables.htm R variables].
<pre>
make.names("abc-d") # [1] "abc.d"
</pre>
</pre>


==== Top new packages in 2015 ====
== Serialization ==
* [http://opiateforthemass.es/articles/R-packages-in-2015/ 2015 R packages roundup] by CHRISTOPH SAFFERLING
If we want to pass an R object to C (use recv() function), we can use writeBin() to output the stream size and then use serialize() function to output the stream to a file. See the
* [http://gforge.se/2016/01/r-trends-in-2015/ R trends in 2015] by MAX GORDON
[https://stat.ethz.ch/pipermail/r-devel/attachments/20130628/56473803/attachment.pl post] on R mailing list.
 
<pre>
==== Speeding up package installation ====
> a <- list(1,2,3)
* http://blog.jumpingrivers.com/posts/2017/speed_package_installation/
> a_serial <- serialize(a, NULL)
* [http://dirk.eddelbuettel.com/blog/2017/11/27/#011_faster_package_installation_one (Much) Faster Package (Re-)Installation via Caching]
> a_length <- length(a_serial)
* [http://dirk.eddelbuettel.com/blog/2017/12/13/#013_faster_package_installation_two (Much) Faster Package (Re-)Installation via Caching, part 2]
> a_length
 
[1] 70
=== R package dependencies ===
> writeBin(as.integer(a_length), connection, endian="big")
* Package tools' functions package.dependencies(), pkgDepends(), etc are deprecated now, mostly in favor of package_dependencies() which is both more flexible and efficient. See [https://cran.rstudio.com/doc/manuals/r-release/NEWS.html R 3.3.0 News].
> serialize(a, connection)
</pre>
In C++ process, I receive one int variable first to get the length, and
then read <length> bytes from the connection.


==== Depends, Imports, Suggests, Enhances, LinkingTo ====
== socketConnection ==
See [https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Package-Dependencies Writing R Extensions] and [[#install.packages.28.29|install.packages()]].
See ?socketconnection.  


* Depends: list of package names which this package depends on. Those packages will be attached (so it is better to use ''Imports'' instead of ''Depends'' as much as you can) before the current package when library or require is called. The ‘Depends’ field can also specify a dependence on a certain version of R.
=== Simple example ===
* Imports: lists packages whose '''namespaces''' are imported from (as specified in the NAMESPACE file) but which do not need to be attached.
from the socketConnection's manual.
* Suggests: lists packages that are not necessarily needed. This includes packages used only in examples, tests or vignettes, and packages loaded in the body of functions.
* Enhances: lists packages “enhanced” by the package at hand, e.g., by providing methods for classes from these packages, or ways to handle objects from these packages.
* LinkingTo: A package that wishes to make use of '''header''' files in other packages needs to declare them as a comma-separated list in the field ‘LinkingTo’ in the DESCRIPTION file.


==== Bioconductor's [http://www.bioconductor.org/packages/release/bioc/html/pkgDepTools.html pkgDepTools] package ====
Open one R session
The is an example of querying the dependencies of the notorious 'lumi' package which often broke the installation script. I am using R 3.2.0 and Bioconductor 3.1.
<pre>
con1 <- socketConnection(port = 22131, server = TRUE) # wait until a connection from some client
writeLines(LETTERS, con1)
close(con1)
</pre>


The '''getInstallOrder''' function is useful to get a list of all (recursive) dependency packages.
Open another R session (client)
<pre>
<pre>
source("http://bioconductor.org/biocLite.R")
con2 <- socketConnection(Sys.info()["nodename"], port = 22131)
if (!require(pkgDepTools)) {
# as non-blocking, may need to loop for input
  biocLite("pkgDepTools", ask = FALSE)
readLines(con2)
  library(pkgDepTools)
while(isIncomplete(con2)) {
  Sys.sleep(1)
  z <- readLines(con2)
  if(length(z)) print(z)
}
}
MkPlot <- FALSE
close(con2)
</pre>


library(BiocInstaller)
=== Use nc in client ===
biocUrl <- biocinstallRepos()["BioCsoft"]
biocDeps <- makeDepGraph(biocUrl, type="source", dosize=FALSE) # pkgDepTools defines its makeDepGraph()


PKG <- "lumi"
The client does not have to be the R. We can use telnet, nc, etc. See the post [https://stat.ethz.ch/pipermail/r-sig-hpc/2009-April/000144.html here]. For example, on the client machine, we can issue
if (MkPlot) {
<pre>
  if (!require(Biobase))  {
nc localhost 22131  [ENTER]
    biocLite("Biobase", ask = FALSE)
    library(Biobase)
  }
  if (!require(Rgraphviz))  {
    biocLite("Rgraphviz", ask = FALSE)
    library(Rgraphviz)
  }
  categoryNodes <- c(PKG, names(acc(biocDeps, PKG)[[1]])) 
  categoryGraph <- subGraph(categoryNodes, biocDeps)
  nn <- makeNodeAttrs(categoryGraph, shape="ellipse")
  plot(categoryGraph, nodeAttrs=nn)  # Complete but plot is too complicated & font is too small.
}
 
system.time(allDeps <- makeDepGraph(biocinstallRepos(), type="source",
                          keep.builtin=TRUE, dosize=FALSE)) # takes a little while
#    user  system elapsed
# 175.737  10.994 186.875
# Warning messages:
# 1: In .local(from, to, graph) : edges replaced: ‘SNPRelate|gdsfmt’
# 2: In .local(from, to, graph) :
#  edges replaced: ‘RCurl|methods’, ‘NA|bitops’
 
# When needed.only=TRUE, only those dependencies not currently installed are included in the list.
x1 <- sort(getInstallOrder(PKG, allDeps, needed.only=TRUE)$packages); x1
[1] "affy"                              "affyio"                         
[3] "annotate"                          "AnnotationDbi"                   
[5] "base64"                            "beanplot"                       
[7] "Biobase"                          "BiocParallel"                   
[9] "biomaRt"                          "Biostrings"                     
[11] "bitops"                            "bumphunter"                     
[13] "colorspace"                        "DBI"                             
[15] "dichromat"                        "digest"                         
[17] "doRNG"                            "FDb.InfiniumMethylation.hg19"   
[19] "foreach"                          "futile.logger"                   
[21] "futile.options"                    "genefilter"                     
[23] "GenomeInfoDb"                      "GenomicAlignments"               
[25] "GenomicFeatures"                  "GenomicRanges"                   
[27] "GEOquery"                          "ggplot2"                         
[29] "gtable"                            "illuminaio"                     
[31] "IRanges"                          "iterators"                       
[33] "labeling"                          "lambda.r"                       
[35] "limma"                            "locfit"                         
[37] "lumi"                              "magrittr"                       
[39] "matrixStats"                      "mclust"                         
[41] "methylumi"                        "minfi"                           
[43] "multtest"                          "munsell"                         
[45] "nleqslv"                          "nor1mix"                         
[47] "org.Hs.eg.db"                      "pkgmaker"                       
[49] "plyr"                              "preprocessCore"                 
[51] "proto"                            "quadprog"                       
[53] "RColorBrewer"                      "Rcpp"                           
[55] "RCurl"                            "registry"                       
[57] "reshape"                          "reshape2"                       
[59] "rngtools"                          "Rsamtools"                       
[61] "RSQLite"                          "rtracklayer"                     
[63] "S4Vectors"                        "scales"                         
[65] "siggenes"                          "snow"                           
[67] "stringi"                          "stringr"                         
[69] "TxDb.Hsapiens.UCSC.hg19.knownGene" "XML"                             
[71] "xtable"                            "XVector"                         
[73] "zlibbioc"                       
 
# When needed.only=FALSE the complete list of dependencies is given regardless of the set of currently installed packages.
x2 <- sort(getInstallOrder(PKG, allDeps, needed.only=FALSE)$packages); x2
[1] "affy"                              "affyio"                            "annotate"                       
[4] "AnnotationDbi"                    "base64"                            "beanplot"                       
[7] "Biobase"                          "BiocGenerics"                      "BiocInstaller"                   
[10] "BiocParallel"                      "biomaRt"                          "Biostrings"                     
[13] "bitops"                            "bumphunter"                        "codetools"                       
[16] "colorspace"                        "DBI"                              "dichromat"                       
[19] "digest"                            "doRNG"                            "FDb.InfiniumMethylation.hg19"   
[22] "foreach"                          "futile.logger"                    "futile.options"                 
[25] "genefilter"                        "GenomeInfoDb"                      "GenomicAlignments"               
[28] "GenomicFeatures"                  "GenomicRanges"                    "GEOquery"                       
[31] "ggplot2"                          "graphics"                          "grDevices"                       
[34] "grid"                              "gtable"                            "illuminaio"                     
[37] "IRanges"                          "iterators"                        "KernSmooth"                     
[40] "labeling"                          "lambda.r"                          "lattice"                         
[43] "limma"                            "locfit"                            "lumi"                           
[46] "magrittr"                          "MASS"                              "Matrix"                         
[49] "matrixStats"                      "mclust"                            "methods"                         
[52] "methylumi"                        "mgcv"                              "minfi"                           
[55] "multtest"                          "munsell"                          "nleqslv"                         
[58] "nlme"                              "nor1mix"                          "org.Hs.eg.db"                   
[61] "parallel"                          "pkgmaker"                          "plyr"                           
[64] "preprocessCore"                    "proto"                            "quadprog"                       
[67] "RColorBrewer"                      "Rcpp"                              "RCurl"                           
[70] "registry"                          "reshape"                          "reshape2"                       
[73] "rngtools"                          "Rsamtools"                        "RSQLite"                         
[76] "rtracklayer"                      "S4Vectors"                        "scales"                         
[79] "siggenes"                          "snow"                              "splines"                         
[82] "stats"                            "stats4"                            "stringi"                         
[85] "stringr"                          "survival"                          "tools"                           
[88] "TxDb.Hsapiens.UCSC.hg19.knownGene" "utils"                            "XML"                             
[91] "xtable"                            "XVector"                          "zlibbioc"
 
> sort(setdiff(x2, x1)) # Not all R's base packages are included; e.g. 'base', 'boot', ...
[1] "BiocGenerics"  "BiocInstaller" "codetools"    "graphics"      "grDevices"   
[6] "grid"          "KernSmooth"    "lattice"      "MASS"          "Matrix"     
[11] "methods"      "mgcv"          "nlme"          "parallel"      "splines"     
[16] "stats"        "stats4"        "survival"      "tools"        "utils" 
</pre>
</pre>
[[File:Lumi rgraphviz.svg|200px]]
Then the client will wait and show anything written from the server machine. The connection from nc will be terminated once close(con1) is given.


==== [http://cran.r-project.org/web/packages/miniCRAN/ miniCRAN package]  ====
If I use the command
'''miniCRAN''' package can be used to identify package dependencies or create a local CRAN repository. It can be used on repositories other than CRAN, such as Bioconductor.
<pre>
nc -v -w 2 localhost -z 22130-22135
</pre>
then the connection will be established for a short time which means the cursor on the server machine will be returned. If we issue the above nc command again on the client machine it will show the connection to the port 22131 is refused. PS. "-w" switch denotes the number of seconds of the timeout for connects and final net reads.


* http://blog.revolutionanalytics.com/2014/07/dependencies-of-popular-r-packages.html
Some post I don't have a chance to read. http://digitheadslabnotebook.blogspot.com/2010/09/how-to-send-http-put-request-from-r.html
* http://www.r-bloggers.com/introducing-minicran-an-r-package-to-create-a-private-cran-repository/
* http://www.magesblog.com/2014/09/managing-r-package-dependencies.html
* [http://blog.revolutionanalytics.com/2015/10/using-minicran-in-azure-ml.html Using miniCRAN in Azure ML]
* [http://www.mango-solutions.com/wp/2016/01/minicran-developing-internal-cran-repositories/ developing internal CRAN Repositories]


Before we go into R, we need to install some packages from Ubuntu terminal. See [[R#Ubuntu.2FDebian_2|here]].
=== Use curl command in client ===
<syntaxhighlight lang='rsplus'>
On the server,
# Consider glmnet package (today is 4/29/2015)
<pre>
# Version: 2.0-2
con1 <- socketConnection(port = 8080, server = TRUE)
# Depends: Matrix (≥ 1.0-6), utils, foreach
</pre>
# Suggests: survival, knitr, lars
if (!require("miniCRAN"))  {
  install.packages("miniCRAN", dependencies = TRUE, repos="http://cran.rstudio.com") # include 'igraph' in Suggests.
  library(miniCRAN)
}
if (!"igraph" %in% installed.packages()[,1]) install.packages("igraph")


tags <- "glmnet"
On the client,
pkgDep(tags, suggests=TRUE, enhances=TRUE) # same as pkgDep(tags)
<pre>
#  [1] "glmnet"    "Matrix"    "foreach"  "codetools" "iterators" "lattice"  "evaluate"  "digest" 
curl --trace-ascii debugdump.txt http://localhost:8080/
#  [9] "formatR"  "highr"    "markdown"  "stringr"  "yaml"      "mime"      "survival"  "knitr"   
</pre>
# [17] "lars" 


dg <- makeDepGraph(tags, suggests=TRUE, enhances=TRUE) # miniCRAN defines its makeDepGraph()
Then go to the server,
plot(dg, legendPosition = c(-1, 1), vertex.size=20)
<pre>
</syntaxhighlight>
while(nchar(x <- readLines(con1, 1)) > 0) cat(x, "\n")


[[File:MiniCRAN dep.svg|300px]] [[File:pkgDepTools dep.svg|300px]]
close(con1) # return cursor in the client machine
[[File:Glmnet dep.svg|300px]]
</pre>


We can also display the dependence for a package from the [http://cran.r-project.org/web/packages/miniCRAN/vignettes/miniCRAN-non-CRAN-repos.html Bioconductor] repository.
=== Use telnet command in client ===
<syntaxhighlight lang='rsplus'>
On the server,
tags <- "DESeq2"
<pre>
# Depends S4Vectors, IRanges, GenomicRanges, Rcpp (>= 0.10.1), RcppArmadillo (>= 0.3.4.4)
con1 <- socketConnection(port = 8080, server = TRUE)
# Imports BiocGenerics(>= 0.7.5), Biobase, BiocParallel, genefilter, methods, locfit, geneplotter, ggplot2, Hmisc
</pre>
# Suggests RUnit, gplots, knitr, RColorBrewer, BiocStyle, airway,\npasilla (>= 0.2.10), DESeq, vsn
# LinkingTo    Rcpp, RcppArmadillo
index <- function(url, type="source", filters=NULL, head=5, cols=c("Package", "Version")){
  contribUrl <- contrib.url(url, type=type)
  available.packages(contribUrl, type=type, filters=filters)
}


bioc <- local({
On the client,
  env <- new.env()
<pre>
  on.exit(rm(env))
sudo apt-get install telnet
  evalq(source("http://bioconductor.org/biocLite.R", local=TRUE), env)
telnet localhost 8080
  biocinstallRepos() # return URLs
abcdefg
})
hijklmn
qestst
</pre>


bioc
Go to the server,
#                                              BioCsoft
<pre>
#            "http://bioconductor.org/packages/3.0/bioc"
readLines(con1, 1)
#                                                BioCann
readLines(con1, 1)
# "http://bioconductor.org/packages/3.0/data/annotation"
readLines(con1, 1)
#                                                BioCexp
close(con1) # return cursor in the client machine
# "http://bioconductor.org/packages/3.0/data/experiment"
</pre>
#                                              BioCextra
#          "http://bioconductor.org/packages/3.0/extra"
#                                                  CRAN
#                                "http://cran.fhcrc.org"
#                                             CRANextra
#                  "http://www.stats.ox.ac.uk/pub/RWin"
str(index(bioc["BioCsoft"])) # similar to cranJuly2014 object


system.time(dg <- makeDepGraph(tags, suggests=TRUE, enhances=TRUE, availPkgs = index(bioc["BioCsoft"]))) # Very quick!
Some [http://blog.gahooa.com/2009/01/23/basics-of-telnet-and-http/ tutorial] about using telnet on http request. And [http://unixhelp.ed.ac.uk/tables/telnet_commands.html this] is a summary of using telnet.
plot(dg, legendPosition = c(-1, 1), vertex.size=20)
</syntaxhighlight>
[[File:deseq2 dep.svg|300px]] [[File:Lumi dep.svg|300px]]


The dependencies of [http://www.bioconductor.org/packages/release/bioc/html/GenomicFeatures.html GenomicFeature] and [http://www.bioconductor.org/packages/release/bioc/html/GenomicAlignments.html GenomicAlignments] are more complicated. So we turn the 'suggests' option to FALSE.
== Subsetting ==
<syntaxhighlight lang='rsplus'>
[http://lib.stat.cmu.edu/R/CRAN/doc/manuals/R-lang.html#Subset-assignment Subset assignment of R Language Definition] and [http://lib.stat.cmu.edu/R/CRAN/doc/manuals/R-lang.html#Manipulation-of-functions Manipulation of functions].
tags <- "GenomicAlignments"
dg <- makeDepGraph(tags, suggests=FALSE, enhances=FALSE, availPkgs = index(bioc["BioCsoft"]))
plot(dg, legendPosition = c(-1, 1), vertex.size=20)
</syntaxhighlight>
[[File:Genomicfeature dep dep.svg|300px]] [[File:Genomicalignments dep.svg|300px]]


==== [http://mran.revolutionanalytics.com/ MRAN] (CRAN only)====
The result of the command '''x[3:5] <- 13:15''' is as if the following had been executed
* http://blog.revolutionanalytics.com/2014/10/explore-r-package-connections-at-mran.html
<pre>
`*tmp*` <- x
x <- "[<-"(`*tmp*`, 3:5, value=13:15)
rm(`*tmp*`)
</pre>


==== [https://cran.r-project.org/web/packages/cranly/ cranly] ====
=== Avoid Coercing Indices To Doubles ===
[https://cran.r-project.org/web/packages/cranly/vignettes/dependence_trees.html R package dependence trees]
[https://www.jottr.org/2018/04/02/coercion-of-indices/ 1 or 1L]


==== Reverse dependence ====
=== Careful on NA value ===
* http://romainfrancois.blog.free.fr/index.php?post/2011/10/30/Rcpp-reverse-dependency-graph
See the example below. base::subset() or dplyr::filter() can remove NA subsets.
<pre>
R> mydf = data.frame(a=1:3, b=c(NA,5,6))
R> mydf[mydf$b >5, ]
    a  b
NA NA NA
3  3  6
R> mydf[which(mydf$b >5), ]
  a b
3 3 6
R> mydf %>% dplyr::filter(b > 5)
  a b
1 3 6
R> subset(mydf, b>5)
  a b
3 3 6
</pre>


==== Install packages offline ====
=== Implicit looping ===
http://www.mango-solutions.com/wp/2017/05/installing-packages-without-internet/
<pre>
set.seed(1)
i <- sample(c(TRUE, FALSE), size=10, replace = TRUE)
# [1]  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE
sum(i)        # [1] 6
x <- 1:10
length(x[i])  # [1] 6
x[i[1:3]]    # [1]  1  3  4  6  7  9 10
length(x[i[1:3]]) # [1] 7
</pre>


==== Install a packages locally and its dependencies ====
== modelling ==
It's impossible to install the dependencies if you want to install a package locally. See [http://r.789695.n4.nabble.com/Windows-GUI-quot-Install-Packages-from-local-zip-files-quot-and-dependencies-td848173.html Windows-GUI: "Install Packages from local zip files" and dependencies]
=== update() ===
* [https://www.rdocumentation.org/packages/stats/versions/3.6.1/topics/update ?update]
* [https://stackoverflow.com/a/5118337 Reusing a Model Built in R]


=== Create a new R package, namespace, documentation ===
=== Extract all variable names in lm(), glm(), ... ===
* http://cran.r-project.org/doc/contrib/Leisch-CreatingPackages.pdf (highly recommend)
all.vars(formula(Model)[-2])
* https://stat.ethz.ch/pipermail/r-devel/2013-July/066975.html
* [http://stackoverflow.com/questions/7283134/what-is-the-benefit-of-import-in-a-namespace-in-r/7283511#7283511 Benefit of import in a namespace]
* This youtube [http://www.youtube.com/watch?v=jGeCCxdZsDQ video] from Tyler Rinker teaches how to use RStudio to develop an R package and also use Git to do version control. Very useful!
* [https://github.com/jtleek/rpackages Developing R packages] by Jeff Leek in Johns Hopkins University.
* [http://r-pkgs.had.co.nz/ R packages] book by Hadley Wickham.
* [http://kbroman.org/pkg_primer/ R package primer] a minimal tutorial from Karl Broman.
* [https://datascienceplus.com/how-to-make-and-share-an-r-package-in-3-steps/ How to make and share an R package in 3 steps] (6/14/2017)


==== R package depends vs imports ====
=== as.formula(): use a string in formula in lm(), glm(), ... ===
* http://stackoverflow.com/questions/8637993/better-explanation-of-when-to-use-imports-depends
* [https://www.r-bloggers.com/2019/08/changing-the-variable-inside-an-r-formula/ Changing the variable inside an R formula]
* http://stackoverflow.com/questions/9893791/imports-and-depends
* [https://stackoverflow.com/questions/5251507/how-to-succinctly-write-a-formula-with-many-variables-from-a-data-frame How to succinctly write a formula with many variables from a data frame?]
* https://stat.ethz.ch/pipermail/r-devel/2013-August/067082.html
{{Pre}}
? as.formula
xnam <- paste("x", 1:25, sep="")
fmla <- as.formula(paste("y ~ ", paste(xnam, collapse= "+")))
</pre>
* [http://www.win-vector.com/blog/2018/09/r-tip-how-to-pass-a-formula-to-lm/ How to Pass A formula to lm], [https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/bquote ?bquote], [https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/eval ?eval]
{{Pre}}
outcome <- "mpg"
variables <- c("cyl", "disp", "hp", "carb")


In the namespace era Depends is never really needed. All modern packages have no technical need for Depends anymore. Loosely speaking the only purpose of Depends today is to expose other package's functions to the user without re-exporting them.
# Method 1. The 'Call' portion of the model is reported as “formula = f”
# our modeling effort,
# fully parameterized!
f <- as.formula(
  paste(outcome,
        paste(variables, collapse = " + "),
        sep = " ~ "))
print(f)
# mpg ~ cyl + disp + hp + carb


load = functions exported in myPkg are available to interested parties as myPkg::foo or via direct imports - essentially this means the package can now be used
model <- lm(f, data = mtcars)
print(model)


attach = the namespace (and thus all exported functions) is attached to the search path - the only effect is that you have now added the exported functions to the global pool of functions - sort of like dumping them in the workspace (for all practical purposes, not technically)
# Call:
#  lm(formula = f, data = mtcars)
#
# Coefficients:
(Intercept)         cyl        disp          hp        carb 
#    34.021595    -1.048523    -0.026906    0.009349    -0.926863 


import a function into a package = make sure that this function works in my package regardless of the search path (so I can write fn1 instead of pkg1::fn1 and still know it will come from pkg1 and not someone's workspace or other package that chose the same name)
# Method 2. eval() + bquote() + ".()"
format(terms(model))  #  or model$terms
# [1] "mpg ~ cyl + disp + hp + carb"


------------------------------------------------------------------------
# The new line of code
* https://stat.ethz.ch/pipermail/r-devel/2013-September/067451.html
model <- eval(bquote(  lm(.(f), data = mtcars)  ))


The distinction is between "loading" and "attaching" a package. Loading
print(model)
it (which would be done if you had MASS::loglm, or imported it)  
# Call:
guarantees that the package is initialized and in memory, but doesn't
#  lm(formula = mpg ~ cyl + disp + hp + carb, data = mtcars)
make it visible to the user without the explicit MASS:: prefix.  
#
Attaching it first loads it, then modifies the user's search list so the
# Coefficients:
user can see it.
#  (Intercept)          cyl        disp          hp        carb  
#    34.021595    -1.048523    -0.026906    0.009349    -0.926863 


Loading is less intrusive, so it's preferred over attaching. Both
# Note if we skip ".()" operator
library() and require() would attach it.
> eval(bquote(  lm(f, data = mtcars)  ))


==== R package suggests ====
Call:
[https://cran.r-project.org/web/packages/stringr/index.html stringr] has suggested '''htmlwidgets'''. An error will come out if the suggested packages are not available.
lm(formula = f, data = mtcars)
<syntaxhighlight lang='rsplus'>
> library(stringr)
> str_view(c("abc", "a.c", "bef"), "a\\.c")
Error in loadNamespace(name) : there is no package called ‘htmlwidgets’
</syntaxhighlight>


==== Useful functions for accessing files in packages ====
Coefficients:
* [https://stat.ethz.ch/R-manual/R-devel/library/base/html/system.file.html system.file()]
(Intercept)          cyl        disp          hp        carb 
* [https://stat.ethz.ch/R-manual/R-devel/library/base/html/find.package.html path.package()] and normalizePath().
  34.021595    -1.048523    -0.026906    0.009349    -0.926863
<syntaxhighlight lang='rsplus'>
</pre>
> system.file(package = "batr")
* [https://statisticaloddsandends.wordpress.com/2019/08/24/changing-the-variable-inside-an-r-formula/ Changing the variable inside an R formula] 1. as.formula() 2. subset by [[i]] 3. get() 4. eval(parse()).
[1] "f:/batr"
> system.file("extdata", package = "batr")


> path.package("batr")
=== reformulate ===
[1] "f:\\batr"
[https://www.r-bloggers.com/2023/06/simplifying-model-formulas-with-the-r-function-reformulate/ Simplifying Model Formulas with the R Function ‘reformulate()]


# sometimes it returns the forward slash format for some reason; C:/Program Files/R/R-3.4.0/library/batr
=== I() function ===
# so it is best to add normalizePath().
I() means isolates. See [https://stackoverflow.com/a/24192745 What does the capital letter "I" in R linear regression formula mean?],  [https://stackoverflow.com/a/8055683 In R formulas, why do I have to use the I() function on power terms, like y ~ I(x^3)]
> normalizePath(path.package("batr"))
</syntaxhighlight>


==== Create R package with [https://github.com/hadley/devtools devtools] and [http://cran.r-project.org/web/packages/roxygen2/index.html roxygen2] ====
=== Aggregating results from linear model ===
A useful [http://thepoliticalmethodologist.com/2014/08/14/building-and-maintaining-r-packages-with-devtools-and-roxygen2/ post] by Jacob Montgomery. Watch the [https://www.youtube.com/watch?v=9PyQlbAEujY#t=19 youtube video] there.
https://stats.stackexchange.com/a/6862


The process requires 3 components: RStudio software, devtools and roxygen2 (creating documentation from R code) packages.
== Replacement function "fun(x) <- a" ==
[https://stackoverflow.com/questions/11563154/what-are-replacement-functions-in-r What are Replacement Functions in R?]
<pre>
R> xx <- c(1,3,66, 99)
R> "cutoff<-" <- function(x, value){
    x[x > value] <- Inf
    x
}
R> cutoff(xx) <- 65 # xx & 65 are both input
R> xx
[1]  1  3 Inf Inf


[https://uoftcoders.github.io/studyGroup/lessons/r/packages/lesson/ MAKING PACKAGES IN R USING DEVTOOLS]
R> "cutoff<-"(x = xx, value = 65)
[1]  1  3 Inf Inf
</pre>
The statement '''fun(x) <- a''' and R will read '''x <- "fun<-"(x,a) '''


[http://r-pkgs.had.co.nz/r.html R code workflow] from Hadley Wickham.
== S3 and S4 methods and signature ==
* How S4 works in R https://www.rdocumentation.org/packages/methods/versions/3.5.1/topics/Methods_Details
* Software for Data Analysis: Programming with R by John Chambers
* Programming with Data: A Guide to the S Language  by John Chambers
* [https://www.amazon.com/Extending-Chapman-Hall-John-Chambers/dp/1498775713 Extending R] by John M. Chambers, 2016
* https://www.rmetrics.org/files/Meielisalp2009/Presentations/Chalabi1.pdf
* [https://njtierney.github.io/r/missing%20data/rbloggers/2016/11/06/simple-s3-methods/ A Simple Guide to S3 Methods]
* [https://rstudio-education.github.io/hopr/s3.html Hands-On Programming with R] by Garrett Grolemund
* https://www.stat.auckland.ac.nz/S-Workshop/Gentleman/S4Objects.pdf
* [http://cran.r-project.org/web/packages/packS4/index.html packS4: Toy Example of S4 Package], * [https://cran.r-project.org/doc/contrib/Genolini-S4tutorialV0-5en.pdf A (Not So) Short Introduction to S4]
* http://www.cyclismo.org/tutorial/R/s4Classes.html
* https://www.coursera.org/lecture/bioconductor/r-s4-methods-C4dNr
* https://www.bioconductor.org/help/course-materials/2013/UnderstandingRBioc2013/
* http://adv-r.had.co.nz/S4.html, http://adv-r.had.co.nz/OO-essentials.html
* [https://appsilon.com/object-oriented-programming-in-r-part-1/ Object-Oriented Programming in R (Part 1): An Introduction], [https://appsilon.com/object-oriented-programming-in-r-part-2/ Part 2: S3 Simplified]


[https://jozefhajnala.gitlab.io/r/r102-addin-roxytags/ RStudio:addins part 2 - roxygen documentation formatting made easy]
=== Debug an S4 function ===
* '''showMethods('FUNCTION')'''
* '''getMethod('FUNCTION', 'SIGNATURE') ''' 
* '''debug(, signature)'''
{{Pre}}
> args(debug)
function (fun, text = "", condition = NULL, signature = NULL)


[https://www.rstudio.com/wp-content/uploads/2015/06/devtools-cheatsheet.pdf devtools cheatsheet] (2 pages)
> library(genefilter) # Bioconductor
> showMethods("nsFilter")
Function: nsFilter (package genefilter)
eset="ExpressionSet"
> debug(nsFilter, signature="ExpressionSet")


How to use [http://rstudio-pubs-static.s3.amazonaws.com/2556_4e9f1c2af93b4683a19e2303a52bb2d5.html devtools::load_all("FolderName")]. load_all() loads any modified R files, and recompile and reload any modified C or Fortran files.
library(DESeq2)
<syntaxhighlight lang='rsplus'>
showMethods("normalizationFactors") # show the object class
# Step 1
                                    # "DESeqDataSet" in this case.
library(devtools)
getMethod(`normalizationFactors`, "DESeqDataSet") # get the source code
</pre>
See the [https://github.com/mikelove/DESeq2/blob/445ae6c61d06de69d465b57f23e1c743d9b4537d/R/methods.R#L367 source code] of '''normalizationFactors<-''' (setReplaceMethod() is used) and the [https://github.com/mikelove/DESeq2/blob/445ae6c61d06de69d465b57f23e1c743d9b4537d/R/methods.R#L385 source code] of '''estimateSizeFactors()'''. We can see how ''avgTxLength'' was used in estimateNormFactors().


# Step 2
Another example
dir.create(file.path("MyCode", "R"), recursive = TRUE)
<pre>
cat("foo=function(x){x*2}", file = file.path("MyCode", "R", "foo.R"))
library(GSVA)
write.dcf(list(Package = "MyCode", Title = "My Code for this project", Description = "To tackle this problem",
args(gsva) # function (expr, gset.idx.list, ...)
    Version = "0.0", License = "For my eyes only", Author = "First Last <noname@example.com>",
    Maintainer = "First Last <noname@example.com>"), file = file.path("MyCode", "DESCRIPTION"))
# OR
# create("path/to/package/pkgname")
# create() will create R/ directory, DESCRIPTION and NAMESPACE files.


# Step 3 (C/Fortran code, optional)
showMethods("gsva")
dir.create(file.path("MyCode", "src"))
# Function: gsva (package GSVA)
cat("void cfoo(double *a, double *b, double *c){*c=*a+*b;}\n", file = file.path("MyCode",
# expr="ExpressionSet", gset.idx.list="GeneSetCollection"
    "src", "cfoo.c"))
# expr="ExpressionSet", gset.idx.list="list"
cat("useDynLib(MyCode)\n", file = file.path("MyCode", "NAMESPACE"))
# expr="matrix", gset.idx.list="GeneSetCollection"
# expr="matrix", gset.idx.list="list"
# expr="SummarizedExperiment", gset.idx.list="GeneSetCollection"
# expr="SummarizedExperiment", gset.idx.list="list"


# Step 4
debug(gsva, signature = c(expr="matrix", gset.idx.list="list"))
load_all("MyCode")
# OR
# debug(gsva, signature = c("matrix", "list"))
gsva(y, geneSets, method="ssgsea", kcdf="Gaussian")
Browse[3]> debug(.gsva)
# return(ssgsea(expr, gset.idx.list, alpha = tau, parallel.sz = parallel.sz,
#      normalization = ssgsea.norm, verbose = verbose,
#      BPPARAM = BPPARAM))


# Step 5
isdebugged("gsva")
# Modify R/C/Fortran code and run load_all("MyCode")
# [1] TRUE
undebug(gsva)
</pre>


# Step 6 (Automatically generate the documentation, optional)
* '''getClassDef()''' in S4 ([http://www.bioconductor.org/help/course-materials/2014/Epigenomics/BiocForSequenceAnalysis.html Bioconductor course]).
document()
{{Pre}}
 
library(IRanges)
# Step 7 (Deployment, optional)
ir <- IRanges(start=c(10, 20, 30), width=5)
build("MyCode")
ir


# Step 8 (Install the package, optional)
class(ir)
install()
## [1] "IRanges"
</syntaxhighlight>
## attr(,"package")
## [1] "IRanges"


'''Note''':
getClassDef(class(ir))
# '''load_all("FolderName")''' will make the FolderName to become ''like'' a package to be loaded into the current R session so the 2nd item returned from '''search()''' will be '''"package:FolderName"'''. However, the ''FolderName'' does not exist under Program Files/R/R-X.Y.Z/library nor Documents/R/win-library/X.Y/ (Windows OS).
## Class "IRanges" [package "IRanges"]
# '''build("FolderName")''' will create a tarball in the current directory. User can install the new package for example using Packages -> Install packages from local files on Windows OS.
##
# For the simplest R package, the source code only contains a file <DESCRIPTION> and a folder <R> with individual R files in the text format.
## Slots:
##                                                                     
## Name:            start          width          NAMES    elementType
## Class:        integer        integer characterORNULL      character
##                                     
## Name:  elementMetadata        metadata
## Class: DataTableORNULL            list
##
## Extends:  
## Class "Ranges", directly
## Class "IntegerList", by class "Ranges", distance 2
## Class "RangesORmissing", by class "Ranges", distance 2
## Class "AtomicList", by class "Ranges", distance 3
## Class "List", by class "Ranges", distance 4
## Class "Vector", by class "Ranges", distance 5
## Class "Annotated", by class "Ranges", distance 6
##
## Known Subclasses: "NormalIRanges"
</pre>


==== Binary packages ====
=== Check if a function is an S4 method ===
* No .R files in the ''R/'' directory. There are 3 files that store the parsed functions in an efficient file format. This is the result of loading all the R code and then saving the functions with ''save()''.
'''isS4(foo)'''
* A ''Meta/'' directory contains a number of Rds files. These files contain cached metadata about the package, like what topics the help files cover and parsed version of the ''DESCRIPTION'' file.
* An ''html/'' directory.
* ''libs/'' directory if you have any code in the ''src/' directory
* The contents of ''inst/'' are moved to the top-level directory.


==== What is a library? ====
=== How to access the slots of an S4 object ===
A library is simply a directory containing installed packages.
* @ will let you access the slots of an S4 object.
* Note that often the best way to do this is to not access the slot directly but rather through an accessor function (e.g. coefs() rather than digging out the coefficients with $ or @). However, often such functions do not exist so you have to access the slots directly. This will mean that your code breaks if the internal implementation changes, however.
* [https://kasperdanielhansen.github.io/genbioconductor/html/R_S4.html#slots-and-accessor-functions R - S4 Classes and Methods] Hansen. '''getClass()''' or '''getClassDef()'''.


You can use ''.libPaths()'' to see which libraries are currently active.
=== setReplaceMethod() ===
<syntaxhighlight lang='rsplus'>
* [https://stackoverflow.com/a/24253311 What's the difference between setMethod(“$<-”) and set setReplaceMethod(“$”)?]
.libPaths()
* [https://stackoverflow.com/a/49267668 What is setReplaceMethod() and how does it work?]


lapply(.libPaths(), dir)
=== See what methods work on an object ===
</syntaxhighlight>
see what methods work on an object, e.g. a GRanges object:
<pre>
methods(class="GRanges")
</pre>
Or if you have an object, x:
<pre>
methods(class=class(x))
</pre>  


==== Object names ====
=== View S3 function definition: double colon '::' and triple colon ':::' operators and getAnywhere() ===
* Variable and function names should be lower case.
?":::"
* Use an underscore (_) to separate words within a name (reserve . for S3 methods).
* [https://en.wikipedia.org/wiki/Camel_case Camel case] is a legitimate alternative, but be consistent! For example, preProcess(), twoClassData, createDataPartition(), trainingRows, trainPredictors, testPredictors, trainClasses, testClasses have been used in [https://cran.r-project.org/web/packages/AppliedPredictiveModeling/index.html Applied Predictive Modeling] by [http://appliedpredictivemodeling.com/ Kuhn & Johnson].
* Generally, variable names should be nouns and function names should be verb.


==== Spacing ====
* pkg::name returns the value of the exported variable name in namespace pkg
* Add a space around the operators +, -, \ and *.
* pkg:::name returns the value of the internal variable name
* Include a space around the assignment operators, <- and =.
* Add a space around any comparison operators such as == and <.


==== Indentation ====
<pre>
* Use two spaces to indent code.
base::"+"
* Never mix tabs and spaces.
stats:::coef.default
* RStudio can automatically convert the tab character to spaces (see Tools -> Global options -> Code).


==== formatR package ====
predict.ppr
Use formatR package to clean up poorly formatted code
# Error: object 'predict.ppr' not found
<syntaxhighlight lang='rsplus'>
stats::predict.ppr
install.packages("formatR")
# Error: 'predict.ppr' is not an exported object from 'namespace:stats'
formatR::tidy_dir("R")
stats:::predict.ppr  # OR 
</syntaxhighlight>
getS3method("predict", "ppr")
 
getS3method("t", "test")
</pre>
 
[https://stackoverflow.com/a/19226817 methods() + getAnywhere() functions]


Another way is to use the '''linter''' package.
=== Read the source code (include Fortran/C, S3 and S4 methods) ===
<syntaxhighlight lang='rsplus'>
* [https://github.com/jimhester/lookup#readme lookup] package
install.packages("lintr")
* [https://blog.r-hub.io/2019/05/14/read-the-source/ Read the source]
lintr:::lin_package()
* Find the source code in [https://stackoverflow.com/a/19226817 UseMethod("XXX")] for S3 methods.
</syntaxhighlight>


==== Minimal R package for submission ====
=== S3 method is overwritten ===
https://stat.ethz.ch/pipermail/r-devel/2013-August/067257.html and [http://cran.r-project.org/web/packages/policies.html CRAN Repository Policy].
For example, the select() method from dplyr is overwritten by [https://github.com/cran/grpreg/blob/master/NAMESPACE grpreg] package.


==== Continuous Integration: [https://travis-ci.org/ Travis-CI] (Linux, Mac) ====
An easy solution is to load grpreg before loading dplyr.  
* [http://juliasilge.com/blog/Beginners-Guide-to-Travis/  A Beginner's Guide to Travis-CI]
* [http://r-pkgs.had.co.nz/tests.html testhat] package
* http://johnmuschelli.com/neuroc/getting_ready_for_submission/index.html#61_travis


==== Continuous Integration: [https://www.appveyor.com/ Appveyor] (Windows) ====
* https://stackoverflow.com/a/14407095
* Appveyor is a continuous integration service that builds projects on Windows machines.
* [https://njtierney.github.io/r/missing%20data/rbloggers/2016/11/06/simple-s3-methods/ A Simple Guide to S3 Methods] and [https://github.com/njtierney/A-Simple-Guide-to-S3-Methods/blob/master/SimpleS3.Rmd its source]
* http://johnmuschelli.com/neuroc/getting_ready_for_submission/index.html#62_appveyor
* [https://developer.r-project.org/Blog/public/2019/08/19/s3-method-lookup/index.html S3 Method Lookup]


==== Submit packages to cran ====
=== mcols() and DataFrame() from Bioc [http://bioconductor.org/packages/release/bioc/html/S4Vectors.html S4Vectors] package ===
* http://f.briatte.org/r/submitting-packages-to-cran
* mcols: Get or set the metadata columns.
* https://rmhogervorst.github.io/cleancode/blog/2016/07/09/submtting-to-cran-first-experience.html
* colData: SummarizedExperiment instances from GenomicRanges
* [http://johnmuschelli.com/neuroc/getting_ready_for_submission/index.html Preparing Your Package for for Submission]
* DataFrame: The DataFrame class extends the DataTable virtual class and supports the storage of any type of object (with length and [ methods) as columns.  
* https://builder.r-hub.io/


=== Build R package faster using multicore ===
For example, in [http://www-huber.embl.de/DESeq2paper/vignettes/posterior.pdf Shrinkage of logarithmic fold changes] vignette of the DESeq2paper package
http://www.rexamine.com/2015/07/speeding-up-r-package-installation-process/
{{Pre}}
> mcols(ddsNoPrior[genes, ])
DataFrame with 2 rows and 21 columns
  baseMean  baseVar  allZero dispGeneEst    dispFit dispersion  dispIter dispOutlier  dispMAP
  <numeric> <numeric> <logical>  <numeric>  <numeric>  <numeric> <numeric>  <logical> <numeric>
1  163.5750  8904.607    FALSE  0.06263141 0.03862798  0.0577712        7      FALSE 0.0577712
2  175.3883 59643.515    FALSE  2.25306109 0.03807917  2.2530611        12        TRUE 1.6011440
  Intercept strain_DBA.2J_vs_C57BL.6J SE_Intercept SE_strain_DBA.2J_vs_C57BL.6J WaldStatistic_Intercept
  <numeric>                <numeric>    <numeric>                    <numeric>              <numeric>
1  6.210188                  1.735829    0.1229354                    0.1636645              50.515872
2  6.234880                  1.823173    0.6870629                    0.9481865                9.074686
  WaldStatistic_strain_DBA.2J_vs_C57BL.6J WaldPvalue_Intercept WaldPvalue_strain_DBA.2J_vs_C57BL.6J
                                <numeric>            <numeric>                            <numeric>
1                                10.60602        0.000000e+00                        2.793908e-26
2                                1.92280        1.140054e-19                        5.450522e-02
  betaConv  betaIter  deviance  maxCooks
  <logical> <numeric> <numeric> <numeric>
1      TRUE        3  210.4045 0.2648753
2      TRUE        9  243.7455 0.3248949
</pre>


The idea is edit the '''/lib64/R/etc/Renviron''' file (where /lib64/R/etc/ is the result to a call to the R.home() function in R) and set:
== Pipe ==
<ul>
<li>[https://www.tidyverse.org/blog/2023/04/base-vs-magrittr-pipe/ Differences between the base R and magrittr pipes] 4/21/2023
<li>[https://win-vector.com/2020/12/05/r-is-getting-an-official-pipe-operator/ R is Getting an Official Pipe Operator], [https://win-vector.com/2020/12/07/my-opinion-on-rs-upcoming-pipe/ My Opinion on R’s Upcoming Pipe]
<li> a(b(x)) vs '''x |> b() |> a()'''. See [https://twitter.com/henrikbengtsson/status/1335328090390597632 this tweet] in R-dev 2020-12-04.
<pre>
<pre>
MAKE='make -j 8' # submit 8 jobs at once
e0 <- quote(a(b(x)))
e1 <- quote(x |> b() |> a())
identical(e0, e1)
</pre>
</pre>
Then build R package as regular, for example,
</li>
<li>
[https://selbydavid.com/2021/05/18/pipes/ There are now 3 different R pipes]
</li>
<li>[https://stackoverflow.com/a/67629310 Error: The pipe operator requires a function call as RHS].
<pre>
<pre>
$ time R CMD INSTALL ~/R/stringi --preclean --configure-args='--disable-pkg-config'
# native pipe
foo |> bar()
# magrittr pipe
foo %>% bar
</pre>
</pre>
</li>
<li>[https://www.infoworld.com/article/3621369/use-the-new-r-pipe-built-into-r-41.html Use the new R pipe built into R 4.1] </li>
<li>[https://towardsdatascience.com/the-new-native-pipe-operator-in-r-cbc5fa8a37bd The New Native Pipe Operator in R] </li>
<li>[https://ivelasq.rbind.io/blog/understanding-the-r-pipe/ Understanding the native R pipe |> ] </li>
<li>[https://medium.com/number-around-us/navigating-the-data-pipes-an-r-programming-journey-with-mario-bros-1aa621af1926 Navigating the Data Pipes: An R Programming Journey with Mario Bros]
</ul>


== Tricks ==
Packages take advantage of pipes
<ul>
<li>[https://cran.r-project.org/web/packages/rstatix/index.html rstatix]: Pipe-Friendly Framework for Basic Statistical Tests
</ul>


=== Getting help ===
== findInterval() ==
* http://stackoverflow.com/questions/tagged/r and [https://stackoverflow.com/tags/r/info R page] contains resources.
Related functions are cuts() and split(). See also
* https://stat.ethz.ch/pipermail/r-help/
* [http://books.google.com/books?id=oKY5QeSWb4cC&pg=PT310&lpg=PT310&dq=r+findinterval3&source=bl&ots=YjNMkHrTMw&sig=y_wIA1um420xVCI5IoGivABge-s&hl=en&sa=X&ei=gm_yUrSqLKXesAS2_IGoBQ&ved=0CFIQ6AEwBTgo#v=onepage&q=r%20findinterval3&f=false R Graphs Cookbook]
* https://stat.ethz.ch/pipermail/r-devel/
* [http://adv-r.had.co.nz/Rcpp.html Hadley Wickham]
 
== Assign operator ==
* Earlier versions of R used underscore (_) as an assignment operator.
* [https://developer.r-project.org/equalAssign.html Assignments with the = Operator]
* In R 1.8.0 (2003), the assign operator has been removed. See [https://cran.r-project.org/src/base/NEWS.1 NEWS].
* In R 1.9.0 (2004), "_" is allowed in valid names. See [https://cran.r-project.org/src/base/NEWS.1 NEWS].


=== Better Coder/coding, best practices ===
: [[File:R162.png|200px]]
* http://www.mango-solutions.com/wp/2015/10/10-top-tips-for-becoming-a-better-coder/
* [https://www.rstudio.com/rviews/2016/12/02/writing-good-r-code-and-writing-well/ Writing Good R Code and Writing Well]
* [http://www.thertrader.com/2018/09/01/r-code-best-practices/ R Code – Best practices]


=== Change default R repository ===
== Operator precedence ==
Edit global Rprofile file. On *NIX platforms, it's located in /usr/lib/R/library/base/R/Rprofile although local .Rprofile settings take precedence.
The ':' operator has higher precedence than '-' so 0:N-1 evaluates to (0:N)-1, not 0:(N-1) like you probably wanted.


For example, I can specify the R mirror I like by creating a single line <.Rprofile> file under my home directory.
== order(), rank() and sort() ==
If we want to find the indices of the first 25 genes with the smallest p-values, we can use '''order(pval)[1:25]'''.
<pre>
<pre>
local({
> x = sample(10)
  r = getOption("repos")
> x
  r["CRAN"] = "https://cran.rstudio.com/"
[1] 4  3 10  7  5  8  6  1  9  2
  options(repos = r)
> order(x)
})
  [1]  8 10  2  1  5  7  4  6  9  3
options(continue = " ")
> rank(x)
message("Hi MC, loading ~/.Rprofile")
[1]  4  3 10  7  5  8  6  1  9  2
if (interactive()) {
> rank(10*x)
  .Last <- function() try(savehistory("~/.Rhistory"))
[1]  4  3 10  7  5  8  6  1  9  2
}


> x[order(x)]
[1]  1  2  3  4  5  6  7  8  9 10
> sort(x)
[1]  1  2  3  4  5  6  7  8  9 10
</pre>
</pre>


=== Change the default web browser ===
=== relate order() and rank() ===
When I run help.start() function in LXLE, it cannot find its default web browser (seamonkey).
<ul>
<syntaxhighlight lang='rsplus'>
<li>Order to rank: rank() = order(order())
> help.start()
<syntaxhighlight lang='r'>
If the browser launched by 'xdg-open' is already running, it is *not*
set.seed(1)
    restarted, and you must switch to its window.
x <- rnorm(5)
Otherwise, be patient ...
order(x)
> /usr/bin/xdg-open: 461: /usr/bin/xdg-open: x-www-browser: not found
# [1] 3 1 2 5 4
/usr/bin/xdg-open: 461: /usr/bin/xdg-open: firefox: not found
rank(x)
/usr/bin/xdg-open: 461: /usr/bin/xdg-open: mozilla: not found
# [1] 2 3 1 5 4
/usr/bin/xdg-open: 461: /usr/bin/xdg-open: epiphany: not found
order(order(x))
/usr/bin/xdg-open: 461: /usr/bin/xdg-open: konqueror: not found
# [1] 2 3 1 5 4
/usr/bin/xdg-open: 461: /usr/bin/xdg-open: chromium-browser: not found
all(rank(x) == order(order(x)))
/usr/bin/xdg-open: 461: /usr/bin/xdg-open: google-chrome: not found
# TRUE
/usr/bin/xdg-open: 461: /usr/bin/xdg-open: links2: not found
/usr/bin/xdg-open: 461: /usr/bin/xdg-open: links: not found
/usr/bin/xdg-open: 461: /usr/bin/xdg-open: lynx: not found
/usr/bin/xdg-open: 461: /usr/bin/xdg-open: w3m: not found
xdg-open: no method available for opening 'http://127.0.0.1:27919/doc/html/index.html'
</syntaxhighlight>
</syntaxhighlight>


The solution is to put
<li>Order to Rank method 2: rank(order()) = 1:n
<pre>
<syntaxhighlight lang='r'>
options(browser='seamonkey')
ord <- order(x)
</pre>
ranks <- integer(length(x))
in the '''.Rprofile''' of your home directory. If the browser is not in the global PATH, we need to put the full path above.
ranks[ord] <- seq_along(x)
ranks
# [1] 2 3 1 5 4
</syntaxhighlight>


For one-time only purpose, we can use the ''browser'' option in help.start() function:
<li>Rank to Order:  
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='r'>
> help.start(browser="seamonkey")
ranks <- rank(x)
If the browser launched by 'seamonkey' is already running, it is *not*
ord <- order(ranks)
    restarted, and you must switch to its window.
ord
Otherwise, be patient ...
# [1] 3 1 2 5 4
</syntaxhighlight>
</syntaxhighlight>
</ul>


We can work made a change (or create the file) ~/.Renviron or etc/Renviron. See
=== OS-dependent results on sorting string vector ===
* [https://stat.ethz.ch/pipermail/r-help/2003-August/037484.html Changing default browser in options()].
Gene symbol case.
* https://stat.ethz.ch/R-manual/R-devel/library/utils/html/browseURL.html
<pre>
# mac:  
order(c("DC-UbP", "DC2")) # c(1,2)


=== Rconsole, Rprofile.site, Renviron.site files ===
# linux:
* https://cran.r-project.org/doc/manuals/r-release/R-admin.html ('''Rprofile.site''')
order(c("DC-UbP", "DC2")) # c(2,1)
* https://cran.r-project.org/doc/manuals/r-release/R-intro.html ('''Rprofile.site, Renviron.site, Rconsole''' (Windows only))
* https://cran.r-project.org/doc/manuals/r-release/R-exts.html  ('''Renviron.site''')
* [http://blog.revolutionanalytics.com/2015/11/how-to-store-and-use-authentication-details-with-r.html How to store and use webservice keys and authentication details]
* [http://itsalocke.com/use-rprofile-give-important-notifications/ Use your .Rprofile to give you important notifications]
 
If we like to install R packages to a personal directory, follow [https://stat.ethz.ch/pipermail/r-devel/2015-July/071562.html this]. Just add the line
<pre>
R_LIBS_SITE=F:/R/library
</pre>
</pre>
to the file '''R_HOME/etc/x64/Renviron.site'''.


Note that on Windows OS, R/etc contains
Affymetric id case.
<pre>
<pre>
$ ls -l /c/Progra~1/r/r-3.2.0/etc
# mac:
total 142
order(c("202800_at", "2028_s_at")) # [1] 2 1
-rw-r--r--    1   Administ    1043 Jun 20  2013 Rcmd_environ
sort(c("202800_at", "2028_s_at")) # [1] "2028_s_at" "202800_at"
-rw-r--r--    1  Administ    1924 Mar 17  2010 Rconsole
-rw-r--r--    1  Administ      943 Oct  3  2011 Rdevga
-rw-r--r--    1  Administ      589 May 20  2013 Rprofile.site
-rw-r--r--    1  Administ  251894 Jan 17  2015 curl-ca-bundle.crt
drwxr-xr-x    1  Administ        0 Jun  8 10:30 i386
-rw-r--r--    1   Administ    1160 Dec 31  2014 repositories
-rw-r--r--    1  Administ    30188 Mar 17  2010 rgb.txt
drwxr-xr-x    3  Administ        0 Jun  8 10:30 x64


$ ls /c/Progra~1/r/r-3.2.0/etc/i386
# linux
Makeconf
order(c("202800_at", "2028_s_at")) # [1] 1 2
sort(c("202800_at", "2028_s_at")) # [1] "202800_at" "2028_s_at"
</pre>
It does not matter if we include factor() on the character vector.


$ cat /c/Progra~1/r/r-3.2.0/etc/Rconsole
The difference is related to locale. See
# Optional parameters for the console and the pager
# The system-wide copy is in R_HOME/etc.
# A user copy can be installed in `R_USER'.


## Style
* [https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/locales ?locales] in R
# This can be `yes' (for MDI) or `no' (for SDI).
* On OS, type '''locale'''
  MDI = yes
* [https://stackoverflow.com/questions/39171613/sort-produces-different-results-in-ubuntu-and-windows sort() produces different results in Ubuntu and Windows]
# MDI = no
* To fix the inconsistency problem, we can set the locale in R code to "C" or use the stringr package (the locale is part of [https://www.rdocumentation.org/packages/stringr/versions/1.4.0/topics/str_order str_order()]'s arguments).
<pre>
# both mac and linux
stringr::str_order(c("202800_at", "2028_s_at")) # [1] 2 1
stringr::str_order(c("DC-UbP", "DC2")) # [1] 1 2


# the next two are only relevant for MDI
# Or setting the locale to "C"
toolbar = yes
Sys.setlocale("LC_ALL", "C"); sort(c("DC-UbP", "DC2"))
statusbar = no
# Or
Sys.setlocale("LC_COLLATE", "C"); sort(c("DC-UbP", "DC2"))
# But not
Sys.setlocale("LC_ALL", "en_US.UTF-8"); sort(c("DC-UbP", "DC2"))
</pre>


## Font.
=== unique() ===
# Please use only fixed width font.
It seems it does not sort. [https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/unique ?unique].
# If font=FixedFont the system fixed font is used; in this case
<pre>
# points and style are ignored. If font begins with "TT ", only
# mac & linux
# True Type fonts are searched for.
R> unique(c("DC-UbP", "DC2"))
font = TT Courier New
[1] "DC-UbP" "DC2"
points = 10
</pre>
style = normal # Style can be normal, bold, italic


# Dimensions (in characters) of the console.
== do.call ==
rows = 25
'''do.call''' constructs and executes a function call from a name or a function and a list of arguments to be passed to it.
columns = 80
# Dimensions (in characters) of the internal pager.
pgrows = 25
pgcolumns = 80
# should options(width=) be set to the console width?
setwidthonresize = yes


# memory limits for the console scrolling buffer, in chars and lines
[https://www.r-bloggers.com/2023/05/the-do-call-function-in-r-unlocking-efficiency-and-flexibility/ The do.call() function in R: Unlocking Efficiency and Flexibility]
# NB: bufbytes is in bytes for R < 2.7.0, chars thereafter.
bufbytes = 250000
buflines = 8000


# Initial position of the console (pixels, relative to the workspace for MDI)
Below are some examples from the [https://stat.ethz.ch/R-manual/R-devel/library/base/html/do.call.html help].
# xconsole = 0
# yconsole = 0


# Dimension of MDI frame in pixels
* Usage
# Format (w*h+xorg+yorg) or use -ve w and h for offsets from right bottom
{{Pre}}
# This will come up maximized if w==0
do.call(what, args, quote = FALSE, envir = parent.frame())
# MDIsize = 0*0+0+0
# what: either a function or a non-empty character string naming the function to be called.
# MDIsize = 1000*800+100+0
# args: a list of arguments to the function call. The names attribute of args gives the argument names.
# MDIsize = -50*-50+50+50 # 50 pixels space all round
# quote: a logical value indicating whether to quote the arguments.
# envir: an environment within which to evaluate the call. This will be most useful
#        if what is a character string and the arguments are symbols or quoted expressions.
</pre>
* do.call() is similar to [https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/lapply lapply()] but not the same. It seems do.call() can make a simple function vectorized.
{{Pre}}
> do.call("complex", list(imag = 1:3))
[1] 0+1i 0+2i 0+3i
> lapply(list(imag = 1:3), complex)
$imag
[1] 0+0i
> complex(imag=1:3)
[1] 0+1i 0+2i 0+3i
> do.call(function(x) x+1, list(1:3))
[1] 2 3 4
</pre>
* Applying do.call with Multiple Arguments
<pre>
> do.call("sum", list(c(1,2,3,NA), na.rm = TRUE))
[1] 6
> do.call("sum", list(c(1,2,3,NA) ))
[1] NA
</pre>
* [https://www.stat.berkeley.edu/~s133/Docall.html do.call() allows you to call any R function, but instead of writing out the arguments one by one, you can use a list to hold the arguments of the function.]
{{Pre}}
> tmp <- expand.grid(letters[1:2], 1:3, c("+", "-"))
> length(tmp)
[1] 3
> tmp[1:4,]
  Var1 Var2 Var3
1    a    1    +
2    b    1    +
3    a    2    +
4    b    2    +
> c(tmp, sep = "")
$Var1
  [1] a b a b a b a b a b a b
Levels: a b


# The internal pager can displays help in a single window
$Var2
# or in multiple windows (one for each topic)
[1] 1 1 2 2 3 3 1 1 2 2 3 3
# pagerstyle can be set to `singlewindow' or `multiplewindows'
pagerstyle = multiplewindows


## Colours for console and pager(s)
$Var3
# (see rwxxxx/etc/rgb.txt for the known colours).
[1] + + + + + + - - - - - -
background = White
Levels: + -
normaltext = NavyBlue
usertext = Red
highlight = DarkRed


## Initial position of the graphics window
$sep
## (pixels, <0 values from opposite edge)
[1] ""
xgraphics = -25
> do.call("paste", c(tmp, sep = ""))
ygraphics = 0
[1] "a1+" "b1+" "a2+" "b2+" "a3+" "b3+" "a1-" "b1-" "a2-" "b2-" "a3-"
[12] "b3-"
</pre>
* ''environment'' and ''quote'' arguments.
{{Pre}}
> A <- 2
> f <- function(x) print(x^2)
> env <- new.env()
> assign("A", 10, envir = env)
> assign("f", f, envir = env)
> f <- function(x) print(x)
> f(A) 
[1] 2
> do.call("f", list(A))
[1] 2
> do.call("f", list(A), envir = env) 
[1] 4
> do.call(f, list(A), envir = env) 
[1] 2                      # Why?


## Language for messages
> eval(call("f", A))                     
language =
[1] 2
> eval(call("f", quote(A)))             
[1] 2
> eval(call("f", A), envir = env)       
[1] 4
> eval(call("f", quote(A)), envir = env) 
[1] 100
</pre>
* Good use case; see [https://stackoverflow.com/a/11892680 Get all Parameters as List]
{{Pre}}
> foo <- function(a=1, b=2, ...) {
        list(arg=do.call(c, as.list(match.call())[-1]))
  }
> foo()
$arg
NULL
> foo(a=1)
$arg
a
1
> foo(a=1, b=2, c=3)
$arg
a b c
1 2 3
</pre>
* do.call() + switch(). See [https://github.com/satijalab/seurat/blob/13b615c27eeeac85e5c928aa752197ac224339b9/R/preprocessing.R#L2450 an example] from Seurat::NormalizeData.
<pre>
do.call(
  what = switch(
    EXPR = margin,
    '1' = 'rbind',
    '2' = 'cbind',
    stop("'margin' must be 1 or 2")
  ),
  args = normalized.data
)
switch('a', 'a' = rnorm(3), 'b'=rnorm(4)) # switch returns a value
do.call(switch('a', 'a' = 'rnorm', 'b'='rexp'), args=list(n=4)) # switch returns a function
</pre>
* The function we want to call is a string that may change: [https://github.com/cran/glmnet/blob/master/R/cv.glmnet.raw.R#L66 glmnet]
<pre>
# Suppose we want to call cv.glmnet or cv.coxnet or cv.lognet or cv.elnet .... depending on the case
fun = paste("cv", subclass, sep = ".")
cvstuff = do.call(fun, list(predmat,y,type.measure,weights,foldid,grouped))
</pre>
 
=== expand.grid, mapply, vapply ===
[https://shikokuchuo.net/posts/10-combinations/ A faster way to generate combinations for mapply and vapply]


## Default setting for console buffering: 'yes' or 'no'
=== do.call vs mapply ===
buffered = yes
* do.call() is doing what [https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/mapply mapply()] does but do.call() uses a list instead of multiple arguments. So do.call() more close to [https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/funprog base::Map()] function.
{{Pre}}
> mapply(paste, tmp[1], tmp[2], tmp[3], sep = "")
      Var1
[1,] "a1+"
[2,] "b1+"
[3,] "a2+"
[4,] "b2+"
[5,] "a3+"
[6,] "b3+"
[7,] "a1-"
[8,] "b1-"
[9,] "a2-"
[10,] "b2-"
[11,] "a3-"
[12,] "b3-"
# It does not work if we do not explicitly specify the arguments in mapply()
> mapply(paste, tmp, sep = "")
      Var1 Var2 Var3
[1,] "a"  "1"  "+"
[2,] "b"  "1"  "+"
[3,] "a"  "2"  "+"
[4,] "b"  "2"  "+"
[5,] "a"  "3"  "+"
[6,] "b"  "3"  "+"
[7,] "a"  "1"  "-"
[8,] "b"  "1"  "-"
[9,] "a"  "2"  "-"
[10,] "b"  "2"  "-"
[11,] "a"  "3"  "-"
[12,] "b"  "3"  "-"
</pre>
* mapply is useful in generating variables with a vector of parameters. For example suppose we want to generate variables from exponential/weibull distribution and a vector of scale parameters (depending on some covariates). In this case we can use ([https://stackoverflow.com/a/17031993 Simulating Weibull distributions from vectors of parameters in R])
{{Pre}}
set.seed(1)
mapply(rweibull, 1, c(1, 10), MoreArgs=list(n=1))
# [1] 1.326108 9.885284
set.seed(1)
x <- replicate(1000, mapply(rweibull, 1, c(1, 10), MoreArgs=list(n=1)))
dim(x) # [1]  2 1000
rowMeans(x)
# [1]  1.032209 10.104131
</pre>
{{Pre}}
set.seed(1); Vectorize(rweibull)(n=1, shape=1, scale=c(1, 10))
# [1] 1.326108 9.885284
set.seed(1); x <- replicate(1000, Vectorize(rweibull)(n=1, shape=1, scale=c(1, 10)))
</pre>
</pre>
and on Linux
<pre>
brb@brb-T3500:~$ whereis R
R: /usr/bin/R /etc/R /usr/lib/R /usr/bin/X11/R /usr/local/lib/R /usr/share/R /usr/share/man/man1/R.1.gz


brb@brb-T3500:~$ ls /usr/lib/R
=== do.call vs lapply ===
bin  COPYING  etc  lib  library  modules  site-library  SVN-REVISION
[https://stackoverflow.com/a/10801883 What's the difference between lapply and do.call?] It seems to me the best usage is combining both functions: '''do.call(..., lapply())'''


brb@brb-T3500:~$ ls /usr/lib/R/etc
* lapply returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X.
javaconf  ldpaths  Makeconf  Renviron  Renviron.orig  Renviron.site  Renviron.ucf  repositories  Rprofile.site
* do.call constructs and executes a function call from a name or a function and a list of arguments to be passed to it. '''It is widely used, for example, to assemble lists into simpler structures (often with rbind or cbind).'''
* Map applies a function to the corresponding elements of given vectors... Map is a simple wrapper to mapply which does not attempt to simplify the result, similar to Common Lisp's mapcar (with arguments being recycled, however). Future versions may allow some control of the result type.


brb@brb-T3500:~$ ls /usr/local/lib/R
{{Pre}}
site-library
> lapply(iris, class) # same as Map(class, iris)
</pre>
$Sepal.Length
and
[1] "numeric"
<pre>
brb@brb-T3500:~$ cat /usr/lib/R/etc/Rprofile.site
##                                              Emacs please make this -*- R -*-
## empty Rprofile.site for R on Debian
##
## Copyright (C) 2008 Dirk Eddelbuettel and GPL'ed
##
## see help(Startup) for documentation on ~/.Rprofile and Rprofile.site


# ## Example of .Rprofile
$Sepal.Width
# options(width=65, digits=5)
[1] "numeric"
# options(show.signif.stars=FALSE)
# setHook(packageEvent("grDevices", "onLoad"),
#        function(...) grDevices::ps.options(horizontal=FALSE))
# set.seed(1234)
# .First <- function() cat("\n  Welcome to R!\n\n")
# .Last <- function()  cat("\n  Goodbye!\n\n")


# ## Example of Rprofile.site
$Petal.Length
# local({
[1] "numeric"
#  # add MASS to the default packages, set a CRAN mirror
#  old <- getOption("defaultPackages"); r <- getOption("repos")
#  r["CRAN"] <- "http://my.local.cran"
#  options(defaultPackages = c(old, "MASS"), repos = r)
#})
brb@brb-T3500:~$ cat /usr/lib/R/etc/Renviron.site
##                                              Emacs please make this -*- R -*-
## empty Renviron.site for R on Debian
##
## Copyright (C) 2008 Dirk Eddelbuettel and GPL'ed
##
## see help(Startup) for documentation on ~/.Renviron and Renviron.site


# ## Example ~/.Renviron on Unix
$Petal.Width
# R_LIBS=~/R/library
[1] "numeric"
# PAGER=/usr/local/bin/less


# ## Example .Renviron on Windows
$Species
# R_LIBS=C:/R/library
[1] "factor"
# MY_TCLTK="c:/Program Files/Tcl/bin"


# ## Example of setting R_DEFAULT_PACKAGES (from R CMD check)
> x <- lapply(iris, class)
# R_DEFAULT_PACKAGES='utils,grDevices,graphics,stats'
> do.call(c, x)
# # this loads the packages in the order given, so they appear on
Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species
# # the search path in reverse order.
  "numeric"    "numeric"    "numeric"    "numeric"    "factor"
brb@brb-T3500:~$
</pre>
</pre>


==== What is the best place to save Rconsole on Windows platform ====
https://stackoverflow.com/a/10801902
Put/create the file <Rconsole> under ''C:/Users/USERNAME/Documents'' folder so no matter how R was upgraded/downgraded, it always find my preference.
* '''lapply''' applies a function '''over a list'''. So there will be several function calls.
* '''do.call''' calls a function with '''a list of arguments''' (... argument) such as [https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/c c()] or [https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/cbind rbind()/cbind()] or [https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/sum sum] or [https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/order order] or [https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/Extract "["] or paste. So there is only one function call.
{{Pre}}
> X <- list(1:3,4:6,7:9)
> lapply(X,mean)
[[1]]
[1] 2
 
[[2]]
[1] 5


My preferred settings:
[[3]]
* Font: Consolas (it will be shown as "TT Consolas" in Rconsole)
[1] 8
* Size: 12
> do.call(sum, X)
* background: black
[1] 45
* normaltext: white
> sum(c(1,2,3), c(4,5,6), c(7,8,9))
* usertext: GreenYellow or orange (close to RStudio's Cobalt theme) or sienna1 or SpringGreen or tan1 or yellow
[1] 45
> do.call(mean, X) # Error
> do.call(rbind,X)
    [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9
> lapply(X,rbind)
[[1]]
    [,1] [,2] [,3]
[1,]    1    2    3


and others (default options)
[[2]]
* pagebg: white
    [,1] [,2] [,3]
* pagetext: navy
[1,]    4    5    6
* highlight: DarkRed
* dataeditbg: white
* dataedittext: navy (View() function)
* dataedituser: red
* editorbg: white (edit() function)
* editortext: black


=== Saving and loading history automatically: .Rprofile & local() ===
[[3]]
* http://stat.ethz.ch/R-manual/R-patched/library/utils/html/savehistory.html
    [,1] [,2] [,3]
* .Rprofile will automatically be loaded when R has started from that directory
[1,]    7    8    9
* .Rprofile has been created/used by the '''packrat''' package to restore a packrat environment. See the packrat/init.R file.
> mapply(mean, X, trim=c(0,0.5,0.1))
* [http://www.statmethods.net/interface/customizing.html Customizing Startup], [http://www.onthelambda.com/2014/09/17/fun-with-rprofile-and-customizing-r-startup/ Fun with .Rprofile and customizing R startup]  
[1] 2 5 8
* https://stackoverflow.com/questions/16734937/saving-and-loading-history-automatically
> mapply(mean, X)
* The history file will always be read from the $HOME directory and the history file will be overwritten by a new session. These two problems can be solved if we define '''R_HISTFILE''' system variable.
[1] 2 5 8
* [https://www.rdocumentation.org/packages/base/versions/3.5.0/topics/eval local()] function can be used in .Rprofile file to set up the environment even no new variables will be created (change repository, install packages, load libraries, source R files, run system() function, file/directory I/O, etc)
</pre>
Below is a good example to show the difference of lapply() and do.call() - [https://stackoverflow.com/a/42734863 Generating Random Strings].  
{{Pre}}
> set.seed(1)
> x <- replicate(2, sample(LETTERS, 4), FALSE)
> x
[[1]]
[1] "Y" "D" "G" "A"


'''Linux''' or '''Mac'''
[[2]]
[1] "B" "W" "K" "N"


In '''~/.profile''' or '''~/.bashrc''' I put:
> lapply(x, paste0)
<pre>
[[1]]
export R_HISTFILE=~/.Rhistory
[1] "Y" "D" "G" "A"
</pre>
 
In '''~/.Rprofile''' I put:
[[2]]
<pre>
[1] "B" "W" "K" "N"
if (interactive()) {
 
  if (.Platform$OS.type == "unix")  .First <- function() try(utils::loadhistory("~/.Rhistory"))
> lapply(x, paste0, collapse= "")
  .Last <- function() try(savehistory(file.path(Sys.getenv("HOME"), ".Rhistory")))
[[1]]
}
[1] "YDGA"
</pre>


'''Windows'''
[[2]]
[1] "BWKN"


If you launch R by clicking its icon from Windows Desktop, the R starts in '''C:\User\$USER\Documents''' directory. So we can create a new file '''.Rprofile''' in this directory.
> do.call(paste0, x)
<pre>
[1] "YB" "DW" "GK" "AN"
if (interactive()) {
  .Last <- function() try(savehistory(file.path(Sys.getenv("HOME"), ".Rhistory")))
}
</pre>
</pre>


=== R release versions ===
=== do.call + rbind + lapply ===
[http://cran.r-project.org/web/packages/rversions/index.html rversions]: Query the main 'R' 'SVN' repository to find the released versions & dates.
Lots of examples. See for example [https://stat.ethz.ch/pipermail/r-help/attachments/20140423/62d8d103/attachment.pl this one] for creating a data frame from a vector.
{{Pre}}
x <- readLines(textConnection("---CLUSTER 1 ---
3
4
5
6
---CLUSTER 2 ---
9
10
8
11"))


=== Detect number of running R instances in Windows ===
# create a list of where the 'clusters' are
* http://stackoverflow.com/questions/15935931/detect-number-of-running-r-instances-in-windows-within-r
clust <- c(grep("CLUSTER", x), length(x) + 1L)
<pre>
C:\Program Files\R>tasklist /FI "IMAGENAME eq Rscript.exe"
INFO: No tasks are running which match the specified criteria.


C:\Program Files\R>tasklist /FI "IMAGENAME eq Rgui.exe"
# get size of each cluster
clustSize <- diff(clust) - 1L


Image Name                    PID Session Name        Session#   Mem Usage
# get cluster number
========================= ======== ================ =========== ============
clustNum <- gsub("[^0-9]+", "", x[grep("CLUSTER", x)])
Rgui.exe                      1096 Console                    1    44,712 K


C:\Program Files\R>tasklist /FI "IMAGENAME eq Rserve.exe"
result <- do.call(rbind, lapply(seq(length(clustNum)), function(.cl){
    cbind(Object = x[seq(clust[.cl] + 1L, length = clustSize[.cl])]
        , Cluster = .cl
        )
    }))


Image Name                    PID Session Name        Session#    Mem Usage
result
========================= ======== ================ =========== ============
Rserve.exe                    6108 Console                    1    381,796 K
</pre>
In R, we can use
<pre>
> system('tasklist /FI "IMAGENAME eq Rgui.exe" ', intern = TRUE)
[1] ""                                                                           
[2] "Image Name                    PID Session Name        Session#    Mem Usage"
[3] "========================= ======== ================ =========== ============"
[4] "Rgui.exe                      1096 Console                    1    44,804 K"


> length(system('tasklist /FI "IMAGENAME eq Rgui.exe" ', intern = TRUE))-3
    Object Cluster
[1,] "3"   "1"
[2,] "4"    "1"
[3,] "5"    "1"
[4,] "6"    "1"
[5,] "9"    "2"
[6,] "10"  "2"
[7,] "8"    "2"
[8,] "11"  "2"
</pre>
</pre>


=== Editor ===
A 2nd example is to [http://datascienceplus.com/working-with-data-frame-in-r/ sort a data frame] by using do.call(order, list()).
http://en.wikipedia.org/wiki/R_(programming_language)#Editors_and_IDEs


* Emacs + ESS. The ESS is useful in the case I want to tidy R code (the tidy_source() function in the formatR package sometimes gives errors; eg when I tested it on an R file like <GetComparisonResults.R> from BRB-ArrayTools v4.4 stable).
Another example is to reproduce aggregate(). aggregate() = do.call() + by().
* [http://www.rstudio.com/ Rstudio] - editor/R terminal/R graphics/file browser/package manager. The new version (0.98) also provides a new feature for debugging step-by-step. See also [https://www.rstudio.com/rviews/2016/11/11/easy-tricks-you-mightve-missed/ RStudio Tricks]
{{Pre}}
* [http://www.geany.org/ geany] - I like the feature that it shows defined functions on the side panel even for R code. RStudio can also do this (see the bottom of the code panel).
attach(mtcars)
* [http://rgedit.sourceforge.net/ Rgedit] which includes a feature of splitting screen into two panes and run R in the bottom panel. See [http://www.stattler.com/article/using-gedit-or-rgedit-r here].
do.call(rbind, by(mtcars, list(cyl, vs), colMeans))
* Komodo IDE with browser preview http://www.youtube.com/watch?v=wv89OOw9roI at 4:06 and http://docs.activestate.com/komodo/4.4/editor.html
# the above approach give the same result as the following
# except it does not have an extra Group.x columns
aggregate(mtcars, list(cyl, vs), FUN=mean)
</pre>


=== GUI for Data Analysis ===
== Run examples ==
When we call help(FUN), it shows the document in the browser. The browser will show
<pre>
example(FUN, package = "XXX") was run in the console
To view output in the browser, the knitr package must be installed
</pre>


==== Rcmdr ====
== How to get examples from help file, example() ==
http://cran.r-project.org/web/packages/Rcmdr/index.html
[https://blog.r-hub.io/2020/01/27/examples/ Code examples in the R package manuals]:
<pre>
# How to run all examples from a man page
example(within)


==== Deducer ====
# How to check your examples?
http://cran.r-project.org/web/packages/Deducer/index.html
devtools::run_examples()
testthat::test_examples()
</pre>


=== Scope ===
See [https://stat.ethz.ch/pipermail/r-help/2014-April/369342.html this post].
See  
Method 1:
* [http://cran.r-project.org/doc/manuals/R-intro.html#Assignment-within-functions Assignments within functions] in the '''An Introduction to R''' manual.
<pre>
* [[#How_to_exit_a_sourced_R_script|source()]] does not work like C's preprocessor where statements in header files will be literally inserted into the code. It does not work when you define a variable in a function but want to use it outside the function (even through '''source()''')
example(acf, give.lines=TRUE)
</pre>
Method 2:
<pre>
Rd <- utils:::.getHelpFile(?acf)
tools::Rd2ex(Rd)
</pre>
 
== "[" and "[[" with the sapply() function ==
Suppose we want to extract string from the id like "ABC-123-XYZ" before the first hyphen.
<pre>
sapply(strsplit("ABC-123-XYZ", "-"), "[", 1)
</pre>
is the same as
<pre>
sapply(strsplit("ABC-123-XYZ", "-"), function(x) x[1])
</pre>
 
== Dealing with dates ==
<ul>
<li>Simple examples
<syntaxhighlight lang='rsplus'>
dates <- c("January 15, 2023", "December 31, 1999")
date_objects <- as.Date(dates, format = "%B %d, %Y") # format is for the input
# [1] "2023-01-15" "1999-12-31"
</syntaxhighlight>


<li>Find difference
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
## foo.R ##
# Convert the dates to Date objects
cat(ArrayTools, "\n")
date1 <- as.Date("6/29/21", format="%m/%d/%y")
## End of foo.R
date2 <- as.Date("11/9/21", format="%m/%d/%y")


# 1. Error
# Calculate the difference in days
predict <- function() {
diff_days <- as.numeric(difftime(date2, date1, units="days")) # 133
  ArrayTools <- "C:/Program Files" # or through load() function
# In months
  source("foo.R")                 # or through a function call; foo()
diff_days / (365.25/12) # 4.36961    
}
predict()  # Object ArrayTools not found


# 2. OK. Make the variable global
# OR using the lubridate package
predict <- function() {
library(lubridate)
  ArrayTools <<- "C:/Program Files'
# Convert the dates to Date objects
  source("foo.R")
date1 <- mdy("6/29/21")
}
date2 <- mdy("11/9/21")
predict()
interval(date1, date2) %/% months(1)
ArrayTools
</syntaxhighlight>


# 3. OK. Create a global variable
<li>http://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html
ArrayTools <- "C:/Program Files"
<syntaxhighlight lang='rsplus'>
predict <- function() {
d1 = date()
   source("foo.R")
class(d1) # "character"
d2 = Sys.Date()
class(d2) # "Date"
 
format(d2, "%a %b %d")
 
library(lubridate); ymd("20140108") # "2014-01-08 UTC"
mdy("08/04/2013") # "2013-08-04 UTC"
dmy("03-04-2013") # "2013-04-03 UTC"
ymd_hms("2011-08-03 10:15:03") # "2011-08-03 10:15:03 UTC"
ymd_hms("2011-08-03 10:15:03", tz="Pacific/Auckland")
# "2011-08-03 10:15:03 NZST"
?Sys.timezone
x = dmy(c("1jan2013", "2jan2013", "31mar2013", "30jul2013"))
wday(x[1]) # 3
wday(x[1], label=TRUE) # Tues
</syntaxhighlight>
 
<li>http://www.r-statistics.com/2012/03/do-more-with-dates-and-times-in-r-with-lubridate-1-1-0/
<li>http://rpubs.com/seandavi/GEOMetadbSurvey2014
<li>We want our dates and times as class "Date" or the class "POSIXct", "POSIXlt". For more information type ?POSIXlt.
<li>[https://cran.r-project.org/web/packages/anytime/index.html anytime] package
<li>weeks to Christmas difftime(as.Date(“2019-12-25”), Sys.Date(), units =“weeks”)
<li>[https://blog.rsquaredacademy.com/handling-date-and-time-in-r/ A Comprehensive Introduction to Handling Date & Time in R] 2020
<li>[https://www.spsanderson.com/steveondata/posts/rtip-2023-05-12/index.html Working with Dates and Times Pt 1]
* Three major functions: as.Date(), as.POSIXct(), and as.POSIXlt().
* '''POSIXct''' is a class in R that represents date-time data. The ct stands for “calendar time” and it represents the (signed) number of seconds since the beginning of 1970 as a numeric vector1.  '''It stores date time as integer.'''
* '''POSIXlt''' is a class in R that represents date-time data. It stands for “local time” and is a list with components as integer vectors, which can represent a vector of broken-down times. '''It stores date time as list:sec, min, hour, mday, mon, year, wday, yday, isdst, zone, gmtoff'''.
 
<li>[https://www.r-bloggers.com/2023/11/r-lubridate-how-to-efficiently-work-with-dates-and-times-in-r/ R lubridate: How To Efficiently Work With Dates and Times in R] 2023
</ul>
 
== Nonstandard/non-standard evaluation, deparse/substitute and scoping ==
* [https://www.brodieg.com/2020/05/05/on-nse/ Standard and Non-Standard Evaluation in R]
* [http://adv-r.had.co.nz/Computing-on-the-language.html Nonstandard evaluation] from Advanced R book.
* [https://edwinth.github.io/blog/nse/ Non-standard evaluation, how tidy eval builds on base R]
* [https://cran.r-project.org/web/packages/lazyeval/vignettes/lazyeval.html Vignette] from the [https://cran.r-project.org/web/packages/lazyeval/index.html lazyeval] package. It is needed in three cases
** Labelling: turn an argument into a label
** Formulas
** Dot-dot-dot
* [https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/substitute substitute(expr, env)] - capture expression. The return mode is a '''call'''.
** substitute() is often paired with '''deparse'''() to create informative labels for data sets and plots. The return mode of deparse() is '''character strings'''.
** Use 'substitute' to include the variable's name in a plot title, e.g.: '''var <- "abc"; hist(var,main=substitute(paste("Dist of ", var))) ''' will show the title "Dist of var" instead of "Dist of abc" in the title.
** [https://stackoverflow.com/a/34079727 Passing a variable name to a function in R]
** Example:
::<syntaxhighlight lang='rsplus'>
f <- function(x) {
   substitute(x)
}
}
predict()
f(1:10)
# 1:10
class(f(1:10)) # or mode()
# [1] "call"
g <- function(x) deparse(substitute(x))
g(1:10)
# [1] "1:10"
class(g(1:10)) # or mode()
# [1] "character"
</syntaxhighlight>
* quote(expr) - similar to substitute() but do nothing?? [https://www.rdocumentation.org/packages/base/versions/3.5.2/topics/noquote noquote] - print character strings without quotes
:<syntaxhighlight lang='rsplus'>
mode(quote(1:10))
# [1] "call"
</syntaxhighlight>
</syntaxhighlight>
* eval(expr, envir), evalq(expr, envir) - eval evaluates its first argument in the current scope before passing it to the evaluator: evalq avoids this.
** The '''parent.frame()''' is necessary in cases like the [https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/update stats::update()] function used by [https://github.com/cran/glmnet/blob/master/R/relax.glmnet.R#L66 relax.glmnet()].
** Example:
::<syntaxhighlight lang='rsplus'>
sample_df <- data.frame(a = 1:5, b = 5:1, c = c(5, 3, 1, 4, 1))


'''Note that any ordinary assignments done within the function are local and temporary and are lost after exit from the function.'''
subset1 <- function(x, condition) {
  condition_call <- substitute(condition)
  r <- eval(condition_call, x)
  x[r, ]
}
x <- 4
condition <- 4
subset1(sample_df, a== 4) # same as subset(sample_df, a >= 4)
subset1(sample_df, a== x) # WRONG!
subset1(sample_df, a == condition) # ERROR


Example 1.
subset2 <- function(x, condition) {
<pre>
  condition_call <- substitute(condition)
> ttt <- data.frame(type=letters[1:5], JpnTest=rep("999", 5), stringsAsFactors = F)
  r <- eval(condition_call, x, parent.frame())
> ttt
  x[r, ]
  type JpnTest
}
1    a     999
subset2(sample_df, a == 4) # same as subset(sample_df, a >= 4)
2    b    999
subset2(sample_df, a == x) # 👌
3    c    999
subset2(sample_df, a == condition) # 👍
4    d    999
</syntaxhighlight>
5    e    999
* deparse(expr) - turns unevaluated expressions into character strings. For example,
> jpntest <- function() { ttt$JpnTest[1] ="N5"; print(ttt)}
:<syntaxhighlight lang='rsplus'>
> jpntest()
> deparse(args(lm))
  type JpnTest
[1] "function (formula, data, subset, weights, na.action, method = \"qr\", "  
1    a      N5
[2] "   model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, "
2    b    999
[3] "   contrasts = NULL, offset, ...) "                                   
3    c    999
[4] "NULL"      
4   d    999
5    e    999
> ttt
  type JpnTest
1    a    999
2    b    999
3    c    999
4    d     999
5    e    999
</pre>


Example 2. [http://stackoverflow.com/questions/1236620/global-variables-in-r How can we set global variables inside a function?] The answer is to use the "<<-" operator or '''assign(, , envir = .GlobalEnv)''' function.
> deparse(args(lm), width=20)
[1] "function (formula, data, "        "    subset, weights, "         
[3] "    na.action, method = \"qr\", " "    model = TRUE, x = FALSE, " 
[5] "    y = FALSE, qr = TRUE, "      "    singular.ok = TRUE, "       
[7] "    contrasts = NULL, "          "    offset, ...) "             
[9] "NULL"
</syntaxhighlight>
* parse(text) - returns the parsed but unevaluated expressions in a list. See [[R#Create_a_Simple_Socket_Server_in_R|Create a Simple Socket Server in R]] for the application of '''eval(parse(text))'''. Be cautious!
** [http://r.789695.n4.nabble.com/using-eval-parse-paste-in-a-loop-td849207.html eval(parse...)) should generally be avoided]
** [https://stackoverflow.com/questions/13649979/what-specifically-are-the-dangers-of-evalparse What specifically are the dangers of eval(parse(…))?]


Other resource: [http://adv-r.had.co.nz/Functions.html Advanced R] by Hadley Wickham.
Following is another example. Assume we have a bunch of functions (f1, f2, ...; each function implements a different algorithm) with same input arguments format (eg a1, a2). We like to run these function on the same data (to compare their performance).  
{{Pre}}
f1 <- function(x) x+1; f2 <- function(x) x+2; f3 <- function(x) x+3


Example 3. [https://stackoverflow.com/questions/1169534/writing-functions-in-r-keeping-scoping-in-mind Writing functions in R, keeping scoping in mind]
f1(1:3)
f2(1:3)
f3(1:3)


=== Speedup R code ===
# Or
* [http://datascienceplus.com/strategies-to-speedup-r-code/ Strategies to speedup R code] from DataScience+
myfun <- function(f, a) {
    eval(parse(text = f))(a)
}
myfun("f1", 1:3)
myfun("f2", 1:3)
myfun("f3", 1:3)
 
# Or with lapply
method <- c("f1", "f2", "f3")
res <- lapply(method, function(M) {
                    Mres <- eval(parse(text = M))(1:3)
                    return(Mres)
})
names(res) <- method
</pre>
 
=== library() accept both quoted and unquoted strings ===
[https://stackoverflow.com/a/25210607 How can library() accept both quoted and unquoted strings]. The key lines are
<pre>
  if (!character.only)
    package <- as.character(substitute(package))
</pre>
 
=== Lexical scoping ===
* [https://lgreski.github.io/dsdepot/2020/06/28/rObjectsSObjectsAndScoping.html R Objects, S Objects, and Lexical Scoping]
* [http://www.biostat.jhsph.edu/~rpeng/docs/R-classes-scope.pdf#page=31 Dynamic scoping vs Lexical scoping] and the example of [http://www.biostat.jhsph.edu/~rpeng/docs/R-classes-scope.pdf#page=41 optimization]
* [https://www.r-bloggers.com/2024/03/indicating-local-functions-in-r-scripts/ Indicating local functions in R scripts]
 
== The ‘…’ argument ==
* See [http://cran.r-project.org/doc/manuals/R-intro.html#The-three-dots-argument Section 10.4 of An Introduction to R]. Especially, the expression '''list(...)''' evaluates all such arguments and returns them in a named list
* [https://statisticaloddsandends.wordpress.com/2020/11/15/some-notes-when-using-dot-dot-dot-in-r/ Some notes when using dot-dot-dot (…) in R]
* [https://stackoverflow.com/questions/26684509/how-to-check-if-any-arguments-were-passed-via-ellipsis-in-r-is-missing How to check if any arguments were passed via “…” (ellipsis) in R? Is missing(…) valid?]
 
== Functions ==
* https://adv-r.hadley.nz/functions.html
* [https://towardsdatascience.com/writing-better-r-functions-best-practices-and-tips-d48ef0691c24 Writing Better R Functions — Best Practices and Tips]. The [https://cran.r-project.org/web/packages/docstring/index.html docstring] package and "?" is interesting!
 
=== Function argument ===
[https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Argument-matching Argument matching] from [https://cran.r-project.org/doc/manuals/r-release/R-lang.html R Language Definition] manual.
 
Argument matching is augmented by the functions
* [https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/match.arg match.arg],
* [https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/match.call match.call]
* [https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/match.fun match.fun].
 
Access to the partial matching algorithm used by R is via [https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/pmatch pmatch].


=== Profiler ===
=== Check function arguments ===
(Video) [https://www.rstudio.com/resources/videos/understand-code-performance-with-the-profiler/ Understand Code Performance with the profiler]
[https://blog.r-hub.io/2022/03/10/input-checking/ Checking the inputs of your R functions]: '''match.arg()''' , '''stopifnot()'''


=== Vectorization ===
'''stopifnot()''': function argument sanity check
* https://en.wikipedia.org/wiki/Vectorization_%28mathematics%29
<ul>
* http://www.noamross.net/blog/2014/4/16/vectorization-in-r--why.html
<li>[https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/stopifnot stopifnot()]. ''stopifnot'' is a quick way to check multiple conditions on the input. so for instance. The code stops when either of the three conditions are not satisfied. However, it doesn't produce pretty error messages.
* https://github.com/vsbuffalo/devnotes/wiki/R-and-Vectorization
<pre>
stopifnot(condition1, condition2, ...)
</pre>
</li>
<li>[https://rud.is/b/2020/05/19/mining-r-4-0-0-changelog-for-nuggets-of-gold-1-stopifnot/ Mining R 4.0.0 Changelog for Nuggets of Gold] </li>
</ul>


==== Mean of duplicated rows ====
=== Lazy evaluation in R functions arguments ===
* rowsum()
* http://adv-r.had.co.nz/Functions.html
* [http://stackoverflow.com/questions/7881660/finding-the-mean-of-all-duplicates use ave() and unique()]
* https://stat.ethz.ch/pipermail/r-devel/2015-February/070688.html
* [http://stackoverflow.com/questions/17383635/average-between-duplicated-rows-in-r data.table package]
* https://twitter.com/_wurli/status/1451459394009550850
* [http://stackoverflow.com/questions/10180132/consolidate-duplicate-rows plyr package]
* [http://www.statmethods.net/management/aggregate.html aggregate()] function. Too slow! http://slowkow.com/2015/01/28/data-table-aggregate/. [http://www.win-vector.com/blog/2015/10/dont-use-statsaggregate/ Don't use aggregate] post.
<syntaxhighlight lang='rsplus'>
> attach(mtcars)
dim(mtcars)
[1] 32 11
> head(mtcars)
                  mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4        21.0  6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag    21.0  6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8  4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4  6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7  8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant          18.1  6  225 105 2.76 3.460 20.22  1  0    3    1
> aggdata <-aggregate(mtcars, by=list(cyl,vs),  FUN=mean, na.rm=TRUE)
> print(aggdata)
  Group.1 Group.2      mpg cyl  disp      hp    drat      wt    qsec vs
1      4      0 26.00000  4 120.30  91.0000 4.430000 2.140000 16.70000  0
2      6      0 20.56667  6 155.00 131.6667 3.806667 2.755000 16.32667  0
3      8      0 15.10000  8 353.10 209.2143 3.229286 3.999214 16.77214  0
4      4      1 26.73000  4 103.62  81.8000 4.035000 2.300300 19.38100  1
5      6      1 19.12500  6 204.55 115.2500 3.420000 3.388750 19.21500  1
        am    gear    carb
1 1.0000000 5.000000 2.000000
2 1.0000000 4.333333 4.666667
3 0.1428571 3.285714 3.500000
4 0.7000000 4.000000 1.500000
5 0.0000000 3.500000 2.500000
> detach(mtcars)


# Another example: select rows with a minimum value from a certain column (yval in this case)
'''R function arguments are lazy — they’re only evaluated if they’re actually used'''.  
> mydf <- read.table(header=T, text='
id xval yval
A 1  1
A -2  2
B 3  3
B 4  4
C 5  5
')
> x = mydf$xval
> y = mydf$yval
> aggregate(mydf[, c(2,3)], by=list(id=mydf$id), FUN=function(x) x[which.min(y)])
  id xval yval
1  A    1    1
2  B    3    3
3  C    5    5
</syntaxhighlight>


=== Apply family ===
* Example 1. By default, R function arguments are lazy.
Vectorize, aggregate, apply, by, eapply, lapply, mapply, rapply, replicate, scale, sapply, split, tapply, and vapply. Check out [http://people.stern.nyu.edu/ylin/r_apply_family.html this].
<pre>
 
f <- function(x) {
The following list gives a hierarchical relationship among these functions.
  999
* apply(X, MARGIN, FUN, ...) – Apply a Functions Over Array Margins
}
* tapply(X, INDEX, FUN = NULL, ..., default = NA, simplify = TRUE) – Apply a Function Over a "Ragged" Array
f(stop("This is an error!"))
** by(data, INDICES, FUN, ..., simplify = TRUE) - Apply a Function to a Data Frame Split by Factors
#> [1] 999
** aggregate(x, by, FUN, ..., simplify = TRUE, drop = TRUE) - Compute Summary Statistics of Data Subsets
</pre>
* lapply(X, FUN, ...) – Apply a Function over a List or Vector
** sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE) – Apply a Function over a List or Vector
*** replicate(n, expr, simplify = "array")
** mapply(FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE) – Multivariate version of sapply
*** Vectorize(FUN, vectorize.args = arg.names, SIMPLIFY = TRUE, USE.NAMES = TRUE) - Vectorize a Scalar Function
** vapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE) – similar to sapply, but has a pre-specified type of return value
* rapply(object, f, classes = "ANY", deflt = NULL, how = c("unlist", "replace", "list"), ...) – A recursive version of lapply
* eapply(env, FUN, ..., all.names = FALSE, USE.NAMES = TRUE) – Apply a Function over values in an environment


Note that, apply's performance is not always better than a for loop. See
* Example 2. If you want to ensure that an argument is evaluated you can use '''force()'''.
* http://tolstoy.newcastle.edu.au/R/help/06/05/27255.html (answered by Brian Ripley)
<pre>
* https://stat.ethz.ch/pipermail/r-help/2014-October/422455.html (has one example)
add <- function(x) {
  force(x)
  function(y) x + y
}
adders2 <- lapply(1:10, add)
adders2[[1]](10)
#> [1] 11
adders2[[10]](10)
#> [1] 20
</pre>


The package 'pbapply' creates a text-mode progress bar - it works on any platforms. On Windows platform, check out [http://www.theanalystatlarge.com/for-loop-tracking-windows-progress-bar/ this post]. It uses  winProgressBar() and setWinProgressBar() functions.
* Example 3. Default arguments are evaluated inside the function.
<pre>
f <- function(x = ls()) {
  a <- 1
  x
}


==== Progress bar ====
# ls() evaluated inside f:
[http://peter.solymos.org/code/2016/09/11/what-is-the-cost-of-a-progress-bar-in-r.html What is the cost of a progress bar in R?]
f()
# [1] "a" "x"


==== lapply and its friends Map(), Reduce(), Filter() from the base package for manipulating lists ====
# ls() evaluated in global environment:
* Examples of using lapply() + split() on a data frame. See [http://rollingyours.wordpress.com/category/r-programming-apply-lapply-tapply/ rollingyours.wordpress.com].
f(ls())
* mapply() [https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/mapply documentation]. [https://stackoverflow.com/questions/9519543/merge-two-lists-in-r Use mapply() to merge lists].
# [1] "add"    "adders" "f"  
* [http://www.brodrigues.co/functional_programming_and_unit_testing_for_data_munging/fprog.html Map() and Reduce()] in functional programming
</pre>
* Map(), Reduce(), and Filter() from [http://adv-r.had.co.nz/Functionals.html#functionals-fp Advanced R] by Hadley
** If you have two or more lists (or data frames) that you need to process in <span style="color: red">parallel</span>, use '''Map()'''. One good example is to compute the weighted.mean() function that requires two input objects. Map() is similar to '''mapply()''' function and is more concise than '''lapply()'''. [http://adv-r.had.co.nz/Functionals.html#functionals-loop Advanced R] has a comment that Map() is better than mapply(). <syntaxhighlight lang='rsplus'>
# Syntax: Map(f, ...)


xs <- replicate(5, runif(10), simplify = FALSE)
* Example 4. Laziness is useful in if statements — the second statement below will be evaluated only if the first is true.
ws <- replicate(5, rpois(10, 5) + 1, simplify = FALSE)
<pre>
Map(weighted.mean, xs, ws)
x <- NULL
if (!is.null(x) && x > 0) {
 
}
</pre>


# instead of a more clumsy way
=== Use of functions as arguments ===
lapply(seq_along(xs), function(i) {
[https://www.njtierney.com/post/2019/09/29/unexpected-function/ Just Quickly: The unexpected use of functions as arguments]
  weighted.mean(xs[[i]], ws[[i]])
})
</syntaxhighlight>
** Reduce() reduces a vector, x, to a single value by <span style="color: red">recursively</span> calling a function, f, two arguments at a time. A good example of using '''Reduce()''' function is to read a list of matrix files and merge them. See [https://stackoverflow.com/questions/29820029/how-to-combine-multiple-matrix-frames-into-one-using-r How to combine multiple matrix frames into one using R?] <syntaxhighlight lang='rsplus'>
# Syntax: Reduce(f, x, ...)


> m1 <- data.frame(id=letters[1:4], val=1:4)
=== body() ===
> m2 <- data.frame(id=letters[2:6], val=2:6)
[https://stackoverflow.com/a/51548945 Remove top axis title base plot]
> merge(m1, m2, "id", all = T)
  id val.x val.y
1  a    1    NA
2  b    2    2
3  c    3    3
4  d    4    4
5  e    NA    5
6  f    NA    6
> m <- list(m1, m2)
> Reduce(function(x,y) merge(x,y, "id",all=T), m)
  id val.x val.y
a     1    NA
2  b    2    2
3  c    3    3
4  d    4    4
5  e    NA    5
6  f    NA    6
</syntaxhighlight>


==== sapply & vapply ====
=== Return functions in R ===
* [http://stackoverflow.com/questions/12339650/why-is-vapply-safer-than-sapply This] discusses why '''vapply''' is safer and faster than sapply.
* [https://win-vector.com/2015/04/03/how-and-why-to-return-functions-in-r/ How and why to return functions in R]
* [http://adv-r.had.co.nz/Functionals.html#functionals-loop Vector output: sapply and vapply] from Advanced R (Hadley Wickham).
* See the doc & example from [https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/taskCallback taskCallback - Create an R-level task callback manager]. [https://developer.r-project.org/TaskHandlers.pdf Top-level Task Callbacks in R].
* [https://purrple.cat/blog/2017/05/28/turn-r-users-insane-with-evil/ Turn R users insane with evil]


==== rapply - recursive version of lapply ====
=== anonymous function ===
* http://4dpiecharts.com/tag/recursive/
In R, the main difference between a lambda function (also known as an anonymous function) and a regular function is that a '''lambda function is defined without a name''', while a regular function is defined with a name.
* [https://github.com/wch/r-source/search?utf8=%E2%9C%93&q=rapply Search in R source code]. Mainly [https://github.com/wch/r-source/blob/trunk/src/library/stats/R/dendrogram.R r-source/src/library/stats/R/dendrogram.R].


==== replicate ====
<ul>
https://www.datacamp.com/community/tutorials/tutorial-on-loops-in-r
<li>See [[Tidyverse#Anonymous_functions|Tidyverse]] page
<li>But defining functions to use them only once is kind of overkill. That's why you can use so-called anonymous functions in R. For example, '''lapply(list(1,2,3), function(x) { x * x }) '''
<li>you can use lambda functions with many other functions in R that take a function as an argument. Some examples include '''sapply, apply, vapply, mapply, Map, Reduce, Filter''', and '''Find'''. These functions all work in a similar way to lapply by applying a function to elements of a list or vector.
<pre>
Reduce(function(x, y) x*y, list(1, 2, 3, 4)) # 24
</pre>
<li>[https://coolbutuseless.github.io/2019/03/13/anonymous-functions-in-r-part-1/ purrr anonymous function]
<li>[https://towardsdatascience.com/the-new-pipe-and-anonymous-function-syntax-in-r-54d98861014c The new pipe and anonymous function syntax in R 4.1.0]
<li>[http://adv-r.had.co.nz/Functional-programming.html#anonymous-functions Functional programming] from Advanced R
<li>[https://www.projectpro.io/recipes/what-are-anonymous-functions-r What are anonymous functions in R].
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
> replicate(5, rnorm(3))
> (function(x) x * x)(3)
          [,1]       [,2]      [,3]      [,4]        [,5]
[1] 9
[1,] 0.2509130 -0.3526600 -0.3170790  1.064816 -0.53708856
> (\(x) x * x)(3)
[2,]  0.5222548  1.5343319  0.6120194 -1.811913 -1.09352459
[1] 9
[3,] -1.9905533 -0.8902026 -0.5489822  1.308273 0.08773477
</syntaxhighlight>
</syntaxhighlight>
</ul>
 
== Backtick sign, infix/prefix/postfix operators ==
The backtick sign ` (not the single quote) refers to functions or variables that have otherwise reserved or illegal names; e.g. '&&', '+', '(', 'for', 'if', etc. See some examples in [http://adv-r.had.co.nz/Functions.html Advanced R] and [https://stackoverflow.com/a/36229703 What do backticks do in R?].
<pre>
iris %>% `[[`("Species")
</pre>


==== Vectorize ====
'''[http://en.wikipedia.org/wiki/Infix_notation infix]''' operator.
<syntaxhighlight lang='rsplus'>
<pre>
> rweibull(1, 1, c(1, 2)) # no error but not sure what it gives?
1 + 2   # infix
[1] 2.17123
+ 1 2   # prefix
> Vectorize("rweibull")(n=1, shape = 1, scale = c(1, 2))
1 2 +    # postfix
[1] 1.6491761 0.9610109
</pre>
</syntaxhighlight>
 
https://blogs.msdn.microsoft.com/gpalem/2013/03/28/make-vectorize-your-friend-in-r/
<syntaxhighlight lang='rsplus'>
myfunc <- function(a, b) a*b
myfunc(1, 2) # 2
myfunc(3, 5) # 15
myfunc(c(1,3), c(2,5)) # 2 15
Vectorize(myfunc)(c(1,3), c(2,5)) # 2 15


myfunc2 <- function(a, b) if (length(a) == 1) a * b else NA
Use with functions like sapply, e.g. '''sapply(1:5, `+`, 3) '''  .
myfunc2(1, 2) # 2
myfunc2(3, 5) # 15
myfunc2(c(1,3), c(2,5)) # NA
Vectorize(myfunc2)(c(1, 3), c(2, 5)) # 2 15
Vectorize(myfunc2)(c(1, 3, 6), c(2, 5)) # 2 15 12
                                        # parameter will be re-used
</syntaxhighlight>


=== plyr and dplyr packages ===
== Error handling and exceptions, tryCatch(), stop(), warning() and message() ==
[https://peerj.com/collections/50-practicaldatascistats/ Practical Data Science for Stats - a PeerJ Collection]
<ul>
<li>http://adv-r.had.co.nz/Exceptions-Debugging.html </li>
<li>[https://www.r-bloggers.com/2023/11/catch-me-if-you-can-exception-handling-in-r/ Catch Me If You Can: Exception Handling in R] </li>
<li>Temporarily disable warning messages
<pre>
# Method1:
suppressWarnings(expr)


[http://www.jstatsoft.org/v40/i01/paper The Split-Apply-Combine Strategy for Data Analysis] (plyr package) in J. Stat Software.
# Method 2:
 
<pre>
[http://seananderson.ca/courses/12-plyr/plyr_2012.pdf A quick introduction to plyr] with a summary of apply functions in R and compare them with functions in plyr package.
defaultW <- getOption("warn")
options(warn = -1)
[YOUR CODE]
options(warn = defaultW)
</pre>
</li>
<li>try() allows execution to continue even after an error has occurred. You can suppress the message with '''try(..., silent = TRUE)'''.
<pre>
out <- try({
  a <- 1
  b <- "x"
  a + b
})
 
elements <- list(1:10, c(-1, 10), c(T, F), letters)
results <- lapply(elements, log)
is.error <- function(x) inherits(x, "try-error")
succeeded <- !sapply(results, is.error)
</pre>
</li>
<li>tryCatch(): With tryCatch() you map conditions to handlers (like switch()), named functions that are called with the condition as an input. Note that try() is a simplified version of tryCatch().
<pre>
tryCatch(expr, ..., finally)


# plyr has a common syntax -- easier to remember
show_condition <- function(code) {
# plyr requires less code since it takes care of the input and output format
  tryCatch(code,
# plyr can easily be run in parallel -- faster
    error = function(c) "error",
    warning = function(c) "warning",
    message = function(c) "message"
  )
}
show_condition(stop("!"))
#> [1] "error"
show_condition(warning("?!"))
#> [1] "warning"
show_condition(message("?"))
#> [1] "message"
show_condition(10)
#> [1] 10
</pre>
Below is another snippet from available.packages() function,
{{Pre}}
z <- tryCatch(download.file(....), error = identity)
if (!inherits(z, "error")) STATEMENTS
</pre>
</li>
<li>The return class from tryCatch() may not be fixed.
<pre>
result <- tryCatch({
  # Code that might generate an error or warning
  log(99)
}, warning = function(w) {
  # Code to handle warnings
  print(paste("Warning:", w))
}, error = function(e) {
  # Code to handle errors
  print(paste("Error:", e))
}, finally = {
  # Code to always run, regardless of whether an error or warning occurred
  print("Finished")
}) 
# character type. But if we remove 'finally', it will be numeric.
</pre>
<li>[https://www.bangyou.me/post/capture-logs/ Capture message, warnings and errors from a R function]
</li>
</ul>


Tutorials
=== suppressMessages() ===
* [http://dplyr.tidyverse.org/articles/dplyr.html Introduction to dplyr] from http://dplyr.tidyverse.org/.
suppressMessages(expression)
* A video of [http://cran.r-project.org/web/packages/dplyr/index.html dplyr] package can be found on [http://vimeo.com/103872918 vimeo].
* [http://www.dataschool.io/dplyr-tutorial-for-faster-data-manipulation-in-r/ Hands-on dplyr tutorial for faster data manipulation in R] from dataschool.io.


Examples of using dplyr:
== List data type ==
* [http://wiekvoet.blogspot.com/2015/03/medicines-under-evaluation.html Medicines under evaluation]
=== Create an empty list ===
* [http://rpubs.com/seandavi/GEOMetadbSurvey2014 CBI GEO Metadata Survey]
<pre>
* [http://datascienceplus.com/r-for-publication-by-page-piccinini-lesson-3-logistic-regression/ Logistic Regression] by Page Piccinini. mutate(), inner_join() and %>%.
out <- vector("list", length=3L) # OR out <- list()
* [http://rpubs.com/turnersd/plot-deseq-results-multipage-pdf DESeq2 post analysis] select(), gather(), arrange() and %>%.
for(j in 1:3) out[[j]] <- myfun(j)


==== tibble ====
outlist <- as.list(seq(nfolds))
'''Tibbles''' are data frames, but slightly tweaked to work better in the '''tidyverse'''.
</pre>


<syntaxhighlight lang='rsplus'>
=== Nested list of data frames ===
> data(pew, package = "efficient")
An array can only hold data of a single type. read.csv() returns a data frame, which can contain both numerical and character data.
> dim(pew)  
<pre>
[1] 18 10
res <- vector("list", 3)
> class(pew) # tibble is also a data frame!!
names(res) <- paste0("m", 1:3)
[1] "tbl_df"     "tbl"        "data.frame"
for (i in seq_along(res)) {
  res[[i]] <- vector("list", 2) # second-level list with 2 elements
  names(res[[i]]) <- c("fc", "pre")
}


> tidyr::gather(pew, key=Income, value = Count, -religion) # make wide tables long
res[["m1"]][["fc"]] <- read.csv()
# A tibble: 162 x 3
                                                      religion Income Count
                                                          <chr>  <chr> <int>
1                                                    Agnostic  <$10k    27
2                                                      Atheist  <$10k    12
...
> mean(tidyr::gather(pew, key=Income, value = Count, -religion)[, 3])
[1] NA
Warning message:
In mean.default(tidyr::gather(pew, key = Income, value = Count,  :
  argument is not numeric or logical: returning NA
> mean(tidyr::gather(pew, key=Income, value = Count, -religion)[[3]])
[1] 181.6975
</syntaxhighlight>


If we try to do a match on some column of a tibble object, we will get zero matches. The issue is we cannot use an index to get a tibble column.
head(res$m1$fc) # Same as res[["m1"]][["fc"]]
</pre>


To [https://stackoverflow.com/questions/21618423/extract-a-dplyr-tbl-column-as-a-vector extract a column from a tibble object], use dplyr::pull().
=== Using $ in R on a List ===
<syntaxhighlight lang='rsplus'>
[https://www.statology.org/dollar-sign-in-r/ How to Use Dollar Sign ($) Operator in R]
pull(TibbleObject, VarName) # won't be a tibble object anymore
# OR
TibbleObject$VarName
# OR
TibbleObject[["VarName"]]
</syntaxhighlight>


==== llply() ====
=== [http://adv-r.had.co.nz/Functions.html Calling a function given a list of arguments] ===
llply is equivalent to lapply except that it will preserve labels and can display a progress bar. This is handy if we want to do a crazy thing.
<pre>
<pre>
LLID2GOIDs <- lapply(rLLID, function(x) get("org.Hs.egGO")[[x]])
> args <- list(c(1:10, NA, NA), na.rm = TRUE)
> do.call(mean, args)
[1] 5.5
> mean(c(1:10, NA, NA), na.rm = TRUE)
[1] 5.5
</pre>
</pre>
where rLLID is a list of entrez ID. For example,
<pre>
get("org.Hs.egGO")[["6772"]]
</pre>
returns a list of 49 GOs.


==== ddply() ====
=== Descend recursively through lists ===
http://lamages.blogspot.com/2012/06/transforming-subsets-of-data-in-r-with.html
<nowiki>x[[c(5,3)]] </nowiki> is the same as <nowiki>x[[5]][[3]]</nowiki>. See [https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/Extract ?Extract].


==== ldply() ====
=== Avoid if-else or switch ===
[http://rpsychologist.com/an-r-script-to-automatically-look-at-pubmed-citation-counts-by-year-of-publication/ An R Script to Automatically download PubMed Citation Counts By Year of Publication]
?plot.stepfun.
<pre>
y0 <- c(1,2,4,3)
sfun0  <- stepfun(1:3, y0, f = 0)
sfun.2 <- stepfun(1:3, y0, f = .2)
sfun1  <- stepfun(1:3, y0, right = TRUE)


=== set.seed(), for loop and saving random seeds ===
tt <- seq(0, 3, by = 0.1)
http://r.789695.n4.nabble.com/set-seed-and-for-loop-td3585857.html. This question is legitimate when we want to debug on a certain iteration.
op <- par(mfrow = c(2,2))
plot(sfun0); plot(sfun0, xval = tt, add = TRUE, col.hor = "bisque")
plot(sfun.2);plot(sfun.2, xval = tt, add = TRUE, col = "orange") # all colors
plot(sfun1);lines(sfun1, xval = tt, col.hor = "coral")
##-- This is revealing :
plot(sfun0, verticals = FALSE,
    main = "stepfun(x, y0, f=f)  for f = 0, .2, 1")


<syntaxhighlight lang='rsplus'>
for(i in 1:3)
set.seed(1001)  
  lines(list(sfun0, sfun.2, stepfun(1:3, y0, f = 1))[[i]], col = i)
data <- vector("list", 30)
legend(2.5, 1.9, paste("f =", c(0, 0.2, 1)), col = 1:3, lty = 1, y.intersp = 1)
seeds <- vector("list", 30)
for(i in 1:30) {
  seeds[[i]] <- .Random.seed
  data[[i]] <- runif(5)  
}
.Random.seed <- seeds[[23]]  # restore
data.23 <- runif(5)  
data.23
data[[23]]
</syntaxhighlight>
* Duncan Murdoch: ''This works in this example, but wouldn't work with all RNGs, because some of them save state outside of .Random.seed.  See ?.Random.seed for details.''
* Uwe Ligges's comment: ''set.seed() actually generates a seed. See ?set.seed that points us to .Random.seed (and relevant references!) which contains the actual current seed.''
* Petr Savicky's comment is also useful in the situation when it is not difficult to re-generate the data.


=== [https://stat.ethz.ch/R-manual/R-devel/library/parallel/html/mclapply.html mclapply()] and [https://stat.ethz.ch/R-manual/R-devel/library/parallel/html/clusterApply.html parLapply()] ===
par(op)
==== mclapply() from the 'parallel' package is a mult-core version of lapply() ====
</pre>
* Be providing the number of cores in mclapply() using '''mc.cores''' argument (2 is used by default)
[[:File:StepfunExample.svg]]
* Be careful on the need and the side-effect of using "L'Ecuyer-CMRG" seed.
* '''[https://stackoverflow.com/questions/15070377/r-doesnt-reset-the-seed-when-lecuyer-cmrg-rng-is-used R doesn't reset the seed when “L'Ecuyer-CMRG” RNG is used?]''' <syntaxhighlight lang='rsplus'>
library(parallel)
system.time(mclapply(1:1e4L, function(x) rnorm(x)))
system.time(mclapply(1:1e4L, function(x) rnorm(x), mc.cores = 4))


set.seed(1234)
== Open a new Window device ==
mclapply(1:3, function(x) rnorm(x))
X11() or dev.new()
set.seed(1234)
mclapply(1:3, function(x) rnorm(x)) # cannot reproduce the result


set.seed(123, "L'Ecuyer")
== par() ==
mclapply(1:3, function(x) rnorm(x))
?par
mclapply(1:3, function(x) rnorm(x)) # results are not changed once we have run set.seed( , "L'Ecuyer")


set.seed(1234)                     # use set.seed() in order to get a new reproducible result
=== text size (cex) and font size on main, lab & axis ===
mclapply(1:3, function(x) rnorm(x))
* [https://www.statmethods.net/advgraphs/parameters.html Graphical Parameters] from statmethods.net.
mclapply(1:3, function(x) rnorm(x)) # results are not changed
* [https://designdatadecisions.wordpress.com/2015/06/09/graphs-in-r-overlaying-data-summaries-in-dotplots/ Overlaying Data Summaries in Dotplots]
</syntaxhighlight>


Note
Examples (default is 1 for each of them):
# [https://stackoverflow.com/questions/15070377/r-doesnt-reset-the-seed-when-lecuyer-cmrg-rng-is-used R doesn't reset the seed when “L'Ecuyer-CMRG” RNG is used?]
* cex.main=0.9
# Windows OS can not use mclapply(). The mclapply() implementation relies on forking and Windows does not support forking. mclapply from the parallel package is implemented as a serial function on Windows systems. The ''parallelsugar'' package was created based on the above idea.
* cex.sub
# Another choice for Windows OS is to use parLapply() function in parallel package.
* cex.lab=0.8, font.lab=2 (x/y axis labels)
# [https://stackoverflow.com/questions/17196261/understanding-the-differences-between-mclapply-and-parlapply-in-r Understanding the differences between mclapply and parLapply in R] You don't have to worry about '''reproducing''' your environment on each of the cluster workers if mclapply() is used. <syntaxhighlight lang='rsplus'>
* cex.axis=0.8, font.axis=2 (axis/tick text/labels)
ncores <- as.integer( Sys.getenv('NUMBER_OF_PROCESSORS') )
* col.axis="grey50"
cl <- makeCluster(getOption("cl.cores", ncores))
LLID2GOIDs2 <- parLapply(cl, rLLID, function(x) {
                                    library(org.Hs.eg.db); get("org.Hs.egGO")[[x]]}
                        )
stopCluster(cl)
</syntaxhighlight>It does work. Cut the computing time from 100 sec to 29 sec on 4 cores.


==== mclapply() vs foreach() ====
An quick example to increase font size ('''cex.lab''', '''cex.axis''', '''cex.main''') and line width ('''lwd''') in a line plot and '''cex''' & '''lwd''' in the legend.
https://stackoverflow.com/questions/44806048/r-mclapply-vs-foreach
<pre>
plot(x=x$mids, y=x$density, type="l",
    xlab="p-value", ylab="Density", lwd=2,
    cex.lab=1.5, cex.axis=1.5,
    cex.main=1.5, main = "")
lines(y$mids, y$density, lty=2, pwd=2)
lines(z$mids, z$density, lty=3, pwd=2)
legend('topright',legend = c('Method A','Method B','Method C'),
      lty=c(2,1,3), lwd=c(2,2,2), cex = 1.5, xjust = 0.5, yjust = 0.5)
</pre>


==== parallel vs doParallel package ====
ggplot2 case (default font size is [https://ggplot2.tidyverse.org/articles/faq-customising.html 11 points]):
* plot.title
* plot.subtitle
* axis.title.x, axis.title.y: (x/y axis labels)
* axis.text.x & axis.text.y: (axis/tick text/labels)
<pre>
ggplot(df, aes(x, y)) +
  geom_point() +
  labs(title = "Title", subtitle = "Subtitle", x = "X-axis", y = "Y-axis") +
  theme(plot.title = element_text(size = 20),
        plot.subtitle = element_text(size = 15),
        axis.title.x = element_text(size = 15),
        axis.title.y = element_text(size = 15),
        axis.text.x = element_text(size = 10),
        axis.text.y = element_text(size = 10))
</pre>


==== parallelsugar package ====
=== Default font ===
* http://edustatistics.org/nathanvan/2015/10/14/parallelsugar-an-implementation-of-mclapply-for-windows/
* [https://stat.ethz.ch/R-manual/R-devel/library/grDevices/html/png.html ?png].  The default font family is '''Arial''' on Windows and '''Helvetica''' otherwise.
* ''sans''. See [https://www.r-bloggers.com/2015/08/changing-the-font-of-r-base-graphic-plots/ Changing the font of R base graphic plots]
* [http://www.cookbook-r.com/Graphs/Fonts/ Fonts] from ''Cookbook for R''. It seems ggplot2 also uses '''sans''' as the default font.
* [https://www.r-bloggers.com/2021/07/using-different-fonts-with-ggplot2/ Using different fonts with ggplot2]
* [https://r-coder.com/plot-r/#Font_family R plot font family]
* [https://r-coder.com/custom-fonts-r/ Add custom fonts in R]


If we load parallelsugar, the default implementation of parallel::mclapply, which used fork based clusters, will be overwritten by parallelsugar::mclapply, which is implemented with socket clusters.  
=== layout ===
* [https://blog.rsquaredacademy.com/data-visualization-with-r-combining-plots/ Data Visualization with R - Combining Plots]
* http://datascienceplus.com/adding-text-to-r-plot/


<syntaxhighlight lang='rsplus'>
=== reset the settings ===
library(parallel)  
{{Pre}}
op <- par(mfrow=c(2,1), mar = c(5,7,4,2) + 0.1)
....
par(op) # mfrow=c(1,1), mar = c(5,4,4,2) + .1
</pre>


system.time( mclapply(1:4, function(xx){ Sys.sleep(10) }) )
=== mtext (margin text) vs title ===
##    user  system elapsed
* https://datascienceplus.com/adding-text-to-r-plot/
##    0.00    0.00  40.06
* https://datascienceplus.com/mastering-r-plot-part-2-axis/


library(parallelsugar)
=== mgp (axis tick label locations or axis title) ===
##
# The margin line (in ‘mex’ units) for the axis title, axis labels and axis line.  Note that ‘mgp[1]’ affects the axis ‘title’ whereas ‘mgp[2:3]’ affect tick mark labels.  The default is ‘c(3, 1, 0)’. If we like to make the axis labels closer to an axis, we can use mgp=c(1.5, .5, 0) for example.
## Attaching package: ‘parallelsugar’
#* the default is c(3,1,0) which specify the margin line for the '''axis title''', '''axis labels''' and '''axis line'''.
##
#* the axis title is drawn in the fourth line of the margin starting from the plot region, the axis labels are drawn in the second line and the axis line itself is the first line.
## The following object is masked from ‘package:parallel’:
# [https://www.r-bloggers.com/2010/06/setting-graph-margins-in-r-using-the-par-function-and-lots-of-cow-milk/ Setting graph margins in R using the par() function and lots of cow milk]
##
# [https://statisticsglobe.com/move-axis-label-closer-to-plot-in-base-r Move Axis Label Closer to Plot in Base R (2 Examples)]
##    mclapply
# http://rfunction.com/archives/1302 mgp – A numeric vector of length 3, which sets the axis label locations relative to the edge of the inner plot window. The first value represents the location the '''labels/axis title''' (i.e. xlab and ylab in plot), the second the '''tick-mark labels''', and third the '''tick marks'''. The default is c(3, 1, 0).


system.time( mclapply(1:4, function(xx){ Sys.sleep(10) }) )
=== move axis title closer to axis ===
##    user  system elapsed
* [https://r-charts.com/base-r/title/ Setting a title and a subtitle]. Default is around 1.7(?). [https://www.rdocumentation.org/packages/graphics/versions/3.6.2/topics/title ?title].
##    0.04    0.08  12.98
* [https://stackoverflow.com/a/30265996 move axis label closer to axis] '''title(, line)'''. This is useful when we use '''xaxt='n' ''' to hide the ticks and labels.
</syntaxhighlight>
<pre>
title(ylab="Within-cluster variance", line=0,
      cex.lab=1.2, family="Calibri Light")
</pre>


=== Regular Expression ===
=== pch and point shapes ===
See [[Regular_expression|here]].
[[:File:R pch.png]]


=== Clipboard (?connections) & textConnection() ===
See [https://www.statmethods.net/advgraphs/parameters.html here].
<syntaxhighlight lang='rsplus'>
source("clipboard")
read.table("clipboard")
</syntaxhighlight>


* On Windows, we can use readClipboard() and writeClipboard().
* Full circle: pch=16
* reading/writing clipboard method seems not quite stable on Linux/macOS. So the alternative is to use the [https://www.rdocumentation.org/packages/base/versions/3.5.0/topics/textConnection textConnection()] function:
* Display all possibilities: ggpubr::show_point_shapes()
<syntaxhighlight lang='rsplus'>
x <- read.delim(textConnection("<USE_KEYBOARD_TO_PASTE_FROM_CLIPBOARD>"))
</syntaxhighlight>


=== read/manipulate binary data ===
=== lty (line type) ===
* x <- readBin(fn, raw(), file.info(fn)$size)
[[:File:R lty.png]]
* rawToChar(x[1:16])
* See Biostrings C API


=== String Manipulation ===
[https://finnstats.com/index.php/2021/06/11/line-types-in-r-lty-for-r-baseplot-and-ggplot/ Line types in R: Ultimate Guide For R Baseplot and ggplot]
* [http://gastonsanchez.com/blog/resources/how-to/2013/09/22/Handling-and-Processing-Strings-in-R.html ebook] by Gaston Sanchez.
* [http://blog.revolutionanalytics.com/2018/06/handling-strings-with-r.html A guide to working with character data in R] (6/22/2018)
* Chapter 7 of the book 'Data Manipulation with R' by Phil Spector.
* Chapter 7 of the book 'R Cookbook' by Paul Teetor.
* Chapter 2 of the book 'Using R for Data Management, Statistical Analysis and Graphics' by Horton and Kleinman.
* http://www.endmemo.com/program/R/deparse.php. '''It includes lots of examples for each R function it lists.'''


=== HTTPs connection ===
See [http://www.sthda.com/english/wiki/line-types-in-r-lty here].
HTTPS connection becomes default in R 3.2.2. See  
* http://blog.rstudio.org/2015/08/17/secure-https-connections-for-r/
* http://blog.revolutionanalytics.com/2015/08/good-advice-for-security-with-r.html


[http://developer.r-project.org/blosxom.cgi/R-devel/2016/12/15#n2016-12-15 R 3.3.2 patched] The internal methods of ‘download.file()’ and ‘url()’ now report if they are unable to follow the redirection of a ‘http://’ URL to a ‘https://’ URL (rather than failing silently)
ggpubr::show_line_types()


=== setInternet2 ===
=== las (label style) ===
There was a bug in ftp downloading in R 3.2.2 (r69053) Windows though it is fixed now in R 3.2 patch.
0: The default, parallel to the axis


Read the [https://stat.ethz.ch/pipermail/r-devel/2015-August/071595.html discussion] reported on 8/8/2015. The error only happened on ftp not http connection. The final solution is explained in [https://stat.ethz.ch/pipermail/r-devel/2015-August/071623.html this post]. The following demonstrated the original problem.
1: Always horizontal <syntaxhighlight lang='r' inline>boxplot(y~x, las=1)</syntaxhighlight>
<pre>
url <- paste0("ftp://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/All/",
              "GCF_000001405.13.assembly.txt")
f1 <- tempfile()
download.file(url, f1)
</pre>
It seems the bug was fixed in R 3.2-branch. See [https://github.com/wch/r-source/commit/3a02ed3a50ba17d9a093b315bf5f31ffc0e21b89 8/16/2015] patch r69089 where a new argument INTERNET_FLAG_PASSIVE was added to [https://msdn.microsoft.com/en-us/library/windows/desktop/aa385098%28v=vs.85%29.aspx InternetOpenUrl()] function of [https://msdn.microsoft.com/en-us/library/windows/desktop/aa385473%28v=vs.85%29.aspx wininet] library. [http://slacksite.com/other/ftp.html This article] and [http://stackoverflow.com/questions/1699145/what-is-the-difference-between-active-and-passive-ftp this post] explain differences of active and passive FTP.


The following R command will show the exact svn revision for the R you are currently using.
2: Perpendicular to the axis
<pre>
R.Version()$"svn rev"
</pre>


If setInternet2(T), then https protocol is supported in download.file().
3: Always vertical


When setInternet(T) is enabled by default, download.file() does not work for ftp protocol (this is used in getGEO() function of the GEOquery package). If I use setInternet(F), download.file() works again for ftp protocol.  
=== oma (outer margin), xpd, common title for two plots, 3 types of regions, multi-panel plots ===
<ul>
<li>The following trick is useful when we want to draw multiple plots with a common title.
{{Pre}}
par(mfrow=c(1,2),oma = c(0, 0, 2, 0))  # oma=c(0, 0, 0, 0) by default
plot(1:10, main="Plot 1")
plot(1:100,  main="Plot 2")
mtext("Title for Two Plots", outer = TRUE, cex = 1.5) # outer=FALSE by default
</pre>
<li>[[PCA#Visualization|PCA plot]] example (the plot in the middle)
<li>For scatterplot3d() function, '''oma''' is not useful and I need to use '''xpd'''.
<li>[https://datascienceplus.com/mastering-r-plot-part-3-outer-margins/ Mastering R plot – Part 3: Outer margins] '''mtext()''' & '''par(xpd)'''.
<li>[https://www.rdocumentation.org/packages/graphics/versions/3.6.2/topics/par ?par] about '''xpd''' option
* If FALSE (default), all plotting is clipped to the plot region,
* If TRUE, all plotting is clipped to the figure region,
* If NA, all plotting is clipped to the device region.
<li>3 types of regions. See [https://www.benjaminbell.co.uk/2018/02/creating-multi-panel-plots-and-figures.html Creating multi-panel plots and figures using layout()] & [https://www.seehuhn.de/blog/122 publication-quality figures with R, part 2]
* plot region,
* figure region,
* device region.
<li>[https://www.benjaminbell.co.uk/2018/02/creating-multi-panel-plots-and-figures.html Creating multi-panel plots and figures using layout()] includes several tricks including creating a picture-in-picture plot.
</ul>
 
=== no.readonly ===
[https://www.zhihu.com/question/54116933 R语言里par(no.readonly=TURE)括号里面这个参数什么意思?], [https://www.jianshu.com/p/a716db5d30ef R-par()]
 
== Non-standard fonts in postscript and pdf graphics ==
https://cran.r-project.org/doc/Rnews/Rnews_2006-2.pdf#page=41


The setInternet2() function is defined in [https://github.com/wch/r-source/commits/trunk/src/library/utils/R/windows/sysutils.R R> src> library> utils > R > windows > sysutils.R].


'''R up to 3.2.2'''
== NULL, NA, NaN, Inf ==
https://tomaztsql.wordpress.com/2018/07/04/r-null-values-null-na-nan-inf/
 
== save()/load() vs saveRDS()/readRDS() vs dput()/dget() vs dump()/source() ==
# saveRDS() can only save one R object while save() does not have this constraint.
# saveRDS() doesn’t save the both the object and its name it just saves a representation of the object. As a result, the saved object can be loaded into a named object within R that is different from the name it had when originally serialized. See [http://www.fromthebottomoftheheap.net/2012/04/01/saving-and-loading-r-objects/ this post].
<pre>
<pre>
setInternet2 <- function(use = TRUE) .Internal(useInternet2(use))
x <- 5
saveRDS(x, "myfile.rds")
x2 <- readRDS("myfile.rds")
identical(mod, mod2, ignore.environment = TRUE)
</pre>
</pre>
See also
* <src/include/Internal.h> (declare do_setInternet2()),
* <src/main/names.c> (show do_setInternet2() in C)
* <src/main/internet.c>  (define do_setInternet2() in C).


Note that: setInternet2(T) becomes default in R 3.2.2. To revert to the previous default use setInternet2(FALSE). See the <doc/NEWS.pdf> file.  If we use setInternet2(F), then it solves the bug of getGEO() error. But it disables the https file download using the download.file() function. In R < 3.2.2, it is also possible to download from https by setIneternet2(T).
[https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/dput dput]: Writes an ASCII text representation of an R object. The object name is not written (unlike '''dump''').
{{Pre}}
$ data(pbc, package = "survival")
$ names(pbc)
$ dput(names(pbc))
c("id", "time", "status", "trt", "age", "sex", "ascites", "hepato",
"spiders", "edema", "bili", "chol", "albumin", "copper", "alk.phos",
"ast", "trig", "platelet", "protime", "stage")


'''R 3.3.0'''
> iris2 <- iris[1:2, ]
<pre>
> dput(iris2)
setInternet2 <- function(use = TRUE) {
structure(list(Sepal.Length = c(5.1, 4.9), Sepal.Width = c(3.5,
    if(!is.na(use)) stop("use != NA is defunct")
3), Petal.Length = c(1.4, 1.4), Petal.Width = c(0.2, 0.2), Species = structure(c(1L,
    NA
1L), .Label = c("setosa", "versicolor", "virginica"), class = "factor")), row.names = 1:2, class = "data.frame")
}
</pre>
</pre>


Note that setInternet2.Rd says As from \R 3.3.0 it changes nothing, and only \code{use = NA} is accepted. Also NEWS.Rd says setInternet2() has no effect and will be removed in due course.
=== User 'verbose = TRUE' in load() ===
When we use load(), it is helpful to add 'verbose =TRUE' to see what objects get loaded.
 
=== What are RDS files anyways ===
[https://www.statworx.com/de/blog/archive-existing-rds-files/ Archive Existing RDS Files]


=== read/download/source a file from internet ===
== [https://www.rdocumentation.org/packages/base/versions/3.5.0/topics/all.equal ==, all.equal(), identical()] ==
==== Simple text file http ====
* ==: exact match
<pre>
* '''all.equal''': compare R objects x and y testing ‘near equality’
retail <- read.csv("http://robjhyndman.com/data/ausretail.csv",header=FALSE)
* identical: The safe and reliable way to test two objects for being exactly equal.
{{Pre}}
x <- 1.0; y <- 0.99999999999
all.equal(x, y)
# [1] TRUE
identical(x, y)
# [1] FALSE
</pre>
</pre>


==== Zip file and url() function ====
Be careful about using "==" to return an index of matches in the case of data with missing values.
<pre>
<pre>
con = gzcon(url('http://www.systematicportfolio.com/sit.gz', 'rb'))
R> c(1,2,NA)[c(1,2,NA) == 1]
source(con)
[1]  1 NA
close(con)
R> c(1,2,NA)[which(c(1,2,NA) == 1)]
[1] 1
</pre>
</pre>
Here url() function is like file(),  gzfile(), bzfile(), xzfile(), unz(), pipe(), fifo(), socketConnection(). They are used to create connections. By default, the connection is not opened (except for ‘socketConnection’), but may be opened by setting a non-empty value of argument ‘open’. See ?url.


Another example of using url() is
See also the [http://cran.r-project.org/web/packages/testthat/index.html testhat] package.
<pre>
 
load(url("http:/www.example.com/example.RData"))
I found a case when I compare two objects where 1 is generated in ''Linux'' and the other is generated in ''macOS'' that identical() gives FALSE but '''all.equal()''' returns TRUE. The difference has a magnitude only e-17.
 
=== waldo ===
* https://waldo.r-lib.org/ or [https://cloud.r-project.org/web/packages/waldo/index.html CRAN]. Find and concisely describe the difference between a pair of R objects.
* [https://predictivehacks.com/how-to-compare-objects-in-r/ How To Compare Objects In R]
 
=== diffobj: Compare/Diff R Objects ===
https://cran.r-project.org/web/packages/diffobj/index.html
 
== testthat ==
* https://github.com/r-lib/testthat
* [http://www.win-vector.com/blog/2019/03/unit-tests-in-r/ Unit Tests in R]
* [https://davidlindelof.com/machine-learning-in-r-start-with-an-end-to-end-test/ Start with an End-to-End Test]
* [https://www.r-bloggers.com/2023/12/a-beautiful-mind-writing-testable-r-code/ A Beautiful Mind: Writing Testable R Code]
 
== tinytest ==
[https://cran.r-project.org/web/packages/tinytest/index.html tinytest]: Lightweight but Feature Complete Unit Testing Framework
 
[https://cran.r-project.org/web/packages/ttdo/index.html ttdo] adds support of the 'diffobj' package for 'diff'-style comparison of R objects.
 
== Numerical Pitfall ==
[http://bayesfactor.blogspot.com/2016/05/numerical-pitfalls-in-computing-variance.html Numerical pitfalls in computing variance]
{{Pre}}
.1 - .3/3
## [1] 0.00000000000000001388
</pre>
</pre>


==== [http://cran.r-project.org/web/packages/downloader/index.html downloader] package ====
== Sys.getpid() ==
This package provides a wrapper for the download.file function, making it possible to download files over https on Windows, Mac OS X, and other Unix-like platforms. The RCurl package provides this functionality (and much more) but can be difficult to install because it must be compiled with external dependencies. This package has no external dependencies, so it is much easier to install.
This can be used to monitor R process memory usage or stop the R process. See [https://stat.ethz.ch/pipermail/r-devel/2016-November/073360.html this post].


==== Google drive file based on https using [http://www.omegahat.org/RCurl/FAQ.html RCurl] package ====
== Sys.getenv() & make the script more portable ==
Replace all the secrets from the script and replace them with '''Sys.getenv("secretname")'''. You can save the secrets in an '''.Renviron''' file next to the script in the same project.
<pre>
<pre>
require(RCurl)
$ for v in 1 2; do MY=$v Rscript -e "Sys.getenv('MY')"; done
myCsv <- getURL("https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AkuuKBh0jM2TdGppUFFxcEdoUklCQlJhM2kweGpoUUE&single=true&gid=0&output=csv")
[1] "1"
read.csv(textConnection(myCsv))
[1] "2"
$ echo $MY
2
</pre>
</pre>


==== Google sheet file using [https://github.com/jennybc/googlesheets googlesheets] package ====
== How to write R codes ==
[http://www.opiniomics.org/reading-data-from-google-sheets-into-r/ Reading data from google sheets into R]
* [https://youtu.be/7oyiPBjLAWY Code smells and feels] from R Consortium
** write simple conditions,
** handle class properly,
** return and exit early,
** polymorphism,
** switch() [e.g., switch(var, value1=out1, value2=out2, value3=out3). Several examples in [https://github.com/cran/glmnet/blob/master/R/assess.glmnet.R#L103 glmnet] ]
** case_when(),
** %||%.
* [https://appsilon.com/write-clean-r-code/ 5 Tips for Writing Clean R Code] – Leave Your Code Reviewer Commentless
** Comments
** Strings
** Loops
** Code Sharing
**Good Programming Practices
 
== How to debug an R code ==
[[Debug#R|Debug R]]
 
== Locale bug (grep did not handle UTF-8 properly PR#16264) ==
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16264
 
== Path length in dir.create() (PR#17206) ==
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17206 (Windows only)


==== Github files https using RCurl package ====
== install.package() error, R_LIBS_USER is empty in R 3.4.1 & .libPaths() ==
* http://support.rstudio.org/help/kb/faq/configuring-r-to-use-an-http-proxy
* https://support.rstudio.com/hc/en-us/community/posts/115008369408-Since-update-to-R-3-4-1-R-LIBS-USER-is-empty and http://r.789695.n4.nabble.com/R-LIBS-USER-on-Ubuntu-16-04-td4740935.html. Modify '''/etc/R/Renviron''' (if you have a sudo right) by uncomment out line 43.
* http://tonybreyal.wordpress.com/2011/11/24/source_https-sourcing-an-r-script-from-github/
<pre>
<pre>
x = getURL("https://gist.github.com/arraytools/6671098/raw/c4cb0ca6fe78054da8dbe253a05f7046270d5693/GeneIDs.txt",
R_LIBS_USER=${R_LIBS_USER-'~/R/x86_64-pc-linux-gnu-library/3.4'}
            ssl.verifypeer = FALSE)
read.table(text=x)
</pre>
</pre>
* [http://cran.r-project.org/web/packages/gistr/index.html gistr] package
* https://stackoverflow.com/questions/44873972/default-r-personal-library-location-is-null. Modify '''$HOME/.Renviron''' by adding a line
 
=== Create publication tables using '''tables''' package ===
See p13 for example in http://www.ianwatson.com.au/stata/tabout_tutorial.pdf
 
R's [http://cran.r-project.org/web/packages/tables/index.html tables] packages is the best solution. For example,
<pre>
<pre>
> library(tables)
R_LIBS_USER="${HOME}/R/${R_PLATFORM}-library/3.4"
> tabular( (Species + 1) ~ (n=1) + Format(digits=2)*
</pre>
+          (Sepal.Length + Sepal.Width)*(mean + sd), data=iris )
* http://stat.ethz.ch/R-manual/R-devel/library/base/html/libPaths.html. Play with .libPaths()
                                                 
 
                Sepal.Length      Sepal.Width   
On Mac & R 3.4.0 (it's fine)
Species    n  mean        sd  mean        sd 
{{Pre}}
setosa      50 5.01        0.35 3.43        0.38
> Sys.getenv("R_LIBS_USER")
versicolor  50 5.94        0.52 2.77        0.31
[1] "~/Library/R/3.4/library"
virginica  50 6.59        0.64 2.97        0.32
> .libPaths()
All        150 5.84        0.83 3.06        0.44
[1] "/Library/Frameworks/R.framework/Versions/3.4/Resources/library"
> str(iris)
'data.frame':  150 obs. of  5 variables:
$ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species    : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
</pre>
and
<pre>
# This example shows some of the less common options       
> Sex <- factor(sample(c("Male", "Female"), 100, rep=TRUE))
> Status <- factor(sample(c("low", "medium", "high"), 100, rep=TRUE))
> z <- rnorm(100)+5
> fmt <- function(x) {
  s <- format(x, digits=2)
  even <- ((1:length(s)) %% 2) == 0
  s[even] <- sprintf("(%s)", s[even])
  s
}
> tabular( Justify(c)*Heading()*z*Sex*Heading(Statistic)*Format(fmt())*(mean+sd) ~ Status )
                  Status             
Sex    Statistic high  low    medium
Female mean      4.88  4.96  5.17
        sd        (1.20) (0.82) (1.35)
Male  mean      4.45  4.31  5.05
        sd        (1.01) (0.93) (0.75)
</pre>
</pre>


See also a collection of R packages related to reproducible research in http://cran.r-project.org/web/views/ReproducibleResearch.html
On Linux & R 3.3.1 (ARM)
{{Pre}}
> Sys.getenv("R_LIBS_USER")
[1] "~/R/armv7l-unknown-linux-gnueabihf-library/3.3"
> .libPaths()
[1] "/home/$USER/R/armv7l-unknown-linux-gnueabihf-library/3.3"
[2] "/usr/local/lib/R/library"
</pre>


=== Tabulizer- extracting tables from PDFs ===
On Linux & R 3.4.1 (*Problematic*)
[http://datascienceplus.com/extracting-tables-from-pdfs-in-r-using-the-tabulizer-package/ extracting Tables from PDFs in R]
{{Pre}}
> Sys.getenv("R_LIBS_USER")
[1] ""
> .libPaths()
[1] "/usr/local/lib/R/site-library" "/usr/lib/R/site-library"
[3] "/usr/lib/R/library"
</pre>
 
I need to specify the '''lib''' parameter when I use the '''install.packages''' command.
{{Pre}}
> install.packages("devtools", "~/R/x86_64-pc-linux-gnu-library/3.4")
> library(devtools)
Error in library(devtools) : there is no package called 'devtools'
 
# Specify lib.loc parameter will not help with the dependency package
> library(devtools, lib.loc = "~/R/x86_64-pc-linux-gnu-library/3.4")
Error: package or namespace load failed for 'devtools':
.onLoad failed in loadNamespace() for 'devtools', details:
  call: loadNamespace(name)
  error: there is no package called 'withr'


=== Create flat tables in R console using ftable() ===
# A solution is to redefine .libPaths
<pre>
> .libPaths(c("~/R/x86_64-pc-linux-gnu-library/3.4", .libPaths()))
> ftable(Titanic, row.vars = 1:3)
> library(devtools) # Works
                  Survived  No Yes
</pre>
Class Sex    Age                 
 
1st  Male  Child            0  5
A better solution is to specify R_LIBS_USER in '''~/.Renviron''' file or '''~/.bash_profile'''; see [http://stat.ethz.ch/R-manual/R-patched/library/base/html/Startup.html ?Startup].
            Adult          118  57
 
      Female Child            0  1
== Using external data from within another package ==
            Adult            4 140
https://logfc.wordpress.com/2017/03/02/using-external-data-from-within-another-package/
2nd  Male  Child            0  11
 
            Adult          154  14
== How to run R scripts from the command line ==
      Female Child            0  13
https://support.rstudio.com/hc/en-us/articles/218012917-How-to-run-R-scripts-from-the-command-line
            Adult          13  80
 
3rd  Male  Child          35  13
== How to exit a sourced R script ==
            Adult          387  75
* [http://stackoverflow.com/questions/25313406/how-to-exit-a-sourced-r-script How to exit a sourced R script]
      Female Child          17  14
* [http://r.789695.n4.nabble.com/Problem-using-the-source-function-within-R-functions-td907180.html Problem using the source-function within R-functions] ''' ''The best way to handle the generic sort of problem you are describing is to take those source'd files, and rewrite their content as functions to be called from your other functions.'' '''
            Adult          89  76
* ‘source()’ and ‘example()’ have a new optional argument ‘catch.aborts’ which allows continued evaluation of the R code after an error. [https://developer.r-project.org/blosxom.cgi/R-devel/2023/10/11 4-devel] 2023/10/11.
Crew  Male  Child            0  0
 
            Adult          670 192
== Decimal point & decimal comma ==
      Female Child            0  0
Countries using Arabic numerals with decimal comma (Austria, Belgium, Brazil France, Germany, Netherlands, Norway, South Africa, Spain, Sweden, ...) https://en.wikipedia.org/wiki/Decimal_mark
            Adult            3  20
 
> ftable(Titanic, row.vars = 1:2, col.vars = "Survived")
== setting seed locally (not globally) in R ==
            Survived  No Yes
https://stackoverflow.com/questions/14324096/setting-seed-locally-not-globally-in-r
Class Sex                   
 
1st   Male            118  62
== R's internal C API ==
      Female            4 141
https://github.com/hadley/r-internals
2nd   Male            154  25
 
      Female          13  93
== cleancall package for C resource cleanup ==
3rd  Male            422  88
[https://www.tidyverse.org/articles/2019/05/resource-cleanup-in-c-and-the-r-api/ Resource Cleanup in C and the R API]
      Female          106  90
 
Crew  Male            670 192
== Random number generator ==
      Female            3  20
* https://cran.r-project.org/doc/manuals/R-exts.html#Random-numbers
> ftable(Titanic, row.vars = 2:1, col.vars = "Survived")
* [https://stackoverflow.com/a/14555220 C code from R with .C(): random value is the same every time]
            Survived  No Yes
* [https://arxiv.org/pdf/2003.08009v2.pdf Random number generators produce collisions: Why, how many and more] Marius Hofert 2020 and the published paper in [https://www.tandfonline.com/doi/full/10.1080/00031305.2020.1782261 American Statistician] (including R code).
Sex    Class               
* R package examples. [https://github.com/cran/party/blob/5ddbd382f01fef2ab993401b43d1fc78d0b061fb/src/RandomForest.c party] package.
Male  1st            118  62
 
      2nd            154  25
{{Pre}}
      3rd            422  88
#include <R.h>
      Crew          670 192
 
Female 1st              4 141
void myunif(){
      2nd            13  93
   GetRNGstate();
      3rd            106  90
  double u = unif_rand();
      Crew            3  20
  PutRNGstate();
> str(Titanic)
   Rprintf("%f\n",u);
table [1:4, 1:2, 1:2, 1:2] 0 0 35 0 0 0 17 0 118 154 ...
}
- attr(*, "dimnames")=List of 4
</pre>
  ..$ Class  : chr [1:4] "1st" "2nd" "3rd" "Crew"
 
  ..$ Sex    : chr [1:2] "Male" "Female"
<pre>
  ..$ Age    : chr [1:2] "Child" "Adult"
$ R CMD SHLIB r_rand.c
  ..$ Survived: chr [1:2] "No" "Yes"
$ R
> x <- ftable(mtcars[c("cyl", "vs", "am", "gear")])
R> dyn.load("r_rand.so")
> x
R> set.seed(1)
          gear  3  4  5
R> .C("myunif")
cyl vs am             
0.265509
4  0  0        0  0  0
list()
      1        0  0  1
R> .C("myunif")
    1  0        1  2  0
0.372124
      1        0  6  1
list()
6  0  0        0  0  0
R> set.seed(1)
      1        0  2 1
R> .C("myunif")
    1  0        2 2  0
0.265509
      1        0  0  0
list()
8   0  0      12  0  0
</pre>
      1        0  0  2
 
    1  0        0  0  0
=== Test For Randomness ===
      1        0  0  0
* [https://predictivehacks.com/how-to-test-for-randomness/ How To Test For Randomness]
> ftable(x, row.vars = c(2, 4))
* [https://www.r-bloggers.com/2021/08/test-for-randomness-in-r-how-to-check-dataset-randomness/ Test For Randomness in R-How to check Dataset Randomness]
        cyl  4    6    8  
 
        am  0  1  0  1  0  1
== Different results in Mac and Linux ==
vs gear                     
=== Random numbers: multivariate normal ===
0  3        0  0  0  0 12  0
Why [https://www.rdocumentation.org/packages/MASS/versions/7.3-49/topics/mvrnorm MASS::mvrnorm()] gives different result on Mac and Linux/Windows?
  4        0  0  0  2  0  0
 
  5        0  1  0  1  0  2
The reason could be the covariance matrix decomposition - and that may be due to the LAPACK/BLAS libraries. See
3         1  0  2  0  0  0
* https://stackoverflow.com/questions/11567613/different-random-number-generation-between-os
  4        2  6 2 0  0  0
* https://stats.stackexchange.com/questions/149321/generating-and-working-with-random-vectors-in-r
  5        0  1  0  0  0  0
<ul>
>
<li>[https://stats.stackexchange.com/questions/61719/cholesky-versus-eigendecomposition-for-drawing-samples-from-a-multivariate-norma Cholesky versus eigendecomposition for drawing samples from a multivariate normal distribution]
> ## Start with expressions, use table()'s "dnn" to change labels
 
> ftable(mtcars$cyl, mtcars$vs, mtcars$am, mtcars$gear, row.vars = c(2, 4),
See [https://gist.github.com/arraytools/0d7f0a02c233aefb9cefc6eb5f7b7754 this example]. A little more investigation shows the eigen values differ a little bit on macOS and Linux. See [https://gist.github.com/arraytools/0d7f0a02c233aefb9cefc6eb5f7b7754#file-mvtnorm_debug-r here].
        dnn = c("Cylinders", "V/S", "Transmission", "Gears"))
</li>
</ul>
 
== rle() running length encoding ==
* https://en.wikipedia.org/wiki/Run-length_encoding
* [https://masterr.org/r/how-to-find-consecutive-repeats-in-r/ How to Find Consecutive Repeats in R]
* [https://www.r-bloggers.com/r-function-of-the-day-rle-2/amp/ R Function of the Day: rle]
* [https://blogs.reed.edu/ed-tech/2015/10/creating-nice-tables-using-r-markdown/ Creating nice tables using R Markdown]
* https://rosettacode.org/wiki/Run-length_encoding
* R's [https://www.rdocumentation.org/packages/base/versions/3.5.2/topics/rle base::rle()] function
* R's [https://www.rdocumentation.org/packages/S4Vectors/versions/0.10.2/topics/Rle-class Rle class] from S4Vectors package which was used in for example [http://genomicsclass.github.io/book/pages/iranges_granges.html IRanges/GRanges/GenomicRanges] package
 
== citation() ==
{{Pre}}
citation()
citation("MASS")
toBibtex(citation())
</pre>
[https://www.r-bloggers.com/2024/05/notes-on-citing-r-and-r-packages/ Notes on Citing R and R Packages] with examples.
 
== R not responding request to interrupt stop process ==
[https://stackoverflow.com/a/43172530 R not responding request to interrupt stop process]. ''R is executing (for example) a C / C++ library call that doesn't provide R an opportunity to check for interrupts.'' It seems to match with the case I'm running (''dist()'' function).
 
== Monitor memory usage ==
* x <- rnorm(2^27) will create an object of the size 1GB (2^27*8/2^20=1024 MB).
* Windows: memory.size(max=TRUE)
* Linux
** RStudio: '''htop -p PID''' where PID is the process ID of ''/usr/lib/rstudio/bin/rsession'', not ''/usr/lib/rstudio/bin/rstudio''. This is obtained by running ''x <- rnorm(2*1e8)''. The object size can be obtained through ''print(object.size(x), units = "auto")''. Note that 1e8*8/2^20 = 762.9395.
** R: '''htop -p PID''' where PID is the process ID of ''/usr/lib/R/bin/exec/R''. Alternatively, use '''htop -p `pgrep -f /usr/lib/R/bin/exec/R`'''
** To find the peak memory usage '''grep VmPeak /proc/$PID/status'''
* '''mem_used()''' function from [https://cran.r-project.org/web/packages/pryr/index.html pryr] package. It is not correct or useful if I use it to check the value compared to the memory returned by '''jobload''' in biowulf. So I cannot use it to see the memory used in running mclapply().
* [https://cran.r-project.org/web/packages/peakRAM/index.html peakRAM]: Monitor the Total and Peak RAM Used by an Expression or Function
* [https://www.zxzyl.com/archives/1456/ Error: protect () : protection stack overflow] and [https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/Memory ?Memory]
 
References:
* [https://unix.stackexchange.com/questions/554/how-to-monitor-cpu-memory-usage-of-a-single-process How to monitor CPU/memory usage of a single process?]. ''htop -p $PID'' is recommended. It only shows the percentage of memory usage.
* [https://stackoverflow.com/questions/774556/peak-memory-usage-of-a-linux-unix-process '''Peak''' memory usage of a linux/unix process] ''grep VmPeak /proc/$PID/status'' is recommended.
* [https://serverfault.com/a/264856 How can I see the memory usage of a Linux process?] ''pmap $PID | tail -n 1'' is recommended. It shows the memory usage in absolute value (eg 1722376K).
* [https://stackoverflow.com/a/6457769 How to check the amount of RAM in R] '''memfree <- as.numeric(system("awk '/MemFree/ {print $2}' /proc/meminfo", intern=TRUE)); memfree '''
 
== Monitor Data ==
[https://www.jstatsoft.org/article/view/v098i01?s=09 Monitoring Data in R with the lumberjack Package]
 
== Pushover ==
[https://rud.is/b/2020/01/29/monitoring-website-ssl-tls-certificate-expiration-times-with-r-openssl-pushoverr-and-dt/ Monitoring Website SSL/TLS Certificate Expiration Times with R, {openssl}, {pushoverr}, and {DT}]


          Cylinders    4    6    8 
[https://cran.r-project.org/web/packages/pushoverr/ pushoverr]
          Transmission  0  1  0  1  0  1
V/S Gears                             
0  3                  0  0  0  0 12  0
    4                  0  0  0  2  0  0
    5                  0  1  0  1  0  2
1  3                  1  0  2  0  0  0
    4                  2  6  2  0  0  0
    5                  0  1  0  0  0  0
</pre>


=== tracemem, data type, copy ===
= Resource =
[http://stackoverflow.com/questions/18359940/r-programming-vector-a1-2-avoid-copying-the-whole-vector/18361181#18361181 How to avoid copying a long vector]
== Books ==
 
* [https://forwards.github.io/rdevguide/ R Development Guide] R Contribution Working Group
=== Tell if the current R is running in 32-bit or 64-bit mode ===
* [https://rviews.rstudio.com/2021/11/04/bookdown-org/ An R Community Public Library] 2011-11-04
<pre>
* A list of recommended books http://blog.revolutionanalytics.com/2015/11/r-recommended-reading.html
8 * .Machine$sizeof.pointer
* [http://statisticalestimation.blogspot.com/2016/11/learning-r-programming-by-reading-books.html Learning R programming by reading books: A book list]
</pre>
* [http://www.stats.ox.ac.uk/pub/MASS4/ Modern Applied Statistics with S] by William N. Venables and Brian D. Ripley
where '''sizeof.pointer''' returns the number of *bytes* in a C SEXP type and '8' means number of bits per byte.
* [http://dirk.eddelbuettel.com/code/rcpp.html Seamless R and C++ Integration with Rcpp] by Dirk Eddelbuettel
* [http://www.amazon.com/Advanced-Chapman-Hall-CRC-Series/dp/1466586966/ref=pd_sim_b_6?ie=UTF8&refRID=0C98YDK5MRSTRY0ZX1DB Advanced R] by Hadley Wickham 2014
** http://brettklamer.com/diversions/statistical/compile-hadleys-advanced-r-programming-to-a-pdf/ Compile Hadley's Advanced R to a PDF
* [https://b-rodrigues.github.io/fput/ Functional programming and unit testing for data munging with R] by Bruno Rodrigues
* [http://www.amazon.com/Cookbook-OReilly-Cookbooks-Paul-Teetor/dp/0596809158/ref=pd_sim_b_3?ie=UTF8&refRID=0C98YDK5MRSTRY0ZX1DB R Cookbook] by Paul Teetor
* [http://www.amazon.com/Machine-Learning-R-Brett-Lantz/dp/1782162143/ref=pd_sim_b_13?ie=UTF8&refRID=1851BAX3M17CK00VSMA6 Machine Learning with R] by Brett Lantz
* [http://www.amazon.com/Everyone-Advanced-Analytics-Graphics-Addison-Wesley/dp/0321888030/ref=pd_sim_b_3?ie=UTF8&refRID=1851BAX3M17CK00VSMA6 R for Everyone] by [http://www.jaredlander.com/r-for-everyone/ Jared P. Lander]
* [http://www.amazon.com/The-Art-Programming-Statistical-Software/dp/1593273843/ref=pd_sim_b_2?ie=UTF8&refRID=1851BAX3M17CK00VSMA6 The Art of R Programming] by Norman Matloff
* [http://www.amazon.com/Applied-Predictive-Modeling-Max-Kuhn/dp/1461468485/ref=pd_sim_b_3?ie=UTF8&refRID=0H3NMWX7KTRAEB32902Q Applied Predictive Modeling] by Max Kuhn
* [http://www.amazon.com/R-Action-Robert-Kabacoff/dp/1935182390/ref=pd_sim_b_17?ie=UTF8&refRID=0H3NMWX7KTRAEB32902Q R in Action] by Robert Kabacoff
* [http://www.amazon.com/The-Book-Michael-J-Crawley/dp/0470973927/ref=pd_sim_b_6?ie=UTF8&refRID=0CNF2XK8VBGF5A6W3NE3 The R Book] by Michael J. Crawley
* Regression Modeling Strategies, with Applications to Linear Models, Survival Analysis and Logistic Regression by Frank E. Harrell
* Data Manipulation with R by Phil Spector
* [https://www.datanovia.com/en/courses/data-manipulation-in-r/ DATA MANIPULATION IN R] by ALBOUKADEL KASSAMBARA
* [https://rviews.rstudio.com/2017/05/19/efficient_r_programming/ Review of Efficient R Programming]
* [http://r-pkgs.had.co.nz/ R packages: Organize, Test, Document, and Share Your Code] by Hadley Wicklam 2015
* [http://tidytextmining.com/ Text Mining with R: A Tidy Approach] and a [http://pacha.hk/2017-05-20_text_mining_with_r.html blog]
<ul>
<li>[https://github.com/csgillespie/efficientR Efficient R programming] by Colin Gillespie and Robin Lovelace. It works to re-create the html version of the book if we follow their simple instruction in the [https://csgillespie.github.io/efficientR/building-the-book-from-source.html Appendix]. Note that pdf version has advantages of expected output (mathematical notations, tables) over the epub version.
{{Pre}}
# R 3.4.1
.libPaths(c("~/R/x86_64-pc-linux-gnu-library/3.4", .libPaths()))
setwd("/tmp/efficientR/")
bookdown::render_book("index.Rmd", output_format = "bookdown::pdf_book")
# generated pdf file is located _book/_main.pdf


=== 32- and 64-bit ===
bookdown::render_book("index.Rmd", output_format = "bookdown::epub_book")
See [http://cran.r-project.org/doc/manuals/R-admin.html#Choosing-between-32_002d-and-64_002dbit-builds R-admin.html].
# generated epub file is located _book/_main.epub.
* For speed you may want to use a 32-bit build, but to handle large datasets a 64-bit build.
# This cannot be done in RStudio ("parse_dt" not resolved from current namespace (lubridate))
* Even on 64-bit builds of R there are limits on the size of R objects, some of which stem from the use of 32-bit integers (especially in FORTRAN code). For example, the dimensionas of an array are limited to 2^31 -1.
# but it is OK to run in an R terminal
* Since R 2.15.0, it is possible to select '64-bit Files' from the standard installer even on a 32-bit version of Windows (2012/3/30).
</pre>
</li>
</ul>
* [https://learningstatisticswithr.com/book/ Learning statistics with R: A tutorial for psychology students and other beginners] by Danielle Navarro
* [https://rstats.wtf/ What They Forgot to Teach You About R] Jennifer Bryan & Jim Hester
* [http://knosof.co.uk/ESEUR/ Evidence-based Software Engineering] by Derek M. Jones
* [https://www.bigbookofr.com/index.html Big Book of R]
* [https://epirhandbook.com/?s=09 R for applied epidemiology and public health]
* [http://bendixcarstensen.com/EwR/ Epidemiology with R] and the [https://cran.r-project.org/web/packages/Epi/ Epi] package. [https://rdrr.io/cran/Epi/man/ci.lin.html ci.lin()] function to return the CI from glm() fit.
* [https://education.rstudio.com/learn/ RStudio &rarr; Finding Your Way To R]. Beginners/Intermediates/Experts
* [https://deepr.gagolewski.com/index.html Deep R Programming]


=== Handling length 2^31 and more in R 3.0.0 ===
== Videos ==
* [https://www.infoworld.com/article/3411819/do-more-with-r-video-tutorials.html “Do More with R” video tutorials]. Search for R video tutorials by task, topic, or package. Most videos are shorter than 10 minutes.
* [https://www.youtube.com/@RLadiesGlobal/videos R-Ladies Global] (youtube)


From R News for 3.0.0 release:
=== Webinar ===
* [https://www.rstudio.com/resources/webinars/ RStudio] & its [https://github.com/rstudio/webinars github] repository


''There is a subtle change in behaviour for numeric index values 2^31 and larger. These never used to be legitimate and so were treated as NA, sometimes with a warning. They are now legal for long vectors so there is no longer a warning, and x[2^31] <- y will now extend the vector on a 64-bit platform and give an error on a 32-bit one.  
== useR! ==
''
* http://blog.revolutionanalytics.com/2017/07/revisiting-user2017.html
* [https://www.youtube.com/watch?v=JacpQdj1Vfc&list=PL4IzsxWztPdnyAKQQLxA4ucpaCLdsKvZw UseR 2018 workshop and tutorials]
* [http://www.user2019.fr/ UseR! 2019], [https://github.com/sowla/useR2019-materials tutorial], [https://www.mango-solutions.com/blog/user2019-roundup-workflow-reproducibility-and-friends Better workflow]
* [https://www.youtube.com/channel/UC_R5smHVXRYGhZYDJsnXTwg/playlists UseR! 2020 & 2021]
* [https://rviews.rstudio.com/2021/09/09/a-guide-to-binge-watching-r-medicine/ A Guide to Binge Watching R / Medicine 2021]
* [https://t.co/QBZwNoPJsC UseR! 2022]


In R 2.15.2, if I try to assign a vector of length 2^31, I will get an error
== R consortium ==
<pre>
https://www.youtube.com/channel/UC_R5smHVXRYGhZYDJsnXTwg/featured
> x <- seq(1, 2^31)
Error in from:to : result would be too long a vector
</pre>


However, for R 3.0.0 (tested on my 64-bit Ubuntu with 16GB RAM. The R was compiled by myself):
== Blogs, Tips, Socials, Communities ==
<pre>
* Google: revolutionanalytics In case you missed it
> system.time(x <- seq(1,2^31))
* [http://r4stats.com/articles/why-r-is-hard-to-learn/ Why R is hard to learn] by Bob Musenchen.
  user  system elapsed
* [http://onetipperday.sterding.com/2016/02/my-15-practical-tips-for.html My 15 practical tips for a bioinformatician]
  8.604  11.060 120.815
* [http://blog.revolutionanalytics.com/2017/06/r-community.html The R community is one of R's best features]
> length(x)
* [https://hbctraining.github.io/main/ Bioinformatics Training at the Harvard Chan Bioinformatics Core]
[1] 2147483648
* The R Blog <s>https://developer.r-project.org/Blog/public/</s> https://blog.r-project.org/
> length(x)/2^20
* [https://www.dataquest.io/blog/top-tips-for-learning-r-from-africa-rs-shelmith-kariuki/ Top Tips for Learning R from Africa R’s Shelmith Kariuki]
[1] 2048
* [https://smach.github.io/R4JournalismBook/HowDoI.html How Do I? …(do that in R)] by Sharon Machlis
> gc()
* [https://www.t4rstats.com/ Twitter for R programmers]
            used    (Mb) gc trigger    (Mb)  max used    (Mb)
Ncells    183823    9.9    407500    21.8    350000    18.7
Vcells 2147764406 16386.2 2368247221 18068.3 2148247383 16389.9
>
</pre>
Note:
# 2^31 length is about 2 Giga length. It takes about 16 GB (2^31*8/2^20 MB) memory.
# On Windows, it is almost impossible to work with 2^31 length of data if the memory is less than 16 GB because virtual disk on Windows does not work well. For example, when I tested on my 12 GB Windows 7, the whole Windows system freezes for several minutes before I force to power off the machine.
# My slide in http://goo.gl/g7sGX shows the screenshots of running the above command on my Ubuntu and RHEL machines. As you can see the linux is pretty good at handling large (> system RAM) data. That said, as long as your linux system is 64-bit, you can possibly work on large data without too much pain.
# For large dataset, it makes sense to use database or specially crafted packages like [http://cran.r-project.org/web/packages/bigmemory/ bigmemory] or [http://cran.r-project.org/web/packages/ff/ ff] or [https://privefl.github.io/bigstatsr/ bigstatsr].


=== NA in index ===
== Bug Tracking System ==
* Question: what is seq(1, 3)[c(1, 2, NA)]?
https://bugs.r-project.org/bugzilla3/ and [https://bugs.r-project.org/bugzilla3/query.cgi Search existing bug reports]. Remember to select 'All' in the Status drop-down list.


Answer: It will reserve the element with NA in indexing and return the value NA for it.
Use '''sessionInfo()'''.
 
* Question: What is TRUE & NA?
Answer: NA
 
* Question: What is FALSE & NA?
Answer: FALSE
 
* Question: c("A", "B", NA) != "" ?
Answer: TRUE TRUE NA
 
* Question: which(c("A", "B", NA) != "") ?
Answer: 1 2
 
* Question: c(1, 2, NA) != "" & !is.na(c(1, 2, NA)) ?
Answer: TRUE TRUE FALSE
 
* Question: c("A", "B", NA) != "" & !is.na(c("A", "B", NA)) ?
Answer: TRUE TRUE FALSE
 
'''Conclusion''': In order to exclude empty or NA for numerical or character data type, we can use '''which()''' or a convenience function '''keep.complete(x) <- function(x) x != "" & !is.na(x)'''. This will guarantee return logical values and not contain NAs.
 
Don't just use x != "" OR !is.na(x).
 
=== Constant ===
Add 'L' after a constant. For example,
<syntaxhighlight lang='rsplus'>
for(i in 1L:n) { }
 
if (max.lines > 0L) { }
 
label <- paste0(n-i+1L, ": ")
 
n <- length(x);  if(n == 0L) { }
</syntaxhighlight>
 
=== Data frame ===
* http://blog.datacamp.com/15-easy-solutions-data-frame-problems-r/
 
=== stringsAsFactors = FALSE ===
http://www.win-vector.com/blog/2018/03/r-tip-use-stringsasfactors-false/
 
=== data.frame to vector ===
<syntaxhighlight lang='rsplus'>
> a= matrix(1:6, 2,3)
> rownames(a) <- c("a", "b")
> colnames(a) <- c("x", "y", "z")
> a
  x y z
a 1 3 5
b 2 4 6
> unlist(data.frame(a))
x1 x2 y1 y2 z1 z2
1  2  3  4  5  6
</syntaxhighlight>
 
=== matrix vs data.frame ===
<pre>
ip1 <- installed.packages()[,c(1,3:4)] # class(ip1) = 'matrix'
unique(ip1$Priority)
# Error in ip1$Priority : $ operator is invalid for atomic vectors
unique(ip1[, "Priority"])  # OK
 
ip2 <- as.data.frame(installed.packages()[,c(1,3:4)], stringsAsFactors = FALSE) # matrix -> data.frame
unique(ip2$Priority)    # OK
</pre>
 
=== Print a vector by suppressing names ===
Use '''unname'''.
 
=== format.pval ===
<syntaxhighlight lang='rsplus'>
> args(format.pval)
function (pv, digits = max(1L, getOption("digits") - 2L), eps = .Machine$double.eps,
    na.form = "NA", ...)
 
> format.pval(c(stats::runif(5), pi^-100, NA))
[1] "0.19571" "0.46793" "0.71696" "0.93200" "0.74485" "< 2e-16" "NA"   
> format.pval(c(0.1, 0.0001, 1e-27))
[1] "1e-01"  "1e-04"  "<2e-16"
</syntaxhighlight>
 
=== sprintf ===
==== Format number as fixed width, with leading zeros ====
* https://stackoverflow.com/questions/8266915/format-number-as-fixed-width-with-leading-zeros
* https://stackoverflow.com/questions/14409084/pad-with-leading-zeros-to-common-width?rq=1
 
<syntaxhighlight lang='rsplus'>
# sprintf()
a <- seq(1,101,25)
sprintf("name_%03d", a)
[1] "name_001" "name_026" "name_051" "name_076" "name_101"
 
# formatC()
paste("name", formatC(a, width=3, flag="0"), sep="_")
[1] "name_001" "name_026" "name_051" "name_076" "name_101"
</syntaxhighlight>
 
==== sprintf does not print ====
Use cat() or print() outside sprintf(). sprintf() do not print in a non interactive mode.
<syntaxhighlight lang='rsplus'>
cat(sprintf('%5.2f\t%i\n',1.234, l234))
</syntaxhighlight>
 
=== Creating publication quality graphs in R ===
* http://teachpress.environmentalinformatics-marburg.de/2013/07/creating-publication-quality-graphs-in-r-7/
 
=== HDF5 : Hierarchical Data Format===
HDF5 is an open binary file format for storing and managing large, complex datasets. The file format was developed by the HDF Group, and is widely used in scientific computing.
 
* https://en.wikipedia.org/wiki/Hierarchical_Data_Format
* [https://support.hdfgroup.org/HDF5/ HDF5 tutorial] and others
* [http://www.bioconductor.org/packages/release/bioc/html/rhdf5.html rhdf5] package
* rhdf5 is used by [http://amp.pharm.mssm.edu/archs4/data.html ARCHS4] where you can download R program that will download hdf5 file storing expression and metadata such as gene ID, sample/GSM ID, tissues, et al.
 
<syntaxhighlight lang='rsplus'>
> h5ls(destination_file)
  group                          name      otype  dclass          dim
0      /                          data  H5I_GROUP                     
1  /data                    expression H5I_DATASET INTEGER 35238 x 65429
2      /                          info  H5I_GROUP                     
3  /info                        author H5I_DATASET  STRING            1
4  /info                        contact H5I_DATASET  STRING            1
5  /info                  creation-date H5I_DATASET  STRING            1
6  /info                            lab H5I_DATASET  STRING            1
7  /info                        version H5I_DATASET  STRING            1
8      /                          meta  H5I_GROUP                     
9  /meta          Sample_channel_count H5I_DATASET  STRING        65429
10 /meta    Sample_characteristics_ch1 H5I_DATASET  STRING        65429
11 /meta        Sample_contact_address H5I_DATASET  STRING        65429
12 /meta            Sample_contact_city H5I_DATASET  STRING        65429
13 /meta        Sample_contact_country H5I_DATASET  STRING        65429
14 /meta      Sample_contact_department H5I_DATASET  STRING        65429
15 /meta          Sample_contact_email H5I_DATASET  STRING        65429
16 /meta      Sample_contact_institute H5I_DATASET  STRING        65429
17 /meta      Sample_contact_laboratory H5I_DATASET  STRING        65429
18 /meta            Sample_contact_name H5I_DATASET  STRING        65429
19 /meta          Sample_contact_phone H5I_DATASET  STRING        65429
20 /meta Sample_contact_zip-postal_code H5I_DATASET  STRING        65429
21 /meta        Sample_data_processing H5I_DATASET  STRING        65429
22 /meta          Sample_data_row_count H5I_DATASET  STRING        65429
23 /meta            Sample_description H5I_DATASET  STRING        65429
24 /meta    Sample_extract_protocol_ch1 H5I_DATASET  STRING        65429
25 /meta          Sample_geo_accession H5I_DATASET  STRING        65429
26 /meta        Sample_instrument_model H5I_DATASET  STRING        65429
27 /meta        Sample_last_update_date H5I_DATASET  STRING        65429
28 /meta      Sample_library_selection H5I_DATASET  STRING        65429
29 /meta          Sample_library_source H5I_DATASET  STRING        65429
30 /meta        Sample_library_strategy H5I_DATASET  STRING        65429
31 /meta            Sample_molecule_ch1 H5I_DATASET  STRING        65429
32 /meta            Sample_organism_ch1 H5I_DATASET  STRING        65429
33 /meta            Sample_platform_id H5I_DATASET  STRING        65429
34 /meta                Sample_relation H5I_DATASET  STRING        65429
35 /meta              Sample_series_id H5I_DATASET  STRING        65429
36 /meta        Sample_source_name_ch1 H5I_DATASET  STRING        65429
37 /meta                  Sample_status H5I_DATASET  STRING        65429
38 /meta        Sample_submission_date H5I_DATASET  STRING        65429
39 /meta    Sample_supplementary_file_1 H5I_DATASET  STRING        65429
40 /meta    Sample_supplementary_file_2 H5I_DATASET  STRING        65429
41 /meta              Sample_taxid_ch1 H5I_DATASET  STRING        65429
42 /meta                  Sample_title H5I_DATASET  STRING        65429
43 /meta                    Sample_type H5I_DATASET  STRING        65429
44 /meta                          genes H5I_DATASET  STRING        35238
</syntaxhighlight>
 
=== Formats for writing/saving and sharing data ===
[http://www.econometricsbysimulation.com/2016/12/efficiently-saving-and-sharing-data-in-r_46.html Efficiently Saving and Sharing Data in R]
 
=== Write unix format files on Windows and vice versa ===
https://stat.ethz.ch/pipermail/r-devel/2012-April/063931.html
 
=== with() and within() functions ===
within() is similar to with() except it is used to create new columns and merge them with the original data sets. See [http://www.youtube.com/watch?v=pZ6Bnxg9E8w&list=PLOU2XLYxmsIK9qQfztXeybpHvru-TrqAP youtube video].
<pre>
closePr <- with(mariokart, totalPr - shipPr)
head(closePr, 20)
 
mk <- within(mariokart, {
            closePr <- totalPr - shipPr
    })
head(mk) # new column closePr
 
mk <- mariokart
aggregate(. ~ wheels + cond, mk, mean)
# create mean according to each level of (wheels, cond)
 
aggregate(totalPr ~ wheels + cond, mk, mean)
 
tapply(mk$totalPr, mk[, c("wheels", "cond")], mean)
</pre>
 
=== stem(): stem-and-leaf plot, bar chart on terminals ===
* https://en.wikipedia.org/wiki/Stem-and-leaf_display
* https://stackoverflow.com/questions/14736556/ascii-plotting-functions-for-r
* [https://cran.r-project.org/web/packages/txtplot/index.html txtplot] package
 
=== Graphical Parameters, Axes and Text, Combining Plots ===
[http://www.statmethods.net/advgraphs/axes.html statmethods.net]
 
=== 15 Questions All R Users Have About Plots ===
See http://blog.datacamp.com/15-questions-about-r-plots/. This is a tremendous post. It covers the built-in plot() function and ggplot() from ggplot2 package.
 
# How To Draw An Empty R Plot? plot.new()
# How To Set The Axis Labels And Title Of The R Plots?
# How To Add And Change The Spacing Of The Tick Marks Of Your R Plot? axis()
# How To Create Two Different X- or Y-axes? par(new=TRUE), axis(), mtext()
# How To Add Or Change The R Plot’s Legend? legend()
# How To Draw A Grid In Your R Plot? grid()
# How To Draw A Plot With A PNG As Background? rasterImage() from the '''png''' package
# How To Adjust The Size Of Points In An R Plot? cex argument
# How To Fit A Smooth Curve To Your R Data? loess() and lines()
# How To Add Error Bars In An R Plot? arrows()
# How To Save A Plot As An Image On Disc
# How To Plot Two R Plots Next To Each Other? par(mfrow), '''gridBase''' package, '''lattice''' package
# How To Plot Multiple Lines Or Points? plot(), lines()
# How To Fix The Aspect Ratio For Your R Plots? asp parameter
# What Is The Function Of hjust And vjust In ggplot2?
 
=== Scatterplot with the "rug" function ===
<pre>
require(stats)  # both 'density' and its default method
with(faithful, {
    plot(density(eruptions, bw = 0.15))
    rug(eruptions)
    rug(jitter(eruptions, amount = 0.01), side = 3, col = "light blue")
})
</pre>
[[File:RugFunction.png|200px]]
 
See also the [https://stat.ethz.ch/R-manual/R-devel/library/graphics/html/stripchart.html stripchart()] function which produces one dimensional scatter plots (or dot plots) of the given data.
 
=== Draw a single plot with two different y-axes ===
* http://www.gettinggeneticsdone.com/2015/04/r-single-plot-with-two-different-y-axes.html
 
=== Draw Color Palette ===
* http://teachpress.environmentalinformatics-marburg.de/2013/07/creating-publication-quality-graphs-in-r-7/
 
=== SVG ===
==== Embed svg in html ====
* http://www.magesblog.com/2016/02/using-svg-graphics-in-blog-posts.html
 
==== svglite ====
https://blog.rstudio.org/2016/11/14/svglite-1-2-0/
 
==== pdf -> svg ====
Using Inkscape. See [https://robertgrantstats.wordpress.com/2017/09/07/svg-from-stats-software-the-good-the-bad-and-the-ugly/ this post].
 
=== read.table ===
==== clipboard ====
<syntaxhighlight lang="rsplus">
source("clipboard")
read.table("clipboard")
</syntaxhighlight>
 
==== inline text ====
<syntaxhighlight lang="rsplus">
mydf <- read.table(header=T, text='
cond yval
    A 2
    B 2.5
    C 1.6
')
</syntaxhighlight>
 
==== http(s) connection ====
<syntaxhighlight lang="rsplus">
temp = getURL("https://gist.github.com/arraytools/6743826/raw/23c8b0bc4b8f0d1bfe1c2fad985ca2e091aeb916/ip.txt",
                          ssl.verifypeer = FALSE)
ip <- read.table(textConnection(temp), as.is=TRUE)
</syntaxhighlight>
 
==== read only specific columns ====
Use 'colClasses' option in read.table, read.delim, .... For example, the following example reads only the 3rd column of the text file and also changes its data type from a data frame to a vector. Note that we have include double quotes around NULL.
<syntaxhighlight lang="rsplus">
x <- read.table("var_annot.vcf", colClasses = c(rep("NULL", 2), "character", rep("NULL", 7)),
                skip=62, header=T, stringsAsFactors = FALSE)[, 1]
#
system.time(x <- read.delim("Methylation450k.txt",
                colClasses = c("character", "numeric", rep("NULL", 188)), stringsAsFactors = FALSE))
</syntaxhighlight>
 
To know the number of columns, we might want to read the first row first.
<syntaxhighlight lang="rsplus">
library(magrittr)
scan("var_annot.vcf", sep="\t", what="character", skip=62, nlines=1, quiet=TRUE) %>% length()
</syntaxhighlight>
 
Another method is to use '''pipe()''', '''cut''' or '''awk'''. See [https://stackoverflow.com/questions/2193742/ways-to-read-only-select-columns-from-a-file-into-r-a-happy-medium-between-re ways to read only selected columns from a file into R]
 
=== Serialization ===
If we want to pass an R object to C (use recv() function), we can use writeBin() to output the stream size and then use serialize() function to output the stream to a file. See the
[https://stat.ethz.ch/pipermail/r-devel/attachments/20130628/56473803/attachment.pl post] on R mailing list.
<pre>
> a <- list(1,2,3)
> a_serial <- serialize(a, NULL)
> a_length <- length(a_serial)
> a_length
[1] 70
> writeBin(as.integer(a_length), connection, endian="big")
> serialize(a, connection)
</pre>
In C++ process, I receive one int variable first to get the length, and
then read <length> bytes from the connection.
 
=== socketConnection ===
See ?socketconnection.
 
==== Simple example ====
from the socketConnection's manual.
 
Open one R session
<pre>
con1 <- socketConnection(port = 22131, server = TRUE) # wait until a connection from some client
writeLines(LETTERS, con1)
close(con1)
</pre>
 
Open another R session (client)
<pre>
con2 <- socketConnection(Sys.info()["nodename"], port = 22131)
# as non-blocking, may need to loop for input
readLines(con2)
while(isIncomplete(con2)) {
  Sys.sleep(1)
  z <- readLines(con2)
  if(length(z)) print(z)
}
close(con2)
</pre>
 
==== Use nc in client ====
 
The client does not have to be the R. We can use telnet, nc, etc. See the post [https://stat.ethz.ch/pipermail/r-sig-hpc/2009-April/000144.html here]. For example, on the client machine, we can issue
<pre>
nc localhost 22131  [ENTER]
</pre>
Then the client will wait and show anything written from the server machine. The connection from nc will be terminated once close(con1) is given.
 
If I use the command
<pre>
nc -v -w 2 localhost -z 22130-22135
</pre>
then the connection will be established for a short time which means the cursor on the server machine will be returned. If we issue the above nc command again on the client machine it will show the connection to the port 22131 is refused. PS. "-w" switch denotes the number of seconds of the timeout for connects and final net reads.
 
Some post I don't have a chance to read. http://digitheadslabnotebook.blogspot.com/2010/09/how-to-send-http-put-request-from-r.html
 
==== Use curl command in client ====
On the server,
<pre>
con1 <- socketConnection(port = 8080, server = TRUE)
</pre>
 
On the client,
<pre>
curl --trace-ascii debugdump.txt http://localhost:8080/
</pre>
 
Then go to the server,
<pre>
while(nchar(x <- readLines(con1, 1)) > 0) cat(x, "\n")
 
close(con1) # return cursor in the client machine
</pre>
 
==== Use telnet command in client ====
On the server,
<pre>
con1 <- socketConnection(port = 8080, server = TRUE)
</pre>
 
On the client,
<pre>
sudo apt-get install telnet
telnet localhost 8080
abcdefg
hijklmn
qestst
</pre>
 
Go to the server,
<pre>
readLines(con1, 1)
readLines(con1, 1)
readLines(con1, 1)
close(con1) # return cursor in the client machine
</pre>
 
Some [http://blog.gahooa.com/2009/01/23/basics-of-telnet-and-http/ tutorial] about using telnet on http request. And [http://unixhelp.ed.ac.uk/tables/telnet_commands.html this] is a summary of using telnet.
 
=== Subsetting ===
[http://lib.stat.cmu.edu/R/CRAN/doc/manuals/R-lang.html#Subset-assignment Subset assignment of R Language Definition] and [http://lib.stat.cmu.edu/R/CRAN/doc/manuals/R-lang.html#Manipulation-of-functions Manipulation of functions].
 
The result of the command '''x[3:5] <- 13:15''' is as if the following had been executed
<pre>
`*tmp*` <- x
x <- "[<-"(`*tmp*`, 3:5, value=13:15)
rm(`*tmp*`)
</pre>
 
==== Avoid Coercing Indices To Doubles ====
[https://www.jottr.org/2018/04/02/coercion-of-indices/ 1 or 1L]
 
=== as.formula() ===
* [https://stackoverflow.com/questions/5251507/how-to-succinctly-write-a-formula-with-many-variables-from-a-data-frame How to succinctly write a formula with many variables from a data frame?]
<syntaxhighlight lang='rsplus'>
? as.formula
xnam <- paste("x", 1:25, sep="")
fmla <- as.formula(paste("y ~ ", paste(xnam, collapse= "+")))
</syntaxhighlight>
* [http://www.win-vector.com/blog/2018/09/r-tip-how-to-pass-a-formula-to-lm/ How to Pass A formula to lm], [https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/bquote ?bquote], [https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/eval ?eval]
<syntaxhighlight lang='rsplus'>
outcome <- "mpg"
variables <- c("cyl", "disp", "hp", "carb")
 
# Method 1. The 'Call' portion of the model is reported as “formula = f”
# our modeling effort,
# fully parameterized!
f <- as.formula(
  paste(outcome,
        paste(variables, collapse = " + "),
        sep = " ~ "))
print(f)
# mpg ~ cyl + disp + hp + carb
 
model <- lm(f, data = mtcars)
print(model)
 
# Call:
#  lm(formula = f, data = mtcars)
#
# Coefficients:
#  (Intercept)          cyl        disp          hp        carb 
#    34.021595    -1.048523    -0.026906    0.009349    -0.926863 
 
# Method 2. eval() + bquote() + ".()"
format(terms(model))  #  or model$terms
# [1] "mpg ~ cyl + disp + hp + carb"
 
# The new line of code
model <- eval(bquote(  lm(.(f), data = mtcars)  ))
 
print(model)
# Call:
#  lm(formula = mpg ~ cyl + disp + hp + carb, data = mtcars)
#
# Coefficients:
#  (Intercept)          cyl        disp          hp        carb 
#    34.021595    -1.048523    -0.026906    0.009349    -0.926863 
 
# Note if we skip ".()" operator
> eval(bquote(  lm(f, data = mtcars)  ))
 
Call:
lm(formula = f, data = mtcars)
 
Coefficients:
(Intercept)          cyl        disp          hp        carb 
  34.021595    -1.048523    -0.026906    0.009349    -0.926863
</syntaxhighlight>
 
=== S3 and S4 methods ===
* How S4 works in R https://www.rdocumentation.org/packages/methods/versions/3.5.1/topics/Methods_Details
* Software for Data Analysis: Programming with R by John Chambers
* Programming with Data: A Guide to the S Language  by John Chambers
* https://www.rmetrics.org/files/Meielisalp2009/Presentations/Chalabi1.pdf
* https://www.stat.auckland.ac.nz/S-Workshop/Gentleman/S4Objects.pdf
* [http://cran.r-project.org/web/packages/packS4/index.html packS4: Toy Example of S4 Package]
* http://www.cyclismo.org/tutorial/R/s4Classes.html
* http://adv-r.had.co.nz/S4.html
 
To get the source code of S4 methods, we can use showMethod(), getMethod() and showMethod(). For example
<syntaxhighlight lang='rsplus'>
library(qrqc)
showMethods("gcPlot")
getMethod("gcPlot", "FASTQSummary") # get an error
showMethods("gcPlot", "FASTQSummary") # good.
</syntaxhighlight>
 
* '''getClassDef()''' in S4 ([http://www.bioconductor.org/help/course-materials/2014/Epigenomics/BiocForSequenceAnalysis.html Bioconductor course]).
<syntaxhighlight lang='rsplus'>
library(IRanges)
ir <- IRanges(start=c(10, 20, 30), width=5)
ir
 
class(ir)
## [1] "IRanges"
## attr(,"package")
## [1] "IRanges"
 
getClassDef(class(ir))
## Class "IRanges" [package "IRanges"]
##
## Slots:
##                                                                     
## Name:            start          width          NAMES    elementType
## Class:        integer        integer characterORNULL      character
##                                     
## Name:  elementMetadata        metadata
## Class: DataTableORNULL            list
##
## Extends:
## Class "Ranges", directly
## Class "IntegerList", by class "Ranges", distance 2
## Class "RangesORmissing", by class "Ranges", distance 2
## Class "AtomicList", by class "Ranges", distance 3
## Class "List", by class "Ranges", distance 4
## Class "Vector", by class "Ranges", distance 5
## Class "Annotated", by class "Ranges", distance 6
##
## Known Subclasses: "NormalIRanges"
</syntaxhighlight>
 
==== See what methods work on an object ====
see what methods work on an object, e.g. a GRanges object:
<syntaxhighlight lang='rsplus'>methods(class="GRanges")</syntaxhighlight> Or if you have an object, x: <syntaxhighlight lang='rsplus'>methods(class=class(x))</syntaxhighlight>
 
==== View S3 function definition: double colon '::' and triple colon ':::' operators ====
?":::"
 
* pkg::name returns the value of the exported variable name in namespace pkg
* pkg:::name returns the value of the internal variable name
 
<syntaxhighlight lang='rsplus'>
base::"+"
stats:::coef.default
</syntaxhighlight>
 
==== mcols() and DataFrame() from Bioc [http://bioconductor.org/packages/release/bioc/html/S4Vectors.html S4Vectors] package ====
* mcols: Get or set the metadata columns.
* colData: SummarizedExperiment instances from GenomicRanges
* DataFrame: The DataFrame class extends the DataTable virtual class and supports the storage of any type of object (with length and [ methods) as columns.
 
For example, in [http://www-huber.embl.de/DESeq2paper/vignettes/posterior.pdf Shrinkage of logarithmic fold changes] vignette of the DESeq2paper package
<syntaxhighlight lang='rsplus'>
> mcols(ddsNoPrior[genes, ])
DataFrame with 2 rows and 21 columns
  baseMean  baseVar  allZero dispGeneEst    dispFit dispersion  dispIter dispOutlier  dispMAP
  <numeric> <numeric> <logical>  <numeric>  <numeric>  <numeric> <numeric>  <logical> <numeric>
1  163.5750  8904.607    FALSE  0.06263141 0.03862798  0.0577712        7      FALSE 0.0577712
2  175.3883 59643.515    FALSE  2.25306109 0.03807917  2.2530611        12        TRUE 1.6011440
  Intercept strain_DBA.2J_vs_C57BL.6J SE_Intercept SE_strain_DBA.2J_vs_C57BL.6J WaldStatistic_Intercept
  <numeric>                <numeric>    <numeric>                    <numeric>              <numeric>
1  6.210188                  1.735829    0.1229354                    0.1636645              50.515872
2  6.234880                  1.823173    0.6870629                    0.9481865                9.074686
  WaldStatistic_strain_DBA.2J_vs_C57BL.6J WaldPvalue_Intercept WaldPvalue_strain_DBA.2J_vs_C57BL.6J
                                <numeric>            <numeric>                            <numeric>
1                                10.60602        0.000000e+00                        2.793908e-26
2                                1.92280        1.140054e-19                        5.450522e-02
  betaConv  betaIter  deviance  maxCooks
  <logical> <numeric> <numeric> <numeric>
1      TRUE        3  210.4045 0.2648753
2      TRUE        9  243.7455 0.3248949
</syntaxhighlight>
 
=== findInterval() ===
Related functions are cuts() and split(). See also
* [http://books.google.com/books?id=oKY5QeSWb4cC&pg=PT310&lpg=PT310&dq=r+findinterval3&source=bl&ots=YjNMkHrTMw&sig=y_wIA1um420xVCI5IoGivABge-s&hl=en&sa=X&ei=gm_yUrSqLKXesAS2_IGoBQ&ved=0CFIQ6AEwBTgo#v=onepage&q=r%20findinterval3&f=false R Graphs Cookbook]
* [http://adv-r.had.co.nz/Rcpp.html Hadley Wickham]
 
=== do.call, rbind, lapply ===
Lots of examples. See for example [https://stat.ethz.ch/pipermail/r-help/attachments/20140423/62d8d103/attachment.pl this one] for creating a data frame from a vector.
<syntaxhighlight lang='rsplus'>
x <- readLines(textConnection("---CLUSTER 1 ---
3
4
5
6
---CLUSTER 2 ---
9
10
8
11"))
 
# create a list of where the 'clusters' are
clust <- c(grep("CLUSTER", x), length(x) + 1L)
 
# get size of each cluster
clustSize <- diff(clust) - 1L
 
# get cluster number
clustNum <- gsub("[^0-9]+", "", x[grep("CLUSTER", x)])
 
result <- do.call(rbind, lapply(seq(length(clustNum)), function(.cl){
    cbind(Object = x[seq(clust[.cl] + 1L, length = clustSize[.cl])]
        , Cluster = .cl
        )
    }))
 
result
 
    Object Cluster
[1,] "3"    "1"
[2,] "4"    "1"
[3,] "5"    "1"
[4,] "6"    "1"
[5,] "9"    "2"
[6,] "10"  "2"
[7,] "8"    "2"
[8,] "11"  "2"
</syntaxhighlight>
 
A 2nd example is to [http://datascienceplus.com/working-with-data-frame-in-r/ sort a data frame] by using do.call(order, list()).
 
=== How to get examples from help file ===
See [https://stat.ethz.ch/pipermail/r-help/2014-April/369342.html this post].
Method 1:
<pre>
example(acf, give.lines=TRUE)
</pre>
Method 2:
<pre>
Rd <- utils:::.getHelpFile(?acf)
tools::Rd2ex(Rd)
</pre>
 
=== "[" and "[[" with the sapply() function ===
Suppose we want to extract string from the id like "ABC-123-XYZ" before the first hyphen.
<pre>
sapply(strsplit("ABC-123-XYZ", "-"), "[", 1)
</pre>
is the same as
<pre>
sapply(strsplit("ABC-123-XYZ", "-"), function(x) x[1])
</pre>
 
=== Dealing with date ===
<pre>
d1 = date()
class(d1) # "character"
d2 = Sys.Date()
class(d2) # "Date"
 
format(d2, "%a %b %d")
 
library(lubridate); ymd("20140108") # "2014-01-08 UTC"
mdy("08/04/2013") # "2013-08-04 UTC"
dmy("03-04-2013") # "2013-04-03 UTC"
ymd_hms("2011-08-03 10:15:03") # "2011-08-03 10:15:03 UTC"
ymd_hms("2011-08-03 10:15:03", tz="Pacific/Auckland")
# "2011-08-03 10:15:03 NZST"
?Sys.timezone
x = dmy(c("1jan2013", "2jan2013", "31mar2013", "30jul2013"))
wday(x[1]) # 3
wday(x[1], label=TRUE) # Tues
</pre>
* http://www.r-statistics.com/2012/03/do-more-with-dates-and-times-in-r-with-lubridate-1-1-0/
* http://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html
* http://rpubs.com/seandavi/GEOMetadbSurvey2014
* We want our dates and times as class "Date" or the class "POSIXct", "POSIXlt". For more information type ?POSIXlt.
 
=== [http://adv-r.had.co.nz/Computing-on-the-language.html Nonstandard evaluation] ===
* [https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/substitute substitute(expr, env)] - capture expression.
** substitute() is often paired with deparse() to create informative labels for data sets and plots.
** Use 'substitute' to include the variable's name in a plot title, e.g.: '''var <- "abc"; hist(var,main=substitute(paste("Dist of ", var))) ''' will show the title "Dist of var" instead of "Dist of abc" in the title.
* quote(expr) - similar to substitute() but do nothing??
* eval(expr, envir), evalq(expr, envir) - eval evaluates its first argument in the current scope before passing it to the evaluator: evalq avoids this.
* deparse(expr) - turns unevaluated expressions into character strings. For example,
<pre>
> deparse(args(lm))
[1] "function (formula, data, subset, weights, na.action, method = \"qr\", "
[2] "    model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, "
[3] "    contrasts = NULL, offset, ...) "                                   
[4] "NULL"   
 
> deparse(args(lm), width=20)
[1] "function (formula, data, "        "    subset, weights, "         
[3] "    na.action, method = \"qr\", " "    model = TRUE, x = FALSE, " 
[5] "    y = FALSE, qr = TRUE, "      "    singular.ok = TRUE, "       
[7] "    contrasts = NULL, "          "    offset, ...) "             
[9] "NULL"
</pre>
* parse(text) - returns the parsed but unevaluated expressions in a list. See [[R#Create_a_Simple_Socket_Server_in_R|Create a Simple Socket Server in R]] for the application of '''eval(parse(text))'''. Be cautious!
** [http://r.789695.n4.nabble.com/using-eval-parse-paste-in-a-loop-td849207.html eval(parse...)) should generally be avoided]
** [https://stackoverflow.com/questions/13649979/what-specifically-are-the-dangers-of-evalparse What specifically are the dangers of eval(parse(…))?]
 
Following is another example. Assume we have a bunch of functions (f1, f2, ...; each function implements a different algorithm) with same input arguments format (eg a1, a2). We like to run these function on the same data (to compare their performance).
<syntaxhighlight lang='rsplus'>
f1 <- function(x) x+1; f2 <- function(x) x+2; f3 <- function(x) x+3
 
f1(1:3)
f2(1:3)
f3(1:3)
 
# Or
myfun <- function(f, a) {
    eval(parse(text = f))(a)
}
myfun("f1", 1:3)
myfun("f2", 1:3)
myfun("f3", 1:3)
 
# Or with lapply
method <- c("f1", "f2", "f3")
res <- lapply(method, function(M) {
                    Mres <- eval(parse(text = M))(1:3)
                    return(Mres)
})
names(res) <- method
</syntaxhighlight>
 
=== The ‘…’ argument ===
See [http://cran.r-project.org/doc/manuals/R-intro.html#The-three-dots-argument Section 10.4 of An Introduction to R]. Especially, the expression '''list(...)''' evaluates all such arguments and returns them in a named list
 
=== Lazy evaluation in R functions arguments ===
* http://adv-r.had.co.nz/Functions.html
* https://stat.ethz.ch/pipermail/r-devel/2015-February/070688.html
 
'''R function arguments are lazy — they’re only evaluated if they’re actually used'''.
 
* Example 1. By default, R function arguments are lazy.
<pre>
f <- function(x) {
  999
}
f(stop("This is an error!"))
#> [1] 999
</pre>
 
* Example 2. If you want to ensure that an argument is evaluated you can use '''force()'''.
<pre>
add <- function(x) {
  force(x)
  function(y) x + y
}
adders2 <- lapply(1:10, add)
adders2[[1]](10)
#> [1] 11
adders2[[10]](10)
#> [1] 20
</pre>
 
* Example 3. Default arguments are evaluated inside the function.
<pre>
f <- function(x = ls()) {
  a <- 1
  x
}
 
# ls() evaluated inside f:
f()
# [1] "a" "x"
 
# ls() evaluated in global environment:
f(ls())
# [1] "add"    "adders" "f"
</pre>
 
* Example 4. Laziness is useful in if statements — the second statement below will be evaluated only if the first is true.
<pre>
x <- NULL
if (!is.null(x) && x > 0) {
 
}
</pre>
 
=== Backtick sign, infix/prefix/postfix operators ===
The backtick sign ` (not the single quote) refers to functions or variables that have otherwise reserved or illegal names; e.g. '&&', '+', '(', 'for', 'if', etc. See some examples in [http://adv-r.had.co.nz/Functions.html this note].
 
'''[http://en.wikipedia.org/wiki/Infix_notation infix]''' operator.
<pre>
1 + 2    # infix
+ 1 2    # prefix
1 2 +    # postfix
</pre>
 
=== List data type ===
==== [http://adv-r.had.co.nz/Functions.html Calling a function given a list of arguments] ====
<pre>
> args <- list(c(1:10, NA, NA), na.rm = TRUE)
> do.call(mean, args)
[1] 5.5
> mean(c(1:10, NA, NA), na.rm = TRUE)
[1] 5.5
</pre>
 
=== Error handling and exceptions ===
* http://adv-r.had.co.nz/Exceptions-Debugging.html
* try() allows execution to continue even after an error has occurred. You can suppress the message with try(..., silent = TRUE).
<pre>
out <- try({
  a <- 1
  b <- "x"
  a + b
})
 
elements <- list(1:10, c(-1, 10), c(T, F), letters)
results <- lapply(elements, log)
is.error <- function(x) inherits(x, "try-error")
succeeded <- !sapply(results, is.error)
</pre>
* tryCatch(): With tryCatch() you map conditions to handlers (like switch()), named functions that are called with the condition as an input. Note that try() is a simplified version of tryCatch().
<pre>
tryCatch(expr, ..., finally)
 
show_condition <- function(code) {
  tryCatch(code,
    error = function(c) "error",
    warning = function(c) "warning",
    message = function(c) "message"
  )
}
show_condition(stop("!"))
#> [1] "error"
show_condition(warning("?!"))
#> [1] "warning"
show_condition(message("?"))
#> [1] "message"
show_condition(10)
#> [1] 10
</pre>
Below is another snippet from available.packages() function,
<pre>
z <- tryCatch(download.file(....), error = identity)
if (!inherits(z, "error")) STATEMENTS
</pre>
 
=== Using list type ===
==== Avoid if-else or switch ====
?plot.stepfun.
<pre>
y0 <- c(1,2,4,3)
sfun0  <- stepfun(1:3, y0, f = 0)
sfun.2 <- stepfun(1:3, y0, f = .2)
sfun1  <- stepfun(1:3, y0, right = TRUE)
 
for(i in 1:3)
  lines(list(sfun0, sfun.2, stepfun(1:3, y0, f = 1))[[i]], col = i)
legend(2.5, 1.9, paste("f =", c(0, 0.2, 1)), col = 1:3, lty = 1, y.intersp = 1)
</pre>
 
=== Open a new Window device ===
X11() or dev.new()
 
=== par() ===
?par
 
==== layout ====
http://datascienceplus.com/adding-text-to-r-plot/
 
==== reset the settings ====
<syntaxhighlight lang='rsplus'>
op <- par(mfrow=c(2,1), mar = c(5,7,4,2) + 0.1)
....
par(op) # mfrow=c(1,1), mar = c(5,4,4,2) + .1
</syntaxhighlight>
 
==== mtext (margin text) vs title ====
* https://datascienceplus.com/adding-text-to-r-plot/
* https://datascienceplus.com/mastering-r-plot-part-2-axis/
 
==== mgp (axis label locations) ====
# The margin line (in ‘mex’ units) for the axis title, axis labels and axis line.  Note that ‘mgp[1]’ affects ‘title’ whereas ‘mgp[2:3]’ affect ‘axis’.  The default is ‘c(3, 1, 0)’. If we like to make the axis labels closer to an axis, we can use mgp=c(2.3, 1, 0) for example.
# http://rfunction.com/archives/1302 mgp – A numeric vector of length 3, which sets the axis label locations relative to the edge of the inner plot window. The first value represents the location the labels (i.e. xlab and ylab in plot), the second the tick-mark labels, and third the tick marks. The default is c(3, 1, 0).
 
==== pch ====
[[File:R pch.png|250px]]
 
([https://www.statmethods.net/advgraphs/parameters.html figure source])
==== lty (line type) ====
[[File:R lty.png|250px]]
 
([http://www.sthda.com/english/wiki/line-types-in-r-lty figure source])
 
==== las (label style) ====
0: The default, parallel to the axis
 
1: Always horizontal
 
2: Perpendicular to the axis
 
3: Always vertical
 
==== oma (outer margin) ====
The following trick is useful when we want to draw multiple plots with a common title.
 
<syntaxhighlight lang='rsplus'>
par(mfrow=c(1,2),oma = c(0, 0, 2, 0))  # oma=c(0, 0, 0, 0) by default
plot(1:10,  main="Plot 1")
plot(1:100,  main="Plot 2")
mtext("Title for Two Plots", outer = TRUE, cex = 1.5) # outer=FALSE by default
</syntaxhighlight>
 
[https://datascienceplus.com/mastering-r-plot-part-3-outer-margins/ Mastering R plot – Part 3: Outer margins] '''mtext()''' & '''par(xpd)'''.
 
=== Suppress warnings ===
Use [https://www.rdocumentation.org/packages/base/versions/3.4.1/topics/options options()]. If ''warn'' is negative all warnings are ignored. If ''warn'' is zero (the default) warnings are stored until the top--level function returns.
<syntaxhighlight lang='rsplus'>
op <- options("warn")
options(warn = -1)
....
options(op)
 
# OR
warnLevel <- options()$warn
options(warn = -1)
...
options(warn = warnLevel)
</syntaxhighlight>
 
=== NULL, NA, NaN, Inf ===
https://tomaztsql.wordpress.com/2018/07/04/r-null-values-null-na-nan-inf/
 
=== save() vs saveRDS() ===
# saveRDS() can only save one R object while save() does not have this constraint.
# saveRDS() doesn’t save the both the object and its name it just saves a representation of the object. As a result, the saved object can be loaded into a named object within R that is different from the name it had when originally serialized. See [http://www.fromthebottomoftheheap.net/2012/04/01/saving-and-loading-r-objects/ this post].
<pre>
x <- 5
saveRDS(x, "myfile.rds")
x2 <- readRDS("myfile.rds")
identical(mod, mod2, ignore.environment = TRUE)
</pre>
 
=== [https://www.rdocumentation.org/packages/base/versions/3.5.0/topics/all.equal ==, all.equal(), identical()] ===
* ==: exact match
* all.equal: compare R objects x and y testing ‘near equality’
* identical: The safe and reliable way to test two objects for being exactly equal.
<syntaxhighlight lang='rsplus'>
x <- 1.0; y <- 0.99999999999
all.equal(x, y)
# [1] TRUE
identical(x, y)
# [1] FALSE
</syntaxhighlight>
 
See also the [http://cran.r-project.org/web/packages/testthat/index.html testhat] package.
 
=== Numerical Pitfall ===
[http://bayesfactor.blogspot.com/2016/05/numerical-pitfalls-in-computing-variance.html Numerical pitfalls in computing variance]
<syntaxhighlight lang='bash'>
.1 - .3/3
## [1] 0.00000000000000001388
</syntaxhighlight>
 
=== Sys.getpid() ===
This can be used to monitor R process memory usage or stop the R process. See [https://stat.ethz.ch/pipermail/r-devel/2016-November/073360.html this post].
 
=== How to debug an R code ===
==== Using assign() in functions ====
For example, insert the following line to your function
<pre>
assign(envir=globalenv(), "GlobalVar", localvar)
</pre>
 
=== Debug lapply()/sapply() ===
* https://stackoverflow.com/questions/1395622/debugging-lapply-sapply-calls
* https://stat.ethz.ch/R-manual/R-devel/library/utils/html/recover.html. Use options(error=NULL) to turn it off.
 
=== Debugging with RStudio ===
* https://www.rstudio.com/resources/videos/debugging-techniques-in-rstudio/
* https://github.com/ajmcoqui/debuggingRStudio/blob/master/RStudio_Debugging_Cheatsheet.pdf
* https://support.rstudio.com/hc/en-us/articles/205612627-Debugging-with-RStudio
 
=== Debug R source code ===
==== Build R with debug information ====
* [[R#Build_R_from_its_source|R -> Build R from its source on Windows]]
* http://www.stats.uwo.ca/faculty/murdoch/software/debuggingR/
* http://www.stats.uwo.ca/faculty/murdoch/software/debuggingR/gdb.shtml
* [https://github.com/arraytools/r-debug My note of debugging cor() function]
 
==== .Call ====
* [https://cran.rstudio.com/doc/manuals/r-release/R-exts.html#Calling-_002eCall Writing R Extensions] manual.
 
==== Registering native routines ====
https://cran.rstudio.com/doc/manuals/r-release/R-exts.html#Registering-native-routines
 
Pay attention to the prefix argument '''.fixes''' (eg .fixes = "C_") in '''useDynLib()''' function in the NAMESPACE file.
 
==== Example of debugging cor() function ====
Note that R's cor() function called a C function cor().
<pre>
stats::cor
....
.Call(C_cor, x, y, na.method, method == "kendall")
</pre>
 
A step-by-step screenshot of debugging using the GNU debugger '''gdb''' can be found on my Github repository https://github.com/arraytools/r-debug.
 
=== Locale bug (grep did not handle UTF-8 properly PR#16264) ===
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16264
 
=== Path length in dir.create() (PR#17206) ===
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17206 (Windows only)
 
=== install.package() error, R_LIBS_USER is empty in R 3.4.1 ===
* https://support.rstudio.com/hc/en-us/community/posts/115008369408-Since-update-to-R-3-4-1-R-LIBS-USER-is-empty and http://r.789695.n4.nabble.com/R-LIBS-USER-on-Ubuntu-16-04-td4740935.html. Modify '''/etc/R/Renviron''' (if you have a sudo right) by uncomment out line 43.
<pre>
R_LIBS_USER=${R_LIBS_USER-'~/R/x86_64-pc-linux-gnu-library/3.4'}
</pre>
* https://stackoverflow.com/questions/44873972/default-r-personal-library-location-is-null. Modify '''$HOME/.Renviron''' by adding a line
<pre>
R_LIBS_USER="${HOME}/R/${R_PLATFORM}-library/3.4"
</pre>
* http://stat.ethz.ch/R-manual/R-devel/library/base/html/libPaths.html. Play with .libPaths()
 
On Mac & R 3.4.0 (it's fine)
<syntaxhighlight lang='rsplus'>
> Sys.getenv("R_LIBS_USER")
[1] "~/Library/R/3.4/library"
> .libPaths()
[1] "/Library/Frameworks/R.framework/Versions/3.4/Resources/library"
</syntaxhighlight>
 
On Linux & R 3.3.1 (ARM)
<syntaxhighlight lang='rsplus'>
> Sys.getenv("R_LIBS_USER")
[1] "~/R/armv7l-unknown-linux-gnueabihf-library/3.3"
> .libPaths()
[1] "/home/$USER/R/armv7l-unknown-linux-gnueabihf-library/3.3"
[2] "/usr/local/lib/R/library"
</syntaxhighlight>
 
On Linux & R 3.4.1 (*Problem*)
<syntaxhighlight lang='rsplus'>
> Sys.getenv("R_LIBS_USER")
[1] ""
> .libPaths()
[1] "/usr/local/lib/R/site-library" "/usr/lib/R/site-library"
[3] "/usr/lib/R/library"
</syntaxhighlight>
 
I need to specify the '''lib''' parameter when I use the '''install.packages''' command.
<syntaxhighlight lang='rsplus'>
> install.packages("devtools", "~/R/x86_64-pc-linux-gnu-library/3.4")
> library(devtools)
Error in library(devtools) : there is no package called 'devtools'
 
# Specify lib.loc parameter will not help with the dependency package
> library(devtools, lib.loc = "~/R/x86_64-pc-linux-gnu-library/3.4")
Error: package or namespace load failed for 'devtools':
.onLoad failed in loadNamespace() for 'devtools', details:
  call: loadNamespace(name)
  error: there is no package called 'withr'
 
# A solution is to redefine .libPaths
> .libPaths(c("~/R/x86_64-pc-linux-gnu-library/3.4", .libPaths()))
> library(devtools) # Works
</syntaxhighlight>
 
A better solution is to specify R_LIBS_USER in '''~/.Renviron''' file or '''~/.bash_profile'''; see [http://stat.ethz.ch/R-manual/R-patched/library/base/html/Startup.html ?Startup].
 
=== Using external data from within another package ===
https://logfc.wordpress.com/2017/03/02/using-external-data-from-within-another-package/
 
=== How to exit a sourced R script ===
* [http://stackoverflow.com/questions/25313406/how-to-exit-a-sourced-r-script How to exit a sourced R script]
* [http://r.789695.n4.nabble.com/Problem-using-the-source-function-within-R-functions-td907180.html Problem using the source-function within R-functions] ''' ''The best way to handle the generic sort of problem you are describing is to take those source'd files, and rewrite their content as functions to be called from your other functions.'' '''
 
=== Decimal point & decimal comma ===
Countries using Arabic numerals with decimal comma (Austria, Belgium, Brazil France, Germany, Netherlands, Norway, South Africa, Spain, Sweden, ...) https://en.wikipedia.org/wiki/Decimal_mark
 
=== setting seed locally (not globally) in R ===
https://stackoverflow.com/questions/14324096/setting-seed-locally-not-globally-in-r


=== R's internal C API ===
== License ==
https://github.com/hadley/r-internals
[http://www.win-vector.com/blog/2019/07/some-notes-on-gnu-licenses-in-r-packages/ Some Notes on GNU Licenses in R Packages]


=== Random numbers: multivariate normal ===
[https://moderndata.plot.ly/why-dash-uses-the-mit-license/ Why Dash uses the mit license (and not a copyleft gpl license)]
Why [https://www.rdocumentation.org/packages/MASS/versions/7.3-49/topics/mvrnorm MASS::mvrnorm()] gives different result on Mac and Linux/Windows?


The reason could be the covariance matrix decomposition - and that may be due to the LAPACK/BLAS libraries. See
== Interview questions ==
* https://stackoverflow.com/questions/11567613/different-random-number-generation-between-os
* Does R store matrices in column-major order or row-major order?
* https://stats.stackexchange.com/questions/149321/generating-and-working-with-random-vectors-in-r
** Matrices are stored in column-major order, which means that elements are arranged and accessed by columns. This is in contrast to languages like Python, where matrices (or arrays) are typically stored in row-major order.
* [https://stats.stackexchange.com/questions/61719/cholesky-versus-eigendecomposition-for-drawing-samples-from-a-multivariate-norma Cholesky versus eigendecomposition for drawing samples from a multivariate normal distribution]
<syntaxhighlight lang='rsplus'>
set.seed(1234)
junk <- biospear::simdata(n=500, p=500, q.main = 10, q.inter = 10,
                          prob.tt = .5, m0=1, alpha.tt= -.5,
                          beta.main= -.5, beta.inter= -.5, b.corr = .7, b.corr.by=25,
                          wei.shape = 1, recr=3, fu=2, timefactor=1)
## Method 1: MASS::mvrnorm()
## This is simdata() has used. It gives different numbers on different OS.
##
library(MASS)
set.seed(1234)
m0 <-1
n <- 500
prob.tt <- .5
p <- 500
b.corr.by <- 25
b.corr <- .7
data <- data.frame(treat = rbinom(n, 1, prob.tt) - 0.5)
n.blocks <- p%/%b.corr.by
covMat <- diag(n.blocks) %x%
  matrix(b.corr^abs(matrix(1:b.corr.by, b.corr.by, b.corr.by, byrow = TRUE) -
                    matrix(1:b.corr.by, b.corr.by, b.corr.by)), b.corr.by, b.corr.by)
diag(covMat) <- 1
data <- cbind(data, mvrnorm(n, rep(0, p), Sigma = covMat))
range(data)
# Mac: -4.963827  4.133723
# Linux/Windows: -4.327635  4.408097
packageVersion("MASS")
# Mac: [1] ‘7.3.49’
# Linux: [1] ‘7.3.49’
# Windows: [1] ‘7.3.47’
 
R.version$version.string
# Mac: [1] "R version 3.4.3 (2017-11-30)"
# Linux: [1] "R version 3.4.4 (2018-03-15)"
# Windows: [1] "R version 3.4.3 (2017-11-30)"
 
## Method 2: mvtnorm::rmvnorm()
library(mvtnorm)
set.seed(1234)
sigma <- matrix(c(4,2,2,3), ncol=2)
x <- rmvnorm(n=n, rep(0, p), sigma=covMat)
range(x)
# Mac: [1] -4.482566  4.459236
# Linux: [1] -4.482566  4.459236
 
## Method 3: mvnfast::rmvn()
set.seed(1234)
x <- mvnfast::rmvn(n, rep(0, p), covMat)
range(x)
# Mac: [1] -4.323585  4.355666
# Linux: [1] -4.323585  4.355666
 
library(microbenchmark)
library(MASS)
library(mvtnorm)
library(mvnfast)
microbenchmark(v1 <- rmvnorm(n=n, rep(0, p), sigma=covMat, "eigen"),
              v2 <- rmvnorm(n=n, rep(0, p), sigma=covMat, "svd"),
              v3 <- rmvnorm(n=n, rep(0, p), sigma=covMat, "chol"),
              v4 <- rmvn(n, rep(0, p), covMat),
              v5 <- mvrnorm(n, rep(0, p), Sigma = covMat))
Unit: milliseconds
expr      min        lq
v1 <- rmvnorm(n = n, rep(0, p), sigma = covMat, "eigen") 296.55374 300.81089
v2 <- rmvnorm(n = n, rep(0, p), sigma = covMat, "svd") 461.81867 466.98806
v3 <- rmvnorm(n = n, rep(0, p), sigma = covMat, "chol") 118.33759 120.01829
v4 <- rmvn(n, rep(0, p), covMat)  66.64675  69.89383
v5 <- mvrnorm(n, rep(0, p), Sigma = covMat) 291.19826 294.88038
mean    median        uq      max neval  cld
306.72485 301.99339 304.46662 335.6137  100    d
478.58536 470.44085 493.89041 571.7990  100    e
125.85427 121.26185 122.21361 151.1658  100  b 
71.67996  70.52985  70.92923 100.2622  100 a   
301.88144 296.76028 299.50839 346.7049  100  c 
</syntaxhighlight>
A little more investigation shows the eigen values differ a little bit on macOS and Linux.
<syntaxhighlight lang='rsplus'>
set.seed(1234); x <- mvrnorm(n, rep(0, p), Sigma = covMat)
debug(mvrnorm)
# eS --- macOS
# eS2 -- Linux
Browse[2]> range(abs(eS$values - eS2$values))
# [1] 0.000000e+00 1.776357e-15
Browse[2]> var(as.vector(eS$vectors))
[1] 0.002000006
Browse[2]> var(as.vector(eS2$vectors))
[1] 0.001999987
Browse[2]> all.equal(eS$values, eS2$values)
[1] TRUE
Browse[2]> which(eS$values != eS2$values)
  [1]  6  7  8  9  10  11  12  13  14  20  22  23  24  25  26  27  28  29
  ...
[451] 494 495 496 497 499 500
Browse[2]> range(abs(eS$vectors - eS2$vectors))
[1] 0.0000000 0.5636919
</syntaxhighlight>


== Resource ==
* Explain the difference between == and === in R. Provide an example to illustrate their use.
=== Books ===
** The == operator is used for testing equality of values in R. It returns TRUE if the values on the left and right sides are equal, otherwise FALSE. The === operator does not exist in base R.  
* A list of recommended books http://blog.revolutionanalytics.com/2015/11/r-recommended-reading.html
* [http://statisticalestimation.blogspot.com/2016/11/learning-r-programming-by-reading-books.html Learning R programming by reading books: A book list]
* [http://www.stats.ox.ac.uk/pub/MASS4/ Modern Applied Statistics with S] by William N. Venables and Brian D. Ripley
* [http://dirk.eddelbuettel.com/code/rcpp.html Seamless R and C++ Integration with Rcpp] by Dirk Eddelbuettel
* [http://www.amazon.com/Advanced-Chapman-Hall-CRC-Series/dp/1466586966/ref=pd_sim_b_6?ie=UTF8&refRID=0C98YDK5MRSTRY0ZX1DB Advanced R] by Hadley Wickham 2014
** http://brettklamer.com/diversions/statistical/compile-hadleys-advanced-r-programming-to-a-pdf/ Compile Hadley's Advanced R to a PDF
* [http://www.brodrigues.co/functional_programming_and_unit_testing_for_data_munging/ Functional programming and unit testing for data munging with R] by Bruno Rodrigues
* [http://www.amazon.com/Cookbook-OReilly-Cookbooks-Paul-Teetor/dp/0596809158/ref=pd_sim_b_3?ie=UTF8&refRID=0C98YDK5MRSTRY0ZX1DB R Cookbook] by Paul Teetor
* [http://www.amazon.com/Machine-Learning-R-Brett-Lantz/dp/1782162143/ref=pd_sim_b_13?ie=UTF8&refRID=1851BAX3M17CK00VSMA6 Machine Learning with R] by Brett Lantz
* [http://www.amazon.com/Everyone-Advanced-Analytics-Graphics-Addison-Wesley/dp/0321888030/ref=pd_sim_b_3?ie=UTF8&refRID=1851BAX3M17CK00VSMA6 R for Everyone] by [http://www.jaredlander.com/r-for-everyone/ Jared P. Lander]
* [http://www.amazon.com/The-Art-Programming-Statistical-Software/dp/1593273843/ref=pd_sim_b_2?ie=UTF8&refRID=1851BAX3M17CK00VSMA6 The Art of R Programming] by Norman Matloff
* [http://www.amazon.com/Applied-Predictive-Modeling-Max-Kuhn/dp/1461468485/ref=pd_sim_b_3?ie=UTF8&refRID=0H3NMWX7KTRAEB32902Q Applied Predictive Modeling] by Max Kuhn
* [http://www.amazon.com/R-Action-Robert-Kabacoff/dp/1935182390/ref=pd_sim_b_17?ie=UTF8&refRID=0H3NMWX7KTRAEB32902Q R in Action] by Robert Kabacoff
* [http://www.amazon.com/The-Book-Michael-J-Crawley/dp/0470973927/ref=pd_sim_b_6?ie=UTF8&refRID=0CNF2XK8VBGF5A6W3NE3 The R Book] by Michael J. Crawley
* Regression Modeling Strategies, with Applications to Linear Models, Survival Analysis and Logistic Regression by Frank E. Harrell
* Data Manipulation with R by Phil Spector
* [https://rviews.rstudio.com/2017/05/19/efficient_r_programming/ Review of Efficient R Programming]
* [http://r-pkgs.had.co.nz/ R packages: Organize, Test, Document, and Share Your Code] by Hadley Wicklam 2015
* [http://tidytextmining.com/ Text Mining with R: A Tidy Approach] and a [http://pacha.hk/2017-05-20_text_mining_with_r.html blog]
* [https://github.com/csgillespie/efficientR Efficient R programming] by Colin Gillespie and Robin Lovelace. It works to re-create the html version of the book if we follow their simple instruction in the [https://csgillespie.github.io/efficientR/building-the-book-from-source.html Appendix]. Note that pdf version has advantages of expected output (mathematical notations, tables) over the epub version.
<syntaxhighlight lang='rsplus'>
# R 3.4.1
.libPaths(c("~/R/x86_64-pc-linux-gnu-library/3.4", .libPaths()))
setwd("/tmp/efficientR/")
bookdown::render_book("index.Rmd", output_format = "bookdown::pdf_book")
# generated pdf file is located _book/_main.pdf


bookdown::render_book("index.Rmd", output_format = "bookdown::epub_book")
* What is the purpose of the apply() function in R? How does it differ from the for loop?
# generated epub file is located _book/_main.epub.
** The apply() function in R is used to apply a function over the margins of an array or matrix. It is often used as an alternative to loops for applying a function to each row or column of a matrix.
# This cannot be done in RStudio ("parse_dt" not resolved from current namespace (lubridate))
# but it is OK to run in an R terminal
</syntaxhighlight>


=== Webinar ===
* Describe the concept of factors in R. How are they used in data manipulation and analysis?
* [https://www.rstudio.com/resources/webinars/ RStudio] & its [https://github.com/rstudio/webinars github] repository
** Factors in R are used to represent categorical data. They are an essential data type for statistical modeling and analysis. Factors store both the unique values that occur in a dataset and the corresponding integer codes used to represent those values.


=== useR! ===
* What is the significance
* http://blog.revolutionanalytics.com/2017/07/revisiting-user2017.html
of the attach() and detach() functions in R? When should they be used?
** A: The attach() function is used to add a data frame to the search path in R, making it easier to access variables within the data frame. The detach() function is used to remove a data frame from the search path, which can help avoid naming conflicts and reduce memory usage.


=== Blogs, Tips, Socials, Communities ===
* Explain the concept of vectorization in R. How does it impact the performance of R code?
* Google: revolutionanalytics In case you missed it
** Vectorization in R refers to the ability to apply operations to entire vectors or arrays at once, without needing to write explicit loops. This can significantly improve the performance of R code, as it allows operations to be performed in a more efficient, vectorized manner by taking advantage of R's underlying C code.
* [http://r4stats.com/articles/why-r-is-hard-to-learn/ Why R is hard to learn] by Bob Musenchen.
* [http://onetipperday.sterding.com/2016/02/my-15-practical-tips-for.html My 15 practical tips for a bioinformatician]
* [http://blog.revolutionanalytics.com/2017/06/r-community.html The R community is one of R's best features]
* [https://hbctraining.github.io/main/ Bioinformatics Training at the Harvard Chan Bioinformatics Core]


=== Bug Tracking System ===
* Describe the difference between data.frame and matrix in R. When would you use one over the other?
https://bugs.r-project.org/bugzilla3/ and [https://bugs.r-project.org/bugzilla3/query.cgi Search existing bug reports]. Remember to select 'All' in the Status drop-down list.
** A data.frame in R is a two-dimensional structure that can store different types of data (e.g., numeric, character, factor) in its columns. It is similar to a table in a database.
** A matrix in R is also a two-dimensional structure, but it can only store elements of the same data type. It is more like a mathematical matrix.
** You would use a data.frame when you have heterogeneous data (i.e., different types of data) and need to work with it as a dataset. You would use a matrix when you have homogeneous data (i.e., the same type of data) and need to perform matrix operations.


Use '''sessionInfo()'''.
* What are the benefits of using the dplyr package in R for data manipulation? Provide an example of how you would use dplyr to filter a data frame.
** The dplyr package provides a set of functions that make it easier to manipulate data frames in R.
** It uses a syntax that is easy to read and understand, making complex data manipulations more intuitive.
** To filter a data frame using dplyr, you can use the filter() function. For example, filter(df, column_name == value) would filter df to include only rows where column_name is equal to value.

Latest revision as of 13:31, 18 October 2024

Install and upgrade R

Here

New release

Online Editor

We can run R on web browsers without installing it on local machines (similar to [/ideone.com Ideone.com] for C++. It does not require an account either (cf RStudio).

rdrr.io

It can produce graphics too. The package I am testing (cobs) is available too.

rstudio.cloud

RDocumentation

The interactive engine is based on DataCamp Light

For example, tbl_df function from dplyr package.

The website DataCamp allows to run library() on the Script window. After that, we can use the packages on R Console.

Here is a list of (common) R packages that users can use on the web.

The packages on RDocumentation may be outdated. For example, the current stringr on CRAN is v1.2.0 (2/18/2017) but RDocumentation has v1.1.0 (8/19/2016).

Web Applications

R web applications

Creating local repository for CRAN and Bioconductor

R repository

Parallel Computing

See R parallel.

Cloud Computing

Install R on Amazon EC2

http://randyzwitch.com/r-amazon-ec2/

Bioconductor on Amazon EC2

http://www.bioconductor.org/help/bioconductor-cloud-ami/

Big Data Analysis

bigmemory, biganalytics, bigtabulate

ff, ffbase

biglm

data.table

See data.table.

disk.frame

Split-apply-combine for Maximum Likelihood Estimation of a linear model

Apache arrow

Reproducible Research

Reproducible Environments

https://rviews.rstudio.com/2019/04/22/reproducible-environments/

checkpoint package

Some lessons in R coding

  1. don't use rand() and srand() in c. The result is platform dependent. My experience is Ubuntu/Debian/CentOS give the same result but they are different from macOS and Windows. Use Rcpp package and R's random number generator instead.
  2. don't use list.files() directly. The result is platform dependent even different Linux OS. An extra sorting helps!

Useful R packages

Rcpp

http://cran.r-project.org/web/packages/Rcpp/index.html. See more here.

RInside : embed R in C++ code

Ubuntu

With RInside, R can be embedded in a graphical application. For example, $HOME/R/x86_64-pc-linux-gnu-library/3.0/RInside/examples/qt directory includes source code of a Qt application to show a kernel density plot with various options like kernel functions, bandwidth and an R command text box to generate the random data. See my demo on Youtube. I have tested this qtdensity example successfully using Qt 4.8.5.

  1. Follow the instruction cairoDevice to install required libraries for cairoDevice package and then cairoDevice itself.
  2. Install Qt. Check 'qmake' command becomes available by typing 'whereis qmake' or 'which qmake' in terminal.
  3. Open Qt Creator from Ubuntu start menu/Launcher. Open the project file $HOME/R/x86_64-pc-linux-gnu-library/3.0/RInside/examples/qt/qtdensity.pro in Qt Creator.
  4. Under Qt Creator, hit 'Ctrl + R' or the big green triangle button on the lower-left corner to build/run the project. If everything works well, you shall see the interactive program qtdensity appears on your desktop.

File:qtdensity.png

With RInside + Wt web toolkit installed, we can also create a web application. To demonstrate the example in examples/wt directory, we can do

cd ~/R/x86_64-pc-linux-gnu-library/3.0/RInside/examples/wt
make
sudo ./wtdensity --docroot . --http-address localhost --http-port 8080

Then we can go to the browser's address bar and type http://localhost:8080 to see how it works (a screenshot is in here).

Windows 7

To make RInside works on Windows OS, try the following

  1. Make sure R is installed under C:\ instead of C:\Program Files if we don't want to get an error like g++.exe: error: Files/R/R-3.0.1/library/RInside/include: No such file or directory.
  2. Install RTools
  3. Instal RInside package from source (the binary version will give an error )
  4. Create a DOS batch file containing necessary paths in PATH environment variable
@echo off
set PATH=C:\Rtools\bin;c:\Rtools\gcc-4.6.3\bin;%PATH%
set PATH=C:\R\R-3.0.1\bin\i386;%PATH%
set PKG_LIBS=`Rscript -e "Rcpp:::LdFlags()"`
set PKG_CPPFLAGS=`Rscript -e "Rcpp:::CxxFlags()"`
set R_HOME=C:\R\R-3.0.1
echo Setting environment for using R
cmd

In the Windows command prompt, run

cd C:\R\R-3.0.1\library\RInside\examples\standard
make -f Makefile.win

Now we can test by running any of executable files that make generates. For example, rinside_sample0.

rinside_sample0

As for the Qt application qdensity program, we need to make sure the same version of MinGW was used in building RInside/Rcpp and Qt. See some discussions in

So the Qt and Wt web tool applications on Windows may or may not be possible.

GUI

Qt and R

tkrplot

On Ubuntu, we need to install tk packages, such as by

sudo apt-get install tk-dev

reticulate - Interface to 'Python'

Python -> reticulate

Hadoop (eg ~100 terabytes)

See also HighPerformanceComputing

RHadoop

Snowdoop: an alternative to MapReduce algorithm

XML

On Ubuntu, we need to install libxml2-dev before we can install XML package.

sudo apt-get update
sudo apt-get install libxml2-dev

On CentOS,

yum -y install libxml2 libxml2-devel

XML

library(XML)

# Read and parse HTML file
doc.html = htmlTreeParse('http://apiolaza.net/babel.html', useInternal = TRUE)

# Extract all the paragraphs (HTML tag is p, starting at
# the root of the document). Unlist flattens the list to
# create a character vector.
doc.text = unlist(xpathApply(doc.html, '//p', xmlValue))

# Replace all by spaces
doc.text = gsub('\n', ' ', doc.text)

# Join all the elements of the character vector into a single
# character string, separated by spaces
doc.text = paste(doc.text, collapse = ' ')

This post http://stackoverflow.com/questions/25315381/using-xpathsapply-to-scrape-xml-attributes-in-r can be used to monitor new releases from github.com.

> library(RCurl) # getURL()
> library(XML)   # htmlParse and xpathSApply
> xData <- getURL("https://github.com/alexdobin/STAR/releases")
> doc = htmlParse(xData)
> plain.text <- xpathSApply(doc, "//span[@class='css-truncate-target']", xmlValue)
  # I look at the source code and search 2.5.3a and find the tag as
  # 2.5.3a
> plain.text
 [1] "2.5.3a"      "2.5.2b"      "2.5.2a"      "2.5.1b"      "2.5.1a"     
 [6] "2.5.0c"      "2.5.0b"      "STAR_2.5.0a" "STAR_2.4.2a" "STAR_2.4.1d"
>
> # try bwa
> > xData <- getURL("https://github.com/lh3/bwa/releases")
> doc = htmlParse(xData)
> xpathSApply(doc, "//span[@class='css-truncate-target']", xmlValue)
[1] "v0.7.15" "v0.7.13"

> # try picard
> xData <- getURL("https://github.com/broadinstitute/picard/releases")
> doc = htmlParse(xData)
> xpathSApply(doc, "//span[@class='css-truncate-target']", xmlValue)
 [1] "2.9.1" "2.9.0" "2.8.3" "2.8.2" "2.8.1" "2.8.0" "2.7.2" "2.7.1" "2.7.0"
[10] "2.6.0"

This method can be used to monitor new tags/releases from some projects like Cura, BWA, Picard, STAR. But for some projects like sratools the class attribute in the span element ("css-truncate-target") can be different (such as "tag-name").

xmlview

RCurl

On Ubuntu, we need to install the packages (the first one is for XML package that RCurl suggests)

# Test on Ubuntu 14.04
sudo apt-get install libxml2-dev
sudo apt-get install libcurl4-openssl-dev

Scrape google scholar results

https://github.com/tonybreyal/Blog-Reference-Functions/blob/master/R/googleScholarXScraper/googleScholarXScraper.R

No google ID is required

Seems not work

 Error in data.frame(footer = xpathLVApply(doc, xpath.base, "/font/span[@class='gs_fl']",  : 
  arguments imply differing number of rows: 2, 0 

devtools

devtools package depends on Curl. It actually depends on some system files. If we just need to install a package, consider the remotes package which was suggested by the BiocManager package.

# Ubuntu 14.04
sudo apt-get install libcurl4-openssl-dev

# Ubuntu 16.04, 18.04
sudo apt-get install build-essential libcurl4-gnutls-dev libxml2-dev libssl-dev

# Ubuntu 20.04
sudo apt-get install -y libxml2-dev libcurl4-openssl-dev libssl-dev

Lazy-load database XXX is corrupt. internal error -3. It often happens when you use install_github to install a package that's currently loaded; try restarting R and running the app again.

NB. According to the output of apt-cache show r-cran-devtools, the binary package is very old though apt-cache show r-base and supported packages like survival shows the latest version.

httr

httr imports curl, jsonlite, mime, openssl and R6 packages.

When I tried to install httr package, I got an error and some message:

Configuration failed because openssl was not found. Try installing:
 * deb: libssl-dev (Debian, Ubuntu, etc)
 * rpm: openssl-devel (Fedora, CentOS, RHEL)
 * csw: libssl_dev (Solaris)
 * brew: openssl (Mac OSX)
If openssl is already installed, check that 'pkg-config' is in your
PATH and PKG_CONFIG_PATH contains a openssl.pc file. If pkg-config
is unavailable you can set INCLUDE_DIR and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
--------------------------------------------------------------------
ERROR: configuration failed for package ‘openssl’

It turns out after I run sudo apt-get install libssl-dev in the terminal (Debian), it would go smoothly with installing httr package. Nice httr!

Real example: see this post. Unfortunately I did not get a table result; I only get an html file (R 3.2.5, httr 1.1.0 on Ubuntu and Debian).

Since httr package was used in many other packages, take a look at how others use it. For example, aRxiv package.

A package to download free Springer books during Covid-19 quarantine, An update to "An adventure in downloading books" (rvest package)

curl

curl is independent of RCurl package.

library(curl)
h <- new_handle()
handle_setform(h,
  name="aaa", email="bbb"
)
req <- curl_fetch_memory("http://localhost/d/phpmyql3_scripts/ch02/form2.html", handle = h)
rawToChar(req$content)

rOpenSci packages

rOpenSci contains packages that allow access to data repositories through the R statistical programming environment

remotes

Download and install R packages stored in 'GitHub', 'BitBucket', or plain 'subversion' or 'git' repositories. This package is a lightweight replacement of the 'install_*' functions in 'devtools'. Also remotes does not require any extra OS level library (at least on Ubuntu 16.04).

Example:

# https://github.com/henrikbengtsson/matrixstats
remotes::install_github('HenrikBengtsson/matrixStats@develop')

DirichletMultinomial

On Ubuntu, we do

sudo apt-get install libgsl0-dev

Create GUI

gWidgets

GenOrd: Generate ordinal and discrete variables with given correlation matrix and marginal distributions

here

json

R web -> json

Map

leaflet

choroplethr

ggplot2

How to make maps with Census data in R

googleVis

See an example from RJSONIO above.

googleAuthR

Create R functions that interact with OAuth2 Google APIs easily, with auto-refresh and Shiny compatibility.

gtrendsR - Google Trends

quantmod

Maintaining a database of price files in R. It consists of 3 steps.

  1. Initial data downloading
  2. Update existing data
  3. Create a batch file

caret

Tool for connecting Excel with R

write.table

Output a named vector

vec <- c(a = 1, b = 2, c = 3)
write.csv(vec, file = "my_file.csv", quote = F)
x = read.csv("my_file.csv", row.names = 1)
vec2 <- x[, 1]
names(vec2) <- rownames(x)
all.equal(vec, vec2)

# one liner: row names of a 'matrix' become the names of a vector
vec3 <- as.matrix(read.csv('my_file.csv', row.names = 1))[, 1]
all.equal(vec, vec3)

Avoid leading empty column to header

write.table writes unwanted leading empty column to header when has rownames

write.table(a, 'a.txt', col.names=NA)
# Or better by
write.table(data.frame("SeqId"=rownames(a), a), "a.txt", row.names=FALSE)

Add blank field AND column names in write.table

  • write.table(, row.names = TRUE) will miss one element on the 1st row when "row.names = TRUE" which is enabled by default.
    • Suppose x is (n x 2)
    • write.table(x, sep="\t") will generate a file with 2 element on the 1st row
    • read.table(file) will return an object with a size (n x 2)
    • read.delim(file) and read.delim2(file) will also be correct
  • Note that write.csv() does not have this issue that write.table() has
    • Suppose x is (n x 2)
    • Suppose we use write.csv(x, file). The csv file will be ((n+1) x 3) b/c the header row.
    • If we use read.csv(file), the object is (n x 3). So we need to use read.csv(file, row.names = 1)
  • adding blank field AND column names in write.table(); write.table writes unwanted leading empty column to header when has rownames
write.table(a, 'a.txt', col.names=NA)
  • readr::write_tsv() does not include row names in the output file

read.delim(, row.names=1) and write.table(, row.names=TRUE)

How to Use read.delim Function in R

Case 1: no row.names

write.table(df, 'my_data.txt', quote=FALSE, sep='\t', row.names=FALSE)
my_df <- read.delim('my_data.txt')  # the rownames will be 1, 2, 3, ...

Case 2: with row.names. Note: if we open the text file in Excel, we'll see the 1st row is missing one header at the end. It is actually missing the column name for the 1st column.

write.table(df, 'my_data.txt', quote=FALSE, sep='\t', row.names=TRUE)
my_df <- read.delim('my_data.txt')  # it will automatically assign the rownames

Read/Write Excel files package

  • http://www.milanor.net/blog/?p=779
  • flipAPI. One useful feature of DownloadXLSX, which is not supported by the readxl package, is that it can read Excel files directly from the URL.
  • xlsx: depends on Java
  • openxlsx: not depend on Java. Depend on zip application. On Windows, it seems to be OK without installing Rtools. But it can not read xls file; it works on xlsx file.
  • readxl: it does not depend on anything although it can only read but not write Excel files.
    • It is part of tidyverse package. The readxl website provides several articles for more examples.
    • readxl webinar.
    • One advantage of read_excel (as with read_csv in the readr package) is that the data imports into an easy to print object with three attributes a tbl_df, a tbl and a data.frame.
    • For writing to Excel formats, use writexl or openxlsx package.
library(readxl)
read_excel(path, sheet = NULL, range = NULL, col_names = TRUE, 
    col_types = NULL, na = "", trim_ws = TRUE, skip = 0, n_max = Inf, 
    guess_max = min(1000, n_max), progress = readxl_progress(), 
    .name_repair = "unique")
# Example
read_excel(path, range = cell_cols("c:cx"), col_types = "numeric")
  • writexl: zero dependency xlsx writer for R
library(writexl)
mylst <- list(sheet1name = df1, sheet2name = df2)
write_xlsx(mylst, "output.xlsx")

For the Chromosome column, integer values becomes strings (but converted to double, so 5 becomes 5.000000) or NA (empty on sheets).

> head(read_excel("~/Downloads/BRCA.xls", 4)[ , -9], 3)
  UniqueID (Double-click) CloneID UGCluster
1                   HK1A1   21652 Hs.445981
2                   HK1A2   22012 Hs.119177
3                   HK1A4   22293 Hs.501376
                                                    Name Symbol EntrezID
1 Catenin (cadherin-associated protein), alpha 1, 102kDa CTNNA1     1495
2                              ADP-ribosylation factor 3   ARF3      377
3                          Uroporphyrinogen III synthase   UROS     7390
  Chromosome      Cytoband ChimericClusterIDs Filter
1   5.000000        5q31.2               <NA>      1
2  12.000000         12q13               <NA>      1
3       <NA> 10q25.2-q26.3               <NA>      1

The hidden worksheets become visible (Not sure what are those first rows mean in the output).

> excel_sheets("~/Downloads/BRCA.xls")
DEFINEDNAME: 21 00 00 01 0b 00 00 00 02 00 00 00 00 00 00 0d 3b 01 00 00 00 9a 0c 00 00 1a 00 
DEFINEDNAME: 21 00 00 01 0b 00 00 00 04 00 00 00 00 00 00 0d 3b 03 00 00 00 9b 0c 00 00 0a 00 
DEFINEDNAME: 21 00 00 01 0b 00 00 00 03 00 00 00 00 00 00 0d 3b 02 00 00 00 9a 0c 00 00 06 00 
[1] "Experiment descriptors" "Filtered log ratio"     "Gene identifiers"      
[4] "Gene annotations"       "CollateInfo"            "GeneSubsets"           
[7] "GeneSubsetsTemp"       

The Chinese character works too.

> read_excel("~/Downloads/testChinese.xlsx", 1)
   中文 B C
1     a b c
2     1 2 3

To read all worksheets we need a convenient function

read_excel_allsheets <- function(filename) {
    sheets <- readxl::excel_sheets(filename)
    sheets <- sheets[-1] # Skip sheet 1
    x <- lapply(sheets, function(X) readxl::read_excel(filename, sheet = X, col_types = "numeric"))
    names(x) <- sheets
    x
}
dcfile <- "table0.77_dC_biospear.xlsx"
dc <- read_excel_allsheets(dcfile)
# Each component (eg dc1) is a tibble.

readr

Compared to base equivalents like read.csv(), readr is much faster and gives more convenient output: it never converts strings to factors, can parse date/times, and it doesn’t munge the column names.

1.0.0 released. readr 2.0.0 adds built-in support for reading multiple files at once, fast multi-threaded lazy reading and automatic guessing of delimiters among other changes.

Consider a text file where the table (6100 x 22) has duplicated row names and the (1,1) element is empty. The column names are all unique.

  • read.delim() will treat the first column as rownames but it does not allow duplicated row names. Even we use row.names=NULL, it still does not read correctly. It does give warnings (EOF within quoted string & number of items read is not a multiple of the number of columns). The dim is 5177 x 22.
  • readr::read_delim(Filename, "\t") will miss the last column. The dim is 6100 x 21.
  • data.table::fread(Filename, sep = "\t") will detect the number of column names is less than the number of columns. Added 1 extra default column name for the first column which is guessed to be row names or an index. The dim is 6100 x 22. (Winner!)

The readr::read_csv() function is as fast as data.table::fread() function. For files beyond 100MB in size fread() and read_csv() can be expected to be around 5 times faster than read.csv(). See 5.3 of Efficient R Programming book.

Note that data.table::fread() can read a selection of the columns.

Speed comparison

The Fastest Way To Read And Write Files In R. data.table >> readr >> base.

ggplot2

See ggplot2

Data Manipulation & Tidyverse

See Tidyverse.

Data Science

See Data science page

microbenchmark & rbenchmark

Plot, image

jpeg

If we want to create the image on this wiki left hand side panel, we can use the jpeg package to read an existing plot and then edit and save it.

We can also use the jpeg package to import and manipulate a jpg image. See Fun with Heatmaps and Plotly.

EPS/postscript format

  • Don't use postscript().
  • Use cairo_ps(). See aving High-Resolution ggplots: How to Preserve Semi-Transparency. It works on base R plots too.
    cairo_ps(filename = "survival-curves.eps",
             width = 7, height = 7, pointsize = 12,
             fallback_resolution = 300)
    print(p) # or any base R plots statements
    dev.off()
  • Export a graph to .eps file with R.
    • The results looks the same as using cairo_ps().
    • The file size by setEPS() + postscript() is quite smaller compared to using cairo_ps().
    • However, grep can find the characters shown on the plot generated by cairo_ps() but not setEPS() + postscript().
    setEPS()
    postscript("whatever.eps") # 483 KB
    plot(rnorm(20000))
    dev.off()
    # grep rnorm whatever.eps # Not found!
    
    cairo_ps("whatever_cairo.eps")   # 2.4 MB
    plot(rnorm(20000))
    dev.off()
    # grep rnorm whatever_cairo.eps  # Found!
    
  • View EPS files
    • Linux: evince. It is installed by default.
    • Mac: evince. brew install evince
    • Windows. Install ghostscript 9.20 (10.x does not work with ghostview/GSview) and ghostview/GSview (5.0). In Ghostview, open Options -> Advanced Configure. Change Ghostscript DLL path AND Ghostscript include Path according to the ghostscript location ("C:\.
  • Edit EPS files: Inkscape
    • Step 1: open the EPS file
    • Step 2: EPS Input: Determine page orientation from text direction 'Page by page' - OK
    • Step 3: PDF Import Settings: default is "Internal import", but we shall choose "Cairo import".
    • Step 4: Zoom in first.
    • Step 5: Click on Layers and Objects tab on the RHS. Now we can select any lines or letters and edit them as we like. The selected objects are highlighted in the "Layers and Objects" panel. That is, we can select multiple objects using object names. The selected objects can be rotated (Object -> Rotate 90 CW), for example.
    • Step 6: We can save the plot as any formats like svg, eps, pdf, html, pdf, ...

png and resolution

It seems people use res=300 as a definition of high resolution.

  • Bottom line: fix res=300 and adjust height/width as needed. The default is res=72, height=width=480. If we increase res=300, the text font size will be increased, lines become thicker and the plot looks like a zoom-in.
  • Saving high resolution plot in png.
    png("heatmap.png", width = 8, height = 6, units='in', res = 300) 
    # we can adjust width/height as we like
    # the pixel values will be width=8*300 and height=6*300 which is equivalent to 
    # 8*300 * 6*300/10^6 = 4.32 Megapixels (1M pixels = 10^6 pixels) in camera's term
    # However, if we use png(, width=8*300, height=6*300, units='px'), it will produce
    # a plot with very large figure body and tiny text font size.
    
    # It seems the following command gives the same result as above
    png("heatmap.png", width = 8*300, height = 6*300, res = 300) # default units="px"
    
  • Chapter 14.5 Outputting to Bitmap (PNG/TIFF) Files by R Graphics Cookbook
    • Changing the resolution affects the size (in pixels) of graphical objects like text, lines, and points.
  • 10 tips for making your R graphics look their best David Smith
    • In Word you can resize the graphic to an appropriate size, but the high resolution gives you the flexibility to choose a size while not compromising on the quality. I'd recommend at least 1200 pixels on the longest side for standard printers.
  • ?png. The png function has default settings ppi=72, height=480, width=480, units="px".
    • By default no resolution is recorded in the file, except for BMP.
    • BMP vs PNG format. If you need a smaller file size and don’t mind a lossless compression, PNG might be a better choice. If you need to retain as much detail as possible and don’t mind a larger file size, BMP could be the way to go.
      • Compression: BMP files are raw and uncompressed, meaning they’re large files that retain as much detail as possible. On the other hand, PNG files are compressed but still lossless. This means you can reduce or expand PNGs without losing any information.
      • File size: BMPs are larger than PNGs. This is because PNG files automatically compress, and can be compressed again to make the file even smaller.
      • Common uses: BMP contains a maximum amount of details while PNGs are good for small illustrations, sketches, drawings, logos and icons.
      • Quality: No difference
      • Transparency: PNG supports transparency while BMP doesn't
  • Some comparison about the ratio
    • 11/8.5=1.29 (A4 paper)
    • 8/6=1.33 (plot output)
    • 1440/900=1.6 (my display)
  • Setting resolution and aspect ratios in R
  • The difference of res parameter for a simple plot. How to change the resolution of a plot in base R?
  • High Resolution Figures in R.
  • High resolution graphics with R
  • R plot: size and resolution
  • How can I increase the resolution of my plot in R?, devEMF package
  • See Images -> Anti-alias.
  • How to check DPI on PNG
    • The width of a PNG file in terms of inches cannot be determined directly from the file itself, as the file contains pixel dimensions, not physical dimensions. However, you can calculate the width in inches if you know the resolution (DPI, dots per inch) of the image. Remember that converting pixel measurements to physical measurements like inches involves a specific resolution (DPI), and different devices may display the same image at different sizes due to having different resolutions.
  • Cairo case.

PowerPoint

  • For PP presentation, I found it is useful to use svg() to generate a small size figure. Then when we enlarge the plot, the text font size can be enlarged too. According to svg, by default, width = 7, height = 7, pointsize = 12, family = sans.
  • Try the following code. The font size is the same for both plots/files. However, the first plot can be enlarged without losing its quality.
    svg("svg4.svg", width=4, height=4)
    plot(1:10, main="width=4, height=4")
    dev.off()
    
    svg("svg7.svg", width=7, height=7) # default
    plot(1:10, main="width=7, height=7")
    dev.off()
    

magick

https://cran.r-project.org/web/packages/magick/

See an example here I created.

Cairo

See White strips problem in png() or tiff().

geDevices

cairoDevice

PS. Not sure the advantage of functions in this package compared to R's functions (eg. Cairo_svg() vs svg()).

For ubuntu OS, we need to install 2 libraries and 1 R package RGtk2.

sudo apt-get install libgtk2.0-dev libcairo2-dev

On Windows OS, we may got the error: unable to load shared object 'C:/Program Files/R/R-3.0.2/library/cairoDevice/libs/x64/cairoDevice.dll' . We need to follow the instruction in here.

dpi requirement for publication

For import into PDF-incapable programs (MS Office)

sketcher: photo to sketch effects

https://htsuda.net/sketcher/

httpgd

igraph

R web -> igraph

Identifying dependencies of R functions and scripts

https://stackoverflow.com/questions/8761857/identifying-dependencies-of-r-functions-and-scripts

library(mvbutils)
foodweb(where = "package:batr")

foodweb( find.funs("package:batr"), prune="survRiskPredict", lwd=2)

foodweb( find.funs("package:batr"), prune="classPredict", lwd=2)

iterators

Iterator is useful over for-loop if the data is already a collection. It can be used to iterate over a vector, data frame, matrix, file

Iterator can be combined to use with foreach package http://www.exegetic.biz/blog/2013/11/iterators-in-r/ has more elaboration.

Colors

  • scales package. This is used in ggplot2 package.
  • colorspace: A Toolbox for Manipulating and Assessing Colors and Palettes. Popular! Many reverse imports/suggests; e.g. ComplexHeatmap. See my ggplot2 page.
    hcl_palettes(plot = TRUE) # a quick overview
    hcl_palettes(palette = "Dark 2", n=5, plot = T)
    q4 <- qualitative_hcl(4, palette = "Dark 3")
    
  • convert hex value to color names
    library(plotrix)
    sapply(rainbow(4), color.id) # color.id is a function
              # it is used to identify closest match to a color
    sapply(palette(), color.id)
    sapply(RColorBrewer::brewer.pal(4, "Set1"), color.id)
    

Below is an example using the option scale_fill_brewer(palette = "Paired"). See the source code at gist. Note that only set1 and set3 palettes in qualitative scheme can support up to 12 classes.

According to the information from the colorbrew website, qualitative schemes do not imply magnitude differences between legend classes, and hues are used to create the primary visual differences between classes.

File:GgplotPalette.svg

colortools

Tools that allow users generate color schemes and palettes

colourpicker

A Colour Picker Tool for Shiny and for Selecting Colours in Plots

eyedroppeR

Select colours from an image in R with {eyedroppeR}

rex

Friendly Regular Expressions

formatR

The best strategy to avoid failure is to put comments in complete lines or after complete R expressions.

See also this discussion on stackoverflow talks about R code reformatting.

library(formatR)
tidy_source("Input.R", file = "output.R", width.cutoff=70)
tidy_source("clipboard") 
# default width is getOption("width") which is 127 in my case.

Some issues

  • Comments appearing at the beginning of a line within a long complete statement. This will break tidy_source().
cat("abcd",
    # This is my comment
    "defg")

will result in

> tidy_source("clipboard")
Error in base::parse(text = code, srcfile = NULL) : 
  3:1: unexpected string constant
2: invisible(".BeGiN_TiDy_IdEnTiFiEr_HaHaHa# This is my comment.HaHaHa_EnD_TiDy_IdEnTiFiEr")
3: "defg"
   ^
  • Comments appearing at the end of a line within a long complete statement won't break tidy_source() but tidy_source() cannot re-locate/tidy the comma sign.
cat("abcd"
    ,"defg"   # This is my comment
  ,"ghij")

will become

cat("abcd", "defg"  # This is my comment
, "ghij") 

Still bad!!

  • Comments appearing at the end of a line within a long complete statement breaks tidy_source() function. For example,
cat("</p>",
	"<HR SIZE=5 WIDTH=\"100%\" NOSHADE>",
	ifelse(codeSurv == 0,"<h3><a name='Genes'><b><u>Genes which are differentially expressed among classes:</u></b></a></h3>", #4/9/09
	                     "<h3><a name='Genes'><b><u>Genes significantly associated with survival:</u></b></a></h3>"), 
	file=ExternalFileName, sep="\n", append=T)

will result in

> tidy_source("clipboard", width.cutoff=70)
Error in base::parse(text = code, srcfile = NULL) : 
  3:129: unexpected SPECIAL
2: "<HR SIZE=5 WIDTH=\"100%\" NOSHADE>" ,
3: ifelse ( codeSurv == 0 , "<h3><a name='Genes'><b><u>Genes which are differentially expressed among classes:</u></b></a></h3>" , %InLiNe_IdEnTiFiEr%
  • width.cutoff parameter is not always working. For example, there is no any change for the following snippet though I hope it will move the cat() to the next line.
if (codePF & !GlobalTest & !DoExactPermTest) cat(paste("Multivariate Permutations test was computed based on", 
    NumPermutations, "random permutations"), "<BR>", " ", file = ExternalFileName, 
    sep = "\n", append = T)
  • It merges lines though I don't always want to do that. For example
cat("abcd"
    ,"defg"  
  ,"ghij")

will become

cat("abcd", "defg", "ghij") 

styler

https://cran.r-project.org/web/packages/styler/index.html Pretty-prints R code without changing the user's formatting intent.

Download papers

biorxivr

Search and Download Papers from the bioRxiv Preprint Server (biology)

aRxiv

Interface to the arXiv API

pdftools

aside: set it aside

An RStudio addin to run long R commands aside your current session.

Teaching

  • smovie: Some Movies to Illustrate Concepts in Statistics

Organize R research project

How to save (and load) datasets in R (.RData vs .Rds file)

How to save (and load) datasets in R: An overview

Naming convention

Efficient Data Management in R

Efficient Data Management in R. .Rprofile, renv package and dplyr package.

Text to speech

Text-to-Speech with the googleLanguageR package

Speech to text

https://github.com/ggerganov/whisper.cpp and an R package audio.whisper

Weather data

logR

https://github.com/jangorecki/logR

Progress bar

https://github.com/r-lib/progress#readme

Configurable Progress bars, they may include percentage, elapsed time, and/or the estimated completion time. They work in terminals, in 'Emacs' 'ESS', 'RStudio', 'Windows' 'Rgui' and the 'macOS'.

cron

beepr: Play A Short Sound

https://www.rdocumentation.org/packages/beepr/versions/1.3/topics/beep. Try sound=3 "fanfare", 4 "complete", 5 "treasure", 7 "shotgun", 8 "mario".

utils package

https://www.rdocumentation.org/packages/utils/versions/3.6.2

tools package

Different ways of using R

Extending R by John M. Chambers (2016)

10 things R can do that might surprise you

https://simplystatistics.org/2019/03/13/10-things-r-can-do-that-might-surprise-you/

R call C/C++

Mainly talks about .C() and .Call().

Note that scalars and arrays must be passed using pointers. So if we want to access a function not exported from a package, we may need to modify the function to make the arguments as pointers.

.Call

Be sure to add the PACKAGE parameter to avoid an error like

cvfit <- cv.grpsurvOverlap(X, Surv(time, event), group, 
                            cv.ind = cv.ind, seed=1, penalty = 'cMCP')
Error in .Call("standardize", X) : 
  "standardize" not resolved from current namespace (grpreg)

NAMESPACE file & useDynLib

(From Writing R Extensions manual) Loading is most often done automatically based on the useDynLib() declaration in the NAMESPACE file, but may be done explicitly via a call to library.dynam(). This has the form

library.dynam("libname", package, lib.loc) 

library.dynam.unload()

gcc

Coping with varying `gcc` versions and capabilities in R packages

Primitive functions

Primitive Functions List

SEXP

Some examples from packages

  • sva package has one C code function

R call Fortran

Embedding R

An very simple example (do not return from shell) from Writing R Extensions manual

The command-line R front-end, R_HOME/bin/exec/R, is one such example. Its source code is in file <src/main/Rmain.c>.

This example can be run by

R_HOME/bin/R CMD R_HOME/bin/exec/R

Note:

  1. R_HOME/bin/exec/R is the R binary. However, it couldn't be launched directly unless R_HOME and LD_LIBRARY_PATH are set up. Again, this is explained in Writing R Extension manual.
  2. R_HOME/bin/R is a shell-script front-end where users can invoke it. It sets up the environment for the executable. It can be copied to /usr/local/bin/R. When we run R_HOME/bin/R, it actually runs R_HOME/bin/R CMD R_HOME/bin/exec/R (see line 259 of R_HOME/bin/R as in R 3.0.2) so we know the important role of R_HOME/bin/exec/R.

More examples of embedding can be found in tests/Embedding directory. Read <index.html> for more information about these test examples.

An example from Bioconductor workshop

Example: Create embed.c file. Then build the executable. Note that I don't need to create R_HOME variable.

cd 
tar xzvf 
cd R-3.0.1
./configure --enable-R-shlib
make
cd tests/Embedding
make
~/R-3.0.1/bin/R CMD ./Rtest

nano embed.c
# Using a single line will give an error and cannot not show the real problem.
# ../../bin/R CMD gcc -I../../include -L../../lib -lR embed.c
# A better way is to run compile and link separately
gcc -I../../include -c embed.c
gcc -o embed embed.o -L../../lib -lR -lRblas
../../bin/R CMD ./embed

Note that if we want to call the executable file ./embed directly, we shall set up R environment by specifying R_HOME variable and including the directories used in linking R in LD_LIBRARY_PATH. This is based on the inform provided by Writing R Extensions.

export R_HOME=/home/brb/Downloads/R-3.0.2
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/brb/Downloads/R-3.0.2/lib
./embed # No need to include R CMD in front.

Question: Create a data frame in C? Answer: Use data.frame() via an eval() call from C. Or see the code is stats/src/model.c, as part of model.frame.default. Or using Rcpp as here.

Reference http://bioconductor.org/help/course-materials/2012/Seattle-Oct-2012/AdvancedR.pdf

Create a Simple Socket Server in R

This example is coming from this paper.

Create an R function

simpleServer <- function(port=6543)
{
  sock <- socketConnection ( port=port , server=TRUE)
  on.exit(close( sock ))
  cat("\nWelcome to R!\nR>" ,file=sock )
  while(( line <- readLines ( sock , n=1)) != "quit")
  {
    cat(paste("socket >" , line , "\n"))
    out<- capture.output (try(eval(parse(text=line ))))
    writeLines ( out , con=sock )
    cat("\nR> " ,file =sock )
  }
}

Then run simpleServer(). Open another terminal and try to communicate with the server

$ telnet localhost 6543
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.

Welcome to R!
R> summary(iris[, 3:5])
  Petal.Length    Petal.Width          Species  
 Min.   :1.000   Min.   :0.100   setosa    :50  
 1st Qu.:1.600   1st Qu.:0.300   versicolor:50  
 Median :4.350   Median :1.300   virginica :50  
 Mean   :3.758   Mean   :1.199                  
 3rd Qu.:5.100   3rd Qu.:1.800                  
 Max.   :6.900   Max.   :2.500                  

R> quit
Connection closed by foreign host.

Rserve

Note the way of launching Rserve is like the way we launch C program when R was embedded in C. See Example from Bioconductor workshop.

See my Rserve page.

outsider

(Commercial) StatconnDcom

R.NET

rJava

Terminal

# jdk 7
sudo apt-get install openjdk-7-*
update-alternatives --config java
# oracle jdk 8
sudo add-apt-repository -y ppa:webupd8team/java
sudo apt-get update
echo debconf shared/accepted-oracle-license-v1-1 select true | sudo debconf-set-selections
echo debconf shared/accepted-oracle-license-v1-1 seen true | sudo debconf-set-selections
sudo apt-get -y install openjdk-8-jdk

and then run the following (thanks to http://stackoverflow.com/questions/12872699/error-unable-to-load-installed-packages-just-now) to fix an error: libjvm.so: cannot open shared object file: No such file or directory.

  • Create the file /etc/ld.so.conf.d/java.conf with the following entries:
/usr/lib/jvm/java-8-oracle/jre/lib/amd64
/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server
  • And then run sudo ldconfig

Now go back to R

install.packages("rJava")

Done!

If above does not work, a simple way is by (under Ubuntu) running

sudo apt-get install r-cran-rjava

which will create new package 'default-jre' (under /usr/lib/jvm) and 'default-jre-headless'.

RCaller

RApache

Rscript, arguments and commandArgs()

Passing arguments to an R script from command lines Syntax:

$ Rscript --help
Usage: /path/to/Rscript [--options] [-e expr [-e expr2 ...] | file] [args]

Example:

args = commandArgs(trailingOnly=TRUE)
# test if there is at least one argument: if not, return an error
if (length(args)==0) {
  stop("At least one argument must be supplied (input file).n", call.=FALSE)
} else if (length(args)==1) {
  # default output file
  args[2] = "out.txt"
}
cat("args[1] = ", args[1], "\n")
cat("args[2] = ", args[2], "\n")
Rscript --vanilla sillyScript.R iris.txt out.txt
# args[1] =  iris.txt 
# args[2] =  out.txt

Rscript, #! Shebang and optparse package

littler

Provides hash-bang (#!) capability for R

FAQs:

root@ed5f80320266:/# ls -l /usr/bin/{r,R*}
# R 3.5.2 docker container
-rwxr-xr-x 1 root root 82632 Jan 26 18:26 /usr/bin/r        # binary, can be used for 'shebang' lines, r --help
                                              # Example: r --verbose -e "date()"

-rwxr-xr-x 1 root root  8722 Dec 20 11:35 /usr/bin/R        # text, R --help
                                              # Example: R -q -e "date()"

-rwxr-xr-x 1 root root 14552 Dec 20 11:35 /usr/bin/Rscript  # binary, can be used for 'shebang' lines, Rscript --help
                                              # It won't show the startup message when it is used in the command line.
                                              # Example: Rscript -e "date()"

We can install littler using two ways.

  • install.packages("littler"). This will install the latest version but the binary 'r' program is only available under the package/bin directory (eg ~/R/x86_64-pc-linux-gnu-library/3.4/littler/bin/r). You need to create a soft link in order to access it globally.
  • sudo apt install littler. This will install 'r' globally; however, the installed version may be old.

After the installation, vignette contains several examples. The off-line vignette has a table of contents. Nice! The web version of examples does not have the TOC.

r was not meant to run interactively like R. See man r.

RInside: Embed R in C++

See RInside

(From RInside documentation) The RInside package makes it easier to embed R in your C++ applications. There is no code you would execute directly from the R environment. Rather, you write C++ programs that embed R which is illustrated by some the included examples.

The included examples are armadillo, eigen, mpi, qt, standard, threads and wt.

To run 'make' when we don't have a global R, we should modify the file <Makefile>. Also if we just want to create one executable file, we can do, for example, 'make rinside_sample1'.

To run any executable program, we need to specify LD_LIBRARY_PATH variable, something like

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/brb/Downloads/R-3.0.2/lib 

The real build process looks like (check <Makefile> for completeness)

g++ -I/home/brb/Downloads/R-3.0.2/include \
    -I/home/brb/Downloads/R-3.0.2/library/Rcpp/include \
    -I/home/brb/Downloads/R-3.0.2/library/RInside/include -g -O2 -Wall \
    -I/usr/local/include   \
    rinside_sample0.cpp  \
    -L/home/brb/Downloads/R-3.0.2/lib -lR  -lRblas -lRlapack \
    -L/home/brb/Downloads/R-3.0.2/library/Rcpp/lib -lRcpp \
    -Wl,-rpath,/home/brb/Downloads/R-3.0.2/library/Rcpp/lib \
    -L/home/brb/Downloads/R-3.0.2/library/RInside/lib -lRInside \
    -Wl,-rpath,/home/brb/Downloads/R-3.0.2/library/RInside/lib \
    -o rinside_sample0

Hello World example of embedding R in C++.

#include <RInside.h>                    // for the embedded R via RInside

int main(int argc, char *argv[]) {

    RInside R(argc, argv);              // create an embedded R instance 

    R["txt"] = "Hello, world!\n";	// assign a char* (string) to 'txt'

    R.parseEvalQ("cat(txt)");           // eval the init string, ignoring any returns

    exit(0);
}

The above can be compared to the Hello world example in Qt.

#include <QApplication.h>
#include <QPushButton.h>

int main( int argc, char **argv )
{
    QApplication app( argc, argv );

    QPushButton hello( "Hello world!", 0 );
    hello.resize( 100, 30 );

    app.setMainWidget( &hello );
    hello.show();

    return app.exec();
}

RFortran

RFortran is an open source project with the following aim:

To provide an easy to use Fortran software library that enables Fortran programs to transfer data and commands to and from R.

It works only on Windows platform with Microsoft Visual Studio installed:(

Call R from other languages

C

Using R from C/C++

Error: “not resolved from current namespace” error, when calling C routines from R

Solution: add getNativeSymbolInfo() around your C/Fortran symbols. Search Google:r dyn.load not resolved from current namespace

JRI

http://www.rforge.net/JRI/

ryp2

http://rpy.sourceforge.net/rpy2.html

Create a standalone Rmath library

R has many math and statistical functions. We can easily use these functions in our C/C++/Fortran. The definite guide of doing this is on Chapter 9 "The standalone Rmath library" of R-admin manual.

Here is my experience based on R 3.0.2 on Windows OS.

Create a static library <libRmath.a> and a dynamic library <Rmath.dll>

Suppose we have downloaded R source code and build R from its source. See Build_R_from_its_source. Then the following 2 lines will generate files <libRmath.a> and <Rmath.dll> under C:\R\R-3.0.2\src\nmath\standalone directory.

cd C:\R\R-3.0.2\src\nmath\standalone
make -f Makefile.win

Use Rmath library in our code

set CPLUS_INCLUDE_PATH=C:\R\R-3.0.2\src\include
set LIBRARY_PATH=C:\R\R-3.0.2\src\nmath\standalone
# It is not LD_LIBRARY_PATH in above.

# Created <RmathEx1.cpp> from the book "Statistical Computing in C++ and R" web site
# http://math.la.asu.edu/~eubank/CandR/ch4Code.cpp
# It is OK to save the cpp file under any directory.

# Force to link against the static library <libRmath.a>
g++ RmathEx1.cpp -lRmath -lm -o RmathEx1.exe
# OR
g++ RmathEx1.cpp -Wl,-Bstatic -lRmath -lm -o RmathEx1.exe

# Force to link against dynamic library <Rmath.dll>
g++ RmathEx1.cpp Rmath.dll -lm -o RmathEx1Dll.exe

Test the executable program. Note that the executable program RmathEx1.exe can be transferred to and run in another computer without R installed. Isn't it cool!

c:\R>RmathEx1
Enter a argument for the normal cdf:
1
Enter a argument for the chi-squared cdf:
1
Prob(Z <= 1) = 0.841345
Prob(Chi^2 <= 1)= 0.682689

Below is the cpp program <RmathEx1.cpp>.

//RmathEx1.cpp
#define MATHLIB_STANDALONE 
#include <iostream>
#include "Rmath.h"

using std::cout; using std::cin; using std::endl;

int main()
{
  double x1, x2;
  cout << "Enter a argument for the normal cdf:" << endl;
  cin >> x1;
  cout << "Enter a argument for the chi-squared cdf:" << endl;
  cin >> x2;

  cout << "Prob(Z <= " << x1 << ") = " << 
    pnorm(x1, 0, 1, 1, 0)  << endl;
  cout << "Prob(Chi^2 <= " << x2 << ")= " << 
    pchisq(x2, 1, 1, 0) << endl;
  return 0;
}

Calling R.dll directly

See Chapter 8.2.2 of R Extensions. This is related to embedding R under Windows. The file <R.dll> on Windows is like <libR.so> on Linux.

Create HTML report

ReportingTools (Jason Hackney) from Bioconductor. See Genome->ReportingTools.

htmlTable package

The htmlTable package is intended for generating tables using HTML formatting. This format is compatible with Markdown when used for HTML-output. The most basic table can easily be created by just passing a matrix or a data.frame to the htmlTable-function.

formattable

htmltab package

This package is NOT used to CREATE html report but EXTRACT html table.

ztable package

Makes zebra-striped tables (tables with alternating row colors) in LaTeX and HTML formats easily from a data.frame, matrix, lm, aov, anova, glm or coxph objects.

Create academic report

reports package in CRAN and in github repository. The youtube video gives an overview of the package.

Create pdf and epub files

# Idea:
#        knitr        pdflatex
#   rnw -------> tex ----------> pdf
library(knitr)
knit("example.rnw") # create example.tex file
  • A very simple example <002-minimal.Rnw> from yihui.name works fine on linux.
git clone https://github.com/yihui/knitr-examples.git
  • <knitr-minimal.Rnw>. I have no problem to create pdf file on Windows but still cannot generate pdf on Linux from tex file. Some people suggested to run sudo apt-get install texlive-fonts-recommended to install missing fonts. It works!

To see a real example, check out DESeq2 package (inst/doc subdirectory). In addition to DESeq2, I also need to install DESeq, BiocStyle, airway, vsn, gplots, and pasilla packages from Bioconductor. Note that, it is best to use sudo/admin account to install packages.

Or starts with markdown file. Download the example <001-minimal.Rmd> and remove the last line of getting png file from internet.

# Idea:
#        knitr        pandoc
#   rmd -------> md ----------> pdf

git clone https://github.com/yihui/knitr-examples.git
cd knitr-examples
R -e "library(knitr); knit('001-minimal.Rmd')"
pandoc 001-minimal.md -o 001-minimal.pdf # require pdflatex to be installed !!

To create an epub file (not success yet on Windows OS, missing figures on Linux OS)

# Idea:
#        knitr        pandoc
#   rnw -------> tex ----------> markdown or epub

library(knitr)
knit("DESeq2.Rnw") # create DESeq2.tex
system("pandoc  -f latex -t markdown -o DESeq2.md DESeq2.tex")

Convert tex to epub

kable() for tables

Create Tables In LaTeX, HTML, Markdown And ReStructuredText

Create Word report

Using the power of Word

How to go from R to nice tables in Microsoft Word

knitr + pandoc

It is better to create rmd file in RStudio. Rstudio provides a template for rmd file and it also provides a quick reference to R markdown language.

# Idea:
#        knitr       pandoc
#   rmd -------> md --------> docx
library(knitr)
knit2html("example.rmd") #Create md and html files

and then

FILE <- "example"
system(paste0("pandoc -o ", FILE, ".docx ", FILE, ".md"))

Note. For example reason, if I play around the above 2 commands for several times, the knit2html() does not work well. However, if I click 'Knit HTML' button on the RStudio, it then works again.

Another way is

library(pander)
name = "demo"
knit(paste0(name, ".Rmd"), encoding = "utf-8")
Pandoc.brew(file = paste0(name, ".md"), output = paste0(-name, "docx"), convert = "docx")

Note that once we have used knitr command to create a md file, we can use pandoc shell command to convert it to different formats:

  • A pdf file: pandoc -s report.md -t latex -o report.pdf
  • A html file: pandoc -s report.md -o report.html (with the -c flag html files can be added easily)
  • Openoffice: pandoc report.md -o report.odt
  • Word docx: pandoc report.md -o report.docx

We can also create the epub file for reading on Kobo ereader. For example, download this file and save it as example.Rmd. I need to remove the line containing the link to http://i.imgur.com/RVNmr.jpg since it creates an error when I run pandoc (not sure if it is the pandoc version I have is too old). Now we just run these 2 lines to get the epub file. Amazing!

knit("example.Rmd")
pandoc("example.md", format="epub")

PS. If we don't remove the link, we will get an error message (pandoc 1.10.1 on Windows 7)

> pandoc("Rmd_to_Epub.md", format="epub")
executing pandoc   -f markdown -t epub -o Rmd_to_Epub.epub "Rmd_to_Epub.utf8md"
pandoc.exe: .\.\http://i.imgur.com/RVNmr.jpg: openBinaryFile: invalid argument (Invalid argument)
Error in (function (input, format, ext, cfg)  : conversion failed
In addition: Warning message:
running command 'pandoc   -f markdown -t epub -o Rmd_to_Epub.epub "Rmd_to_Epub.utf8md"' had status 1

pander

Try pandoc[1] with a minimal reproducible example, you might give a try to my "pander" package [2] too:

library(pander)
Pandoc.brew(system.file('examples/minimal.brew', package='pander'),
            output = tempfile(), convert = 'docx')

Where the content of the "minimal.brew" file is something you might have got used to with Sweave - although it's using "brew" syntax instead. See the examples of pander [3] for more details. Please note that pandoc should be installed first, which is pretty easy on Windows.

  1. http://johnmacfarlane.net/pandoc/
  2. http://rapporter.github.com/pander/
  3. http://rapporter.github.com/pander/#examples

R2wd

Use R2wd package. However, only 32-bit R is allowed and sometimes it can not produce all 'table's.

> library(R2wd)
> wdGet()
Loading required package: rcom
Loading required package: rscproxy
rcom requires a current version of statconnDCOM installed.
To install statconnDCOM type
     installstatconnDCOM()

This will download and install the current version of statconnDCOM

You will need a working Internet connection
because installation needs to download a file.
Error in if (wdapp[["Documents"]][["Count"]] == 0) wdapp[["Documents"]]$Add() : 
  argument is of length zero 

The solution is to launch 32-bit R instead of 64-bit R since statconnDCOM does not support 64-bit R.

Convert from pdf to word

The best rendering of advanced tables is done by converting from pdf to Word. See http://biostat.mc.vanderbilt.edu/wiki/Main/SweaveConvert

rtf

Use rtf package for Rich Text Format (RTF) Output.

xtable

Package xtable will produce html output.

print(xtable(X), type="html")

If you save the file and then open it with Word, you will get serviceable results. I've had better luck copying the output from xtable and pasting it into Excel.

officer

  • CRAN. Microsoft Word, Microsoft Powerpoint and HTML documents generation from R.
  • The gist includes a comprehensive example that encompasses various elements such as sections, subsections, and tables. It also incorporates a detailed paragraph, along with visual representations created using base R plots and ggplots.
  • Add a line space
    doc <- body_add_par(doc, "")
    
    # Function to add n line spaces
    body_add_par_n <- function (doc, n) {
      for(i in 1:n){
        doc <- body_add_par(doc, "")
      }
      return(doc)
    }
    body_add_par_n(3)
    
  • Figures from the documentation of officeverse.
  • See Data frame to word table?.
  • See Office page for some code.
  • How to read and create Word Documents in R where we can extracting tables from Word Documents.
    x = read_docx("myfile.docx")
    content <- docx_summary(x) # a vector
    grep("nlme", content$text, ignore.case = T, value = T)
    

Powerpoint

PDF manipulation

staplr

R Graphs Gallery

COM client or server

Client

Server

RDCOMServer

Use R under proxy

http://support.rstudio.org/help/kb/faq/configuring-r-to-use-an-http-proxy

RStudio

  • Github
  • Installing RStudio (1.0.44) on Ubuntu will not install Java even the source code contains 37.5% Java??
  • Preview

rstudio.cloud

https://rstudio.cloud/

Launch RStudio

Multiple versions of R

Create .Rproj file

If you have an existing package that doesn't have an .Rproj file, you can use devtools::use_rstudio("path/to/package") to add it.

With an RStudio project file, you can

  • Restore .RData into workspace at startup
  • Save workspace to .RData on exit (or save.image("Robj.RData") & load("Robj.RData"))
  • Always save history (even if no saving .RData, savehistory(".Rhistory") & loadhistory(".Rhistory"))
  • etc

package search

https://github.com/RhoInc/CRANsearcher

Git

Visual Studio

R and Python support now built in to Visual Studio 2017

List files using regular expression

  • Extension
list.files(pattern = "\\.txt$")

where the dot (.) is a metacharacter. It is used to refer to any character.

  • Start with
list.files(pattern = "^Something")

Using Sys.glob()"' as

> Sys.glob("~/Downloads/*.txt")
[1] "/home/brb/Downloads/ip.txt"       "/home/brb/Downloads/valgrind.txt"

Hidden tool: rsync in Rtools

c:\Rtools\bin>rsync -avz "/cygdrive/c/users/limingc/Downloads/a.exe" "/cygdrive/c/users/limingc/Documents/"
sending incremental file list
a.exe

sent 323142 bytes  received 31 bytes  646346.00 bytes/sec
total size is 1198416  speedup is 3.71

c:\Rtools\bin>

Unforunately, if the destination is a network drive, I could get a permission denied (13) error. See also rsync file permissions on windows.

Install rgdal package (geospatial Data) on ubuntu

Terminal

sudo apt-get install libgdal1-dev libproj-dev # https://stackoverflow.com/a/44389304
sudo apt-get install libgdal1i # Ubuntu 16.04 https://stackoverflow.com/a/12143411

R

install.packages("rgdal")

Install sf package

I got the following error even I have installed some libraries.

checking GDAL version >= 2.0.1... no
configure: error: sf is not compatible with GDAL versions below 2.0.1

Then I follow the instruction here

sudo apt remove libgdal-dev
sudo apt remove libproj-dev
sudo apt remove gdal-bin
sudo add-apt-repository ppa:ubuntugis/ubuntugis-stable

sudo apt update
sudo apt-cache policy libgdal-dev # Make sure a version >= 2.0 appears 

sudo apt install libgdal-dev # works on ubuntu 20.04 too
                             # no need the previous lines

Database

RSQLite

Creating a new database:

library(DBI)

mydb <- dbConnect(RSQLite::SQLite(), "my-db.sqlite")
dbDisconnect(mydb)
unlink("my-db.sqlite")

# temporary database
mydb <- dbConnect(RSQLite::SQLite(), "")
dbDisconnect(mydb)

Loading data:

mydb <- dbConnect(RSQLite::SQLite(), "")
dbWriteTable(mydb, "mtcars", mtcars)
dbWriteTable(mydb, "iris", iris)

dbListTables(mydb)

dbListFields(con, "mtcars")

dbReadTable(con, "mtcars")

Queries:

dbGetQuery(mydb, 'SELECT * FROM mtcars LIMIT 5')

dbGetQuery(mydb, 'SELECT * FROM iris WHERE "Sepal.Length" < 4.6')

dbGetQuery(mydb, 'SELECT * FROM iris WHERE "Sepal.Length" < :x', params = list(x = 4.6))

res <- dbSendQuery(con, "SELECT * FROM mtcars WHERE cyl = 4")
dbFetch(res)

Batched queries:

dbClearResult(rs)
rs <- dbSendQuery(mydb, 'SELECT * FROM mtcars')
while (!dbHasCompleted(rs)) {
  df <- dbFetch(rs, n = 10)
  print(nrow(df))
}

dbClearResult(rs)

Multiple parameterised queries:

rs <- dbSendQuery(mydb, 'SELECT * FROM iris WHERE "Sepal.Length" = :x')
dbBind(rs, param = list(x = seq(4, 4.4, by = 0.1)))
nrow(dbFetch(rs))
#> [1] 4
dbClearResult(rs)

Statements:

dbExecute(mydb, 'DELETE FROM iris WHERE "Sepal.Length" < 4')
#> [1] 0
rs <- dbSendStatement(mydb, 'DELETE FROM iris WHERE "Sepal.Length" < :x')
dbBind(rs, param = list(x = 4.5))
dbGetRowsAffected(rs)
#> [1] 4
dbClearResult(rs)

sqldf

Manipulate R data frames using SQL. Depends on RSQLite. A use of gsub, reshape2 and sqldf with healthcare data

RPostgreSQL

RMySQL

MongoDB

odbc

RODBC

DBI

dbplyr

Create a new SQLite database:

surveys <- read.csv("data/surveys.csv")
plots <- read.csv("data/plots.csv")

my_db_file <- "portal-database.sqlite"
my_db <- src_sqlite(my_db_file, create = TRUE)

copy_to(my_db, surveys)
copy_to(my_db, plots)
my_db

Connect to a database:

download.file(url = "https://ndownloader.figshare.com/files/2292171",
              destfile = "portal_mammals.sqlite", mode = "wb")

library(dbplyr)
library(dplyr)
mammals <- src_sqlite("portal_mammals.sqlite")

Querying the database with the SQL syntax:

tbl(mammals, sql("SELECT year, species_id, plot_id FROM surveys"))

Querying the database with the dplyr syntax:

surveys <- tbl(mammals, "surveys")
surveys %>%
    select(year, species_id, plot_id)
head(surveys, n = 10)

show_query(head(surveys, n = 10)) # show which SQL commands are actually sent to the database

Simple database queries:

surveys %>%
  filter(weight < 5) %>%
  select(species_id, sex, weight)

Laziness (instruct R to stop being lazy):

data_subset <- surveys %>%
  filter(weight < 5) %>%
  select(species_id, sex, weight) %>%
  collect()

Complex database queries:

plots <- tbl(mammals, "plots")
plots # # The plot_id column features in the plots table

surveys # The plot_id column also features in the surveys table

# Join databases method 1
plots %>%
  filter(plot_id == 1) %>%
  inner_join(surveys) %>%
  collect()

NoSQL

nodbi: the NoSQL Database Connector

Github

R source

https://github.com/wch/r-source/ Daily update, interesting, should be visited every day. Clicking 1000+ commits to look at daily changes.

If we are interested in a certain branch (say 3.2), look for R-3-2-branch.

R packages (only) source (metacran)

Bioconductor packages source

Announcement, https://github.com/Bioconductor-mirror

Send local repository to Github in R by using reports package

http://www.youtube.com/watch?v=WdOI_-aZV0Y

My collection

How to download

Clone ~ Download.

  • Command line
git clone https://gist.github.com/4484270.git

This will create a subdirectory called '4484270' with all cloned files there.

  • Within R
library(devtools)
source_gist("4484270")

or First download the json file from

https://api.github.com/users/MYUSERLOGIN/gists

and then

library(RJSONIO)
x <- fromJSON("~/Downloads/gists.json")
setwd("~/Downloads/")
gist.id <- lapply(x, "[[", "id")
lapply(gist.id, function(x){
  cmd <- paste0("git clone https://gist.github.com/", x, ".git")
  system(cmd)
})

Jekyll

An Easy Start with Jekyll, for R-Bloggers

Connect R with Arduino

Android App

Common plots tips

Create an empty plot

plot.new()

Overlay plots

How to Overlay Plots in R-Quick Guide with Example.

#Step1:-create scatterplot
plot(x1, y1)
#Step 2:-overlay line plot
lines(x2, y2)
#Step3:-overlay scatterplot
points(x2, y2)

Save the par() and restore it

Example 1: Don't use old.par <- par() directly. no.readonly = FALSE by default. * The `no.readonly = TRUE` argument in the par() function in R is used to get the full list of graphical parameters that can be restored.

  • When you call `par()` with no arguments or `par(no.readonly = TRUE)`, it returns an invisible named list of all the graphical parameters. This includes both parameters that can be set and those that are read-only.
  • If we use par(old.par) where old.par <- par(), we will get several warning messages like 'In par(op) : graphical parameter "cin" cannot be set'.
old.par <- par(no.readonly = TRUE); par(mar = c(5, 4, 4, 2) - 2)  # OR in one step
old.par <- par(mar = c(5, 4, 4, 2) - 2)
## do plotting stuff with new settings
par(old.par)

Example 2: Use it inside a function with the on.exit(0 function.

ex <- function() {
   old.par <- par(no.readonly = TRUE) # all par settings which
                                      # could be changed.
   on.exit(par(old.par))
   ## ... do lots of par() settings and plots
   ## ...
   invisible() #-- now,  par(old.par)  will be executed
}

Example 3: It seems par() inside a function will affect the global environment. But if we use dev.off(), it will reset all parameters.

ex <- function() { par(mar=c(5,4,4,1)) }
ex()
par()$mar
ex = function() { png("~/Downloads/test.png"); par(mar=c(5,4,4,1)); dev.off()}
ex()
par()$mar

Grouped boxplots

Weather Time Line

The plot looks similar to a boxplot though it is not. See a screenshot on Android by Sam Ruston.

Horizontal bar plot

library(ggplot2)
dtf <- data.frame(x = c("ETB", "PMA", "PER", "KON", "TRA", 
                        "DDR", "BUM", "MAT", "HED", "EXP"),
                  y = c(.02, .11, -.01, -.03, -.03, .02, .1, -.01, -.02, 0.06))
ggplot(dtf, aes(x, y)) +
  geom_bar(stat = "identity", aes(fill = x), show.legend = FALSE) + 
  coord_flip() + xlab("") + ylab("Fold Change")   

File:Ggplot2bar.svg

Include bar values in a barplot

Use text().

Or use geom_text() if we are using the ggplot2 package. See an example here or this.

For stacked barplot, see this post.

Grouped barplots

library(ggplot2)
# mydata <- data.frame(OUTGRP, INGRP, value)
ggplot(mydata, aes(fill=INGRP, y=value, x=OUTGRP)) + 
       geom_bar(position="dodge", stat="identity")
> 1 - 2*(1-pnorm(1))
[1] 0.6826895
> 1 - 2*(1-pnorm(1.96))
[1] 0.9500042

Unicode symbols

Mind reader game, and Unicode symbols

Math expression

# Expressions
plot(x,y, xlab = expression(hat(x)[t]),
     ylab = expression(phi^{rho + a}),
     main = "Pure Expressions")

# Superscript
plot(1:10, main = expression("My Title"^2)) 
# Subscript
plot(1:10, main = expression("My Title"[2]))  

# Expressions with Spacing
# '~' is to add space and '*' is to squish characters together
plot(1:10, xlab= expression(Delta * 'C'))
plot(x,y, xlab = expression(hat(x)[t] ~ z ~ w),
     ylab = expression(phi^{rho + a} * z * w),
     main = "Pure Expressions with Spacing")

# Expressions with Text
plot(x,y, 
     xlab = expression(paste("Text here ", hat(x), " here ", z^rho, " and here")), 
     ylab = expression(paste("Here is some text of ", phi^{rho})), 
     main = "Expressions with Text")

# Substituting Expressions
plot(x,y, 
     xlab = substitute(paste("Here is ", pi, " = ", p), list(p = py)), 
     ylab = substitute(paste("e is = ", e ), list(e = ee)), 
     main = "Substituted Expressions")

Impose a line to a scatter plot

  • abline + lsfit # least squares
plot(cars)
abline(lsfit(cars[, 1], cars[, 2]))
# OR
abline(lm(cars[,2] ~ cars[,1]))
  • abline + line # robust line fitting
plot(cars)
(z <- line(cars))
abline(coef(z), col = 'green')
  • lines
plot(cars)
fit <- lm(cars[,2] ~ cars[,1])
lines(cars[,1], fitted(fit), col="blue")
lines(stats::lowess(cars), col='red')

How to actually make a quality scatterplot in R: axis(), mtext()

How to actually make a quality scatterplot in R

3D scatterplot

Rotating x axis labels for barplot

https://stackoverflow.com/questions/10286473/rotating-x-axis-labels-in-r-for-barplot

barplot(mytable,main="Car makes",ylab="Freqency",xlab="make",las=2)

Set R plots x axis to show at y=0

https://stackoverflow.com/questions/3422203/set-r-plots-x-axis-to-show-at-y-0

plot(1:10, rnorm(10), ylim=c(0,10), yaxs="i")

Different colors of axis labels in barplot

See Vary colors of axis labels in R based on another variable

Method 1: Append labels for the 2nd, 3rd, ... color gradually because 'col.axis' argument cannot accept more than one color.

tN <- table(Ni <- stats::rpois(100, lambda = 5))
r <- barplot(tN, col = rainbow(20))
axis(1, 1, LETTERS[1], col.axis="red", col="red")
axis(1, 2, LETTERS[2], col.axis="blue", col = "blue")

Method 2: text() which can accept multiple colors in 'col' parameter but we need to find out the (x, y) by ourselves.

barplot(tN, col = rainbow(20), axisnames = F)
text(4:6, par("usr")[3]-2 , LETTERS[4:6], col=c("black","red","blue"), xpd=TRUE)

Use text() to draw labels on X/Y-axis including rotation

par(mar = c(5, 6, 4, 5) + 0.1)
plot(..., xaxt = "n") # "n" suppresses plotting of the axis; need mtext() and axis() to supplement
text(x = barCenters, y = par("usr")[3] - 1, srt = 45,
     adj = 1, labels = myData$names, xpd = TRUE)

Vertically stacked plots with the same x axis

https://stackoverflow.com/questions/11794436/stacking-multiple-plots-vertically-with-the-same-x-axis-but-different-y-axes-in

Include labels on the top axis/margin: axis() and mtext()

plot(1:4, rnorm(4), axes = FALSE)
axis(3, at=1:4, labels = LETTERS[1:4], tick = FALSE, line = -0.5) # las, cex.axis
box()
mtext("Groups selected", cex = 0.8, line = 1.5) # default side = 3

See also 15_Questions_All_R_Users_Have_About_Plots

This can be used to annotate each plot with the script name, date, ...

mtext(text=paste("Prepared on", format(Sys.time(), "%d %B %Y at %H:%M")), 
      adj=.99,  # text align to right 
      cex=.75, side=3, las=1, line=2)

ggplot2 uses breaks instead of at parameter. See ggplot2 → Add axis on top or right hand side, ggplot2 → scale_x_continus(name, breaks, labels) and the scale_continuous documentation.

Legend tips

Add legend to a plot in R

Increase/decrease legend font size cex & ggplot2 package case.

plot(rnorm(100))
# op <- par(cex=2)
legend("topleft", legend = 1:4, col=1:4, pch=1, lwd=2, lty = 1, cex =2)
# par(op)

legend inset. Default is 0. % (from 0 to 1) to draw the legend away from x and y axis. The inset argument with negative values moves the legend outside the plot.

legend("bottomright", inset=.05, )

legend without a box

legend(, bty = "n")

Add a legend title

legend(, title = "")

Add a common legend to multiple plots. Use the layout function.

Superimpose a density plot or any curves

Use lines().

Example 1

plot(cars, main = "Stopping Distance versus Speed")
lines(stats::lowess(cars))

plot(density(x), col = "#6F69AC", lwd = 3)
lines(density(y), col = "#95DAC1", lwd = 3)
lines(density(z), col = "#FFEBA1", lwd = 3)

Example 2

require(survival)
n = 10000
beta1 = 2; beta2 = -1
lambdaT = 1 # baseline hazard
lambdaC = 2  # hazard of censoring
set.seed(1234)
x1 = rnorm(n,0)
x2 = rnorm(n,0)
# true event time
T = rweibull(n, shape=1, scale=lambdaT*exp(-beta1*x1-beta2*x2)) 
C <- rweibull(n, shape=1, scale=lambdaC)   
time = pmin(T,C)  
status <- 1*(T <= C) 
status2 <- 1-status
plot(survfit(Surv(time, status2) ~ 1), 
     ylab="Survival probability",
     main = 'Exponential censoring time')
xseq <- seq(.1, max(time), length =100)
func <- function(x) 1-pweibull(x, shape = 1, scale = lambdaC)
lines(xseq, func(xseq), col = 'red') # survival function of Weibull

Example 3. Use ggplot(df, aes(x = x, color = factor(grp))) + geom_density(). Then each density curve will represent data from each "grp".

log scale

If we set y-axis to use log-scale, then what we display is the value log(Y) or log10(Y) though we still label the values using the input. For example, when we plot c(1, 10, 100) using the log scale, it is like we draw log10(c(1, 10, 100)) = c(0,1,2) on the plot but label the axis using the true values c(1, 10, 100).

File:Logscale.png

Custom scales

Using custom scales with the 'scales' package

Time series

Time series stock price plot

library(quantmod)
getSymbols("AAPL")
getSymbols("IBM") # similar to AAPL
getSymbols("CSCO") # much smaller than AAPL, IBM
getSymbols("DJI") # Dow Jones, huge 
chart_Series(Cl(AAPL), TA="add_TA(Cl(IBM), col='blue', on=1); add_TA(Cl(CSCO), col = 'green', on=1)", 
    col='orange', subset = '2017::2017-08')

tail(Cl(DJI))

tidyquant: Getting stock data

The 'largest stock profit or loss' puzzle: efficient computation in R

Timeline plot

Clockify

Clockify

Circular plot

Word cloud

Text mining

World map

Visualising SSH attacks with R (rworldmap and rgeolocate packages)

Diagram/flowchart/Directed acyclic diagrams (DAGs)

DiagrammeR

diagram

Functions for Visualising Simple Graphs (Networks), Plotting Flow Diagrams

DAGitty (browser-based and R package)

dagR

Gmisc

Easiest flowcharts eveR?

Concept Maps

concept-maps where the diagrams are generated from https://app.diagrams.net/.

flow

flow, How To Draw Flow Diagrams In R

Venn Diagram

Venn diagram

hexbin plot

Bump chart/Metro map

https://dominikkoch.github.io/Bump-Chart/

Amazing/special plots

See Amazing plot.

Google Analytics

GAR package

http://www.analyticsforfun.com/2015/10/query-your-google-analytics-data-with.html

Linear Programming

http://www.r-bloggers.com/modeling-and-solving-linear-programming-with-r-free-book/

Linear Algebra

Amazon Alexa

R and Singularity

https://rviews.rstudio.com/2017/03/29/r-and-singularity/

Teach kids about R with Minecraft

http://blog.revolutionanalytics.com/2017/06/teach-kids-about-r-with-minecraft.html

Secure API keys

Securely store API keys in R scripts with the "secret" package

Credentials and secrets

How to manage credentials and secrets safely in R

Hide a password

keyring package

getPass

getPass

Vision and image recognition

Creating a Dataset from an Image

Creating a Dataset from an Image in R Markdown using reticulate

Turn pictures into coloring pages

https://gist.github.com/jeroen/53a5f721cf81de2acba82ea47d0b19d0

Numerical optimization

CRAN Task View: Numerical Mathematics, CRAN Task View: Optimization and Mathematical Programming

Ryacas: R Interface to the 'Yacas' Computer Algebra System

Doing Maths Symbolically: R as a Computer Algebra System (CAS)

Game

Music

  • gm. Require to install MuseScore, an open source and free notation software.

SAS

sasMap Static code analysis for SAS scripts

R packages

R packages

Tricks

Getting help

Better Coder/coding, best practices

E-notation

6.022E23 (or 6.022e23) is equivalent to 6.022×10^23

Getting user's home directory

See What are HOME and working directories?

# Windows
normalizePath("~")   # "C:\\Users\\brb\\Documents"
Sys.getenv("R_USER") # "C:/Users/brb/Documents"
Sys.getenv("HOME")   # "C:/Users/brb/Documents"

# Mac
normalizePath("~")   # [1] "/Users/brb"
Sys.getenv("R_USER") # [1] ""
Sys.getenv("HOME")   # "/Users/brb"

# Linux
normalizePath("~")   # [1] "/home/brb"
Sys.getenv("R_USER") # [1] ""
Sys.getenv("HOME")   # [1] "/home/brb"

tempdir()

  • The path is a per-session temporary directory. On parallel use, R processes forked by functions such as mclapply and makeForkCluster in package parallel share a per-session temporary directory.
  • Set temporary folder for R in Rstudio server

Distinguish Windows and Linux/Mac, R.Version()

identical(.Platform$OS.type, "unix") returns TRUE on Mac and Linux.

get_os <- function(){
  sysinf <- Sys.info()
  if (!is.null(sysinf)){
    os <- sysinf['sysname']
    if (os == 'Darwin')
      os <- "osx"
  } else { ## mystery machine
    os <- .Platform$OS.type
    if (grepl("^darwin", R.version$os))
      os <- "osx"
    if (grepl("linux-gnu", R.version$os))
      os <- "linux"
  }
  tolower(os)
}
names(R.Version())
#  [1] "platform"       "arch"           "os"             "system"        
#  [5] "status"         "major"          "minor"          "year"          
#  [9] "month"          "day"            "svn rev"        "language"      
# [13] "version.string" "nickname" 
getRversion()
# [1] ‘4.3.0’

Rprofile.site, Renviron.site (all platforms) and Rconsole (Windows only)

If we like to install R packages to a personal directory, follow this. Just add the line

R_LIBS_SITE=F:/R/library

to the file R_HOME/etc/x64/Renviron.site. In R, run Sys.getenv("R_LIBS_SITE") or Sys.getenv("R_LIBS_USER") to query the environment variable. See Environment Variables.

What is the best place to save Rconsole on Windows platform

Put/create the file <Rconsole> under C:/Users/USERNAME/Documents folder so no matter how R was upgraded/downgraded, it always find my preference.

My preferred settings:

  • Font: Consolas (it will be shown as "TT Consolas" in Rconsole)
  • Size: 12
  • background: black
  • normaltext: white
  • usertext: GreenYellow or orange (close to RStudio's Cobalt theme) or sienna1 or SpringGreen or tan1 or yellow

and others (default options)

  • pagebg: white
  • pagetext: navy
  • highlight: DarkRed
  • dataeditbg: white
  • dataedittext: navy (View() function)
  • dataedituser: red
  • editorbg: white (edit() function)
  • editortext: black

A copy of the Rconsole is saved in github.

How R starts up

https://rstats.wtf/r-startup.html

startup - Friendly R Startup Configuration

https://github.com/henrikbengtsson/startup

Saving and loading history automatically: .Rprofile & local()

  • savehistory("filename"). It will save everything from the beginning to the command savehistory() to a text file.
  • .Rprofile will automatically be loaded when R has started from that directory
  • Don't do things in your .Rprofile that affect how R code runs, such as loading a package like dplyr or ggplot or setting an option such as stringsAsFactors = FALSE. See Project-oriented workflow.
  • .Rprofile has been created/used by the packrat package to restore a packrat environment. See the packrat/init.R file and R packages → packrat.
  • Customizing Startup from R in Action, Fun with .Rprofile and customizing R startup
    • You can also place a .Rprofile file in any directory that you are going to run R from or in the user home directory.
    • At startup, R will source the Rprofile.site file. It will then look for a .Rprofile file to source in the current working directory. If it doesn't find it, it will look for one in the user's home directory.
    options(continue="  ") # default is "+ "
    options(prompt="R> ", continue=" ")
    options(editor="nano") # default is "vi" on Linux
    # options(htmlhelp=TRUE) 
    
    local({r <- getOption("repos")
          r["CRAN"] <- "https://cran.rstudio.com"
          options(repos=r)})
    
    .First <- function(){
     # library(tidyverse)
     cat("\nWelcome at", date(), "\n")
    }
    
    .Last <- function(){
     cat("\nGoodbye at ", date(), "\n")
    }  
    
  • https://stackoverflow.com/questions/16734937/saving-and-loading-history-automatically
  • The history file will always be read from the $HOME directory and the history file will be overwritten by a new session. These two problems can be solved if we define R_HISTFILE system variable.
  • local() function can be used in .Rprofile file to set up the environment even no new variables will be created (change repository, install packages, load libraries, source R files, run system() function, file/directory I/O, etc)

Linux or Mac

In ~/.profile or ~/.bashrc I put:

export R_HISTFILE=~/.Rhistory

In ~/.Rprofile I put:

if (interactive()) {
  if (.Platform$OS.type == "unix")  .First <- function() try(utils::loadhistory("~/.Rhistory")) 
  .Last <- function() try(savehistory(file.path(Sys.getenv("HOME"), ".Rhistory")))
}

Windows

If you launch R by clicking its icon from Windows Desktop, the R starts in C:\User\$USER\Documents directory. So we can create a new file .Rprofile in this directory.

if (interactive()) {
  .Last <- function() try(savehistory(file.path(Sys.getenv("HOME"), ".Rhistory")))
}

Disable "Save workspace image?" prompt when exit R?

How to disable "Save workspace image?" prompt in R?

R release versions

rversions: Query the main 'R' 'SVN' repository to find the released versions & dates.

getRversion()

getRversion()
[1] ‘4.3.0’

Detect number of running R instances in Windows

C:\Program Files\R>tasklist /FI "IMAGENAME eq Rscript.exe"
INFO: No tasks are running which match the specified criteria.

C:\Program Files\R>tasklist /FI "IMAGENAME eq Rgui.exe"

Image Name                     PID Session Name        Session#    Mem Usage
============================================================================
Rgui.exe                      1096 Console                    1     44,712 K

C:\Program Files\R>tasklist /FI "IMAGENAME eq Rserve.exe"

Image Name                     PID Session Name        Session#    Mem Usage
============================================================================
Rserve.exe                    6108 Console                    1    381,796 K

In R, we can use

> system('tasklist /FI "IMAGENAME eq Rgui.exe" ', intern = TRUE)
[1] ""                                                                            
[2] "Image Name                     PID Session Name        Session#    Mem Usage"
[3] "============================================================================"
[4] "Rgui.exe                      1096 Console                    1     44,804 K"

> length(system('tasklist /FI "IMAGENAME eq Rgui.exe" ', intern = TRUE))-3

Editor

http://en.wikipedia.org/wiki/R_(programming_language)#Editors_and_IDEs

  • Emacs + ESS. The ESS is useful in the case I want to tidy R code (the tidy_source() function in the formatR package sometimes gives errors; eg when I tested it on an R file like <GetComparisonResults.R> from BRB-ArrayTools v4.4 stable).
    • Edit the file C:\Program Files\GNU Emacs 23.2\site-lisp\site-start.el with something like
    (setq-default inferior-R-program-name
                  "c:/program files/r/r-2.15.2/bin/i386/rterm.exe")
    

GUI for Data Analysis

Update to Data Science Software Popularity 6/7/2023

BlueSky Statistics

Rcmdr

http://cran.r-project.org/web/packages/Rcmdr/index.html. After loading a dataset, click Statistics -> Fit models. Then select Linear regression, Linear model, GLM, Multinomial logit model, Ordinal regression model, Linear mixed model, and Generalized linear mixed model. However, Rcmdr does not include, e.g. random forest, SVM, glmnet, et al.

Deducer

http://cran.r-project.org/web/packages/Deducer/index.html

jamovi

Scope

See

source()

## foo.R ##
cat(ArrayTools, "\n")
## End of foo.R

# 1. Error
predict <- function() {
  ArrayTools <- "C:/Program Files" # or through load() function 
  source("foo.R")                  # or through a function call; foo()
}
predict()   # Object ArrayTools not found

# 2. OK. Make the variable global
predict <- function() {
  ArrayTools <<- "C:/Program Files'
  source("foo.R")
}
predict()  
ArrayTools

# 3. OK. Create a global variable
ArrayTools <- "C:/Program Files"
predict <- function() {
  source("foo.R")
}
predict()

Note that any ordinary assignments done within the function are local and temporary and are lost after exit from the function.

Example 1.

> ttt <- data.frame(type=letters[1:5], JpnTest=rep("999", 5), stringsAsFactors = F)
> ttt
  type JpnTest
1    a     999
2    b     999
3    c     999
4    d     999
5    e     999
> jpntest <- function() { ttt$JpnTest[1] ="N5"; print(ttt)}
> jpntest()
  type JpnTest
1    a      N5
2    b     999
3    c     999
4    d     999
5    e     999
> ttt
  type JpnTest
1    a     999
2    b     999
3    c     999
4    d     999
5    e     999

Example 2. How can we set global variables inside a function? The answer is to use the "<<-" operator or assign(, , envir = .GlobalEnv) function.

Other resource: Advanced R by Hadley Wickham.

Example 3. Writing functions in R, keeping scoping in mind

New environment

Run the same function on a bunch of R objects

mye = new.env()
load(<filename>, mye)
for(n in names(mye)) n = as_tibble(mye[[n]])

Just look at the contents of rda file without saving to anywhere (?load)

local({
   load("myfile.rda")
   ls()
})

Or use attach() which is a wrapper of load(). It creates an environment and slots it into the list right after the global environment, then populates it with the objects we're attaching.

attach("all.rda") # safer and will warn about masked objects w/ same name in .GlobalEnv
ls(pos = 2)
##  also typically need to cleanup the search path:
detach("file:all.rda")

If we want to read data from internet, load() works but not attach().

con <- url("http://some.where.net/R/data/example.rda")
## print the value to see what objects were created.
print(load(con))
close(con)
# Github example
# https://stackoverflow.com/a/62954840

source() case.

myEnv <- new.env()    
source("some_other_script.R", local=myEnv)
attach(myEnv, name="sourced_scripts")
search()
ls(2)
ls(myEnv)
with(myEnv, print(x))

str( , max) function

Use max.level parameter to avoid a long display of the structure of a complex R object. Use give.head = FALSE to hide the attributes. See ?str

If we use str() on a function like str(lm), it is equivalent to args(lm)

For a complicated list object, it is useful to use the max.level argument; e.g. str(, max.level = 1)

For a large data frame, we can use the tibble() function; e.g. mydf %>% tibble()

tidy() function

broom::tidy() provides a simplified form of an R object (obtained from running some analysis). See here.

View all objects present in a package, ls()

https://stackoverflow.com/a/30392688. In the case of an R package created by Rcpp.package.skeleton("mypackage"), we will get

> devtools::load_all("mypackage")
> search()
 [1] ".GlobalEnv"        "devtools_shims"    "package:mypackage"
 [4] "package:stats"     "package:graphics"  "package:grDevices"
 [7] "package:utils"     "package:datasets"  "package:methods"
[10] "Autoloads"         "package:base"

> ls("package:mypackage")
[1] "_mypackage_rcpp_hello_world" "evalCpp"                     "library.dynam.unload"       
[4] "rcpp_hello_world"            "system.file"

Note that the first argument of ls() (or detach()) is used to specify the environment. It can be

  • an integer (the position in the ‘search’ list);
  • the character string name of an element in the search list;
  • an explicit ‘environment’ (including using ‘sys.frame’ to access the currently active function calls).

Speedup R code

Profiler

&& vs &

See https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/Logic.

  • The shorter form performs elementwise comparisons in much the same way as arithmetic operators. The return is a vector.
  • The longer form evaluates left to right examining only the first element of each vector. The return is one value.
  • The longer form evaluates left to right examining only the first element of each vector. Evaluation proceeds only until the result is determined.
  • The idea of the longer form && in R seems to be the same as the && operator in linux shell; see here.
  • Single or double?: AND operator and OR operator in R. The confusion might come from the inconsistency when choosing these operators in different languages. For example, in C, & performs bitwise AND, while && does Boolean logical AND.
  • Think of && as a stricter &
c(T,F,T) & c(T,T,T)
# [1]  TRUE FALSE  TRUE
c(T,F,T) && c(T,T,T)
# [1] TRUE
c(T,F,T) && c(F,T,T)
# [1] FALSE
c(T,F,T) && c(NA,T,T)
# [1] NA
# Assume 'b' is not defined
> if (TRUE && b==3) cat("end")
Error: object 'b' not found
> if (FALSE && b==3) cat("end")
> # No error since the 2nd condition is never evaluated

It's useful in functions(). We don't need nested if statements. In this case if 'arg' is missing, the argument 'L' is not needed so there is not syntax error.

> foo <- function(arg, L) {
   # Suppose 'L' is meaningful only if 'arg' is provided
   # 
   # Evaluate 'L' only if 'arg' is provided
   #
   if (!missing(arg) && L) {
     print("L is true")
   } else {
     print("Either arg is missing or L is FALSE")
   }
 }
> foo()
[1] "arg is missing or L is FALSE"
> foo("a", F)
[1] "arg is missing or L is FALSE"
> foo("a", T)
[1] "L is true"

Other examples: && is more flexible than &.

nspot <- ifelse(missing(rvm) || !rvm, nrow(exprTrain), sum(filter))

if (!is.null(exprTest) && any(is.na(exprTest))) { ... }

for-loop, control flow

Vectorization

sapply vs vectorization

Speed test: sapply vs vectorization

lapply vs for loop

split() and sapply()

split() can be used to split a vector, columns or rows. See How to split a data frame?

  • Split divides the data in the vector or data frame x into the groups defined by f. The syntax is
    split(x, f, drop = FALSE, …)
    
  • split() + cut(). How to Split Data into Equal Sized Groups in R: A Comprehensive Guide for Beginners
  • Split a vector into chunks. split() returns a vector/indices and the indices can be used in lapply() to subset the data. Useful for the split() + lapply() + do.call() or split() + sapply() operations.
    d <- 1:10
    chunksize <- 4
    ceiling(1:10/4)
    # [1] 1 1 1 1 2 2 2 2 3 3
    split(d, ceiling(seq_along(d)/chunksize))
    # $`1`
    # [1] 1 2 3 4
    #
    # $`2`
    # [1] 5 6 7 8
    #
    # $`3`
    # [1]  9 10
    do.call(c, lapply(split(d, ceiling(seq_along(d)/4)), function(x) sum(x)) ) 
    #  1  2  3 
    # 10 26 19
    
    # bigmemory vignette
    planeindices <- split(1:nrow(x), x[,'TailNum'])
    planeStart <- sapply(planeindices,
                         function(i) birthmonth(x[i, c('Year','Month'),
                                                drop=FALSE]))
    
  • Split rows of a data frame/matrix; e.g. rows represents genes. The data frame/matrix is split directly.
    split(mtcars,mtcars$cyl)
    
    split(data.frame(matrix(1:20, nr=10) ), ceiling(1:10/chunksize)) # data.frame/tibble works
    split.data.frame(matrix(1:20, nr=10), ceiling(1:10/chunksize))   # split.data.frame() works for matrices
    
  • Split columns of a data frame/matrix.
    ma <- cbind(x = 1:10, y = (-4:5)^2, z = 11:20)
    split(ma, cbind(rep(1,10), rep(2, 10), rep(1,10))) # not an interesting example
    # $`1`
    #  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
    #
    # $`2`
    #  [1] 16  9  4  1  0  1  4  9 16 25
    
  • split() + sapply() to merge columns. See below Mean of duplicated columns for more detail.
  • split() + sapply() to split a vector. See nsFilter() function which can remove duplicated probesets/rows using unique Entrez Gene IDs (genefilter package). The source code of nsFilter() and findLargest().
    tSsp = split.default(testStat, lls) 
    # testStat is a vector of numerics including probeset IDs as names
    # lls is a vector of entrez IDs (same length as testStat)
    # tSSp is a list of the same length as unique elements of lls.
    
    sapply(tSsp, function(x) names(which.max(x))) 
    # return a vector of probset IDs of length of unique entrez IDs
    

strsplit and sapply

> namedf <- c("John ABC", "Mary CDE", "Kat FGH")
> strsplit(namedf, " ")
1
[1] "John" "ABC" 

2
[1] "Mary" "CDE" 

3
[1] "Kat" "FGH"

> sapply(strsplit(namedf, " "), "[", 1)
[1] "John" "Mary" "Kat" 
> sapply(strsplit(namedf, " "), "[", 2)
[1] "ABC" "CDE" "FGH"

Mean of duplicated columns: rowMeans; compute Means by each row

  • Reduce columns of a matrix by a function in R. To use rowMedians() instead of rowMeans(), we need to install matrixStats from CRAN.
    set.seed(1)
    x <- matrix(1:60, nr=10); x[1, 2:3] <- NA
    colnames(x) <- c("b", "b", "b", "c", "a", "a"); x
    res <- sapply(split(1:ncol(x), colnames(x)), 
                  function(i) rowMeans(x[, i, drop=F], na.rm = TRUE))
    res  # notice the sorting of columns
           a  b  c
     [1,] 46  1 31
     [2,] 47 12 32
     [3,] 48 13 33
     [4,] 49 14 34
     [5,] 50 15 35
     [6,] 51 16 36
     [7,] 52 17 37
     [8,] 53 18 38
     [9,] 54 19 39
    [10,] 55 20 40
    
    # vapply() is safter than sapply(). 
    # The 3rd arg in vapply() is a template of the return value.
    res2 <- vapply(split(1:ncol(x), colnames(x)), 
                   function(i) rowMeans(x[, i, drop=F], na.rm = TRUE),
                   rep(0, nrow(x)))
  • colSums, rowSums, colMeans, rowMeans (no group variable). These functions are equivalent to use of ‘apply’ with ‘FUN = mean’ or ‘FUN = sum’ with appropriate margins, but are a lot faster.
    rowMeans(x, na.rm=T)
    # [1] 31 27 28 29 30 31 32 33 34 35
    
    apply(x, 1, mean, na.rm=T)
    # [1] 31 27 28 29 30 31 32 33 34 35
    
  • matrixStats: Functions that Apply to Rows and Columns of Matrices (and to Vectors)
  • From for() loops to the split-apply-combine paradigm for column-wise tasks: the transition for a dinosaur

Mean of duplicated rows: colMeans and rowsum

  • colMeans(x, na.rm = FALSE, dims = 1), take mean per columns & sum over rows. It returns a vector. Other similar idea functions include colSums, rowSums, rowMeans.
    x <- matrix(1:60, nr=10); x[1, 2:3] <- NA; x
    rownames(x) <- c(rep("b", 2), rep("c", 3), rep("d", 4), "a") # move 'a' to the last
    res <- sapply(split(1:nrow(x), rownames(x)), 
                  function(i) colMeans(x[i, , drop=F], na.rm = TRUE))
    res <- t(res) # transpose is needed since sapply() will form the resulting matrix by columns
    res  # still a matrix, rows are ordered
    #   [,1] [,2] [,3] [,4] [,5] [,6]
    # a 10.0 20.0 30.0 40.0 50.0 60.0
    # b  1.5 12.0 22.0 31.5 41.5 51.5
    # c  4.0 14.0 24.0 34.0 44.0 54.0
    # d  7.5 17.5 27.5 37.5 47.5 57.5
    table(rownames(x))
    # a b c d
    # 1 2 3 4
    
    aggregate(x, list(rownames(x)), FUN=mean, na.rm = T) # EASY, but it becomes a data frame, rows are ordered
    #   Group.1   V1   V2   V3   V4   V5   V6
    # 1       a 10.0 20.0 30.0 40.0 50.0 60.0
    # 2       b  1.5 12.0 22.0 31.5 41.5 51.5
    # 3       c  4.0 14.0 24.0 34.0 44.0 54.0
    # 4       d  7.5 17.5 27.5 37.5 47.5 57.5
    
  • Reduce multiple probes by the maximally expressed probe (set) measured by average intensity across arrays
  • rowsum(x, group, reorder = TRUE, …). Sum over rows. It returns a matrix. This is very special. It's not the same as rowSums. There is no "colsum" function. It has the speed advantage over sapply+colSums OR aggregate.
    group <- rownames(x)
    rowsum(x, group, na.rm=T)/as.vector(table(group))
    #   [,1] [,2] [,3] [,4] [,5] [,6]
    # a 10.0 20.0 30.0 40.0 50.0 60.0
    # b  1.5  6.0 11.0 31.5 41.5 51.5
    # c  4.0 14.0 24.0 34.0 44.0 54.0
    # d  7.5 17.5 27.5 37.5 47.5 57.5
    
  • by() function. Calculating change from baseline in R
  • See aggregate Function in R- A powerful tool for data frames & summarize in r, Data Summarization In R
  • aggregate() function. Too slow! http://slowkow.com/2015/01/28/data-table-aggregate/. Don't use aggregate post.
    > attach(mtcars)
    dim(mtcars)
    [1] 32 11
    > head(mtcars)
                       mpg cyl disp  hp drat    wt  qsec vs am gear carb
    Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
    Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
    Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
    Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
    Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
    Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
    > with(mtcars, table(cyl, vs))
       vs
    cyl  0  1
      4  1 10
      6  3  4
      8 14  0
    > aggdata <-aggregate(mtcars, by=list(cyl,vs),  FUN=mean, na.rm=TRUE)
    > print(aggdata)
      Group.1 Group.2      mpg cyl   disp       hp     drat       wt     qsec vs
    1       4       0 26.00000   4 120.30  91.0000 4.430000 2.140000 16.70000  0
    2       6       0 20.56667   6 155.00 131.6667 3.806667 2.755000 16.32667  0
    3       8       0 15.10000   8 353.10 209.2143 3.229286 3.999214 16.77214  0
    4       4       1 26.73000   4 103.62  81.8000 4.035000 2.300300 19.38100  1
    5       6       1 19.12500   6 204.55 115.2500 3.420000 3.388750 19.21500  1
             am     gear     carb
    1 1.0000000 5.000000 2.000000
    2 1.0000000 4.333333 4.666667
    3 0.1428571 3.285714 3.500000
    4 0.7000000 4.000000 1.500000
    5 0.0000000 3.500000 2.500000
    > detach(mtcars)
    
    # Another example: select rows with a minimum value from a certain column (yval in this case)
    > mydf <- read.table(header=T, text='
     id xval yval
     A 1  1
     A -2  2
     B 3  3
     B 4  4
     C 5  5
     ')
    > x = mydf$xval
    > y = mydf$yval
    > aggregate(mydf[, c(2,3)], by=list(id=mydf$id), FUN=function(x) x[which.min(y)])
      id xval yval
    1  A    1    1
    2  B    3    3
    3  C    5    5
    

Mean by Group

Mean by Group in R (2 Examples) | dplyr Package vs. Base R

aggregate(x = iris$Sepal.Length,                # Specify data column
          by = list(iris$Species),              # Specify group indicator
          FUN = mean)                           # Specify function (i.e. mean)
library(dplyr)
iris %>%                                        # Specify data frame
  group_by(Species) %>%                         # Specify group indicator
  summarise_at(vars(Sepal.Length),              # Specify column
               list(name = mean))               # Specify function
  • ave(x, ..., FUN),
  • aggregate(x, by, FUN),
  • by(x, INDICES, FUN): return is a list
  • tapply(): return results as a matrix or array. Useful for ragged array.

Apply family

Vectorize, aggregate, apply, by, eapply, lapply, mapply, rapply, replicate, scale, sapply, split, tapply, and vapply.

The following list gives a hierarchical relationship among these functions.

  • apply(X, MARGIN, FUN, ...) – Apply a Functions Over Array Margins
  • lapply(X, FUN, ...) – Apply a Function over a List (including a data frame) or Vector X.
    • sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE) – Apply a Function over a List or Vector
      • replicate(n, expr, simplify = "array")
    • mapply(FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE) – Multivariate version of sapply
      • Vectorize(FUN, vectorize.args = arg.names, SIMPLIFY = TRUE, USE.NAMES = TRUE) - Vectorize a Scalar Function
      • Map(FUN, ...) A wrapper to mapply with SIMPLIFY = FALSE, so it is guaranteed to return a list.
    • vapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE) – similar to sapply, but has a pre-specified type of return value
    • rapply(object, f, classes = "ANY", deflt = NULL, how = c("unlist", "replace", "list"), ...) – A recursive version of lapply
  • tapply(V, INDEX, FUN = NULL, ..., default = NA, simplify = TRUE) – Apply a Function Over a "Ragged" Array. V is typically a vector where split() will be applied. INDEX is a list of one or more factors.
    • aggregate(D, by, FUN, ..., simplify = TRUE, drop = TRUE) - Apply a function to each columns of subset data frame split by factors. FUN (such as mean(), weighted.mean(), sum()) is a simple function applied to a vector. D is typically a data frame. This is used to summarize data.
    • by(D, INDICES, FUN, ..., simplify = TRUE) - Apply a Function to each subset data frame split by factors. FUN (such as summary(), lm()) is applied to a data frame. D is typically a data frame.
  • eapply(env, FUN, ..., all.names = FALSE, USE.NAMES = TRUE) – Apply a Function over values in an environment

Difference between apply vs sapply vs lapply vs tapply?

  • apply - When you want to apply a function to the rows or columns or both of a matrix and output is a one-dimensional if only row or column is selected else it is a 2D-matrix
  • lapply - When you want to apply a function to each element of a list in turn and get a list back.
  • sapply - When you want to apply a function to each element of a list in turn, but you want a vector back, rather than a list.
  • tapply - When you want to apply a function to subsets of a vector and the subsets are defined by some other vector, usually a factor.

Some short examples:

Apply vs for loop

Note that, apply's performance is not always better than a for loop. See

Progress bar

What is the cost of a progress bar in R?

The package 'pbapply' creates a text-mode progress bar - it works on any platforms. On Windows platform, check out this post. It uses winProgressBar() and setWinProgressBar() functions.

e-Rum 2020 Slides on Progressr by Henrik Bengtsson. progressr 0.8.0: RStudio's progress bar, Shiny progress updates, and absolute progress, progressr 0.10.1: Plyr Now Supports Progress Updates also in Parallel

simplify option in sapply()

library(KEGGREST)

names1 <- keggGet(c("hsa05340", "hsa05410"))
names2 <- sapply(names1, function(x) x$GENE)
length(names2)  # same if we use lapply() above
# [1] 2

names3 <- keggGet(c("hsa05340"))
names4 <- sapply(names3, function(x) x$GENE)
length(names4)  # may or may not be what we expect
# [1] 76
names4 <- sapply(names3, function(x) x$GENE, simplify = FALSE)
length(names4)  # same if we use lapply() w/o simplify 
# [1] 1

lapply and its friends Map(), Reduce(), Filter() from the base package for manipulating lists

  • mapply() documentation. Use mapply() to merge lists.
    mapply(rep, 1:4, 4:1)
    mapply(rep, times = 1:4, x = 4:1)
    mapply(function(x, y) seq_len(x) + y,
           c(a =  1, b = 2, c = 3),  # names from first
           c(A = 10, B = 0, C = -10))
    mapply(c, firstList, secondList, SIMPLIFY=FALSE)
    
  • Finding the Expected value of the maximum of two Bivariate Normal variables with simulation sapply + mapply.
    z <- mapply(function(u, v) { max(u, v) }, 
                u = x[, 1], v = x[, 2])
    
  • Map() and Reduce() in functional programming
  • Map(), Reduce(), and Filter() from Advanced R by Hadley
    • If you have two or more lists (or data frames) that you need to process in parallel, use Map(). One good example is to compute the weighted.mean() function that requires two input objects. Map() is similar to mapply() function and is more concise than lapply(). Advanced R has a comment that Map() is better than mapply().
      # Syntax: Map(f, ...)
      
      xs <- replicate(5, runif(10), simplify = FALSE)
      ws <- replicate(5, rpois(10, 5) + 1, simplify = FALSE)
      Map(weighted.mean, xs, ws)
      
      # instead of a more clumsy way
      lapply(seq_along(xs), function(i) {
        weighted.mean(xsi, wsi)
      })
      
    • Reduce() reduces a vector, x, to a single value by recursively calling a function, f, two arguments at a time. A good example of using Reduce() function is to read a list of matrix files and merge them. See How to combine multiple matrix frames into one using R?
      # Syntax: Reduce(f, x, ...)
      
      > m1 <- data.frame(id=letters[1:4], val=1:4)
      > m2 <- data.frame(id=letters[2:6], val=2:6)
      > merge(m1, m2, "id", all = T)
        id val.x val.y
      1  a     1    NA
      2  b     2     2
      3  c     3     3
      4  d     4     4
      5  e    NA     5
      6  f    NA     6
      > m <- list(m1, m2)
      > Reduce(function(x,y) merge(x,y, "id",all=T), m)
        id val.x val.y
      1  a     1    NA
      2  b     2     2
      3  c     3     3
      4  d     4     4
      5  e    NA     5
      6  f    NA     6
      

sapply & vapply

See parallel::parSapply() for a parallel version of sapply(1:n, function(x)). We can this technique to speed up this example.

rapply - recursive version of lapply

replicate

https://www.datacamp.com/community/tutorials/tutorial-on-loops-in-r

> replicate(5, rnorm(3))
           [,1]       [,2]       [,3]      [,4]        [,5]
[1,]  0.2509130 -0.3526600 -0.3170790  1.064816 -0.53708856
[2,]  0.5222548  1.5343319  0.6120194 -1.811913 -1.09352459
[3,] -1.9905533 -0.8902026 -0.5489822  1.308273  0.08773477

See parSapply() for a parallel version of replicate().

Vectorize

> rep(1:4, 4:1)
 [1] 1 1 1 1 2 2 2 3 3 4
> vrep <- Vectorize(rep.int)
> vrep(1:4, 4:1)
1
[1] 1 1 1 1

2
[1] 2 2 2

3
[1] 3 3

4
[1] 4
> rweibull(1, 1, c(1, 2)) # no error but not sure what it gives?
[1] 2.17123
> Vectorize("rweibull")(n=1, shape = 1, scale = c(1, 2)) 
[1] 1.6491761 0.9610109
myfunc <- function(a, b) a*b
myfunc(1, 2) # 2
myfunc(3, 5) # 15
myfunc(c(1,3), c(2,5)) # 2 15
Vectorize(myfunc)(c(1,3), c(2,5)) # 2 15

myfunc2 <- function(a, b) if (length(a) == 1) a * b else NA
myfunc2(1, 2) # 2 
myfunc2(3, 5) # 15
myfunc2(c(1,3), c(2,5)) # NA
Vectorize(myfunc2)(c(1, 3), c(2, 5)) # 2 15
Vectorize(myfunc2)(c(1, 3, 6), c(2, 5)) # 2 15 12
                                        # parameter will be re-used

plyr and dplyr packages

Practical Data Science for Stats - a PeerJ Collection

The Split-Apply-Combine Strategy for Data Analysis (plyr package) in J. Stat Software.

A quick introduction to plyr with a summary of apply functions in R and compare them with functions in plyr package.

  1. plyr has a common syntax -- easier to remember
  2. plyr requires less code since it takes care of the input and output format
  3. plyr can easily be run in parallel -- faster

Tutorials

Examples of using dplyr:

tibble

Tidy DataFrames but not Tibbles

Tibble objects

  • it does not have row names (cf data frame),
  • it never changes the type of the inputs (e.g. it never converts strings to factors!),
  • it never changes the names of variables

To show all rows or columns of a tibble object,

print(tbObj, n= Inf)

print(tbObj, width = Inf)

If we try to do a match on some column of a tibble object, we will get zero matches. The issue is we cannot use an index to get a tibble column.

Subsetting: to extract a column from a tibble object, use [[ or $ or dplyr::pull(). Select Data Frame Columns in R.

TibbleObject$VarName
# OR
TibbleObject"VarName"
# OR
pull(TibbleObject, VarName) # won't be a tibble object anymore

# For multiple columns, use select()
dplyr::select(TibbleObject, -c(VarName1, VarName2)) # still a tibble object
# OR
dplyr::select(TibbleObject, 2:5) # 

Convert a data frame to a tibble See Tibble Data Format in R: Best and Modern Way to Work with Your Data

my_data <- as_tibble(iris)
class(my_data)

llply()

llply is equivalent to lapply except that it will preserve labels and can display a progress bar. This is handy if we want to do a crazy thing.

LLID2GOIDs <- lapply(rLLID, function(x) get("org.Hs.egGO")[[x]])

where rLLID is a list of entrez ID. For example,

get("org.Hs.egGO")[["6772"]]

returns a list of 49 GOs.

ddply()

http://lamages.blogspot.com/2012/06/transforming-subsets-of-data-in-r-with.html

ldply()

An R Script to Automatically download PubMed Citation Counts By Year of Publication

Performance/speed comparison

Performance comparison of converting list to data.frame with R language

Using R's set.seed() to set seeds for use in C/C++ (including Rcpp)

http://rorynolan.rbind.io/2018/09/30/rcsetseed/

get_seed()

See the same blog

get_seed <- function() {
  sample.int(.Machine$integer.max, 1)
}

Note: .Machine$integer.max = 2147483647 = 2^31 - 1.

Random seeds

By default, R uses the exact time in milliseconds of the computer's clock when R starts up to generate a seed. See ?Random.

set.seed(as.numeric(Sys.time()))

set.seed(as.numeric(Sys.Date()))  # same seed for each day

.Machine and the largest integer, double

See ?.Machine.

                          Linux/Mac  32-bit Windows 64-bit Windows
double.eps              2.220446e-16   2.220446e-16   2.220446e-16
double.neg.eps          1.110223e-16   1.110223e-16   1.110223e-16
double.xmin            2.225074e-308  2.225074e-308  2.225074e-308
double.xmax            1.797693e+308  1.797693e+308  1.797693e+308
double.base             2.000000e+00   2.000000e+00   2.000000e+00
double.digits           5.300000e+01   5.300000e+01   5.300000e+01
double.rounding         5.000000e+00   5.000000e+00   5.000000e+00
double.guard            0.000000e+00   0.000000e+00   0.000000e+00
double.ulp.digits      -5.200000e+01  -5.200000e+01  -5.200000e+01
double.neg.ulp.digits  -5.300000e+01  -5.300000e+01  -5.300000e+01
double.exponent         1.100000e+01   1.100000e+01   1.100000e+01
double.min.exp         -1.022000e+03  -1.022000e+03  -1.022000e+03
double.max.exp          1.024000e+03   1.024000e+03   1.024000e+03
integer.max             2.147484e+09   2.147484e+09   2.147484e+09
sizeof.long             8.000000e+00   4.000000e+00   4.000000e+00
sizeof.longlong         8.000000e+00   8.000000e+00   8.000000e+00
sizeof.longdouble       1.600000e+01   1.200000e+01   1.600000e+01
sizeof.pointer          8.000000e+00   4.000000e+00   8.000000e+00

NA when overflow

tmp <- 156287L
tmp*tmp
# [1] NA
# Warning message:
# In tmp * tmp : NAs produced by integer overflow
.Machine$integer.max
# [1] 2147483647

How to select a seed for simulation or randomization

set.seed() allow alphanumeric seeds

https://stackoverflow.com/a/10913336

set.seed(), for loop and saving random seeds

  • Detect When the Random Number Generator Was Used
    if (interactive()) {
      invisible(addTaskCallback(local({
        last <- .GlobalEnv$.Random.seed
        
        function(...) {
          curr <- .GlobalEnv$.Random.seed
          if (!identical(curr, last)) {
            msg <- "NOTE: .Random.seed changed"
            if (requireNamespace("crayon", quietly=TRUE)) msg <- crayon::blurred(msg)
            message(msg)
            last <<- curr
          }
          TRUE
        }
      }), name = "RNG tracker"))
    }
    
  • http://r.789695.n4.nabble.com/set-seed-and-for-loop-td3585857.html. This question is legitimate when we want to debug on a certain iteration.
    set.seed(1001) 
    data <- vector("list", 30) 
    seeds <- vector("list", 30) 
    for(i in 1:30) { 
      seeds[[i]] <- .Random.seed 
      data[[i]] <- runif(5) 
    } 
     
    # If we save and load .Random.seed from a file using scan(), make
    # sure to convert its type from doubles to integers.
    # Otherwise, .Random.seed will complain!
    
    .Random.seed <- seeds[[23]]  # restore 
    data.23 <- runif(5) 
    data.23 
    data[[23]] 
    
  • impute.knn
  • Duncan Murdoch: This works in this example, but wouldn't work with all RNGs, because some of them save state outside of .Random.seed. See ?.Random.seed for details.
  • Uwe Ligges's comment: set.seed() actually generates a seed. See ?set.seed that points us to .Random.seed (and relevant references!) which contains the actual current seed.
  • Petr Savicky's comment is also useful in the situation when it is not difficult to re-generate the data.
  • Local randomness in R.

sample()

sample() inaccurate on very large populations, fixed in R 3.6.0

# R 3.5.3
set.seed(123)
m <- (2/5)*2^32
m > 2^31
# [1] FALSE
log10(m)
# [1] 9.23502
x <- sample(m, 1000000, replace = TRUE)
table(x %% 2)
#      0      1 
# 400070 599930 
# R 3.5.3
# docker run --net=host -it --rm r-base:3.5.3
> set.seed(1234)
> sample(5)
[1] 1 3 2 4 5

# R 3.6.0
# docker run --net=host -it --rm r-base:3.6.0
> set.seed(1234)
> sample(5)
[1] 4 5 2 3 1
> RNGkind(sample.kind = "Rounding")
Warning message:
In RNGkind(sample.kind = "Rounding") : non-uniform 'Rounding' sampler used
> set.seed(1234)
> sample(5)
[1] 1 3 2 4 5

Getting different results with set.seed() in RStudio

Getting different results with set.seed(). It's possible that you're loading an R package that is changing the requested random number generator; RNGkind().

dplyr::sample_n()

The function has a parameter weight. For example if we have some download statistics for each day and we want to do sampling based on their download numbers, we can use this function.

Regular Expression

See here.

Read rrd file

on.exit()

Examples of using on.exit(). In all these examples, add = TRUE is used in the on.exit() call to ensure that each exit action is added to the list of actions to be performed when the function exits, rather than replacing the previous actions.

  • Database connections
    library(RSQLite)
    sqlite_get_query <- function(db, sql) {
      conn <- dbConnect(RSQLite::SQLite(), db)
      on.exit(dbDisconnect(conn), add = TRUE)
      dbGetQuery(conn, sql)
    }
    
  • File connections
    read_chars <- function(file_name) {
      conn <- file(file_name, "r")
      on.exit(close(conn), add = TRUE)
      readChar(conn, file.info(file_name)$size)
    }
    
  • Temporary files
    history_lines <- function() {
      f <- tempfile()
      on.exit(unlink(f), add = TRUE)
      savehistory(f)
      readLines(f, encoding = "UTF-8")
    }
    
  • Printing messages
    myfun = function(x) {
      on.exit(print("first"))
      on.exit(print("second"), add = TRUE)
      return(x)
    }
    

file, connection

  • cat() and scan() (read data into a vector or list from the console or file)
  • read() and write()
  • read.table() and write.table()
out = file('tmp.txt', 'w')
writeLines("abcd", out)
writeLines("eeeeee", out)
close(out)
readLines('tmp.txt')
unlink('tmp.txt')
args(writeLines)
# function (text, con = stdout(), sep = "\n", useBytes = FALSE)

foo <- function() {
  con <- file()
  ...
  on.exit(close(con))
  ...
}

Error in close.connection(f) : invalid connection. If we want to use close(con), we have to specify how to open the connection; such as

con <- gzfile(FileName, "r") # Or gzfile(FileName, open = 'r')
x <- read.delim(con)
close(x)

withr package

https://cran.r-project.org/web/packages/withr/index.html . Reverse suggested by languageserver.

Clipboard (?connections), textConnection(), pipe()

  • On Windows, we can use readClipboard() and writeClipboard().
    source("clipboard")
    read.table("clipboard")
    
  • Clipboard -> R. Reading/writing clipboard on macOS. Use textConnection() function:
    x <- read.delim(textConnection("<USE_KEYBOARD_TO_PASTE_FROM_CLIPBOARD>"))
    # Or on Mac
    x <- read.delim(pipe("pbpaste"))
    # safely ignore the warning: incomplete final line found by readTableHeader on 'pbpaste'
    

    An example is to copy data from this post. In this case we need to use read.table() instead of read.delim().

  • R -> clipboard on Mac. Note: pbcopy and pbpaste are macOS terminal commands. See pbcopy & pbpaste: Manipulating the Clipboard from the Command Line.
    • pbcopy: takes standard input and places it in the clipboard buffer
    • pbpaste: takes data from the clipboard buffer and writes it to the standard output
    clip <- pipe("pbcopy", "w")
    write.table(apply(x, 1, mean), file = clip, row.names=F, col.names=F)
    # write.table(data.frame(Var1, Var2), file = clip, row.names=F, quote=F, sep="\t")
    close(clip)
    
  • Clipboard -> Excel.
    • Method 1: Paste icon -> Text import wizard -> Delimit (Tab, uncheck Space) or Fixed width depending on the situation -> Finish.
    • Method 2: Ctrl+v first. Then choose Data -> Text to Columns. Fixed width -> Next -> Next -> Finish.
  • On Linux, we need to install "xclip". See R Copy from Clipboard in Ubuntu Linux. It seems to work.
    # sudo apt-get install xclip
    read.table(pipe("xclip -selection clipboard -o",open="r"))
    

clipr

clipr: Read and Write from the System Clipboard

read/manipulate binary data

  • x <- readBin(fn, raw(), file.info(fn)$size)
  • rawToChar(x[1:16])
  • See Biostrings C API

String Manipulation

format(): padding with zero

ngenes <- 10
genenames <- paste0("bm", gsub(" ", "0", format(1:ngenes))); genenames
#  [1] "bm01" "bm02" "bm03" "bm04" "bm05" "bm06" "bm07" "bm08" "bm09" "bm10"

noquote()

noqute Print character strings without quotes.

stringr package

glue package

  • glue. Useful in a loop and some function like ggtitle() or ggsave(). Inside the curly braces {R-Expression}, the expression is evaluated.
    library(glue)
    name <- "John"
    age <- 30
    glue("My name is {name} and I am {age} years old.")
    # My name is John and I am 30 years old.
    
    price <- 9.99
    quantity <- 3
    total <- glue("The total cost is {round(price * quantity, 2)}.")
    # Inside the curly braces {}, the expression round(price * quantity, 2) is evaluated.
    print(total)
    # The total cost is 29.97.

    The syntax of glue() in R is quite similar to Python's print() function when using formatted strings. In Python, you typically use f-strings to embed variables inside strings.

    name = "John"
    age = 30
    print(f"My name is {name} and I am {age} years old.")
    # My name is John and I am 30 years old.
    
    price = 9.99
    quantity = 3
    total = f"The total cost is {price * quantity:.2f}."
    print(total)
    # The total cost is 29.97.
  • String interpolation

Raw data type

Fun with strings, Cyrillic alphabets

a1 <- "А"
a2 <- "A"
a1 == a2
# [1] FALSE
charToRaw("А")
# [1] d0 90
charToRaw("A")
# [1] 41

number of characters limit

It's a limit on a (single) input line in the REPL

Comparing strings to numeric

">" coerces the number to a string before comparing. "10" < 2 # TRUE

HTTPs connection

HTTPS connection becomes default in R 3.2.2. See

R 3.3.2 patched The internal methods of ‘download.file()’ and ‘url()’ now report if they are unable to follow the redirection of a ‘http://’ URL to a ‘https://’ URL (rather than failing silently)

setInternet2

There was a bug in ftp downloading in R 3.2.2 (r69053) Windows though it is fixed now in R 3.2 patch.

Read the discussion reported on 8/8/2015. The error only happened on ftp not http connection. The final solution is explained in this post. The following demonstrated the original problem.

url <- paste0("ftp://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/All/",
              "GCF_000001405.13.assembly.txt")
f1 <- tempfile()
download.file(url, f1)

It seems the bug was fixed in R 3.2-branch. See 8/16/2015 patch r69089 where a new argument INTERNET_FLAG_PASSIVE was added to InternetOpenUrl() function of wininet library. This article and this post explain differences of active and passive FTP.

The following R command will show the exact svn revision for the R you are currently using.

R.Version()$"svn rev"

If setInternet2(T), then https protocol is supported in download.file().

When setInternet(T) is enabled by default, download.file() does not work for ftp protocol (this is used in getGEO() function of the GEOquery package). If I use setInternet(F), download.file() works again for ftp protocol.

The setInternet2() function is defined in R> src> library> utils > R > windows > sysutils.R.

R up to 3.2.2

setInternet2 <- function(use = TRUE) .Internal(useInternet2(use))

See also

  • <src/include/Internal.h> (declare do_setInternet2()),
  • <src/main/names.c> (show do_setInternet2() in C)
  • <src/main/internet.c> (define do_setInternet2() in C).

Note that: setInternet2(T) becomes default in R 3.2.2. To revert to the previous default use setInternet2(FALSE). See the <doc/NEWS.pdf> file. If we use setInternet2(F), then it solves the bug of getGEO() error. But it disables the https file download using the download.file() function. In R < 3.2.2, it is also possible to download from https by setIneternet2(T).

R 3.3.0

setInternet2 <- function(use = TRUE) {
    if(!is.na(use)) stop("use != NA is defunct")
    NA
}

Note that setInternet2.Rd says As from \R 3.3.0 it changes nothing, and only \code{use = NA} is accepted. Also NEWS.Rd says setInternet2() has no effect and will be removed in due course.

Finite, Infinite and NaN Numbers: is.finite(), is.infinite(), is.nan()

In R, basically all mathematical functions (including basic Arithmetic), are supposed to work properly with +/-, Inf and NaN as input or output.

See ?is.finite.

How to replace Inf with NA in All or Specific Columns of the Data Frame

replace() function

File/path operations

  • list.files(, include.dirs =F, recursive = T, pattern = "\\.csv$", all.files = TRUE)
  • file.info()
  • dir.create()
  • file.create()
  • file.copy()
  • file.exists()
  • basename() - remove the parent path, dirname() - returns the part of the path up to but excluding the last path separator
    > file.path("~", "Downloads")
    [1] "~/Downloads"
    > dirname(file.path("~", "Downloads"))
    [1] "/home/brb"
    > basename(file.path("~", "Downloads"))
    [1] "Downloads"
    
  • path.expand("~/.Renviron") # "/home/brb/.Renviron"
  • normalizePath() # Express File Paths in Canonical Form
    > cat(normalizePath(c(R.home(), tempdir())), sep = "\n")
    /usr/lib/R
    /tmp/RtmpzvDhAe
    
  • system.file() - Finds the full file names of files in packages etc
    > system.file("extdata", "ex1.bam", package="Rsamtools")
    [1] "/home/brb/R/x86_64-pc-linux-gnu-library/4.0/Rsamtools/extdata/ex1.bam"
    

read/download/source a file from internet

Simple text file http

retail <- read.csv("http://robjhyndman.com/data/ausretail.csv",header=FALSE)

Zip, RData, gz file and url() function

x <- read.delim(gzfile("filename.txt.gz"), nrows=10)
con = gzcon(url('http://www.systematicportfolio.com/sit.gz', 'rb'))
source(con)
close(con)

Here url() function is like file(), gzfile(), bzfile(), xzfile(), unz(), pipe(), fifo(), socketConnection(). They are used to create connections. By default, the connection is not opened (except for ‘socketConnection’), but may be opened by setting a non-empty value of argument ‘open’. See ?url.

Another example is Read gzipped csv directly from a url in R

con <- gzcon(url(paste("http://dumps.wikimedia.org/other/articlefeedback/",
                       "aa_combined-20110321.csv.gz", sep="")))
txt <- readLines(con)
dat <- read.csv(textConnection(txt))

Another example of using url() is

load(url("http:/www.example.com/example.RData"))

This does not work with load(), dget(), read.table() for files on OneDrive. In fact, I cannot use wget with shared files from OneDrive. The following trick works: How to configure a OneDrive file for use with wget.

Dropbox is easy and works for load(), wget, ...

R download .RData or Directly loading .RData from github from Github.

zip function

This will include 'hallmarkFiles' root folder in the files inside zip.

zip(zipfile = 'myFile.zip', 
    files = dir('hallmarkFiles', full.names = TRUE))

# Verify/view the files. 'list = TRUE' won't extract 
unzip('testZip.zip', list = TRUE) 

downloader package

This package provides a wrapper for the download.file function, making it possible to download files over https on Windows, Mac OS X, and other Unix-like platforms. The RCurl package provides this functionality (and much more) but can be difficult to install because it must be compiled with external dependencies. This package has no external dependencies, so it is much easier to install.

Google drive file based on https using RCurl package

require(RCurl)
myCsv <- getURL("https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AkuuKBh0jM2TdGppUFFxcEdoUklCQlJhM2kweGpoUUE&single=true&gid=0&output=csv")
read.csv(textConnection(myCsv))

Google sheet file using googlesheets package

Reading data from google sheets into R

Github files https using RCurl package

x = getURL("https://gist.github.com/arraytools/6671098/raw/c4cb0ca6fe78054da8dbe253a05f7046270d5693/GeneIDs.txt", 
            ssl.verifypeer = FALSE)
read.table(text=x)

data summary table

summarytools: create summary tables for vectors and data frames

https://github.com/dcomtois/summarytools. R Package for quickly and neatly summarizing vectors and data frames.

skimr: A frictionless, pipeable approach to dealing with summary statistics

skimr for useful and tidy summary statistics

modelsummary

modelsummary: Summary Tables and Plots for Statistical Models and Data: Beautiful, Customizable, and Publication-Ready

broom

Tidyverse->broom

Create publication tables using tables package

See p13 for example at here

R's tables packages is the best solution. For example,

> library(tables)
> tabular( (Species + 1) ~ (n=1) + Format(digits=2)*
+          (Sepal.Length + Sepal.Width)*(mean + sd), data=iris )
                                                  
                Sepal.Length      Sepal.Width     
 Species    n   mean         sd   mean        sd  
 setosa      50 5.01         0.35 3.43        0.38
 versicolor  50 5.94         0.52 2.77        0.31
 virginica   50 6.59         0.64 2.97        0.32
 All        150 5.84         0.83 3.06        0.44
> str(iris)
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

and

# This example shows some of the less common options         
> Sex <- factor(sample(c("Male", "Female"), 100, rep=TRUE))
> Status <- factor(sample(c("low", "medium", "high"), 100, rep=TRUE))
> z <- rnorm(100)+5
> fmt <- function(x) {
  s <- format(x, digits=2)
  even <- ((1:length(s)) %% 2) == 0
  s[even] <- sprintf("(%s)", s[even])
  s
}
> tabular( Justify(c)*Heading()*z*Sex*Heading(Statistic)*Format(fmt())*(mean+sd) ~ Status )
                  Status              
 Sex    Statistic high   low    medium
 Female mean       4.88   4.96   5.17 
        sd        (1.20) (0.82) (1.35)
 Male   mean       4.45   4.31   5.05 
        sd        (1.01) (0.93) (0.75)

fgsea example

vignette & source code

(archived) ClinReport: Statistical Reporting in Clinical Trials

https://cran.r-project.org/web/packages/ClinReport/index.html

Append figures to PDF files

How to append a plot to an existing pdf file. Hint: use the recordPlot() function.

Save base graphics as pseudo-objects

Save base graphics as pseudo-objects in R. Note there are some cons with this approach.

pdf(NULL)
dev.control(displaylist="enable")
plot(df$x, df$y)
text(40, 0, "Random")
text(60, 2, "Text")
lines(stats::lowess(df$x, df$y))
p1.base <- recordPlot()
invisible(dev.off())

# Display the saved plot
grid::grid.newpage()
p1.base

Extracting tables from PDFs

Print tables

addmargins()

tableone

Some examples

Cox models

finalfit package

table1

gtsummary

gt*

dplyr

https://stackoverflow.com/a/34587522. The output includes counts and proportions in a publication like fashion.

tables::tabular()

gmodels::CrossTable()

https://www.statmethods.net/stats/frequencies.html

base::prop.table(x, margin)

New function ‘proportions()’ and ‘marginSums()’. These should replace the unfortunately named ‘prop.table()’ and ‘margin.table()’. for R 4.0.0.

R> m <- matrix(1:4, 2)
R> prop.table(m, 1) # row percentage
          [,1]      [,2]
[1,] 0.2500000 0.7500000
[2,] 0.3333333 0.6666667
R> prop.table(m, 2) # column percentage
          [,1]      [,2]
[1,] 0.3333333 0.4285714
[2,] 0.6666667 0.5714286

stats::xtabs()

stats::ftable()

> ftable(Titanic, row.vars = 1:3)
                   Survived  No Yes
Class Sex    Age                   
1st   Male   Child            0   5
             Adult          118  57
      Female Child            0   1
             Adult            4 140
2nd   Male   Child            0  11
             Adult          154  14
      Female Child            0  13
             Adult           13  80
3rd   Male   Child           35  13
             Adult          387  75
      Female Child           17  14
             Adult           89  76
Crew  Male   Child            0   0
             Adult          670 192
      Female Child            0   0
             Adult            3  20
> ftable(Titanic, row.vars = 1:2, col.vars = "Survived")
             Survived  No Yes
Class Sex                    
1st   Male            118  62
      Female            4 141
2nd   Male            154  25
      Female           13  93
3rd   Male            422  88
      Female          106  90
Crew  Male            670 192
      Female            3  20
> ftable(Titanic, row.vars = 2:1, col.vars = "Survived")
             Survived  No Yes
Sex    Class                 
Male   1st            118  62
       2nd            154  25
       3rd            422  88
       Crew           670 192
Female 1st              4 141
       2nd             13  93
       3rd            106  90
       Crew             3  20
> str(Titanic)
 table [1:4, 1:2, 1:2, 1:2] 0 0 35 0 0 0 17 0 118 154 ...
 - attr(*, "dimnames")=List of 4
  ..$ Class   : chr [1:4] "1st" "2nd" "3rd" "Crew"
  ..$ Sex     : chr [1:2] "Male" "Female"
  ..$ Age     : chr [1:2] "Child" "Adult"
  ..$ Survived: chr [1:2] "No" "Yes"
> x <- ftable(mtcars[c("cyl", "vs", "am", "gear")])
> x
          gear  3  4  5
cyl vs am              
4   0  0        0  0  0
       1        0  0  1
    1  0        1  2  0
       1        0  6  1
6   0  0        0  0  0
       1        0  2  1
    1  0        2  2  0
       1        0  0  0
8   0  0       12  0  0
       1        0  0  2
    1  0        0  0  0
       1        0  0  0
> ftable(x, row.vars = c(2, 4))
        cyl  4     6     8   
        am   0  1  0  1  0  1
vs gear                      
0  3         0  0  0  0 12  0
   4         0  0  0  2  0  0
   5         0  1  0  1  0  2
1  3         1  0  2  0  0  0
   4         2  6  2  0  0  0
   5         0  1  0  0  0  0
> 
> ## Start with expressions, use table()'s "dnn" to change labels
> ftable(mtcars$cyl, mtcars$vs, mtcars$am, mtcars$gear, row.vars = c(2, 4),
         dnn = c("Cylinders", "V/S", "Transmission", "Gears"))

          Cylinders     4     6     8   
          Transmission  0  1  0  1  0  1
V/S Gears                               
0   3                   0  0  0  0 12  0
    4                   0  0  0  2  0  0
    5                   0  1  0  1  0  2
1   3                   1  0  2  0  0  0
    4                   2  6  2  0  0  0
    5                   0  1  0  0  0  0

tracemem, data type, copy

How to avoid copying a long vector

Tell if the current R is running in 32-bit or 64-bit mode

8 * .Machine$sizeof.pointer

where sizeof.pointer returns the number of *bytes* in a C SEXP type and '8' means number of bits per byte.

32- and 64-bit

See R-admin.html.

  • For speed you may want to use a 32-bit build, but to handle large datasets a 64-bit build.
  • Even on 64-bit builds of R there are limits on the size of R objects, some of which stem from the use of 32-bit integers (especially in FORTRAN code). For example, the dimensionas of an array are limited to 2^31 -1.
  • Since R 2.15.0, it is possible to select '64-bit Files' from the standard installer even on a 32-bit version of Windows (2012/3/30).

Handling length 2^31 and more in R 3.0.0

From R News for 3.0.0 release:

There is a subtle change in behaviour for numeric index values 2^31 and larger. These never used to be legitimate and so were treated as NA, sometimes with a warning. They are now legal for long vectors so there is no longer a warning, and x[2^31] <- y will now extend the vector on a 64-bit platform and give an error on a 32-bit one.

In R 2.15.2, if I try to assign a vector of length 2^31, I will get an error

> x <- seq(1, 2^31)
Error in from:to : result would be too long a vector

However, for R 3.0.0 (tested on my 64-bit Ubuntu with 16GB RAM. The R was compiled by myself):

> system.time(x <- seq(1,2^31))
   user  system elapsed
  8.604  11.060 120.815
> length(x)
[1] 2147483648
> length(x)/2^20
[1] 2048
> gc()
             used    (Mb) gc trigger    (Mb)   max used    (Mb)
Ncells     183823     9.9     407500    21.8     350000    18.7
Vcells 2147764406 16386.2 2368247221 18068.3 2148247383 16389.9
>

Note:

  1. 2^31 length is about 2 Giga length. It takes about 16 GB (2^31*8/2^20 MB) memory.
  2. On Windows, it is almost impossible to work with 2^31 length of data if the memory is less than 16 GB because virtual disk on Windows does not work well. For example, when I tested on my 12 GB Windows 7, the whole Windows system freezes for several minutes before I force to power off the machine.
  3. My slide in http://goo.gl/g7sGX shows the screenshots of running the above command on my Ubuntu and RHEL machines. As you can see the linux is pretty good at handling large (> system RAM) data. That said, as long as your linux system is 64-bit, you can possibly work on large data without too much pain.
  4. For large dataset, it makes sense to use database or specially crafted packages like bigmemory or ff or bigstatsr.
  5. [[<- for index 2^31 fails

NA in index

  • Question: what is seq(1, 3)[c(1, 2, NA)]?

Answer: It will reserve the element with NA in indexing and return the value NA for it.

  • Question: What is TRUE & NA?

Answer: NA

  • Question: What is FALSE & NA?

Answer: FALSE

  • Question: c("A", "B", NA) != "" ?

Answer: TRUE TRUE NA

  • Question: which(c("A", "B", NA) != "") ?

Answer: 1 2

  • Question: c(1, 2, NA) != "" & !is.na(c(1, 2, NA)) ?

Answer: TRUE TRUE FALSE

  • Question: c("A", "B", NA) != "" & !is.na(c("A", "B", NA)) ?

Answer: TRUE TRUE FALSE

Conclusion: In order to exclude empty or NA for numerical or character data type, we can use which() or a convenience function keep.complete(x) <- function(x) x != "" & !is.na(x). This will guarantee return logical values and not contain NAs.

Don't just use x != "" OR !is.na(x).

Some functions

Constant and 'L'

Add 'L' after a constant. For example,

for(i in 1L:n) { }

if (max.lines > 0L) { }

label <- paste0(n-i+1L, ": ")

n <- length(x);  if(n == 0L) { }

Vector/Arrays

R indexes arrays from 1 like Fortran, not from 0 like C or Python.

remove integer(0)

How to remove integer(0) from a vector?

Append some elements

append() and its after argument

setNames()

Assign names to a vector

z <- setNames(1:3, c("a", "b", "c"))
# OR
z <- 1:3; names(z) <- c("a", "b", "c")
# OR
z <- c("a"=1, "b"=2, "c"=3) # not work if "a", "b", "c" is like x[1], x[2], x[3].

Factor

labels argument

We can specify the factor levels and new labels using the factor() function.

sex <- factor(sex, levels = c("0", "1"), labels = c("Male", "Female"))
drug_treatment <- factor(drug_treatment, levels = c("Placebo", "Low dose", "High dose"))
health_status <- factor(health_status, levels = c("Healthy", "Alzheimer's"))

factor(rev(letters[1:3]), labels = c("A", "B", "C"))
# C B A
# Levels: A B C

Create a factor/categorical variable from a continuous variable: cut() and dplyr::case_when()

cut(
     c(0, 10, 30), 
     breaks = c(0, 30, 50, Inf), 
     labels = c("Young", "Middle-aged", "Elderly")
 )  # Default include.lowest = FALSE
# [1] <NA>  Young Young
  • ?cut
    set.seed(1)
    x <- rnorm(100)
    facVar <- cut(x, c(min(x), -1, 1, max(x)), labels = c("low", "medium", "high"))
    table(facVar, useNA = "ifany")
    facVar
    #   low medium   high   <NA> 
    #    10     74     15      1 
    

    Note the option include.lowest = TRUE is needed when we use cut() + quantile(); otherwise the smallest data will become NA since the intervals have the format (a, b].

    x2 <- cut(x, quantile(x, 0:2/2), include.lowest = TRUE) # split x into 2 levels
    x2 <- cut(x, quantile(x, 0:3/3), include.lowest = TRUE) # split x into 3 levels
    
    library(tidyverse); library(magrittr)
    set.seed(1)
    breaks <- quantile(runif(100), probs=seq(0, 1, len=20))
    x <- runif(50)
    bins <- cut(x, breaks=unique(breaks), include.lowest=T, right=T)
    
    data.frame(sc=x, bins=bins) %>% 
      group_by(bins) %>% 
      summarise(n=n()) %>% 
      ggplot(aes(x = bins, y = n)) + 
        geom_col(color = "black", fill = "#90AACB") + 
        theme_minimal() + 
        theme(axis.text.x = element_text(angle = 90)) + 
        theme(legend.position = "none") + coord_flip()
    
  • A Guide to Using the cut() Function in R
  • tibble object
    library(tidyverse)
    tibble(age_yrs = c(0, 4, 10, 15, 24, 55),
           age_cat = case_when(
              age_yrs < 2 ~ "baby",
              age_yrs < 13 ~ "kid",
              age_yrs < 20 ~ "teen",
              TRUE         ~ "adult")
    )
    
  • R tip: Learn dplyr’s case_when() function
    case_when(
      condition1 ~ value1, 
      condition2 ~ value2,
      TRUE ~ ValueAnythingElse
    )
    # Example
    case_when(
      x %%2 == 0 ~ "even",
      x %%2 == 1 ~ "odd",
      TRUE ~ "Neither even or odd"
    )
    

How to change one of the level to NA

https://stackoverflow.com/a/25354985. Note that the factor level is removed.

x <- factor(c("a", "b", "c", "NotPerformed"))
levels(x)[levels(x) == 'NotPerformed'] <- NA

Creating missing values in factors

Concatenating two factor vectors

Not trivial. How to concatenate factors, without them being converted to integer level?.

unlist(list(f1, f2))
# unlist(list(factor(letters[1:5]), factor(letters[5:2])))

droplevels()

droplevels(): drop unused levels from a factor or, more commonly, from factors in a data frame.

factor(x , levels = ...) vs levels(x) <-

Note levels(x) is to set/rename levels, not reorder. Use relevel() or factor() to reorder.

levels()
plyr::revalue()
forcats::fct_recode()
rename levels
factor(, levels) reorder levels
sizes <- factor(c("small", "large", "large", "small", "medium"))
sizes
#> [1] small  large  large  small  medium
#> Levels: large medium small

sizes2 <- factor(sizes, levels = c("small", "medium", "large")) # reorder levels but data is not changed
sizes2
# [1] small  large  large  small  medium
# Levels: small medium large

sizes3 <- sizes
levels(sizes3) <- c("small", "medium", "large") # rename, not reorder
                                                # large -> small
                                                # medium -> medium
                                                # small -> large 
sizes3
# [1] large  small  small  large  medium
# Levels: small medium large

A regression example.

set.seed(1)
x <- sample(1:2, 500, replace = TRUE)
y <- round(x + rnorm(500), 3)
x <- as.factor(x)
sample_data <- data.frame(x, y)
 
# create linear model
summary(lm( y~x, sample_data))
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  0.96804    0.06610   14.65   <2e-16 ***
# x2           0.99620    0.09462   10.53   <2e-16 ***

# Wrong way when we want to change the baseline level to '2'
# No change on the model fitting except the apparent change on the variable name in the printout
levels(sample_data$x) <- c("2", "1")
summary(lm( y~x, sample_data))
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  0.96804    0.06610   14.65   <2e-16 ***
# x1           0.99620    0.09462   10.53   <2e-16 ***

# Correct way if we want to change the baseline level to '2'
# The estimate was changed by flipping the sign from the original data
sample_data$x <- relevel(x, ref = "2")
summary(lm( y~x, sample_data))
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  1.96425    0.06770   29.01   <2e-16 ***
# x1          -0.99620    0.09462  -10.53   <2e-16 ***

stats::relevel()

relevel. This function can only be used to change the reference level of a factor variable. It does not directly create an arbitrary order of levels. That is, it is useful in lm() or aov(), etc.

reorder(), levels() and boxplot()

  • How to Reorder Boxplots in R: A Comprehensive Guide (tapply() method, simple & effective)
  • reorder().This is useful in barplot (ggplot2::geom_col()) where we want to sort the bars by a numerical variable.
    # Syntax:
    # newFac <- with(df, reorder(fac, vec, FUN=mean)) # newFac is like fac except it has a new order
    
    (bymedian <- with(InsectSprays, reorder(spray, count, median)) )
    class(bymedian)
    levels(bymedian)
    boxplot(count ~ bymedian, data = InsectSprays,
            xlab = "Type of spray", ylab = "Insect count",
            main = "InsectSprays data", varwidth = TRUE,
            col = "lightgray") # boxplots are sorted according to the new levels
    boxplot(count ~ spray, data = InsectSprays,
            xlab = "Type of spray", ylab = "Insect count",
            main = "InsectSprays data", varwidth = TRUE,
            col = "lightgray") # not sorted
    
  • Statistics Sunday: My 2019 Reading (reorder function)

factor() vs ordered()

factor(levels=c("a", "b", "c"), ordered=TRUE)
# ordered(0)
# Levels: a < b < c

factor(levels=c("a", "b", "c"))
# factor(0)
# Levels: a b c

ordered(levels=c("a", "b", "c"))
# Error in factor(x, ..., ordered = TRUE) : 
#  argument "x" is missing, with no default

Data frame

stringsAsFactors = FALSE

http://www.win-vector.com/blog/2018/03/r-tip-use-stringsasfactors-false/

We can use options(stringsAsFactors=FALSE) forces R to import character data as character objects.

In R 4.0.0, stringAsFactors=FALSE will be default. This also affects read.table() function.

check.names = FALSE

Note this option will not affect rownames. So if the rownames contains special symbols, like dash, space, parentheses, etc, they will not be modified.

> data.frame("1a"=1:2, "2a"=1:2, check.names = FALSE)
  1a 2a
1  1  1
2  2  2
> data.frame("1a"=1:2, "2a"=1:2) # default
  X1a X2a
1   1   1
2   2   2

Create unique rownames: make.unique()

groupCodes <- c(rep("Cont",5), rep("Tre1",5), rep("Tre2",5))
rownames(mydf) <- make.unique(groupCodes)

data.frame() will change rownames

class(df2)
# [1] "matrix" "array"
rownames(df2)[c(9109, 44999)]
# [1] "A1CF"     "A1BG-AS1"
rownames(data.frame(df2))[c(9109, 44999)]
# [1] "A1CF"     "A1BG.AS1"

Print a data frame without rownames

# Method 1. 
rownames(df1) <- NULL

# Method 2. 
print(df1, row.names = FALSE)

Convert data frame factor columns to characters

Convert data.frame columns from factors to characters

# Method 1:
bob <- data.frame(lapply(bob, as.character), stringsAsFactors=FALSE)

# Method 2:
bob[] <- lapply(bob, as.character)

To replace only factor columns:

# Method 1:
i <- sapply(bob, is.factor)
bob[i] <- lapply(bob[i], as.character)

# Method 2:
library(dplyr)
bob %>% mutate_if(is.factor, as.character) -> bob

Sort Or Order A Data Frame

How To Sort Or Order A Data Frame In R

  1. df[order(df$x), ], df[order(df$x, decreasing = TRUE), ], df[order(df$x, df$y), ]
  2. library(plyr); arrange(df, x), arrange(df, desc(x)), arrange(df, x, y)
  3. library(dplyr); df %>% arrange(x),df %>% arrange(x, desc(x)), df %>% arrange(x, y)
  4. library(doBy); order(~x, df), order(~ -x, df), order(~ x+y, df)

data.frame to vector

df <- data.frame(x = c(1, 2, 3), y = c(4, 5, 6))

class(df)
# [1] "data.frame"
class(t(df))
# [1] "matrix" "array"
class(unlist(df))
# [1] "numeric"

# Method 1: Convert data frame to matrix using as.matrix()
# and then Convert matrix to vector using as.vector() or c()
mat <- as.matrix(df)
vec1 <- as.vector(mat)   # [1] 1 2 3 4 5 6
vec2 <- c(mat)

# Method 2: Convert data frame to matrix using t()/transpose
# and then Convert matrix to vector using as.vector() or c()
vec3 <- as.vector(t(df)) # [1] 1 4 2 5 3 6
vec4 <- c(t(df))

# Not working
as.vector(df)
# $x
# [1] 1 2 3
# $y
# [1] 4 5 6

# Method 3: unlist() - easiest solution
unlist(df)
# x1 x2 x3 y1 y2 y3 
#  1  2  3  4  5  6 
unlist(data.frame(df), use.names = F) # OR dplyr::pull()
# [1] 1 2 3 4 5 6

Q: Why as.vector(df) cannot convert a data frame into a vector?

A: The as.vector function cannot be used directly on a data frame to convert it into a vector because a data frame is a list of vectors (i.e., its columns) and as.vector only removes the attributes of an object to create a vector. When you apply as.vector to a data frame, R does not know how to concatenate these independent columns (which could be of different types) into a single vector. Therefore, it doesn’t perform the operation. Therefore as.vector() returns the underlying list structure of the data frame instead of converting it into a vector.

However, when you transpose the data frame using t(), it gets converted into a matrix. A matrix in R is a vector with dimensions. Therefore, all elements of the matrix must be of the same type. If they are not, R will coerce them to be so. Once you have a matrix, as.vector() can easily convert it into a vector because all elements are of the same type.

Using cbind() to merge vectors together?

It’s a common mistake to try and create a data frame by cbind()ing vectors together. This doesn’t work because cbind() will create a matrix unless one of the arguments is already a data frame. Instead use data.frame() directly. See Advanced R -> Data structures chapter.

cbind NULL and data.frame

cbind can't combine NULL with dataframe. Add as.matrix() will fix the problem.

merge

Special character in the matched variable can create a trouble when we use merge() or dplyr::inner_join(). I guess R internally turns df2 (a matrix but not a data frame) to a data frame (so rownames are changed if they contain special character like "-"). This still does not explain the situation when I

class(df1); class(df2)
# [1] "data.frame"  # 2 x 2
# [1] "matrix" "array" # 52439 x 2
rownames(df1)
# [1] "A1CF"     "A1BG-AS1"
merge(df1, df2[c(9109, 44999), ], by=0)
#   Row.names 786-0 A498 ACH-000001 ACH-000002
# 1  A1BG-AS1     0    0   7.321358   6.908333
# 2      A1CF     0    0   3.011470   1.189578
merge(df1, df2[c(9109, 38959:44999), ], by= 0) # still correct
merge(df1, df2[c(9109, 38958:44999), ], by= 0) # same as merge(df1, df2, by=0)
#   Row.names 786-0 A498 ACH-000001 ACH-000002
# 1      A1CF     0    0    3.01147   1.189578
rownames(df2)[38958:38959]
# [1] "ITFG2-AS1"  "ADGRD1-AS1"

rownames(df1)[2] <- "A1BGAS1"
rownames(df2)[44999] <- "A1BGAS1"
merge(df1, df2, by= 0)
#   Row.names 786-0 A498 ACH-000001 ACH-000002
# 1   A1BGAS1     0    0   7.321358   6.908333
# 2      A1CF     0    0   3.011470   1.189578

is.matrix: data.frame is not necessarily a matrix

See ?matrix. is.matrix returns TRUE if x is a vector and has a "dim" attribute of length 2 and FALSE otherwise.

An example that is a data frame (is.data.frame() returns TRUE) but not a matrix (is.matrix() returns FALSE) is an object returned by

X <- data.frame(x=1:2, y=3:4)

The 'X' object is NOT a vector and it does NOT have the "dim" attribute. It has only 3 attributes: "names", "row.names" & "class". Note that dim() function works fine and returns correctly though there is not "dim" attribute.

Another example that is a data frame but not a matrix is the built-in object cars; see ?matrix. It is not a vector

Convert a data frame to a matrix: as.matrix() vs data.matrix()

If I have a data frame X which recorded the time of some files.

  • is.data.frame(X) shows TRUE but is.matrix(X) show FALSE
  • as.matrix(X) will keep the time mode. The returned object is not a data frame anymore.
  • data.matrix(X) will convert the time to numerical values. So use data.matrix() if the data is numeric. The returned object is not a data frame anymore.
# latex directory contains cache files from knitting an rmarkdown file
X <- list.files("latex/", full.names = T) %>%
     grep("RData", ., value=T) %>% 
     file.info() %>%  
     `[`("mtime")
X %>% is.data.frame() # TRUE
X %>% is.matrix() # FALSE
X %>% as.matrix() %>% is.matrix() # TRUE
X %>% data.matrix() %>% is.matrix() # TRUE
X %>% as.matrix() %>% "["(1:2, ) # timestamps
X %>% data.matrix() %>% "["(1:2, ) # numeric
  • The as.matrix() function is used to coerce an object into a matrix. It can be used with various types of R objects, such as vectors, data frames, and arrays.
  • The data.matrix() function is specifically designed for converting a data frame into a matrix by coercing all columns to numeric values. If the data frame contains non-numeric columns, such as character or factor columns, data.matrix() will convert them to numeric values if possible (e.g., by converting factors to their integer codes).
  • See the following example where as.matrix() and data.matrix() return different resuls.
df <- data.frame(a = c(1, 2, 3), b = c("x", "y", "z"))
mat <- as.matrix(df)
mat
#      a   b  
# [1,] "1" "x"
# [2,] "2" "y"
# [3,] "3" "z"
class(mat)
# [1] "matrix" "array" 
mat2 <- data.matrix(df)
mat2
#      a b
# [1,] 1 1
# [2,] 2 2
# [3,] 3 3
class(mat2)
# [1] "matrix" "array" 
typeof(mat)
# [1] "character"
typeof(mat2)
# [1] "double"

matrix vs data.frame

Case 1: colnames() is safer than names() if the object could be a data frame or a matrix.

Browse[2]> names(res2$surv.data.new[[index]])
NULL
Browse[2]> colnames(res2$surv.data.new[[index]])
 [1] "time"   "status" "treat"  "AKT1"   "BRAF"   "FLOT2"  "MTOR"   "PCK2"   "PIK3CA"
[10] "RAF1"  
Browse[2]> mode(res2$surv.data.new[[index]])
[1] "numeric"
Browse[2]> is.matrix(res2$surv.data.new[[index]])
[1] TRUE
Browse[2]> dim(res2$surv.data.new[[index]])
[1] 991  10

Case 2:

ip1 <- installed.packages()[,c(1,3:4)] # class(ip1) = 'matrix'
unique(ip1$Priority)
# Error in ip1$Priority : $ operator is invalid for atomic vectors
unique(ip1[, "Priority"])   # OK

ip2 <- as.data.frame(installed.packages()[,c(1,3:4)], stringsAsFactors = FALSE) # matrix -> data.frame
unique(ip2$Priority)     # OK

The length of a matrix and a data frame is different.

> length(matrix(1:6, 3, 2))
[1] 6
> length(data.frame(matrix(1:6, 3, 2)))
[1] 2
> x[1]
  X1
1  1
2  2
3  3
4  4
5  5
6  6
> x1
[1] 1 2 3 4 5 6

So the length of a data frame is the number of columns. When we use sapply() function on a data frame, it will apply to each column of the data frame.

How to Remove Duplicates

How to Remove Duplicates in R with Example

Convert a matrix (not data frame) of characters to numeric

Just change the mode of the object

tmp <- cbind(a=c("0.12", "0.34"), b =c("0.567", "0.890")); tmp
     a     b
1 0.12 0.567
2 0.34 0.890
> is.data.frame(tmp) # FALSE
> is.matrix(tmp)     # TRUE
> sum(tmp)
Error in sum(tmp) : invalid 'type' (character) of argument
> mode(tmp)  # "character"

> mode(tmp) <- "numeric"
> sum(tmp)
[1] 1.917

Convert Data Frame Row to Vector

as.numeric() or c()

Convert characters to integers

mode(x) <- "integer"

Non-Standard Evaluation

Understanding Non-Standard Evaluation. Part 1: The Basics

Select Data Frame Columns in R

This is part of series of DATA MANIPULATION IN R from datanovia.com

  • pull(): Extract column values as a vector. The column of interest can be specified either by name or by index.
  • select(): Extract one or multiple columns as a data table. It can be also used to remove columns from the data frame.
  • select_if(): Select columns based on a particular condition. One can use this function to, for example, select columns if they are numeric.
  • Helper functions - starts_with(), ends_with(), contains(), matches(), one_of(): Select columns/variables based on their names

Another way is to the dollar sign $ operator (?"$") to extract rows or column from a data frame.

class(USArrests)  # "data.frame"
USArrests$"Assault"

Note that for both data frame and matrix objects, we need to use the [ operator to extract columns and/or rows.

USArrests[c("Alabama", "Alask"), c("Murder", "Assault")]
#         Murder Assault
# Alabama   13.2     236
# Alaska    10.0     263
USArrests[c("Murder", "Assault")]  # all rows

tmp <- data(package="datasets")
class(tmp$results)  # "matrix" "array" 
tmp$results[, "Item"]
# Same method can be used if rownames are available in a matrix

Note for a data.table object, we can extract columns using the column names without double quotes.

data.table(USArrests)[1:2, list(Murder, Assault)]

Add columns to a data frame

How to add columns to a data frame in R

Exclude/drop/remove data frame columns

# method 1
df = subset(mydata, select = -c(x,z) )

# method 2
drop <- c("x","z")
df = mydata[,!(names(mydata) %in% drop)]

# method 3: dplyr
mydata2 = select(mydata, -a, -x, -y)
mydata2 = select(mydata, -c(a, x, y))
mydata2 = select(mydata, -a:-y)
mydata2 = mydata[,!grepl("^INC",names(mydata))]

Remove Rows from the data frame

Remove Rows from the data frame in R

Danger of selecting rows from a data frame

> dim(cars)
[1] 50  2
> data.frame(a=cars[1,], b=cars[2, ])
  a.speed a.dist b.speed b.dist
1       4      2       4     10
> dim(data.frame(a=cars[1,], b=cars[2, ]))
[1] 1 4
> cars2 = as.matrix(cars)
> data.frame(a=cars2[1,], b=cars2[2, ])
      a  b
speed 4  4
dist  2 10

Creating data frame using structure() function

Creating data frame using structure() function in R

Create an empty data.frame

https://stackoverflow.com/questions/10689055/create-an-empty-data-frame

# the column types default as logical per vector(), but are then overridden
a = data.frame(matrix(vector(), 5, 3,
               dimnames=list(c(), c("Date", "File", "User"))),
               stringsAsFactors=F)
str(a) # NA but they are logical , not numeric.
a[1,1] <- rnorm(1)
str(a)

# similar to above
a <- data.frame(matrix(NA, nrow = 2, ncol = 3))

# different data type
a <- data.frame(x1 = character(),
                x2 = numeric(),
                x3 = factor(),
                stringsAsFactors = FALSE)

Objects from subsetting a row in a data frame vs matrix

  • Subsetting creates repeated rows. This will create unexpected rownames.
    R> z <- data.frame(x=1:3, y=2:4)
    R> rownames(z) <- letters[1:3]
    R> rownames(z)[c(1,1)]
    [1] "a" "a"
    R> rownames(z[c(1,1),])
    [1] "a"   "a.1"
    R> z[c(1,1), ]
        x y
    a   1 2
    a.1 1 2
    
  • Convert a dataframe to a vector (by rows) The solution is as.vector(t(mydf[i, ])) or c(mydf[i, ]). My example:
    str(trainData)
    # 'data.frame':	503 obs. of  500 variables:
    #  $ bm001: num  0.429 1 -0.5 1.415 -1.899 ...
    #  $ bm002: num  0.0568 1 0.5 0.3556 -1.16 ...
    # ...
    trainData[1:3, 1:3]
    #        bm001      bm002    bm003
    # 1  0.4289449 0.05676296 1.657966
    # 2  1.0000000 1.00000000 1.000000
    # 3 -0.5000000 0.50000000 0.500000
    o <- data.frame(time = trainData[1, ], status = trainData[2, ], treat = trainData[3, ], t(TData))
    # Warning message:
    # In data.frame(time = trainData[1, ], status = trainData[2, ], treat = trainData[3,  :
    #   row names were found from a short variable and have been discarded
    

    'trees' data from the 'datasets' package

    trees[1:3,]
    #   Girth Height Volume
    # 1   8.3     70   10.3
    # 2   8.6     65   10.3
    # 3   8.8     63   10.2
    
    # Wrong ways:
    data.frame(trees[1,] , trees[2,])
    #   Girth Height Volume Girth.1 Height.1 Volume.1
    # 1   8.3     70   10.3     8.6       65     10.3
    data.frame(time=trees[1,] , status=trees[2,])
    #   time.Girth time.Height time.Volume status.Girth status.Height status.Volume
    # 1        8.3          70        10.3          8.6            65          10.3
    data.frame(time=as.vector(trees[1,]) , status=as.vector(trees[2,]))
    #   time.Girth time.Height time.Volume status.Girth status.Height status.Volume
    # 1        8.3          70        10.3          8.6            65          10.3
    data.frame(time=c(trees[1,]) , status=c(trees[2,]))
    # time.Girth time.Height time.Volume status.Girth status.Height status.Volume
    # 1        8.3          70        10.3          8.6            65          10.3
    
    # Right ways:
    # method 1: dropping row names
    data.frame(time=c(t(trees[1,])) , status=c(t(trees[2,]))) 
    # OR
    data.frame(time=as.numeric(trees[1,]) , status=as.numeric(trees[2,]))
    #   time status
    # 1  8.3    8.6
    # 2 70.0   65.0
    # 3 10.3   10.3
    # method 2: keeping row names
    data.frame(time=t(trees[1,]) , status=t(trees[2,]))
    #          X1   X2
    # Girth   8.3  8.6
    # Height 70.0 65.0
    # Volume 10.3 10.3
    data.frame(time=unlist(trees[1,]) , status=unlist(trees[2,]))
    #        time status
    # Girth   8.3    8.6
    # Height 70.0   65.0
    # Volume 10.3   10.3
    
    # Method 3: convert a data frame to a matrix
    is.matrix(trees)
    # [1] FALSE
    trees2 <- as.matrix(trees)
    data.frame(time=trees2[1,] , status=trees2[2,]) # row names are kept
    #        time status
    # Girth   8.3    8.6
    # Height 70.0   65.0
    # Volume 10.3   10.3
    
    dim(trees[1,])
    # [1] 1 3
    dim(trees2[1, ])
    # NULL
    trees[1, ]  # notice the row name '1' on the left hand side
    #   Girth Height Volume
    # 1   8.3     70   10.3
    trees2[1, ]
    #  Girth Height Volume
    #    8.3   70.0   10.3
    

Convert a list to data frame

How to Convert a List to a Data Frame in R.

# method 1
data.frame(t(sapply(my_list,c)))

# method 2
library(dplyr)
bind_rows(my_list) # OR bind_cols(my_list)

# method 3
library(data.table)
rbindlist(my_list)

tibble and data.table

Clean a dataset

How to clean the datasets in R

matrix

Define and subset a matrix

  • Matrix in R
    • It is clear when a vector becomes a matrix the data is transformed column-wisely (byrow = FALSE, by default).
    • When subsetting a matrix, it follows the format: X[rows, colums] or X[y-axis, x-axis].
data <- c(2, 4, 7, 5, 10, 1)
A <- matrix(data, ncol = 3)
print(A)
#      [,1] [,2] [,3]
# [1,]    2    7   10
# [2,]    4    5    1

A[1:1, 2:3, drop=F]
#      [,1] [,2]
# [1,]    7   10

Prevent automatic conversion of single column to vector

use drop = FALSE such as mat[, 1, drop = FALSE].

complete.cases(): remove rows with missing in any column

It works on a sequence of vectors, matrices and data frames.

NROW vs nrow

?nrow. Use NROW/NCOL instead of nrow/ncol to treat vectors as 1-column matrices.

matrix (column-major order) multiply a vector

> matrix(1:6, 3,2)
     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6
> matrix(1:6, 3,2) * c(1,2,3) # c(1,2,3) will be recycled to form a matrix. Good quiz.
     [,1] [,2]
[1,]    1    4
[2,]    4   10
[3,]    9   18
> matrix(1:6, 3,2) * c(1,2,3,4) # c(1,2,3,4) will be recycled
     [,1] [,2]
[1,]    1   16
[2,]    4    5
[3,]    9   12

add a vector to all rows of a matrix

add a vector to all rows of a matrix. sweep() or rep() is the best.

sparse matrix

R convert matrix or data frame to sparseMatrix

To subset a vector from some column of a sparseMatrix, we need to convert it to a regular vector, as.vector().

Attributes

Names

Useful functions for dealing with object names. (Un)Setting object names: stats::setNames(), unname() and rlang::set_names()

Print a vector by suppressing names

Use unname. sapply(, , USE.NAMES = FALSE).

format.pval/print p-values/format p values

format.pval(). By default it will show 5 significant digits (getOption("digits")-2).

> set.seed(1); format.pval(c(stats::runif(5), pi^-100, NA))
[1] "0.26551" "0.37212" "0.57285" "0.90821" "0.20168" "< 2e-16" "NA"
> format.pval(c(0.1, 0.0001, 1e-27))
[1] "1e-01"  "1e-04"  "<2e-16"

R> pvalue
[1] 0.0004632104
R> print(pvalue, digits =20)
[1] 0.00046321036188223807528
R> format.pval(pvalue)
[1] "0.00046321"
R> format.pval(pvalue * 1e-1)
[1] "4.6321e-05"
R> format.pval(0.00004632)
[1] "4.632e-05"
R> getOption("digits")
[1] 7

Return type

The format.pval() function returns a string, so it’s not appropriate to use the returned object for operations like sorting.

Wrong number of digits in format.pval()

See here. The solution is to apply round() and then format.pval().

x <- c(6.25433625041843e-05, NA, 0.220313341361346, NA, 0.154029880744594, 
   0.0378437685448703, 0.023358329881356, NA, 0.0262561986351483, 
   0.000251274794673796) 
format.pval(x, digits=3)
# [1] "6.25e-05" "NA"       "0.220313" "NA"       "0.154030" "0.037844" "0.023358"
# [8] "NA"       "0.026256" "0.000251"

round(x, 3) |> format.pval(digits=3, eps=.001)
# [1] "<0.001" "NA"     "0.220"  "NA"     "0.154"  "0.038"  "0.023"  "NA"
# [9] "0.026"  "<0.001"

dplr::mutate_if()

library(dplyr)
df <- data.frame(
  char_var = c("A", "B", "C"),
  num_var1 = c(1.123456, 2.123456, 3.123456),
  num_var2 = c(4.654321, 5.654321, 6.654321),
  stringsAsFactors = FALSE
)

# Round numerical variables to 4 digits after the decimal point
df_rounded <- df %>%
  mutate_if(is.numeric, round, digits = 4)

Customize R: options()

Change the default R repository, my .Rprofile

Change R repository

Edit global Rprofile file. On *NIX platforms, it's located in /usr/lib/R/library/base/R/Rprofile although local .Rprofile settings take precedence.

For example, I can specify the R mirror I like by creating a single line .Rprofile file under my home directory. Another good choice of repository is cloud.r-project.org.

Type file.edit("~/.Rprofile")

local({
  r = getOption("repos")
  r["CRAN"] = "https://cran.rstudio.com/"
  options(repos = r)
})
options(continue = "  ", editor = "nano")
message("Hi MC, loading ~/.Rprofile")
if (interactive()) {
  .Last <- function() try(savehistory("~/.Rhistory"))
}

Change the default web browser for utils::browseURL()

When I run help.start() function in LXLE, it cannot find its default web browser (seamonkey). The solution is to put

options(browser='seamonkey')

in the .Rprofile of your home directory. If the browser is not in the global PATH, we need to put the full path above.

For one-time only purpose, we can use the browser option in help.start() function:

> help.start(browser="seamonkey")
If the browser launched by 'seamonkey' is already running, it is *not*
    restarted, and you must switch to its window.
Otherwise, be patient ...

We can work made a change (or create the file) ~/.Renviron or etc/Renviron. See

Change the default editor

On my Linux and mac, the default editor is "vi". To change it to "nano",

options(editor = "nano")

Change prompt and remove '+' sign

See https://stackoverflow.com/a/1448823.

options(prompt="R> ", continue=" ")

digits

  • signif() rounds x to n significant digits.
    R> signif(pi, 3)
    [1] 3.14
    R> signif(pi, 5)
    [1] 3.1416
    
  • The default digits 7 may be too small. For example, if a number is very large, then we may not be able to see (enough) value after the decimal point. The acceptable range is 1-22. See the following examples

In R,

> options()$digits # Default
[1] 7
> print(.1+.2, digits=18)
[1] 0.300000000000000044
> 100000.07 + .04
[1] 100000.1
> options(digits = 16)
> 100000.07 + .04
[1] 100000.11

In Python,

>>> 100000.07 + .04
100000.11

Disable scientific notation in printing: options(scipen)

How to Turn Off Scientific Notation in R?

This also helps with write.table() results. For example, 0.0003 won't become 3e-4 in the output file.

> numer = 29707; denom = 93874
> c(numer/denom, numer, denom) 
[1] 3.164561e-01 2.970700e+04 9.387400e+04

# Method 1. Without changing the global option
> format(c(numer/denom, numer, denom), scientific=FALSE)
[1] "    0.3164561" "29707.0000000" "93874.0000000"

# Method 2. Change the global option
> options(scipen=999)
> numer/denom
[1] 0.3164561
> c(numer/denom, numer, denom)
[1]     0.3164561 29707.0000000 93874.0000000
> c(4/5, numer, denom)
[1]     0.8 29707.0 93874.0

Suppress warnings: options() and capture.output()

Use options(). If warn is negative all warnings are ignored. If warn is zero (the default) warnings are stored until the top--level function returns.

op <- options("warn")
options(warn = -1)
....
options(op)

# OR
warnLevel <- options()$warn
options(warn = -1)
...
options(warn = warnLevel)

suppressWarnings()

suppressWarnings( foo() )

foo <- capture.output( 
 bar <- suppressWarnings( 
 {print( "hello, world" ); 
   warning("unwanted" )} ) ) 

capture.output()

str(iris, max.level=1) %>% capture.output(file = "/tmp/iris.txt")

Converts warnings into errors

options(warn=2)

demo() function

  • How to wait for a keypress in R? PS readline() is different from readLines().
    for(i in 1:2) { print(i); readline("Press [enter] to continue")}
    
  • Hit 'ESC' or Ctrl+c to skip the prompt "Hit <Return> to see next plot:"
  • demo() uses options() to ask users to hit Enter on each plot
    op <- options(device.ask.default = ask)  # ask = TRUE
    on.exit(options(op), add = TRUE)
    

sprintf

paste, paste0, sprintf

this post, 3 R functions that I enjoy

sep vs collapse in paste()

  • sep is used if we supply multiple separate objects to paste(). A more powerful function is tidyr::unite() function.
  • collapse is used to make the output of length 1. It is commonly used if we have only 1 input object
R> paste("a", "A", sep=",") # multi-vec -> multi-vec
[1] "a,A"
R> paste(c("Elon", "Taylor"), c("Mask", "Swift"))
[1] "Elon Mask"    "Taylor Swift"
# OR
R> sprintf("%s, %s", c("Elon", "Taylor"), c("Mask", "Swift"))

R> paste(c("a", "A"), collapse="-") # one-vec/multi-vec  -> one-scale
[1] "a-A"

# When use together, sep first and collapse second
R> paste(letters[1:3], LETTERS[1:3], sep=",", collapse=" - ")
[1] "a,A - b,B - c,C"
R> paste(letters[1:3], LETTERS[1:3], sep=",")
[1] "a,A" "b,B" "c,C"
R> paste(letters[1:3], LETTERS[1:3], sep=",") |> paste(collapse=" - ")
[1] "a,A - b,B - c,C"

Format number as fixed width, with leading zeros

# sprintf()
a <- seq(1,101,25)
sprintf("name_%03d", a)
[1] "name_001" "name_026" "name_051" "name_076" "name_101"

# formatC()
paste("name", formatC(a, width=3, flag="0"), sep="_")
[1] "name_001" "name_026" "name_051" "name_076" "name_101"

# gsub()
paste0("bm", gsub(" ", "0", format(5:15)))
# [1] "bm05" "bm06" "bm07" "bm08" "bm09" "bm10" "bm11" "bm12" "bm13" "bm14" "bm15"

formatC and prettyNum (prettifying numbers)

R> (x <- 1.2345 * 10 ^ (-8:4))
 [1] 1.2345e-08 1.2345e-07 1.2345e-06 1.2345e-05 1.2345e-04 1.2345e-03
 [7] 1.2345e-02 1.2345e-01 1.2345e+00 1.2345e+01 1.2345e+02 1.2345e+03
[13] 1.2345e+04
R> formatC(x)
 [1] "1.234e-08" "1.234e-07" "1.234e-06" "1.234e-05" "0.0001234" "0.001234"
 [7] "0.01235"   "0.1235"    "1.234"     "12.34"     "123.4"     "1234"
[13] "1.234e+04"
R> formatC(x, digits=3)
 [1] "1.23e-08" "1.23e-07" "1.23e-06" "1.23e-05" "0.000123" "0.00123"
 [7] "0.0123"   "0.123"    "1.23"     "12.3"     " 123"     "1.23e+03"
[13] "1.23e+04"
R> formatC(x, digits=3, format="e")
 [1] "1.234e-08" "1.234e-07" "1.234e-06" "1.234e-05" "1.234e-04" "1.234e-03"
 [7] "1.235e-02" "1.235e-01" "1.234e+00" "1.234e+01" "1.234e+02" "1.234e+03"
[13] "1.234e+04"

R> x <- .000012345
R> prettyNum(x)
[1] "1.2345e-05"
R> x <- .00012345
R> prettyNum(x)
[1] "0.00012345"

format(x, scientific = TRUE) vs round() vs format.pval()

Print numeric data in exponential format, so .0001 prints as 1e-4

format(c(0.00001156, 0.84134, 2.1669), scientific = T, digits=4)
# [1] "1.156e-05" "8.413e-01" "2.167e+00"
round(c(0.00001156, 0.84134, 2.1669), digits=4)
# [1] 0.0000 0.8413 2.1669

format.pval(c(0.00001156, 0.84134, 2.1669)) # output is char vector
# [1] "1.156e-05" "0.84134"   "2.16690"
format.pval(c(0.00001156, 0.84134, 2.1669), digits=4)
# [1] "1.156e-05" "0.8413"    "2.1669"

Creating publication quality graphs in R

HDF5 : Hierarchical Data Format

HDF5 is an open binary file format for storing and managing large, complex datasets. The file format was developed by the HDF Group, and is widely used in scientific computing.

Formats for writing/saving and sharing data

Efficiently Saving and Sharing Data in R

Write unix format files on Windows and vice versa

https://stat.ethz.ch/pipermail/r-devel/2012-April/063931.html

with() and within() functions

closePr <- with(mariokart, totalPr - shipPr)
head(closePr, 20)

mk <- within(mariokart, {
             closePr <- totalPr - shipPr
     })
head(mk) # new column closePr

mk <- mariokart
aggregate(. ~ wheels + cond, mk, mean)
# create mean according to each level of (wheels, cond)

aggregate(totalPr ~ wheels + cond, mk, mean)

tapply(mk$totalPr, mk[, c("wheels", "cond")], mean)

stem(): stem-and-leaf plot (alternative to histogram), bar chart on terminals

Plot histograms as lines

https://stackoverflow.com/a/16681279. This is useful when we want to compare the distribution from different statistics.

x2=invisible(hist(out2$EB))
y2=invisible(hist(out2$Bench))
z2=invisible(hist(out2$EB0.001))

plot(x=x2$mids, y=x2$density, type="l")
lines(y2$mids, y2$density, lty=2, pwd=2)
lines(z2$mids, z2$density, lty=3, pwd=2)

Histogram with density line

hist(x, prob = TRUE)
lines(density(x), col = 4, lwd = 2)

The overlayed density may looks strange in cases for example counts from single-cell RNASeq or p-values from RNASeq (there is a peak around x=0).

Graphical Parameters, Axes and Text, Combining Plots

statmethods.net

15 Questions All R Users Have About Plots

See 15 Questions All R Users Have About Plots. This is a tremendous post. It covers the built-in plot() function and ggplot() from ggplot2 package.

  1. How To Draw An Empty R Plot? plot.new()
  2. How To Set The Axis Labels And Title Of The R Plots?
  3. How To Add And Change The Spacing Of The Tick Marks Of Your R Plot? axis()
  4. How To Create Two Different X- or Y-axes? par(new=TRUE), axis(), mtext(). ?par.
  5. How To Add Or Change The R Plot’s Legend? legend()
  6. How To Draw A Grid In Your R Plot? grid()
  7. How To Draw A Plot With A PNG As Background? rasterImage() from the png package
  8. How To Adjust The Size Of Points In An R Plot? cex argument
  9. How To Fit A Smooth Curve To Your R Data? loess() and lines()
  10. How To Add Error Bars In An R Plot? arrows()
  11. How To Save A Plot As An Image On Disc
  12. How To Plot Two R Plots Next To Each Other? par(mfrow)[which means Multiple Figures (use ROW-wise)], gridBase package, lattice package
  13. How To Plot Multiple Lines Or Points? plot(), lines()
  14. How To Fix The Aspect Ratio For Your R Plots? asp parameter
  15. What Is The Function Of hjust And vjust In ggplot2?

jitter function

Jitterbox.png

Scatterplot with the "rug" function

require(stats)  # both 'density' and its default method
with(faithful, {
    plot(density(eruptions, bw = 0.15))
    rug(eruptions)
    rug(jitter(eruptions, amount = 0.01), side = 3, col = "light blue")
})

File:RugFunction.png

See also the stripchart() function which produces one dimensional scatter plots (or dot plots) of the given data.

Identify/Locate Points in a Scatter Plot

  • ?identify
  • Using the identify function in R
    plot(x, y)
    identify(x, y, labels = names, plot = TRUE) 
    # Use left clicks to select points we want to identify and "esc" to stop the process
    # This will put the labels on the plot and also return the indices of points
    # [1] 143
    names[143]
    

Draw a single plot with two different y-axes

Draw Color Palette

Default palette before R 4.0

palette() # black, red, green3, blue, cyan, magenta, yellow, gray

# Example from Coursera "Statistics for Genomic Data Science" by Jeff Leek
tropical = c('darkorange', 'dodgerblue', 'hotpink', 'limegreen', 'yellow')
palette(tropical)
plot(1:5, 1:5, col=1:5, pch=16, cex=5)

New palette in R 4.0.0

R 4.0: 3 new features, R 4.0.0 now available, and a look back at R's history. For example, we can select "ggplot2" palette to make the base graphics charts that match the color scheme of ggplot2.

R> palette() 
[1] "black"   "#DF536B" "#61D04F" "#2297E6" "#28E2E5" "#CD0BBC" "#F5C710"
[8] "gray62"
R> palette.pals()
 [1] "R3"              "R4"              "ggplot2"        
 [4] "Okabe-Ito"       "Accent"          "Dark 2"         
 [7] "Paired"          "Pastel 1"        "Pastel 2"       
[10] "Set 1"           "Set 2"           "Set 3"          
[13] "Tableau 10"      "Classic Tableau" "Polychrome 36"  
[16] "Alphabet"
R> palette.colors(palette='R4') # same as palette()
[1] "#000000" "#DF536B" "#61D04F" "#2297E6" "#28E2E5" "#CD0BBC" "#F5C710"
[8] "#9E9E9E"
R> palette("R3")  # nothing return on screen but palette has changed
R> palette() 
[1] "black"   "red"     "green3"  "blue"    "cyan"    "magenta" "yellow" 
[8] "gray"  
R> palette("R4") # reset to the default color palette; OR palette("default")

R> scales::show_col(palette.colors(palette = "Okabe-Ito"))
R> for(id in palette.pals()) { 
     scales::show_col(palette.colors(palette = id))
     title(id)
     readline("Press [enter] to continue") 
   } 

The palette function can also be used to change the color palette. See Setting up Color Palettes in R

palette("ggplot2")
palette(palette()[-1]) # Remove 'black'
   # OR palette(palette.colors(palette = "ggplot2")[-1] )
with(iris, plot(Sepal.Length, Petal.Length, col = Species, pch=16))

cc <- palette()
palette(c(cc,"purple","brown")) # Add two colors
R> colors() |> length() # [1] 657
R> colors(distinct = T) |> length() # [1] 502

evoPalette

Evolve new colour palettes in R with evoPalette

rtist

rtist: Use the palettes of famous artists in your own visualizations.

SVG

Embed svg in html

svglite

svglite is better R's svg(). It was used by ggsave(). svglite 1.2.0, R Graphics Cookbook.

pdf -> svg

Using Inkscape. See this post.

svg -> png

SVG to PNG using the gyro package

read.table

clipboard

source("clipboard")
read.table("clipboard")

inline text

mydf <- read.table(header=T, text='
 cond yval
    A 2
    B 2.5
    C 1.6
')

http(s) connection

temp = getURL("https://gist.github.com/arraytools/6743826/raw/23c8b0bc4b8f0d1bfe1c2fad985ca2e091aeb916/ip.txt", 
                           ssl.verifypeer = FALSE)
ip <- read.table(textConnection(temp), as.is=TRUE)

read only specific columns

Use 'colClasses' option in read.table, read.delim, .... For example, the following example reads only the 3rd column of the text file and also changes its data type from a data frame to a vector. Note that we have include double quotes around NULL.

x <- read.table("var_annot.vcf", colClasses = c(rep("NULL", 2), "character", rep("NULL", 7)), 
                skip=62, header=T, stringsAsFactors = FALSE)[, 1]
# 
system.time(x <- read.delim("Methylation450k.txt", 
                colClasses = c("character", "numeric", rep("NULL", 188)), stringsAsFactors = FALSE))

To know the number of columns, we might want to read the first row first.

library(magrittr)
scan("var_annot.vcf", sep="\t", what="character", skip=62, nlines=1, quiet=TRUE) %>% length()

Another method is to use pipe(), cut or awk. See ways to read only selected columns from a file into R

check.names = FALSE in read.table()

gx <- read.table(file, header = T, row.names =1)
colnames(gx) %>% grep("[^[:alnum:] ]", ., value = TRUE)
# [1] "hCG_1642354" "IGH."        "IGHV1.69"    "IGKV1.5"     "IGKV2.24"    "KRTAP13.2"  
# [7] "KRTAP19.1"   "KRTAP2.4"    "KRTAP5.9"    "KRTAP6.3"    "Kua.UEV"  

gx <- read.table(file, header = T, row.names =1, check.names = FALSE)
colnames(gx) %>% grep("[^[:alnum:] ]", ., value = TRUE)
# [1] "hCG_1642354" "IGH@"        "IGHV1-69"    "IGKV1-5"     "IGKV2-24"    "KRTAP13-2"  
# [7] "KRTAP19-1"   "KRTAP2-4"    "KRTAP5-9"    "KRTAP6-3"    "Kua-UEV"  

setNames()

Change the colnames. See an example from tidymodels

Testing for valid variable names

Testing for valid variable names

make.names(): Make syntactically valid names out of character vectors

  • make.names()
  • A valid variable name consists of letters, numbers and the dot or underline characters. The variable name starts with a letter or the dot not followed by a number. See R variables.
make.names("abc-d") # [1] "abc.d"

Serialization

If we want to pass an R object to C (use recv() function), we can use writeBin() to output the stream size and then use serialize() function to output the stream to a file. See the post on R mailing list.

> a <- list(1,2,3)
> a_serial <- serialize(a, NULL)
> a_length <- length(a_serial)
> a_length
[1] 70
> writeBin(as.integer(a_length), connection, endian="big")
> serialize(a, connection)

In C++ process, I receive one int variable first to get the length, and then read <length> bytes from the connection.

socketConnection

See ?socketconnection.

Simple example

from the socketConnection's manual.

Open one R session

con1 <- socketConnection(port = 22131, server = TRUE) # wait until a connection from some client
writeLines(LETTERS, con1)
close(con1)

Open another R session (client)

con2 <- socketConnection(Sys.info()["nodename"], port = 22131)
# as non-blocking, may need to loop for input
readLines(con2)
while(isIncomplete(con2)) {
   Sys.sleep(1)
   z <- readLines(con2)
   if(length(z)) print(z)
}
close(con2)

Use nc in client

The client does not have to be the R. We can use telnet, nc, etc. See the post here. For example, on the client machine, we can issue

nc localhost 22131   [ENTER]

Then the client will wait and show anything written from the server machine. The connection from nc will be terminated once close(con1) is given.

If I use the command

nc -v -w 2 localhost -z 22130-22135

then the connection will be established for a short time which means the cursor on the server machine will be returned. If we issue the above nc command again on the client machine it will show the connection to the port 22131 is refused. PS. "-w" switch denotes the number of seconds of the timeout for connects and final net reads.

Some post I don't have a chance to read. http://digitheadslabnotebook.blogspot.com/2010/09/how-to-send-http-put-request-from-r.html

Use curl command in client

On the server,

con1 <- socketConnection(port = 8080, server = TRUE)

On the client,

curl --trace-ascii debugdump.txt http://localhost:8080/

Then go to the server,

while(nchar(x <- readLines(con1, 1)) > 0) cat(x, "\n")

close(con1) # return cursor in the client machine

Use telnet command in client

On the server,

con1 <- socketConnection(port = 8080, server = TRUE)

On the client,

sudo apt-get install telnet
telnet localhost 8080
abcdefg
hijklmn
qestst

Go to the server,

readLines(con1, 1)
readLines(con1, 1)
readLines(con1, 1)
close(con1) # return cursor in the client machine

Some tutorial about using telnet on http request. And this is a summary of using telnet.

Subsetting

Subset assignment of R Language Definition and Manipulation of functions.

The result of the command x[3:5] <- 13:15 is as if the following had been executed

`*tmp*` <- x
x <- "[<-"(`*tmp*`, 3:5, value=13:15)
rm(`*tmp*`)

Avoid Coercing Indices To Doubles

1 or 1L

Careful on NA value

See the example below. base::subset() or dplyr::filter() can remove NA subsets.

R> mydf = data.frame(a=1:3, b=c(NA,5,6))
R> mydf[mydf$b >5, ]
    a  b
NA NA NA
3   3  6
R> mydf[which(mydf$b >5), ]
  a b
3 3 6
R> mydf %>% dplyr::filter(b > 5)
  a b
1 3 6
R> subset(mydf, b>5)
  a b
3 3 6

Implicit looping

set.seed(1)
i <- sample(c(TRUE, FALSE), size=10, replace = TRUE)
# [1]  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE
sum(i)        # [1] 6
x <- 1:10
length(x[i])  # [1] 6
x[i[1:3]]     # [1]  1  3  4  6  7  9 10
length(x[i[1:3]]) # [1] 7

modelling

update()

Extract all variable names in lm(), glm(), ...

all.vars(formula(Model)[-2])

as.formula(): use a string in formula in lm(), glm(), ...

? as.formula
xnam <- paste("x", 1:25, sep="")
fmla <- as.formula(paste("y ~ ", paste(xnam, collapse= "+")))
outcome <- "mpg"
variables <- c("cyl", "disp", "hp", "carb")

# Method 1. The 'Call' portion of the model is reported as “formula = f” 
# our modeling effort, 
# fully parameterized!
f <- as.formula(
  paste(outcome, 
        paste(variables, collapse = " + "), 
        sep = " ~ "))
print(f)
# mpg ~ cyl + disp + hp + carb

model <- lm(f, data = mtcars)
print(model)

# Call:
#   lm(formula = f, data = mtcars)
# 
# Coefficients:
#   (Intercept)          cyl         disp           hp         carb  
#     34.021595    -1.048523    -0.026906     0.009349    -0.926863  

# Method 2. eval() + bquote() + ".()"
format(terms(model))  #  or model$terms
# [1] "mpg ~ cyl + disp + hp + carb"

# The new line of code
model <- eval(bquote(   lm(.(f), data = mtcars)   ))

print(model)
# Call:
#   lm(formula = mpg ~ cyl + disp + hp + carb, data = mtcars)
# 
# Coefficients:
#   (Intercept)          cyl         disp           hp         carb  
#     34.021595    -1.048523    -0.026906     0.009349    -0.926863  

# Note if we skip ".()" operator
> eval(bquote(   lm(f, data = mtcars)   ))

Call:
lm(formula = f, data = mtcars)

Coefficients:
(Intercept)          cyl         disp           hp         carb  
  34.021595    -1.048523    -0.026906     0.009349    -0.926863 

reformulate

Simplifying Model Formulas with the R Function ‘reformulate()’

I() function

I() means isolates. See What does the capital letter "I" in R linear regression formula mean?, In R formulas, why do I have to use the I() function on power terms, like y ~ I(x^3)

Aggregating results from linear model

https://stats.stackexchange.com/a/6862

Replacement function "fun(x) <- a"

What are Replacement Functions in R?

R> xx <- c(1,3,66, 99)
R> "cutoff<-" <- function(x, value){
     x[x > value] <- Inf
     x
 }
R> cutoff(xx) <- 65 # xx & 65 are both input
R> xx
[1]   1   3 Inf Inf

R> "cutoff<-"(x = xx, value = 65)
[1]   1   3 Inf Inf

The statement fun(x) <- a and R will read x <- "fun<-"(x,a)

S3 and S4 methods and signature

Debug an S4 function

  • showMethods('FUNCTION')
  • getMethod('FUNCTION', 'SIGNATURE')
  • debug(, signature)
> args(debug)
function (fun, text = "", condition = NULL, signature = NULL) 

> library(genefilter) # Bioconductor
> showMethods("nsFilter")
Function: nsFilter (package genefilter)
eset="ExpressionSet"
> debug(nsFilter, signature="ExpressionSet")

library(DESeq2)
showMethods("normalizationFactors") # show the object class
                                    # "DESeqDataSet" in this case.
getMethod(`normalizationFactors`, "DESeqDataSet") # get the source code

See the source code of normalizationFactors<- (setReplaceMethod() is used) and the source code of estimateSizeFactors(). We can see how avgTxLength was used in estimateNormFactors().

Another example

library(GSVA)
args(gsva) # function (expr, gset.idx.list, ...)

showMethods("gsva")
# Function: gsva (package GSVA)
# expr="ExpressionSet", gset.idx.list="GeneSetCollection"
# expr="ExpressionSet", gset.idx.list="list"
# expr="matrix", gset.idx.list="GeneSetCollection"
# expr="matrix", gset.idx.list="list"
# expr="SummarizedExperiment", gset.idx.list="GeneSetCollection"
# expr="SummarizedExperiment", gset.idx.list="list"

debug(gsva, signature = c(expr="matrix", gset.idx.list="list"))
# OR
# debug(gsva, signature = c("matrix", "list"))
gsva(y, geneSets, method="ssgsea", kcdf="Gaussian")
Browse[3]> debug(.gsva)
# return(ssgsea(expr, gset.idx.list, alpha = tau, parallel.sz = parallel.sz, 
#      normalization = ssgsea.norm, verbose = verbose, 
#      BPPARAM = BPPARAM))

isdebugged("gsva")
# [1] TRUE
undebug(gsva)
library(IRanges)
ir <- IRanges(start=c(10, 20, 30), width=5)
ir

class(ir)
## [1] "IRanges"
## attr(,"package")
## [1] "IRanges"

getClassDef(class(ir))
## Class "IRanges" [package "IRanges"]
## 
## Slots:
##                                                                       
## Name:            start           width           NAMES     elementType
## Class:         integer         integer characterORNULL       character
##                                       
## Name:  elementMetadata        metadata
## Class: DataTableORNULL            list
## 
## Extends: 
## Class "Ranges", directly
## Class "IntegerList", by class "Ranges", distance 2
## Class "RangesORmissing", by class "Ranges", distance 2
## Class "AtomicList", by class "Ranges", distance 3
## Class "List", by class "Ranges", distance 4
## Class "Vector", by class "Ranges", distance 5
## Class "Annotated", by class "Ranges", distance 6
## 
## Known Subclasses: "NormalIRanges"

Check if a function is an S4 method

isS4(foo)

How to access the slots of an S4 object

  • @ will let you access the slots of an S4 object.
  • Note that often the best way to do this is to not access the slot directly but rather through an accessor function (e.g. coefs() rather than digging out the coefficients with $ or @). However, often such functions do not exist so you have to access the slots directly. This will mean that your code breaks if the internal implementation changes, however.
  • R - S4 Classes and Methods Hansen. getClass() or getClassDef().

setReplaceMethod()

See what methods work on an object

see what methods work on an object, e.g. a GRanges object:

methods(class="GRanges")

Or if you have an object, x:

methods(class=class(x))

View S3 function definition: double colon '::' and triple colon ':::' operators and getAnywhere()

?":::"

  • pkg::name returns the value of the exported variable name in namespace pkg
  • pkg:::name returns the value of the internal variable name
base::"+"
stats:::coef.default

predict.ppr
# Error: object 'predict.ppr' not found
stats::predict.ppr
# Error: 'predict.ppr' is not an exported object from 'namespace:stats'
stats:::predict.ppr  # OR  
getS3method("predict", "ppr")

getS3method("t", "test")

methods() + getAnywhere() functions

Read the source code (include Fortran/C, S3 and S4 methods)

S3 method is overwritten

For example, the select() method from dplyr is overwritten by grpreg package.

An easy solution is to load grpreg before loading dplyr.

mcols() and DataFrame() from Bioc S4Vectors package

  • mcols: Get or set the metadata columns.
  • colData: SummarizedExperiment instances from GenomicRanges
  • DataFrame: The DataFrame class extends the DataTable virtual class and supports the storage of any type of object (with length and [ methods) as columns.

For example, in Shrinkage of logarithmic fold changes vignette of the DESeq2paper package

> mcols(ddsNoPrior[genes, ])
DataFrame with 2 rows and 21 columns
   baseMean   baseVar   allZero dispGeneEst    dispFit dispersion  dispIter dispOutlier   dispMAP
  <numeric> <numeric> <logical>   <numeric>  <numeric>  <numeric> <numeric>   <logical> <numeric>
1  163.5750  8904.607     FALSE  0.06263141 0.03862798  0.0577712         7       FALSE 0.0577712
2  175.3883 59643.515     FALSE  2.25306109 0.03807917  2.2530611        12        TRUE 1.6011440
  Intercept strain_DBA.2J_vs_C57BL.6J SE_Intercept SE_strain_DBA.2J_vs_C57BL.6J WaldStatistic_Intercept
  <numeric>                 <numeric>    <numeric>                    <numeric>               <numeric>
1  6.210188                  1.735829    0.1229354                    0.1636645               50.515872
2  6.234880                  1.823173    0.6870629                    0.9481865                9.074686
  WaldStatistic_strain_DBA.2J_vs_C57BL.6J WaldPvalue_Intercept WaldPvalue_strain_DBA.2J_vs_C57BL.6J
                                <numeric>            <numeric>                            <numeric>
1                                10.60602         0.000000e+00                         2.793908e-26
2                                 1.92280         1.140054e-19                         5.450522e-02
   betaConv  betaIter  deviance  maxCooks
  <logical> <numeric> <numeric> <numeric>
1      TRUE         3  210.4045 0.2648753
2      TRUE         9  243.7455 0.3248949

Pipe

Packages take advantage of pipes

  • rstatix: Pipe-Friendly Framework for Basic Statistical Tests

findInterval()

Related functions are cuts() and split(). See also

Assign operator

  • Earlier versions of R used underscore (_) as an assignment operator.
  • Assignments with the = Operator
  • In R 1.8.0 (2003), the assign operator has been removed. See NEWS.
  • In R 1.9.0 (2004), "_" is allowed in valid names. See NEWS.
R162.png

Operator precedence

The ':' operator has higher precedence than '-' so 0:N-1 evaluates to (0:N)-1, not 0:(N-1) like you probably wanted.

order(), rank() and sort()

If we want to find the indices of the first 25 genes with the smallest p-values, we can use order(pval)[1:25].

> x = sample(10)
> x
 [1]  4  3 10  7  5  8  6  1  9  2
> order(x)
 [1]  8 10  2  1  5  7  4  6  9  3
> rank(x)
 [1]  4  3 10  7  5  8  6  1  9  2
> rank(10*x)
 [1]  4  3 10  7  5  8  6  1  9  2

> x[order(x)]
 [1]  1  2  3  4  5  6  7  8  9 10
> sort(x)
 [1]  1  2  3  4  5  6  7  8  9 10

relate order() and rank()

  • Order to rank: rank() = order(order())
    set.seed(1)
    x <- rnorm(5)
    order(x)
    # [1] 3 1 2 5 4
    rank(x)
    # [1] 2 3 1 5 4
    order(order(x))
    # [1] 2 3 1 5 4
    all(rank(x) == order(order(x)))
    # TRUE
  • Order to Rank method 2: rank(order()) = 1:n
    ord <- order(x)
    ranks <- integer(length(x))
    ranks[ord] <- seq_along(x)
    ranks
    # [1] 2 3 1 5 4
  • Rank to Order:
    ranks <- rank(x)
    ord <- order(ranks)
    ord
    # [1] 3 1 2 5 4

OS-dependent results on sorting string vector

Gene symbol case.

# mac: 
order(c("DC-UbP", "DC2")) # c(1,2)

# linux: 
order(c("DC-UbP", "DC2")) # c(2,1)

Affymetric id case.

# mac:
order(c("202800_at", "2028_s_at")) # [1] 2 1
sort(c("202800_at", "2028_s_at")) # [1] "2028_s_at" "202800_at"

# linux
order(c("202800_at", "2028_s_at")) # [1] 1 2
sort(c("202800_at", "2028_s_at")) # [1] "202800_at" "2028_s_at"

It does not matter if we include factor() on the character vector.

The difference is related to locale. See

# both mac and linux
stringr::str_order(c("202800_at", "2028_s_at")) # [1] 2 1
stringr::str_order(c("DC-UbP", "DC2")) # [1] 1 2

# Or setting the locale to "C"
Sys.setlocale("LC_ALL", "C"); sort(c("DC-UbP", "DC2"))
# Or
Sys.setlocale("LC_COLLATE", "C"); sort(c("DC-UbP", "DC2"))
# But not
Sys.setlocale("LC_ALL", "en_US.UTF-8"); sort(c("DC-UbP", "DC2"))

unique()

It seems it does not sort. ?unique.

# mac & linux
R> unique(c("DC-UbP", "DC2"))
[1] "DC-UbP" "DC2"

do.call

do.call constructs and executes a function call from a name or a function and a list of arguments to be passed to it.

The do.call() function in R: Unlocking Efficiency and Flexibility

Below are some examples from the help.

  • Usage
do.call(what, args, quote = FALSE, envir = parent.frame())
# what: either a function or a non-empty character string naming the function to be called.
# args: a list of arguments to the function call. The names attribute of args gives the argument names.
# quote: a logical value indicating whether to quote the arguments.
# envir: an environment within which to evaluate the call. This will be most useful
#        if what is a character string and the arguments are symbols or quoted expressions.
  • do.call() is similar to lapply() but not the same. It seems do.call() can make a simple function vectorized.
> do.call("complex", list(imag = 1:3))
[1] 0+1i 0+2i 0+3i
> lapply(list(imag = 1:3), complex)
$imag
[1] 0+0i
> complex(imag=1:3)
[1] 0+1i 0+2i 0+3i
> do.call(function(x) x+1, list(1:3))
[1] 2 3 4
  • Applying do.call with Multiple Arguments
> do.call("sum", list(c(1,2,3,NA), na.rm = TRUE))
[1] 6
> do.call("sum", list(c(1,2,3,NA) ))
[1] NA
> tmp <- expand.grid(letters[1:2], 1:3, c("+", "-"))
> length(tmp)
[1] 3
> tmp[1:4,]
  Var1 Var2 Var3
1    a    1    +
2    b    1    +
3    a    2    +
4    b    2    +
> c(tmp, sep = "")
$Var1
 [1] a b a b a b a b a b a b
Levels: a b

$Var2
 [1] 1 1 2 2 3 3 1 1 2 2 3 3

$Var3
 [1] + + + + + + - - - - - -
Levels: + -

$sep
[1] ""
> do.call("paste", c(tmp, sep = ""))
 [1] "a1+" "b1+" "a2+" "b2+" "a3+" "b3+" "a1-" "b1-" "a2-" "b2-" "a3-"
[12] "b3-"
  • environment and quote arguments.
> A <- 2
> f <- function(x) print(x^2)
> env <- new.env()
> assign("A", 10, envir = env)
> assign("f", f, envir = env)
> f <- function(x) print(x)
> f(A)   
[1] 2
> do.call("f", list(A))
[1] 2
> do.call("f", list(A), envir = env)  
[1] 4
> do.call(f, list(A), envir = env)   
[1] 2                       # Why?

> eval(call("f", A))                      
[1] 2
> eval(call("f", quote(A)))               
[1] 2
> eval(call("f", A), envir = env)         
[1] 4
> eval(call("f", quote(A)), envir = env)  
[1] 100
> foo <- function(a=1, b=2, ...) { 
         list(arg=do.call(c, as.list(match.call())[-1])) 
  }
> foo()
$arg
NULL
> foo(a=1)
$arg
a 
1 
> foo(a=1, b=2, c=3)
$arg
a b c 
1 2 3 
  • do.call() + switch(). See an example from Seurat::NormalizeData.
do.call(
   what = switch(
     EXPR = margin,
     '1' = 'rbind',
     '2' = 'cbind',
     stop("'margin' must be 1 or 2")
   ),
   args = normalized.data
)
switch('a', 'a' = rnorm(3), 'b'=rnorm(4)) # switch returns a value
do.call(switch('a', 'a' = 'rnorm', 'b'='rexp'), args=list(n=4)) # switch returns a function
  • The function we want to call is a string that may change: glmnet
# Suppose we want to call cv.glmnet or cv.coxnet or cv.lognet or cv.elnet .... depending on the case
fun = paste("cv", subclass, sep = ".")
cvstuff = do.call(fun, list(predmat,y,type.measure,weights,foldid,grouped))

expand.grid, mapply, vapply

A faster way to generate combinations for mapply and vapply

do.call vs mapply

  • do.call() is doing what mapply() does but do.call() uses a list instead of multiple arguments. So do.call() more close to base::Map() function.
> mapply(paste, tmp[1], tmp[2], tmp[3], sep = "")
      Var1 
 [1,] "a1+"
 [2,] "b1+"
 [3,] "a2+"
 [4,] "b2+"
 [5,] "a3+"
 [6,] "b3+"
 [7,] "a1-"
 [8,] "b1-"
 [9,] "a2-"
[10,] "b2-"
[11,] "a3-"
[12,] "b3-"
# It does not work if we do not explicitly specify the arguments in mapply()
> mapply(paste, tmp, sep = "")
      Var1 Var2 Var3
 [1,] "a"  "1"  "+" 
 [2,] "b"  "1"  "+" 
 [3,] "a"  "2"  "+" 
 [4,] "b"  "2"  "+" 
 [5,] "a"  "3"  "+" 
 [6,] "b"  "3"  "+" 
 [7,] "a"  "1"  "-" 
 [8,] "b"  "1"  "-" 
 [9,] "a"  "2"  "-" 
[10,] "b"  "2"  "-" 
[11,] "a"  "3"  "-" 
[12,] "b"  "3"  "-" 
set.seed(1)
mapply(rweibull, 1, c(1, 10), MoreArgs=list(n=1))
# [1] 1.326108 9.885284
set.seed(1)
x <- replicate(1000, mapply(rweibull, 1, c(1, 10), MoreArgs=list(n=1)))
dim(x) # [1]  2 1000
rowMeans(x)
# [1]  1.032209 10.104131
set.seed(1); Vectorize(rweibull)(n=1, shape=1, scale=c(1, 10))
# [1] 1.326108 9.885284
set.seed(1); x <- replicate(1000, Vectorize(rweibull)(n=1, shape=1, scale=c(1, 10)))

do.call vs lapply

What's the difference between lapply and do.call? It seems to me the best usage is combining both functions: do.call(..., lapply())

  • lapply returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X.
  • do.call constructs and executes a function call from a name or a function and a list of arguments to be passed to it. It is widely used, for example, to assemble lists into simpler structures (often with rbind or cbind).
  • Map applies a function to the corresponding elements of given vectors... Map is a simple wrapper to mapply which does not attempt to simplify the result, similar to Common Lisp's mapcar (with arguments being recycled, however). Future versions may allow some control of the result type.
> lapply(iris, class) # same as Map(class, iris)
$Sepal.Length
[1] "numeric"

$Sepal.Width
[1] "numeric"

$Petal.Length
[1] "numeric"

$Petal.Width
[1] "numeric"

$Species
[1] "factor"

> x <- lapply(iris, class)
> do.call(c, x)
Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
   "numeric"    "numeric"    "numeric"    "numeric"     "factor" 

https://stackoverflow.com/a/10801902

  • lapply applies a function over a list. So there will be several function calls.
  • do.call calls a function with a list of arguments (... argument) such as c() or rbind()/cbind() or sum or order or "[" or paste. So there is only one function call.
> X <- list(1:3,4:6,7:9)
> lapply(X,mean)
1
[1] 2

2
[1] 5

3
[1] 8
> do.call(sum, X)
[1] 45
> sum(c(1,2,3), c(4,5,6), c(7,8,9))
[1] 45
> do.call(mean, X) # Error
> do.call(rbind,X)
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9
> lapply(X,rbind)
1
     [,1] [,2] [,3]
[1,]    1    2    3

2
     [,1] [,2] [,3]
[1,]    4    5    6

3
     [,1] [,2] [,3]
[1,]    7    8    9
> mapply(mean, X, trim=c(0,0.5,0.1))
[1] 2 5 8
> mapply(mean, X) 
[1] 2 5 8

Below is a good example to show the difference of lapply() and do.call() - Generating Random Strings.

> set.seed(1)
> x <- replicate(2, sample(LETTERS, 4), FALSE)
> x
1
[1] "Y" "D" "G" "A"

2
[1] "B" "W" "K" "N"

> lapply(x, paste0)
1
[1] "Y" "D" "G" "A"

2
[1] "B" "W" "K" "N"

> lapply(x, paste0, collapse= "")
1
[1] "YDGA"

2
[1] "BWKN"

> do.call(paste0, x)
[1] "YB" "DW" "GK" "AN"

do.call + rbind + lapply

Lots of examples. See for example this one for creating a data frame from a vector.

x <- readLines(textConnection("---CLUSTER 1 ---
 3
 4
 5
 6
 ---CLUSTER 2 ---
 9
 10
 8
 11"))

 # create a list of where the 'clusters' are
 clust <- c(grep("CLUSTER", x), length(x) + 1L)

 # get size of each cluster
 clustSize <- diff(clust) - 1L

 # get cluster number
 clustNum <- gsub("[^0-9]+", "", x[grep("CLUSTER", x)])

 result <- do.call(rbind, lapply(seq(length(clustNum)), function(.cl){
     cbind(Object = x[seq(clust[.cl] + 1L, length = clustSize[.cl])]
         , Cluster = .cl
         )
     }))

 result

     Object Cluster
[1,] "3"    "1"
[2,] "4"    "1"
[3,] "5"    "1"
[4,] "6"    "1"
[5,] "9"    "2"
[6,] "10"   "2"
[7,] "8"    "2"
[8,] "11"   "2"

A 2nd example is to sort a data frame by using do.call(order, list()).

Another example is to reproduce aggregate(). aggregate() = do.call() + by().

attach(mtcars)
do.call(rbind, by(mtcars, list(cyl, vs), colMeans))
# the above approach give the same result as the following
# except it does not have an extra Group.x columns
aggregate(mtcars, list(cyl, vs), FUN=mean)

Run examples

When we call help(FUN), it shows the document in the browser. The browser will show

example(FUN, package = "XXX") was run in the console
To view output in the browser, the knitr package must be installed

How to get examples from help file, example()

Code examples in the R package manuals:

# How to run all examples from a man page
example(within)

# How to check your examples?
devtools::run_examples() 
testthat::test_examples()

See this post. Method 1:

example(acf, give.lines=TRUE)

Method 2:

Rd <- utils:::.getHelpFile(?acf)
tools::Rd2ex(Rd)

"[" and "[[" with the sapply() function

Suppose we want to extract string from the id like "ABC-123-XYZ" before the first hyphen.

sapply(strsplit("ABC-123-XYZ", "-"), "[", 1)

is the same as

sapply(strsplit("ABC-123-XYZ", "-"), function(x) x[1])

Dealing with dates

  • Simple examples
    dates <- c("January 15, 2023", "December 31, 1999")
    date_objects <- as.Date(dates, format = "%B %d, %Y") # format is for the input
    # [1] "2023-01-15" "1999-12-31"
  • Find difference
    # Convert the dates to Date objects
    date1 <- as.Date("6/29/21", format="%m/%d/%y")
    date2 <- as.Date("11/9/21", format="%m/%d/%y")
    
    # Calculate the difference in days
    diff_days <- as.numeric(difftime(date2, date1, units="days")) # 133
    # In months
    diff_days / (365.25/12)  # 4.36961   
    
    # OR using the lubridate package
    library(lubridate)
    # Convert the dates to Date objects
    date1 <- mdy("6/29/21")
    date2 <- mdy("11/9/21")
    interval(date1, date2) %/% months(1)
  • http://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html
    d1 = date()
    class(d1) # "character"
    d2 = Sys.Date()
    class(d2) # "Date"
    
    format(d2, "%a %b %d")
    
    library(lubridate); ymd("20140108") # "2014-01-08 UTC"
    mdy("08/04/2013") # "2013-08-04 UTC"
    dmy("03-04-2013") # "2013-04-03 UTC"
    ymd_hms("2011-08-03 10:15:03") # "2011-08-03 10:15:03 UTC"
    ymd_hms("2011-08-03 10:15:03", tz="Pacific/Auckland") 
    # "2011-08-03 10:15:03 NZST"
    ?Sys.timezone
    x = dmy(c("1jan2013", "2jan2013", "31mar2013", "30jul2013"))
    wday(x[1]) # 3
    wday(x[1], label=TRUE) # Tues
  • http://www.r-statistics.com/2012/03/do-more-with-dates-and-times-in-r-with-lubridate-1-1-0/
  • http://rpubs.com/seandavi/GEOMetadbSurvey2014
  • We want our dates and times as class "Date" or the class "POSIXct", "POSIXlt". For more information type ?POSIXlt.
  • anytime package
  • weeks to Christmas difftime(as.Date(“2019-12-25”), Sys.Date(), units =“weeks”)
  • A Comprehensive Introduction to Handling Date & Time in R 2020
  • Working with Dates and Times Pt 1
    • Three major functions: as.Date(), as.POSIXct(), and as.POSIXlt().
    • POSIXct is a class in R that represents date-time data. The ct stands for “calendar time” and it represents the (signed) number of seconds since the beginning of 1970 as a numeric vector1. It stores date time as integer.
    • POSIXlt is a class in R that represents date-time data. It stands for “local time” and is a list with components as integer vectors, which can represent a vector of broken-down times. It stores date time as list:sec, min, hour, mday, mon, year, wday, yday, isdst, zone, gmtoff.
  • R lubridate: How To Efficiently Work With Dates and Times in R 2023

Nonstandard/non-standard evaluation, deparse/substitute and scoping

f <- function(x) {
  substitute(x)
}
f(1:10)
# 1:10
class(f(1:10)) # or mode()
# [1] "call"
g <- function(x) deparse(substitute(x))
g(1:10)
# [1] "1:10"
class(g(1:10)) # or mode()
# [1] "character"
  • quote(expr) - similar to substitute() but do nothing?? noquote - print character strings without quotes
mode(quote(1:10))
# [1] "call"
  • eval(expr, envir), evalq(expr, envir) - eval evaluates its first argument in the current scope before passing it to the evaluator: evalq avoids this.
sample_df <- data.frame(a = 1:5, b = 5:1, c = c(5, 3, 1, 4, 1))

subset1 <- function(x, condition) {
  condition_call <- substitute(condition)
  r <- eval(condition_call, x)
  x[r, ]
}
x <- 4
condition <- 4
subset1(sample_df, a== 4) # same as subset(sample_df, a >= 4)
subset1(sample_df, a== x) # WRONG!
subset1(sample_df, a == condition) # ERROR

subset2 <- function(x, condition) {
  condition_call <- substitute(condition)
  r <- eval(condition_call, x, parent.frame())
  x[r, ]
}
subset2(sample_df, a == 4) # same as subset(sample_df, a >= 4)
subset2(sample_df, a == x) # 👌 
subset2(sample_df, a == condition) # 👍
  • deparse(expr) - turns unevaluated expressions into character strings. For example,
> deparse(args(lm))
[1] "function (formula, data, subset, weights, na.action, method = \"qr\", " 
[2] "    model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, "
[3] "    contrasts = NULL, offset, ...) "                                    
[4] "NULL"     

> deparse(args(lm), width=20)
[1] "function (formula, data, "        "    subset, weights, "           
[3] "    na.action, method = \"qr\", " "    model = TRUE, x = FALSE, "   
[5] "    y = FALSE, qr = TRUE, "       "    singular.ok = TRUE, "        
[7] "    contrasts = NULL, "           "    offset, ...) "               
[9] "NULL"

Following is another example. Assume we have a bunch of functions (f1, f2, ...; each function implements a different algorithm) with same input arguments format (eg a1, a2). We like to run these function on the same data (to compare their performance).

f1 <- function(x) x+1; f2 <- function(x) x+2; f3 <- function(x) x+3

f1(1:3)
f2(1:3)
f3(1:3)

# Or
myfun <- function(f, a) {
    eval(parse(text = f))(a)
}
myfun("f1", 1:3)
myfun("f2", 1:3)
myfun("f3", 1:3)

# Or with lapply
method <- c("f1", "f2", "f3")
res <- lapply(method, function(M) {
                    Mres <- eval(parse(text = M))(1:3)
                    return(Mres)
})
names(res) <- method

library() accept both quoted and unquoted strings

How can library() accept both quoted and unquoted strings. The key lines are

  if (!character.only) 
     package <- as.character(substitute(package))

Lexical scoping

The ‘…’ argument

Functions

Function argument

Argument matching from R Language Definition manual.

Argument matching is augmented by the functions

Access to the partial matching algorithm used by R is via pmatch.

Check function arguments

Checking the inputs of your R functions: match.arg() , stopifnot()

stopifnot(): function argument sanity check

  • stopifnot(). stopifnot is a quick way to check multiple conditions on the input. so for instance. The code stops when either of the three conditions are not satisfied. However, it doesn't produce pretty error messages.
    stopifnot(condition1, condition2, ...)
    
  • Mining R 4.0.0 Changelog for Nuggets of Gold

Lazy evaluation in R functions arguments

R function arguments are lazy — they’re only evaluated if they’re actually used.

  • Example 1. By default, R function arguments are lazy.
f <- function(x) {
  999
}
f(stop("This is an error!"))
#> [1] 999
  • Example 2. If you want to ensure that an argument is evaluated you can use force().
add <- function(x) {
  force(x)
  function(y) x + y
}
adders2 <- lapply(1:10, add)
adders2[[1]](10)
#> [1] 11
adders2[[10]](10)
#> [1] 20
  • Example 3. Default arguments are evaluated inside the function.
f <- function(x = ls()) {
  a <- 1
  x
}

# ls() evaluated inside f:
f()
# [1] "a" "x"

# ls() evaluated in global environment:
f(ls())
# [1] "add"    "adders" "f" 
  • Example 4. Laziness is useful in if statements — the second statement below will be evaluated only if the first is true.
x <- NULL
if (!is.null(x) && x > 0) {

}

Use of functions as arguments

Just Quickly: The unexpected use of functions as arguments

body()

Remove top axis title base plot

Return functions in R

anonymous function

In R, the main difference between a lambda function (also known as an anonymous function) and a regular function is that a lambda function is defined without a name, while a regular function is defined with a name.

  • See Tidyverse page
  • But defining functions to use them only once is kind of overkill. That's why you can use so-called anonymous functions in R. For example, lapply(list(1,2,3), function(x) { x * x })
  • you can use lambda functions with many other functions in R that take a function as an argument. Some examples include sapply, apply, vapply, mapply, Map, Reduce, Filter, and Find. These functions all work in a similar way to lapply by applying a function to elements of a list or vector.
    Reduce(function(x, y) x*y, list(1, 2, 3, 4)) # 24
    
  • purrr anonymous function
  • The new pipe and anonymous function syntax in R 4.1.0
  • Functional programming from Advanced R
  • What are anonymous functions in R.
    > (function(x) x * x)(3)
    [1] 9
    > (\(x) x * x)(3)
    [1] 9

Backtick sign, infix/prefix/postfix operators

The backtick sign ` (not the single quote) refers to functions or variables that have otherwise reserved or illegal names; e.g. '&&', '+', '(', 'for', 'if', etc. See some examples in Advanced R and What do backticks do in R?.

iris %>%  `[[`("Species")

infix operator.

1 + 2    # infix
+ 1 2    # prefix
1 2 +    # postfix

Use with functions like sapply, e.g. sapply(1:5, `+`, 3) .

Error handling and exceptions, tryCatch(), stop(), warning() and message()

  • http://adv-r.had.co.nz/Exceptions-Debugging.html
  • Catch Me If You Can: Exception Handling in R
  • Temporarily disable warning messages
    # Method1: 
    suppressWarnings(expr)
    
    # Method 2:
    <pre>
    defaultW <- getOption("warn") 
    options(warn = -1) 
    [YOUR CODE] 
    options(warn = defaultW)
    
  • try() allows execution to continue even after an error has occurred. You can suppress the message with try(..., silent = TRUE).
    out <- try({
      a <- 1
      b <- "x"
      a + b
    })
    
    elements <- list(1:10, c(-1, 10), c(T, F), letters)
    results <- lapply(elements, log)
    is.error <- function(x) inherits(x, "try-error")
    succeeded <- !sapply(results, is.error)
    
  • tryCatch(): With tryCatch() you map conditions to handlers (like switch()), named functions that are called with the condition as an input. Note that try() is a simplified version of tryCatch().
    tryCatch(expr, ..., finally)
    
    show_condition <- function(code) {
      tryCatch(code,
        error = function(c) "error",
        warning = function(c) "warning",
        message = function(c) "message"
      )
    }
    show_condition(stop("!"))
    #> [1] "error"
    show_condition(warning("?!"))
    #> [1] "warning"
    show_condition(message("?"))
    #> [1] "message"
    show_condition(10)
    #> [1] 10
    

    Below is another snippet from available.packages() function,

    z <- tryCatch(download.file(....), error = identity)
    if (!inherits(z, "error")) STATEMENTS
    
  • The return class from tryCatch() may not be fixed.
    result <- tryCatch({
      # Code that might generate an error or warning
      log(99)
    }, warning = function(w) {
      # Code to handle warnings
      print(paste("Warning:", w))
    }, error = function(e) {
      # Code to handle errors
      print(paste("Error:", e))
    }, finally = {
      # Code to always run, regardless of whether an error or warning occurred
      print("Finished")
    })   
    # character type. But if we remove 'finally', it will be numeric.
    
  • Capture message, warnings and errors from a R function

suppressMessages()

suppressMessages(expression)

List data type

Create an empty list

out <- vector("list", length=3L) # OR out <- list()
for(j in 1:3) out[[j]] <- myfun(j)

outlist <- as.list(seq(nfolds))

Nested list of data frames

An array can only hold data of a single type. read.csv() returns a data frame, which can contain both numerical and character data.

res <- vector("list", 3) 
names(res) <- paste0("m", 1:3)
for (i in seq_along(res)) {
  res[[i]] <- vector("list", 2)  # second-level list with 2 elements
  names(res[[i]]) <- c("fc", "pre")
}

res[["m1"]][["fc"]] <- read.csv()

head(res$m1$fc) # Same as res[["m1"]][["fc"]]

Using $ in R on a List

How to Use Dollar Sign ($) Operator in R

Calling a function given a list of arguments

> args <- list(c(1:10, NA, NA), na.rm = TRUE)
> do.call(mean, args)
[1] 5.5
> mean(c(1:10, NA, NA), na.rm = TRUE)
[1] 5.5

Descend recursively through lists

x[[c(5,3)]] is the same as x[[5]][[3]]. See ?Extract.

Avoid if-else or switch

?plot.stepfun.

y0 <- c(1,2,4,3)
sfun0  <- stepfun(1:3, y0, f = 0)
sfun.2 <- stepfun(1:3, y0, f = .2)
sfun1  <- stepfun(1:3, y0, right = TRUE)

tt <- seq(0, 3, by = 0.1)
op <- par(mfrow = c(2,2))
plot(sfun0); plot(sfun0, xval = tt, add = TRUE, col.hor = "bisque")
plot(sfun.2);plot(sfun.2, xval = tt, add = TRUE, col = "orange") # all colors
plot(sfun1);lines(sfun1, xval = tt, col.hor = "coral")
##-- This is  revealing :
plot(sfun0, verticals = FALSE,
     main = "stepfun(x, y0, f=f)  for f = 0, .2, 1")

for(i in 1:3)
  lines(list(sfun0, sfun.2, stepfun(1:3, y0, f = 1))[[i]], col = i)
legend(2.5, 1.9, paste("f =", c(0, 0.2, 1)), col = 1:3, lty = 1, y.intersp = 1)

par(op)

File:StepfunExample.svg

Open a new Window device

X11() or dev.new()

par()

?par

text size (cex) and font size on main, lab & axis

Examples (default is 1 for each of them):

  • cex.main=0.9
  • cex.sub
  • cex.lab=0.8, font.lab=2 (x/y axis labels)
  • cex.axis=0.8, font.axis=2 (axis/tick text/labels)
  • col.axis="grey50"

An quick example to increase font size (cex.lab, cex.axis, cex.main) and line width (lwd) in a line plot and cex & lwd in the legend.

plot(x=x$mids, y=x$density, type="l", 
     xlab="p-value", ylab="Density", lwd=2, 
     cex.lab=1.5, cex.axis=1.5, 
     cex.main=1.5, main = "")
lines(y$mids, y$density, lty=2, pwd=2)
lines(z$mids, z$density, lty=3, pwd=2)
legend('topright',legend = c('Method A','Method B','Method C'),
       lty=c(2,1,3), lwd=c(2,2,2), cex = 1.5, xjust = 0.5, yjust = 0.5)

ggplot2 case (default font size is 11 points):

  • plot.title
  • plot.subtitle
  • axis.title.x, axis.title.y: (x/y axis labels)
  • axis.text.x & axis.text.y: (axis/tick text/labels)
ggplot(df, aes(x, y)) +
  geom_point() +
  labs(title = "Title", subtitle = "Subtitle", x = "X-axis", y = "Y-axis") +
  theme(plot.title = element_text(size = 20),
        plot.subtitle = element_text(size = 15),
        axis.title.x = element_text(size = 15),
        axis.title.y = element_text(size = 15),
        axis.text.x = element_text(size = 10),
        axis.text.y = element_text(size = 10))

Default font

layout

reset the settings

op <- par(mfrow=c(2,1), mar = c(5,7,4,2) + 0.1) 
....
par(op) # mfrow=c(1,1), mar = c(5,4,4,2) + .1

mtext (margin text) vs title

mgp (axis tick label locations or axis title)

  1. The margin line (in ‘mex’ units) for the axis title, axis labels and axis line. Note that ‘mgp[1]’ affects the axis ‘title’ whereas ‘mgp[2:3]’ affect tick mark labels. The default is ‘c(3, 1, 0)’. If we like to make the axis labels closer to an axis, we can use mgp=c(1.5, .5, 0) for example.
    • the default is c(3,1,0) which specify the margin line for the axis title, axis labels and axis line.
    • the axis title is drawn in the fourth line of the margin starting from the plot region, the axis labels are drawn in the second line and the axis line itself is the first line.
  2. Setting graph margins in R using the par() function and lots of cow milk
  3. Move Axis Label Closer to Plot in Base R (2 Examples)
  4. http://rfunction.com/archives/1302 mgp – A numeric vector of length 3, which sets the axis label locations relative to the edge of the inner plot window. The first value represents the location the labels/axis title (i.e. xlab and ylab in plot), the second the tick-mark labels, and third the tick marks. The default is c(3, 1, 0).

move axis title closer to axis

title(ylab="Within-cluster variance", line=0, 
      cex.lab=1.2, family="Calibri Light")

pch and point shapes

File:R pch.png

See here.

  • Full circle: pch=16
  • Display all possibilities: ggpubr::show_point_shapes()

lty (line type)

File:R lty.png

Line types in R: Ultimate Guide For R Baseplot and ggplot

See here.

ggpubr::show_line_types()

las (label style)

0: The default, parallel to the axis

1: Always horizontal boxplot(y~x, las=1)

2: Perpendicular to the axis

3: Always vertical

oma (outer margin), xpd, common title for two plots, 3 types of regions, multi-panel plots

no.readonly

R语言里par(no.readonly=TURE)括号里面这个参数什么意思?, R-par()

Non-standard fonts in postscript and pdf graphics

https://cran.r-project.org/doc/Rnews/Rnews_2006-2.pdf#page=41


NULL, NA, NaN, Inf

https://tomaztsql.wordpress.com/2018/07/04/r-null-values-null-na-nan-inf/

save()/load() vs saveRDS()/readRDS() vs dput()/dget() vs dump()/source()

  1. saveRDS() can only save one R object while save() does not have this constraint.
  2. saveRDS() doesn’t save the both the object and its name it just saves a representation of the object. As a result, the saved object can be loaded into a named object within R that is different from the name it had when originally serialized. See this post.
x <- 5
saveRDS(x, "myfile.rds")
x2 <- readRDS("myfile.rds")
identical(mod, mod2, ignore.environment = TRUE)

dput: Writes an ASCII text representation of an R object. The object name is not written (unlike dump).

$ data(pbc, package = "survival")
$ names(pbc)
$ dput(names(pbc))
c("id", "time", "status", "trt", "age", "sex", "ascites", "hepato", 
"spiders", "edema", "bili", "chol", "albumin", "copper", "alk.phos", 
"ast", "trig", "platelet", "protime", "stage")

> iris2 <- iris[1:2, ]
> dput(iris2)
structure(list(Sepal.Length = c(5.1, 4.9), Sepal.Width = c(3.5, 
3), Petal.Length = c(1.4, 1.4), Petal.Width = c(0.2, 0.2), Species = structure(c(1L, 
1L), .Label = c("setosa", "versicolor", "virginica"), class = "factor")), row.names = 1:2, class = "data.frame")

User 'verbose = TRUE' in load()

When we use load(), it is helpful to add 'verbose =TRUE' to see what objects get loaded.

What are RDS files anyways

Archive Existing RDS Files

==, all.equal(), identical()

  • ==: exact match
  • all.equal: compare R objects x and y testing ‘near equality’
  • identical: The safe and reliable way to test two objects for being exactly equal.
x <- 1.0; y <- 0.99999999999
all.equal(x, y)
# [1] TRUE
identical(x, y)
# [1] FALSE

Be careful about using "==" to return an index of matches in the case of data with missing values.

R> c(1,2,NA)[c(1,2,NA) == 1]
[1]  1 NA
R> c(1,2,NA)[which(c(1,2,NA) == 1)]
[1] 1

See also the testhat package.

I found a case when I compare two objects where 1 is generated in Linux and the other is generated in macOS that identical() gives FALSE but all.equal() returns TRUE. The difference has a magnitude only e-17.

waldo

diffobj: Compare/Diff R Objects

https://cran.r-project.org/web/packages/diffobj/index.html

testthat

tinytest

tinytest: Lightweight but Feature Complete Unit Testing Framework

ttdo adds support of the 'diffobj' package for 'diff'-style comparison of R objects.

Numerical Pitfall

Numerical pitfalls in computing variance

.1 - .3/3
## [1] 0.00000000000000001388

Sys.getpid()

This can be used to monitor R process memory usage or stop the R process. See this post.

Sys.getenv() & make the script more portable

Replace all the secrets from the script and replace them with Sys.getenv("secretname"). You can save the secrets in an .Renviron file next to the script in the same project.

$ for v in 1 2; do MY=$v Rscript -e "Sys.getenv('MY')"; done
[1] "1"
[1] "2"
$ echo $MY
2

How to write R codes

  • Code smells and feels from R Consortium
    • write simple conditions,
    • handle class properly,
    • return and exit early,
    • polymorphism,
    • switch() [e.g., switch(var, value1=out1, value2=out2, value3=out3). Several examples in glmnet ]
    • case_when(),
    • %||%.
  • 5 Tips for Writing Clean R Code – Leave Your Code Reviewer Commentless
    • Comments
    • Strings
    • Loops
    • Code Sharing
    • Good Programming Practices

How to debug an R code

Debug R

Locale bug (grep did not handle UTF-8 properly PR#16264)

https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16264

Path length in dir.create() (PR#17206)

https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17206 (Windows only)

install.package() error, R_LIBS_USER is empty in R 3.4.1 & .libPaths()

R_LIBS_USER=${R_LIBS_USER-'~/R/x86_64-pc-linux-gnu-library/3.4'}
R_LIBS_USER="${HOME}/R/${R_PLATFORM}-library/3.4"

On Mac & R 3.4.0 (it's fine)

> Sys.getenv("R_LIBS_USER")
[1] "~/Library/R/3.4/library"
> .libPaths()
[1] "/Library/Frameworks/R.framework/Versions/3.4/Resources/library"

On Linux & R 3.3.1 (ARM)

> Sys.getenv("R_LIBS_USER")
[1] "~/R/armv7l-unknown-linux-gnueabihf-library/3.3"
> .libPaths()
[1] "/home/$USER/R/armv7l-unknown-linux-gnueabihf-library/3.3"
[2] "/usr/local/lib/R/library"

On Linux & R 3.4.1 (*Problematic*)

> Sys.getenv("R_LIBS_USER")
[1] ""
> .libPaths()
[1] "/usr/local/lib/R/site-library" "/usr/lib/R/site-library"
[3] "/usr/lib/R/library"

I need to specify the lib parameter when I use the install.packages command.

> install.packages("devtools", "~/R/x86_64-pc-linux-gnu-library/3.4")
> library(devtools)
Error in library(devtools) : there is no package called 'devtools'

# Specify lib.loc parameter will not help with the dependency package
> library(devtools, lib.loc = "~/R/x86_64-pc-linux-gnu-library/3.4")
Error: package or namespace load failed for 'devtools':
 .onLoad failed in loadNamespace() for 'devtools', details:
  call: loadNamespace(name)
  error: there is no package called 'withr'

# A solution is to redefine .libPaths
> .libPaths(c("~/R/x86_64-pc-linux-gnu-library/3.4", .libPaths()))
> library(devtools) # Works

A better solution is to specify R_LIBS_USER in ~/.Renviron file or ~/.bash_profile; see ?Startup.

Using external data from within another package

https://logfc.wordpress.com/2017/03/02/using-external-data-from-within-another-package/

How to run R scripts from the command line

https://support.rstudio.com/hc/en-us/articles/218012917-How-to-run-R-scripts-from-the-command-line

How to exit a sourced R script

Decimal point & decimal comma

Countries using Arabic numerals with decimal comma (Austria, Belgium, Brazil France, Germany, Netherlands, Norway, South Africa, Spain, Sweden, ...) https://en.wikipedia.org/wiki/Decimal_mark

setting seed locally (not globally) in R

https://stackoverflow.com/questions/14324096/setting-seed-locally-not-globally-in-r

R's internal C API

https://github.com/hadley/r-internals

cleancall package for C resource cleanup

Resource Cleanup in C and the R API

Random number generator

#include <R.h>

void myunif(){
  GetRNGstate();
  double u = unif_rand();
  PutRNGstate();
  Rprintf("%f\n",u);
}
$ R CMD SHLIB r_rand.c
$ R
R> dyn.load("r_rand.so")
R> set.seed(1)
R> .C("myunif")
0.265509
list()
R> .C("myunif")
0.372124
list()
R> set.seed(1)
R> .C("myunif")
0.265509
list()

Test For Randomness

Different results in Mac and Linux

Random numbers: multivariate normal

Why MASS::mvrnorm() gives different result on Mac and Linux/Windows?

The reason could be the covariance matrix decomposition - and that may be due to the LAPACK/BLAS libraries. See

rle() running length encoding

citation()

citation()
citation("MASS")
toBibtex(citation())

Notes on Citing R and R Packages with examples.

R not responding request to interrupt stop process

R not responding request to interrupt stop process. R is executing (for example) a C / C++ library call that doesn't provide R an opportunity to check for interrupts. It seems to match with the case I'm running (dist() function).

Monitor memory usage

  • x <- rnorm(2^27) will create an object of the size 1GB (2^27*8/2^20=1024 MB).
  • Windows: memory.size(max=TRUE)
  • Linux
    • RStudio: htop -p PID where PID is the process ID of /usr/lib/rstudio/bin/rsession, not /usr/lib/rstudio/bin/rstudio. This is obtained by running x <- rnorm(2*1e8). The object size can be obtained through print(object.size(x), units = "auto"). Note that 1e8*8/2^20 = 762.9395.
    • R: htop -p PID where PID is the process ID of /usr/lib/R/bin/exec/R. Alternatively, use htop -p `pgrep -f /usr/lib/R/bin/exec/R`
    • To find the peak memory usage grep VmPeak /proc/$PID/status
  • mem_used() function from pryr package. It is not correct or useful if I use it to check the value compared to the memory returned by jobload in biowulf. So I cannot use it to see the memory used in running mclapply().
  • peakRAM: Monitor the Total and Peak RAM Used by an Expression or Function
  • Error: protect () : protection stack overflow and ?Memory

References:

Monitor Data

Monitoring Data in R with the lumberjack Package

Pushover

Monitoring Website SSL/TLS Certificate Expiration Times with R, {openssl}, {pushoverr}, and {DT}

pushoverr

Resource

Books

  • Efficient R programming by Colin Gillespie and Robin Lovelace. It works to re-create the html version of the book if we follow their simple instruction in the Appendix. Note that pdf version has advantages of expected output (mathematical notations, tables) over the epub version.
    # R 3.4.1
    .libPaths(c("~/R/x86_64-pc-linux-gnu-library/3.4", .libPaths()))
    setwd("/tmp/efficientR/")
    bookdown::render_book("index.Rmd", output_format = "bookdown::pdf_book")
    # generated pdf file is located _book/_main.pdf
    
    bookdown::render_book("index.Rmd", output_format = "bookdown::epub_book")
    # generated epub file is located _book/_main.epub.
    # This cannot be done in RStudio ("parse_dt" not resolved from current namespace (lubridate))
    # but it is OK to run in an R terminal
    

Videos

Webinar

useR!

R consortium

https://www.youtube.com/channel/UC_R5smHVXRYGhZYDJsnXTwg/featured

Blogs, Tips, Socials, Communities

Bug Tracking System

https://bugs.r-project.org/bugzilla3/ and Search existing bug reports. Remember to select 'All' in the Status drop-down list.

Use sessionInfo().

License

Some Notes on GNU Licenses in R Packages

Why Dash uses the mit license (and not a copyleft gpl license)

Interview questions

  • Does R store matrices in column-major order or row-major order?
    • Matrices are stored in column-major order, which means that elements are arranged and accessed by columns. This is in contrast to languages like Python, where matrices (or arrays) are typically stored in row-major order.
  • Explain the difference between == and === in R. Provide an example to illustrate their use.
    • The == operator is used for testing equality of values in R. It returns TRUE if the values on the left and right sides are equal, otherwise FALSE. The === operator does not exist in base R.
  • What is the purpose of the apply() function in R? How does it differ from the for loop?
    • The apply() function in R is used to apply a function over the margins of an array or matrix. It is often used as an alternative to loops for applying a function to each row or column of a matrix.
  • Describe the concept of factors in R. How are they used in data manipulation and analysis?
    • Factors in R are used to represent categorical data. They are an essential data type for statistical modeling and analysis. Factors store both the unique values that occur in a dataset and the corresponding integer codes used to represent those values.
  • What is the significance
of the attach() and detach() functions in R? When should they be used?
    • A: The attach() function is used to add a data frame to the search path in R, making it easier to access variables within the data frame. The detach() function is used to remove a data frame from the search path, which can help avoid naming conflicts and reduce memory usage.
  • Explain the concept of vectorization in R. How does it impact the performance of R code?
    • Vectorization in R refers to the ability to apply operations to entire vectors or arrays at once, without needing to write explicit loops. This can significantly improve the performance of R code, as it allows operations to be performed in a more efficient, vectorized manner by taking advantage of R's underlying C code.
  • Describe the difference between data.frame and matrix in R. When would you use one over the other?
    • A data.frame in R is a two-dimensional structure that can store different types of data (e.g., numeric, character, factor) in its columns. It is similar to a table in a database.
    • A matrix in R is also a two-dimensional structure, but it can only store elements of the same data type. It is more like a mathematical matrix.
    • You would use a data.frame when you have heterogeneous data (i.e., different types of data) and need to work with it as a dataset. You would use a matrix when you have homogeneous data (i.e., the same type of data) and need to perform matrix operations.
  • What are the benefits of using the dplyr package in R for data manipulation? Provide an example of how you would use dplyr to filter a data frame.
    • The dplyr package provides a set of functions that make it easier to manipulate data frames in R.
    • It uses a syntax that is easy to read and understand, making complex data manipulations more intuitive.
    • To filter a data frame using dplyr, you can use the filter() function. For example, filter(df, column_name == value) would filter df to include only rows where column_name is equal to value.