R: Difference between revisions

From 太極
Jump to navigation Jump to search
No edit summary
Line 1: Line 1:
== Install and upgrade R ==
= Install and upgrade R =
[[Install_R|Here]]
[[Install_R|Here]]


== Online Editor ==
= Online Editor =
We can run R on web browsers without installing it on local machines (similar to [/ideone.com Ideone.com] for C++. It does not require an account either (cf RStudio).  
We can run R on web browsers without installing it on local machines (similar to [/ideone.com Ideone.com] for C++. It does not require an account either (cf RStudio).  


=== [https://rdrr.io/snippets/ rdrr.io] ===
== [https://rdrr.io/snippets/ rdrr.io] ==
It can produce graphics too. The package I am testing ([https://www.rdocumentation.org/packages/cobs/versions/1.3-3/topics/cobs cobs]) is available too.
It can produce graphics too. The package I am testing ([https://www.rdocumentation.org/packages/cobs/versions/1.3-3/topics/cobs cobs]) is available too.


=== rstudio.cloud ===
== rstudio.cloud ==


=== [https://www.rdocumentation.org/ RDocumentation] ===
== [https://www.rdocumentation.org/ RDocumentation] ==
The interactive engine is based on [https://github.com/datacamp/datacamp-light DataCamp Light]
The interactive engine is based on [https://github.com/datacamp/datacamp-light DataCamp Light]


Line 21: Line 21:
The packages on RDocumentation may be outdated. For example, the current stringr on CRAN is v1.2.0 (2/18/2017) but RDocumentation has v1.1.0 (8/19/2016).
The packages on RDocumentation may be outdated. For example, the current stringr on CRAN is v1.2.0 (2/18/2017) but RDocumentation has v1.1.0 (8/19/2016).


== Web Applications ==
= Web Applications =


See also CRAN Task View: [http://cran.r-project.org/web/views/WebTechnologies.html Web Technologies and Services]
See also CRAN Task View: [http://cran.r-project.org/web/views/WebTechnologies.html Web Technologies and Services]


=== TexLive ===
== TexLive ==
TexLive can be installed by 2 ways
TexLive can be installed by 2 ways
* '''sudo apt install texlive''' It includes '''tlmgr''' utility for package manager.
* '''sudo apt install texlive''' It includes '''tlmgr''' utility for package manager.
* [http://tug.org/texlive/ Official website]  
* [http://tug.org/texlive/ Official website]  


==== texlive-latex-extra ====
=== texlive-latex-extra ===
https://packages.debian.org/sid/texlive-latex-extra
https://packages.debian.org/sid/texlive-latex-extra


For example, framed and titling packages are included.
For example, framed and titling packages are included.


==== tlmgr - TeX Live package manager ====
=== tlmgr - TeX Live package manager ===
https://www.tug.org/texlive/tlmgr.html
https://www.tug.org/texlive/tlmgr.html


=== [https://yihui.name/tinytex/ TinyTex] ===
== [https://yihui.name/tinytex/ TinyTex] ==
https://github.com/yihui/tinytex
https://github.com/yihui/tinytex


=== Rmarkdown: create HTML5 web, slides and more ===
== Rmarkdown: create HTML5 web, slides and more ==
[[Rmarkdown]]
[[Rmarkdown]]


=== [http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol HTTP protocol] ===
== [http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol HTTP protocol] ==


* http://en.wikipedia.org/wiki/File:Http_request_telnet_ubuntu.png
* http://en.wikipedia.org/wiki/File:Http_request_telnet_ubuntu.png
Line 66: Line 66:
It gets more difficult depending on how much of HTTP you want to support - POST is a little more complicated, scripts, handling multiple requests, etc.
It gets more difficult depending on how much of HTTP you want to support - POST is a little more complicated, scripts, handling multiple requests, etc.


==== Example in R ====
=== Example in R ===
<syntaxhighlight lang='r'>
<syntaxhighlight lang='r'>
> co <- socketConnection(port=8080, server=TRUE, blocking=TRUE)  
> co <- socketConnection(port=8080, server=TRUE, blocking=TRUE)  
Line 88: Line 88:
</syntaxhighlight>
</syntaxhighlight>


==== Example in C ([http://blog.abhijeetr.com/2010/04/very-simple-http-server-writen-in-c.html Very simple http server written in C], 187 lines) ====
=== Example in C ([http://blog.abhijeetr.com/2010/04/very-simple-http-server-writen-in-c.html Very simple http server written in C], 187 lines) ===


Create a simple hello world html page and save it as <[http://en.wikipedia.org/wiki/List_of_Hello_world_program_examples#H index.html]> in the current directory (/home/brb/Downloads/)
Create a simple hello world html page and save it as <[http://en.wikipedia.org/wiki/List_of_Hello_world_program_examples#H index.html]> in the current directory (/home/brb/Downloads/)
Line 133: Line 133:




==== Another Example in C (55 lines) ====
=== Another Example in C (55 lines) ===
http://mwaidyanatha.blogspot.com/2011/05/writing-simple-web-server-in-c.html  
http://mwaidyanatha.blogspot.com/2011/05/writing-simple-web-server-in-c.html  


Line 168: Line 168:
See also more examples under [[C#Socket_Programming_Examples_using_C.2FC.2B.2B.2FQt|C page]].
See also more examples under [[C#Socket_Programming_Examples_using_C.2FC.2B.2B.2FQt|C page]].


==== Others  ====
=== Others  ===
* http://rosettacode.org/wiki/Hello_world/ (Different languages)
* http://rosettacode.org/wiki/Hello_world/ (Different languages)
* http://kperisetla.blogspot.com/2012/07/simple-http-web-server-in-c.html (Windows web server)
* http://kperisetla.blogspot.com/2012/07/simple-http-web-server-in-c.html (Windows web server)
Line 175: Line 175:
* https://github.com/davidmoreno/onion
* https://github.com/davidmoreno/onion


=== shiny ===
== shiny ==
See [[Shiny|Shiny]].
See [[Shiny|Shiny]].


=== [https://www.rplumber.io/ plumber]: Turning your R code into a RESTful Web API ===
== [https://www.rplumber.io/ plumber]: Turning your R code into a RESTful Web API ==
* https://github.com/trestletech/plumber
* https://github.com/trestletech/plumber
* https://www.rstudio.com/resources/videos/plumber-turning-your-r-code-into-an-api/
* https://www.rstudio.com/resources/videos/plumber-turning-your-r-code-into-an-api/
Line 184: Line 184:
* [https://medium.com/@skyetetra/using-docker-to-deploy-an-r-plumber-api-863ccf91516d Using docker to deploy an R plumber API]
* [https://medium.com/@skyetetra/using-docker-to-deploy-an-r-plumber-api-863ccf91516d Using docker to deploy an R plumber API]


=== Docker ===
== Docker ==
* There are two major Docker images. They include gcc, gfortran, .... So it can be used to install Rcpp package for example.
* There are two major Docker images. They include gcc, gfortran, .... So it can be used to install Rcpp package for example.
** [https://hub.docker.com/_/r-base/?tab=tags Official] which supports version tags. The official Docker image's dockerfile still points to rocker/r-base.  
** [https://hub.docker.com/_/r-base/?tab=tags Official] which supports version tags. The official Docker image's dockerfile still points to rocker/r-base.  
Line 202: Line 202:
* [https://www.statworx.com/de/blog/running-your-r-script-in-docker/ Running your R script in Docker]. Goal: containerizing an R script to eventually execute it automatically each time the container is started, without any user interaction. An enhanced version of the instruction is at [https://github.com/arraytools/RinDocker this page].
* [https://www.statworx.com/de/blog/running-your-r-script-in-docker/ Running your R script in Docker]. Goal: containerizing an R script to eventually execute it automatically each time the container is started, without any user interaction. An enhanced version of the instruction is at [https://github.com/arraytools/RinDocker this page].


=== [http://cran.r-project.org/web/packages/httpuv/index.html httpuv] ===
== [http://cran.r-project.org/web/packages/httpuv/index.html httpuv] ==
http and WebSocket library.
http and WebSocket library.


See also the [https://cran.r-project.org/web/packages/servr/index.html servr] package which can start an HTTP server in R to serve static files, or dynamic documents that can be converted to HTML files (e.g., R Markdown) under a given directory.
See also the [https://cran.r-project.org/web/packages/servr/index.html servr] package which can start an HTTP server in R to serve static files, or dynamic documents that can be converted to HTML files (e.g., R Markdown) under a given directory.


=== [http://rapache.net/ RApache] ===
== [http://rapache.net/ RApache] ==


=== [http://cran.r-project.org/web/packages/gWidgetsWWW/index.html gWidgetsWWW] ===
== [http://cran.r-project.org/web/packages/gWidgetsWWW/index.html gWidgetsWWW] ==


* http://www.jstatsoft.org/v49/i10/paper
* http://www.jstatsoft.org/v49/i10/paper
Line 215: Line 215:
* [http://www.r-statistics.com/2012/11/comparing-shiny-with-gwidgetswww2-rapache/ Compare shiny with gWidgetsWWW2.rapache]
* [http://www.r-statistics.com/2012/11/comparing-shiny-with-gwidgetswww2-rapache/ Compare shiny with gWidgetsWWW2.rapache]


=== [http://cran.r-project.org/web/packages/Rook/index.html Rook] ===
== [http://cran.r-project.org/web/packages/Rook/index.html Rook] ==


Since R 2.13, the internal web server was exposed.
Since R 2.13, the internal web server was exposed.
Line 311: Line 311:
* [https://rud.is/b/2016/07/05/a-simple-prediction-web-service-using-the-new-firery-package/ A Simple Prediction Web Service Using the New fiery Package]
* [https://rud.is/b/2016/07/05/a-simple-prediction-web-service-using-the-new-firery-package/ A Simple Prediction Web Service Using the New fiery Package]


=== [https://code.google.com/p/sumo/ sumo] ===
== [https://code.google.com/p/sumo/ sumo] ==
Sumo is a fully-functional web application template that exposes an authenticated user's R session within java server pages. See the paper http://journal.r-project.org/archive/2012-1/RJournal_2012-1_Bergsma+Smith.pdf.
Sumo is a fully-functional web application template that exposes an authenticated user's R session within java server pages. See the paper http://journal.r-project.org/archive/2012-1/RJournal_2012-1_Bergsma+Smith.pdf.


=== [http://www.stat.ucla.edu/~jeroen/stockplot Stockplot] ===
== [http://www.stat.ucla.edu/~jeroen/stockplot Stockplot] ==


=== [http://www.rforge.net/FastRWeb/ FastRWeb] ===
== [http://www.rforge.net/FastRWeb/ FastRWeb] ==
http://cran.r-project.org/web/packages/FastRWeb/index.html
http://cran.r-project.org/web/packages/FastRWeb/index.html


=== WebDriver ===
== WebDriver ==
'WebDriver' Client for 'PhantomJS'  
'WebDriver' Client for 'PhantomJS'  


https://github.com/rstudio/webdriver
https://github.com/rstudio/webdriver


=== [http://sysbio.mrc-bsu.cam.ac.uk/Rwui/tutorial/Instructions.html Rwui] ===
== [http://sysbio.mrc-bsu.cam.ac.uk/Rwui/tutorial/Instructions.html Rwui] ==


=== [http://cran.r-project.org/web/packages/CGIwithR/index.html CGHWithR] and [http://cran.r-project.org/web/packages/WebDevelopR/ WebDevelopR] ===
== [http://cran.r-project.org/web/packages/CGIwithR/index.html CGHWithR] and [http://cran.r-project.org/web/packages/WebDevelopR/ WebDevelopR] ==
CGHwithR is still working with old version of R although it is removed from CRAN. Its successor is WebDevelopR. Its The vignette (year 2013) provides a review of several available methods.
CGHwithR is still working with old version of R although it is removed from CRAN. Its successor is WebDevelopR. Its The vignette (year 2013) provides a review of several available methods.


=== [http://www.rstudio.com/ide/docs/advanced/manipulate manipulate] from RStudio ===
== [http://www.rstudio.com/ide/docs/advanced/manipulate manipulate] from RStudio ==
This is not a web application. But the '''manipulate''' package can be used to create interactive plot within R(Studio) environment easily. Its source is available at [https://github.com/rstudio/rstudio/tree/master/src/cpp/r/R/packages/manipulate here].
This is not a web application. But the '''manipulate''' package can be used to create interactive plot within R(Studio) environment easily. Its source is available at [https://github.com/rstudio/rstudio/tree/master/src/cpp/r/R/packages/manipulate here].


Mathematica also has manipulate function for plotting; see [http://reference.wolfram.com/mathematica/tutorial/IntroductionToManipulate.html here].
Mathematica also has manipulate function for plotting; see [http://reference.wolfram.com/mathematica/tutorial/IntroductionToManipulate.html here].


=== [https://github.com/att/rcloud RCloud] ===
== [https://github.com/att/rcloud RCloud] ==
RCloud is an environment for collaboratively creating and sharing data analysis scripts. RCloud lets you mix analysis code in R, HTML5, Markdown, Python, and others. Much like Sage, iPython notebooks and Mathematica, RCloud provides a notebook interface that lets you easily record a session and annotate it with text, equations, and supporting images.
RCloud is an environment for collaboratively creating and sharing data analysis scripts. RCloud lets you mix analysis code in R, HTML5, Markdown, Python, and others. Much like Sage, iPython notebooks and Mathematica, RCloud provides a notebook interface that lets you easily record a session and annotate it with text, equations, and supporting images.


See also the [http://user2014.stat.ucla.edu/abstracts/talks/193_Harner.pdf Talk] in UseR 2014.
See also the [http://user2014.stat.ucla.edu/abstracts/talks/193_Harner.pdf Talk] in UseR 2014.


=== [https://github.com/cloudyr cloudyr] and [https://github.com/socialcopsdev/flyio flyio] - Input Output Files in R from Cloud or Local ===
== [https://github.com/cloudyr cloudyr] and [https://github.com/socialcopsdev/flyio flyio] - Input Output Files in R from Cloud or Local ==
https://blog.socialcops.com/inside-sc/announcements/flyio-r-package-interact-data-cloud/ Announcing flyio, an R Package to Interact with Data in the Cloud]
https://blog.socialcops.com/inside-sc/announcements/flyio-r-package-interact-data-cloud/ Announcing flyio, an R Package to Interact with Data in the Cloud]


=== Dropbox access ===
== Dropbox access ==
[https://cran.r-project.org/web/packages/rdrop2/index.html rdrop2] package
[https://cran.r-project.org/web/packages/rdrop2/index.html rdrop2] package


=== Web page scraping ===
== Web page scraping ==
http://www.slideshare.net/schamber/web-data-from-r#btnNext
http://www.slideshare.net/schamber/web-data-from-r#btnNext


==== [https://cran.r-project.org/web/packages/xml2/ xml2] package ====
=== [https://cran.r-project.org/web/packages/xml2/ xml2] package ===
rvest package depends on xml2.
rvest package depends on xml2.


==== [https://cran.r-project.org/web/packages/purrr/index.html purrr]: Functional Programming Tools ====
=== [https://cran.r-project.org/web/packages/purrr/index.html purrr]: Functional Programming Tools ===
* https://purrr.tidyverse.org/
* https://purrr.tidyverse.org/
* [http://colinfay.me/purrr-cookbook/ purrr cookbook]
* [http://colinfay.me/purrr-cookbook/ purrr cookbook]
Line 360: Line 360:
* [http://staff.math.su.se/hoehle/blog/2019/01/04/mathgenius.html Purr yourself into a math genius]
* [http://staff.math.su.se/hoehle/blog/2019/01/04/mathgenius.html Purr yourself into a math genius]


==== [https://cran.r-project.org/web/packages/rvest/index.html rvest] ====
=== [https://cran.r-project.org/web/packages/rvest/index.html rvest] ===
[http://blog.rstudio.org/2014/11/24/rvest-easy-web-scraping-with-r/ Easy web scraping with R]
[http://blog.rstudio.org/2014/11/24/rvest-easy-web-scraping-with-r/ Easy web scraping with R]


Line 377: Line 377:
* [https://datascienceplus.com/google-scholar-scraping-with-rvest/ Google scholar scraping with rvest package]
* [https://datascienceplus.com/google-scholar-scraping-with-rvest/ Google scholar scraping with rvest package]


==== Animate ====
=== Animate ===
* [https://guyabel.com/post/football-kits/ Animating Changes in Football Kits using R]: rvest, tidyverse, xml2, purrr & magick
* [https://guyabel.com/post/football-kits/ Animating Changes in Football Kits using R]: rvest, tidyverse, xml2, purrr & magick
* [https://guyabel.com/post/animated-directional-chord-diagrams/ Animated Directional Chord Diagrams] tweenr & magick
* [https://guyabel.com/post/animated-directional-chord-diagrams/ Animated Directional Chord Diagrams] tweenr & magick
Line 383: Line 383:
* [https://www.listendata.com/2019/05/create-animation-in-r-learn-with.html Create animation in R]: learn by examples (gganimate)
* [https://www.listendata.com/2019/05/create-animation-in-r-learn-with.html Create animation in R]: learn by examples (gganimate)


==== [https://cran.r-project.org/web/packages/V8/index.html V8]: Embedded JavaScript Engine for R ====
=== [https://cran.r-project.org/web/packages/V8/index.html V8]: Embedded JavaScript Engine for R ===
[https://rud.is/b/2017/07/25/r%E2%81%B6-general-attys-distributions/ R⁶ — General (Attys) Distributions]: V8, rvest, ggbeeswarm, hrbrthemes and tidyverse packages are used.
[https://rud.is/b/2017/07/25/r%E2%81%B6-general-attys-distributions/ R⁶ — General (Attys) Distributions]: V8, rvest, ggbeeswarm, hrbrthemes and tidyverse packages are used.


==== [http://cran.r-project.org/web/packages/pubmed.mineR/index.html pubmed.mineR] ====
=== [http://cran.r-project.org/web/packages/pubmed.mineR/index.html pubmed.mineR] ===
Text mining of PubMed Abstracts (http://www.ncbi.nlm.nih.gov/pubmed). The algorithms are designed for two formats (text and XML) from PubMed.
Text mining of PubMed Abstracts (http://www.ncbi.nlm.nih.gov/pubmed). The algorithms are designed for two formats (text and XML) from PubMed.


[https://github.com/jtleek/swfdr R code for scraping the P-values from pubmed, calculating the Science-wise False Discovery Rate, et al] (Jeff Leek)
[https://github.com/jtleek/swfdr R code for scraping the P-values from pubmed, calculating the Science-wise False Discovery Rate, et al] (Jeff Leek)


=== These R packages import sports, weather, stock data and more ===
== These R packages import sports, weather, stock data and more ==
* https://www.computerworld.com/article/3109890/data-analytics/these-r-packages-import-sports-weather-stock-data-and-more.html
* https://www.computerworld.com/article/3109890/data-analytics/these-r-packages-import-sports-weather-stock-data-and-more.html
* https://github.com/ALShum/rwunderground
* https://github.com/ALShum/rwunderground
Line 398: Line 398:
* <strike>http://ram-n.github.io/weatherData/ </strike>  (not working)
* <strike>http://ram-n.github.io/weatherData/ </strike>  (not working)


=== Diving Into Dynamic Website Content with splashr ===
== Diving Into Dynamic Website Content with splashr ==
https://rud.is/b/2017/02/09/diving-into-dynamic-website-content-with-splashr/
https://rud.is/b/2017/02/09/diving-into-dynamic-website-content-with-splashr/


=== Send email ===
== Send email ==
==== [https://github.com/rpremraj/mailR/ mailR] ====
=== [https://github.com/rpremraj/mailR/ mailR] ===
Easiest. Require rJava package (not trivial to install, see [[#RJava|rJava]]). mailR is an interface to Apache Commons Email to send emails from within R. See also [http://unamatematicaseltigre.blogspot.com/2016/12/how-to-send-bulk-email-to-your-students.html send bulk email]
Easiest. Require rJava package (not trivial to install, see [[#RJava|rJava]]). mailR is an interface to Apache Commons Email to send emails from within R. See also [http://unamatematicaseltigre.blogspot.com/2016/12/how-to-send-bulk-email-to-your-students.html send bulk email]


Line 421: Line 421:
[https://r-bar.net/mailr-smtp-webmail-starttls-tls-ssl/ MailR SMTP Setup (Gmail, Outlook, Yahoo) | STARTTLS]
[https://r-bar.net/mailr-smtp-webmail-starttls-tls-ssl/ MailR SMTP Setup (Gmail, Outlook, Yahoo) | STARTTLS]


==== [https://cran.r-project.org/web/packages/gmailr/index.html gmailr] ====
=== [https://cran.r-project.org/web/packages/gmailr/index.html gmailr] ===
More complicated. gmailr provides access the Google's gmail.com RESTful API. [https://cran.r-project.org/web/packages/gmailr/vignettes/sending_messages.html Vignette] and an example on [http://stackoverflow.com/questions/30144876/send-html-message-using-gmailr here]. Note that it does not use a password; it uses a '''json''' file for oauth authentication downloaded from https://console.cloud.google.com/. See also https://github.com/jimhester/gmailr/issues/1.
More complicated. gmailr provides access the Google's gmail.com RESTful API. [https://cran.r-project.org/web/packages/gmailr/vignettes/sending_messages.html Vignette] and an example on [http://stackoverflow.com/questions/30144876/send-html-message-using-gmailr here]. Note that it does not use a password; it uses a '''json''' file for oauth authentication downloaded from https://console.cloud.google.com/. See also https://github.com/jimhester/gmailr/issues/1.
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
Line 435: Line 435:
</syntaxhighlight>
</syntaxhighlight>


==== [https://cran.r-project.org/web/packages/sendmailR/index.html sendmailR] ====  
=== [https://cran.r-project.org/web/packages/sendmailR/index.html sendmailR] ===  
sendmailR provides a simple SMTP client. It is not clear how to use the package (i.e. where to enter the password).
sendmailR provides a simple SMTP client. It is not clear how to use the package (i.e. where to enter the password).


==== emayili ====
=== emayili ===
[https://datawookie.netlify.com/blog/2019/05/emayili-sending-email-from-r/ emayili: Sending Email from R]
[https://datawookie.netlify.com/blog/2019/05/emayili-sending-email-from-r/ emayili: Sending Email from R]


=== [http://www.ncbi.nlm.nih.gov/geo/ GEO (Gene Expression Omnibus)] ===
== [http://www.ncbi.nlm.nih.gov/geo/ GEO (Gene Expression Omnibus)] ==
See [[GEO#R_packages|this internal link]].
See [[GEO#R_packages|this internal link]].


=== Interactive html output ===
== Interactive html output ==
==== [http://cran.r-project.org/web/packages/sendplot/index.html sendplot] ====
=== [http://cran.r-project.org/web/packages/sendplot/index.html sendplot] ===
==== [http://cran.r-project.org/web/packages/RIGHT/index.html RIGHT] ====
=== [http://cran.r-project.org/web/packages/RIGHT/index.html RIGHT] ===
The supported plot types include scatterplot, barplot, box plot, line plot and pie plot.
The supported plot types include scatterplot, barplot, box plot, line plot and pie plot.


In addition to tooltip boxes, the package can create a [http://righthelp.github.io/tutorial/interactivity table showing all information about selected nodes].
In addition to tooltip boxes, the package can create a [http://righthelp.github.io/tutorial/interactivity table showing all information about selected nodes].


==== r2d3 ====
=== r2d3 ===
[https://blog.rstudio.com/2018/10/05/r2d3-r-interface-to-d3-visualizations/ r2d3] - R Interface to D3 Visualizations
[https://blog.rstudio.com/2018/10/05/r2d3-r-interface-to-d3-visualizations/ r2d3] - R Interface to D3 Visualizations


==== [http://cran.r-project.org/web/packages/d3Network/index.html d3Network] ====
=== [http://cran.r-project.org/web/packages/d3Network/index.html d3Network] ===
* http://christophergandrud.github.io/d3Network/ (old)
* http://christophergandrud.github.io/d3Network/ (old)
* https://christophergandrud.github.io/networkD3/ (new)
* https://christophergandrud.github.io/networkD3/ (new)
Line 467: Line 467:
</source>
</source>


==== [http://cran.r-project.org/web/packages/htmlwidgets/ htmlwidgets for R] ====
=== [http://cran.r-project.org/web/packages/htmlwidgets/ htmlwidgets for R] ===
Embed widgets in R Markdown documents and Shiny web applications.  
Embed widgets in R Markdown documents and Shiny web applications.  


Line 473: Line 473:
* [http://deanattali.com/blog/htmlwidgets-tips/ How to write a useful htmlwidgets in R: tips and walk-through a real example]
* [http://deanattali.com/blog/htmlwidgets-tips/ How to write a useful htmlwidgets in R: tips and walk-through a real example]


==== [http://cran.r-project.org/web/packages/networkD3/index.html networkD3] ====
=== [http://cran.r-project.org/web/packages/networkD3/index.html networkD3] ===
This is a port of Christopher Gandrud's [http://christophergandrud.github.io/d3Network/ d3Network] package to the htmlwidgets framework.
This is a port of Christopher Gandrud's [http://christophergandrud.github.io/d3Network/ d3Network] package to the htmlwidgets framework.


==== [http://cran.r-project.org/web/packages/scatterD3/index.html scatterD3] ====
=== [http://cran.r-project.org/web/packages/scatterD3/index.html scatterD3] ===
scatterD3 is an HTML R widget for interactive scatter plots visualization. It is based on the htmlwidgets R package and on the d3.js javascript library.
scatterD3 is an HTML R widget for interactive scatter plots visualization. It is based on the htmlwidgets R package and on the d3.js javascript library.


==== dygraphs ====
=== dygraphs ===
* https://github.com/rstudio/dygraphs
* https://github.com/rstudio/dygraphs
* [https://www.r-bloggers.com/visualizations-for-algorithmic-trading-in-r/ Visualizations for Algorithmic Trading in R]
* [https://www.r-bloggers.com/visualizations-for-algorithmic-trading-in-r/ Visualizations for Algorithmic Trading in R]


==== [https://github.com/bwlewis/rthreejs rthreejs] - Create interactive 3D scatter plots, network plots, and globes ====
=== [https://github.com/bwlewis/rthreejs rthreejs] - Create interactive 3D scatter plots, network plots, and globes ===
[http://bwlewis.github.io/rthreejs/ Examples]
[http://bwlewis.github.io/rthreejs/ Examples]


==== d3heatmap ====
=== d3heatmap ===
See [[Heatmap#d3heatmap|R]]
See [[Heatmap#d3heatmap|R]]


==== [https://cran.r-project.org/web/packages/svgPanZoom/index.html svgPanZoom] ====
=== [https://cran.r-project.org/web/packages/svgPanZoom/index.html svgPanZoom] ===
This 'htmlwidget' provides pan and zoom interactivity to R graphics, including 'base', 'lattice', and 'ggplot2'. The interactivity is provided through the 'svg-pan-zoom.js' library.
This 'htmlwidget' provides pan and zoom interactivity to R graphics, including 'base', 'lattice', and 'ggplot2'. The interactivity is provided through the 'svg-pan-zoom.js' library.


==== DT: An R interface to the DataTables library ====
=== DT: An R interface to the DataTables library ===
* http://blog.rstudio.org/2015/06/24/dt-an-r-interface-to-the-datatables-library/
* http://blog.rstudio.org/2015/06/24/dt-an-r-interface-to-the-datatables-library/


==== plotly ====
=== plotly ===
* [http://moderndata.plot.ly/power-curves-r-plotly-ggplot2/ Power curves] and ggplot2.
* [http://moderndata.plot.ly/power-curves-r-plotly-ggplot2/ Power curves] and ggplot2.
* [http://moderndata.plot.ly/time-series-charts-by-the-economist-in-r-using-plotly/ TIME SERIES CHARTS BY THE ECONOMIST IN R USING PLOTLY] & [https://moderndata.plot.ly/interactive-r-visualizations-with-d3-ggplot2-rstudio/ FIVE INTERACTIVE R VISUALIZATIONS WITH D3, GGPLOT2, & RSTUDIO]
* [http://moderndata.plot.ly/time-series-charts-by-the-economist-in-r-using-plotly/ TIME SERIES CHARTS BY THE ECONOMIST IN R USING PLOTLY] & [https://moderndata.plot.ly/interactive-r-visualizations-with-d3-ggplot2-rstudio/ FIVE INTERACTIVE R VISUALIZATIONS WITH D3, GGPLOT2, & RSTUDIO]
Line 508: Line 508:
* [https://blog.methodsconsultants.com/posts/introduction-to-interactive-graphics-in-r-with-plotly/ Introduction to Interactive Graphics in R with plotly]
* [https://blog.methodsconsultants.com/posts/introduction-to-interactive-graphics-in-r-with-plotly/ Introduction to Interactive Graphics in R with plotly]


=== Amazon ===
== Amazon ==
[https://github.com/56north/Rmazon Download product information and reviews from Amazon.com]
[https://github.com/56north/Rmazon Download product information and reviews from Amazon.com]
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
Line 548: Line 548:
</syntaxhighlight>
</syntaxhighlight>


=== [https://cran.r-project.org/web/packages/gutenbergr/index.html gutenbergr] ===
== [https://cran.r-project.org/web/packages/gutenbergr/index.html gutenbergr] ==
[https://blog.jumpingrivers.com/posts/2018/tidytext_edinbr_2018/ Edinbr: Text Mining with R]
[https://blog.jumpingrivers.com/posts/2018/tidytext_edinbr_2018/ Edinbr: Text Mining with R]


=== Twitter ===
== Twitter ==
[http://www.masalmon.eu/2017/03/19/facesofr/ Faces of #rstats Twitter]
[http://www.masalmon.eu/2017/03/19/facesofr/ Faces of #rstats Twitter]


=== OCR ===
== OCR ==
* [http://ropensci.org/blog/blog/2016/11/16/tesseract Tesseract package: High Quality OCR in R], [https://www.r-bloggers.com/how-to-do-optical-character-recognition-ocr-of-non-english-documents-in-r-using-tesseract/ How to do Optical Character Recognition (OCR) of non-English documents in R using Tesseract?]
* [http://ropensci.org/blog/blog/2016/11/16/tesseract Tesseract package: High Quality OCR in R], [https://www.r-bloggers.com/how-to-do-optical-character-recognition-ocr-of-non-english-documents-in-r-using-tesseract/ How to do Optical Character Recognition (OCR) of non-English documents in R using Tesseract?]
* https://cran.r-project.org/web/packages/abbyyR/index.html
* https://cran.r-project.org/web/packages/abbyyR/index.html


=== Wikipedia ===
== Wikipedia ==
[https://github.com/ironholds/wikipedir WikipediR]: R's MediaWiki API client library
[https://github.com/ironholds/wikipedir WikipediR]: R's MediaWiki API client library


== Creating local repository for CRAN and Bioconductor ==
= Creating local repository for CRAN and Bioconductor =
[[R_repository|R repository]]
[[R_repository|R repository]]


== Parallel Computing ==
= Parallel Computing =


# [http://shop.oreilly.com/product/0636920021421.do Example code] for the book Parallel R by McCallum and Weston.
# [http://shop.oreilly.com/product/0636920021421.do Example code] for the book Parallel R by McCallum and Weston.
Line 571: Line 571:
# [http://danielmarcelino.com/parallel-processing/Parallel Processing: When does it worth?]
# [http://danielmarcelino.com/parallel-processing/Parallel Processing: When does it worth?]


=== Security warning from Windows/Mac ===
== Security warning from Windows/Mac ==
It seems it is safe to choose 'Cancel' when Windows Firewall tried to block R program when we use '''makeCluster()''' to create a socket cluster.
It seems it is safe to choose 'Cancel' when Windows Firewall tried to block R program when we use '''makeCluster()''' to create a socket cluster.
<pre>
<pre>
Line 583: Line 583:
If we like to see current firewall settings, just click Windows Start button, search 'Firewall' and choose 'Windows Firewall with Advanced Security'. In the 'Inbound Rules', we can see what programs (like, R for Windows GUI front-end, or Rserve) are among the rules. These rules are called 'private' in the 'Profile' column. Note that each of them may appear twice because one is 'TCP' protocol and the other one has a 'UDP' protocol.
If we like to see current firewall settings, just click Windows Start button, search 'Firewall' and choose 'Windows Firewall with Advanced Security'. In the 'Inbound Rules', we can see what programs (like, R for Windows GUI front-end, or Rserve) are among the rules. These rules are called 'private' in the 'Profile' column. Note that each of them may appear twice because one is 'TCP' protocol and the other one has a 'UDP' protocol.


=== Detect number of cores ===
== Detect number of cores ==
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
parallel::detectCores()
parallel::detectCores()
Line 593: Line 593:
On NIH's biowulf, even I specify an interactive session with 4 cores, the parallel::detectCores() function returns 56. This number is the same as the output from the bash command '''grep processor /proc/cpuinfo''' or (better) '''lscpu'''. The '''free -hm''' also returns a full 125GB size instead of my requested size (4GB by default). The '''future::availableCores()''' fixes the problem. See [https://hpc.nih.gov/apps/R.html#parallel Biowulf's R webpage] for a detailed instructure.
On NIH's biowulf, even I specify an interactive session with 4 cores, the parallel::detectCores() function returns 56. This number is the same as the output from the bash command '''grep processor /proc/cpuinfo''' or (better) '''lscpu'''. The '''free -hm''' also returns a full 125GB size instead of my requested size (4GB by default). The '''future::availableCores()''' fixes the problem. See [https://hpc.nih.gov/apps/R.html#parallel Biowulf's R webpage] for a detailed instructure.


=== How can I print when using %dopar% ===
== How can I print when using %dopar% ==
https://stackoverflow.com/a/15078540
https://stackoverflow.com/a/15078540


Line 619: Line 619:
Setting outfile to the empty string ("") will prevent doParallel (or snow whatever) from redirecting the output, often resulting in the output from your print messages showing up on the terminal of the master process. Your foreach loop doesn't need to change at all.
Setting outfile to the empty string ("") will prevent doParallel (or snow whatever) from redirecting the output, often resulting in the output from your print messages showing up on the terminal of the master process. Your foreach loop doesn't need to change at all.


=== parallel package (including parLapply, parSapply) ===
== parallel package (including parLapply, parSapply) ==
Parallel package was included in R 2.14.0. It is derived from the snow and multicore packages and provides many of the same functions as those packages.
Parallel package was included in R 2.14.0. It is derived from the snow and multicore packages and provides many of the same functions as those packages.


Line 667: Line 667:
</syntaxhighlight>
</syntaxhighlight>


==== mclapply() from the 'parallel' package is a mult-core version of lapply() ====
=== mclapply() from the 'parallel' package is a mult-core version of lapply() ===
* Be providing the number of cores in mclapply() using '''mc.cores''' argument (2 is used by default)
* Be providing the number of cores in mclapply() using '''mc.cores''' argument (2 is used by default)
* Be careful on the need and the side-effect of using "L'Ecuyer-CMRG" seed.  
* Be careful on the need and the side-effect of using "L'Ecuyer-CMRG" seed.  
Line 703: Line 703:
</syntaxhighlight>It does work. Cut the computing time from 100 sec to 29 sec on 4 cores.
</syntaxhighlight>It does work. Cut the computing time from 100 sec to 29 sec on 4 cores.


==== mclapply() vs foreach() ====
=== mclapply() vs foreach() ===
https://stackoverflow.com/questions/44806048/r-mclapply-vs-foreach
https://stackoverflow.com/questions/44806048/r-mclapply-vs-foreach


==== parallel vs doParallel package ====
=== parallel vs doParallel package ===
doParallel is a '''foreach''' parallel adaptor for the '''parallel''' package.
doParallel is a '''foreach''' parallel adaptor for the '''parallel''' package.


Line 713: Line 713:
doParallel depends on foreach, iterators, parallel and utils packages.
doParallel depends on foreach, iterators, parallel and utils packages.


==== parallelsugar package ====
=== parallelsugar package ===
* http://edustatistics.org/nathanvan/2015/10/14/parallelsugar-an-implementation-of-mclapply-for-windows/
* http://edustatistics.org/nathanvan/2015/10/14/parallelsugar-an-implementation-of-mclapply-for-windows/


Line 738: Line 738:
</syntaxhighlight>
</syntaxhighlight>


=== [http://cran.r-project.org/web/packages/snow/index.html snow] package ===
== [http://cran.r-project.org/web/packages/snow/index.html snow] package ==


Supported cluster types are "SOCK", "PVM", "MPI", and "NWS".
Supported cluster types are "SOCK", "PVM", "MPI", and "NWS".


=== [http://cran.r-project.org/web/packages/multicore/index.html multicore] package ===
== [http://cran.r-project.org/web/packages/multicore/index.html multicore] package ==
This package is removed from CRAN.  
This package is removed from CRAN.  


Consider using package ‘parallel’ instead.
Consider using package ‘parallel’ instead.


=== [http://cran.r-project.org/web/packages/foreach/index.html foreach] package ===
== [http://cran.r-project.org/web/packages/foreach/index.html foreach] package ==
This package depends on one of the following  
This package depends on one of the following  
* doParallel - Foreach parallel adaptor for the parallel package
* doParallel - Foreach parallel adaptor for the parallel package
Line 775: Line 775:
* [http://gforge.se/2015/02/how-to-go-parallel-in-r-basics-tips/#The_foreach_package How-to go parallel in R – basics + tips]
* [http://gforge.se/2015/02/how-to-go-parallel-in-r-basics-tips/#The_foreach_package How-to go parallel in R – basics + tips]


==== combine list of lists ====
=== combine list of lists ===
* .combine argument https://stackoverflow.com/questions/27279164/output-list-of-two-rbinded-data-frames-with-foreach-in-r
* .combine argument https://stackoverflow.com/questions/27279164/output-list-of-two-rbinded-data-frames-with-foreach-in-r
* [https://stackoverflow.com/questions/9519543/merge-two-lists-in-r Merge lists] by mapply() or base::Map()
* [https://stackoverflow.com/questions/9519543/merge-two-lists-in-r Merge lists] by mapply() or base::Map()
Line 832: Line 832:
</syntaxhighlight>
</syntaxhighlight>


==== Replacing double loops ====
=== Replacing double loops ===
* https://stackoverflow.com/questions/30927693/how-can-i-parallelize-a-double-for-loop-in-r
* https://stackoverflow.com/questions/30927693/how-can-i-parallelize-a-double-for-loop-in-r
* http://www.exegetic.biz/blog/2013/08/the-wonders-of-foreach/
* http://www.exegetic.biz/blog/2013/08/the-wonders-of-foreach/
Line 860: Line 860:
Note that since the random seed (see the next session) does not work on nested loop, it is better to convert nested loop (two indices) to a single loop (one index).
Note that since the random seed (see the next session) does not work on nested loop, it is better to convert nested loop (two indices) to a single loop (one index).


==== set seed and [https://cran.r-project.org/web/packages/doRNG/ doRNG] package ====
=== set seed and [https://cran.r-project.org/web/packages/doRNG/ doRNG] package ===
* [https://cran.r-project.org/web/packages/doRNG/vignettes/doRNG.pdf Vignette], [https://www.rdocumentation.org/packages/doRNG/versions/1.7.1/topics/doRNG-package Documentation]
* [https://cran.r-project.org/web/packages/doRNG/vignettes/doRNG.pdf Vignette], [https://www.rdocumentation.org/packages/doRNG/versions/1.7.1/topics/doRNG-package Documentation]
* [http://michaeljkoontz.weebly.com/uploads/1/9/9/4/19940979/parallel.pdf#page=4 doRNG] package example
* [http://michaeljkoontz.weebly.com/uploads/1/9/9/4/19940979/parallel.pdf#page=4 doRNG] package example
Line 885: Line 885:
</syntaxhighlight>
</syntaxhighlight>


==== Export libraries, variables, functions ====
=== Export libraries, variables, functions ===
* http://stat.ethz.ch/R-manual/R-devel/library/parallel/html/clusterApply.html
* http://stat.ethz.ch/R-manual/R-devel/library/parallel/html/clusterApply.html
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
Line 896: Line 896:
</syntaxhighlight>
</syntaxhighlight>


==== Summary the result ====
=== Summary the result ===
foreach returns the result in a list. For example, if each component is a matrix we can use
foreach returns the result in a list. For example, if each component is a matrix we can use


Line 904: Line 904:
to get the average of matrices over the list.
to get the average of matrices over the list.


==== Read a list files ====
=== Read a list files ===
[https://danielmarcelino.github.io/2018/reading-list-faster.html Reading List Faster With parallel, doParallel, and pbapply]
[https://danielmarcelino.github.io/2018/reading-list-faster.html Reading List Faster With parallel, doParallel, and pbapply]


==== Don't use too many cores: Error in unserialize(socklist<nowiki>[[n]]</nowiki>) ====
=== Don't use too many cores: Error in unserialize(socklist<nowiki>[[n]]</nowiki>) ===
error reading from connection. See https://stackoverflow.com/a/24352032
error reading from connection. See https://stackoverflow.com/a/24352032


==== Why is foreach() sometimes slower than for? ====
=== Why is foreach() sometimes slower than for? ===
See https://stackoverflow.com/a/10414280 or https://stackoverflow.com/q/16963808
See https://stackoverflow.com/a/10414280 or https://stackoverflow.com/q/16963808


Line 955: Line 955:
</syntaxhighlight>
</syntaxhighlight>


==== Progress bar (doSNOW + foreach) ====
=== Progress bar (doSNOW + foreach) ===
* [https://stackoverflow.com/a/40687156 How to show the progress of code in parallel computation in R?] It looks great.
* [https://stackoverflow.com/a/40687156 How to show the progress of code in parallel computation in R?] It looks great.
* [https://blog.revolutionanalytics.com/2015/02/monitoring-progress-of-a-foreach-parallel-job.html Monitoring progress of a foreach parallel job] It also works on doSNOW but not doParallel backend.
* [https://blog.revolutionanalytics.com/2015/02/monitoring-progress-of-a-foreach-parallel-job.html Monitoring progress of a foreach parallel job] It also works on doSNOW but not doParallel backend.


=== snowfall package ===
== snowfall package ==
http://www.imbi.uni-freiburg.de/parallel/docs/Reisensburg2009_TutParallelComputing_Knaus_Porzelius.pdf
http://www.imbi.uni-freiburg.de/parallel/docs/Reisensburg2009_TutParallelComputing_Knaus_Porzelius.pdf


=== [http://cran.r-project.org/web/packages/Rmpi/index.html Rmpi] package ===
== [http://cran.r-project.org/web/packages/Rmpi/index.html Rmpi] package ==
Some examples/tutorials
Some examples/tutorials


Line 974: Line 974:
* * http://biowulf.nih.gov/apps/R.html
* * http://biowulf.nih.gov/apps/R.html


=== OpenMP ===
== OpenMP ==
* [http://www.parallelr.com/r-and-openmp-boosting-compiled-code-on-multi-core-cpu-s/ R and openMP: boosting compiled code on multi-core cpu-s] from parallelr.com.
* [http://www.parallelr.com/r-and-openmp-boosting-compiled-code-on-multi-core-cpu-s/ R and openMP: boosting compiled code on multi-core cpu-s] from parallelr.com.


=== [http://www.bioconductor.org/packages/release/bioc/html/BiocParallel.html BiocParallel] ===
== [http://www.bioconductor.org/packages/release/bioc/html/BiocParallel.html BiocParallel] ==
* [http://rpubs.com/seandavi/KallistoFromR Orchestrating a small, parallel, RNA-seq pre-processing workflow using R]
* [http://rpubs.com/seandavi/KallistoFromR Orchestrating a small, parallel, RNA-seq pre-processing workflow using R]


=== [https://cran.r-project.org/web/packages/RcppParallel/index.html RcppParallel] ===
== [https://cran.r-project.org/web/packages/RcppParallel/index.html RcppParallel] ==


=== future & [https://cran.r-project.org/web/packages/future.apply/index.html future.apply] & doFuture packages ===
== future & [https://cran.r-project.org/web/packages/future.apply/index.html future.apply] & doFuture packages ==
* [https://alexioannides.com/2016/11/02/asynchronous-and-distributed-programming-in-r-with-the-future-package/ Asynchronous and Distributed Programming in R with the Future Package]
* [https://alexioannides.com/2016/11/02/asynchronous-and-distributed-programming-in-r-with-the-future-package/ Asynchronous and Distributed Programming in R with the Future Package]
* [https://www.jottr.org/2018/06/23/future.apply_1.0.0/ Parallelize Any Base R Apply Function]
* [https://www.jottr.org/2018/06/23/future.apply_1.0.0/ Parallelize Any Base R Apply Function]
Line 988: Line 988:
* [https://blog.revolutionanalytics.com/2019/01/future-package.html Use foreach with HPC schedulers thanks to the future package]
* [https://blog.revolutionanalytics.com/2019/01/future-package.html Use foreach with HPC schedulers thanks to the future package]


=== Apache Spark ===
== Apache Spark ==
* [http://files.meetup.com/3576292/Dubravko%20Dulic%20SparkR%20June%202016.pdf Introduction to Apache Spark]
* [http://files.meetup.com/3576292/Dubravko%20Dulic%20SparkR%20June%202016.pdf Introduction to Apache Spark]


=== Microsoft R Server ===
== Microsoft R Server ==
* [http://files.meetup.com/3576292/Stefan%20Cronjaeger%20R%20Server.pdf Microsoft R '''Server'''] (not Microsoft R Open)
* [http://files.meetup.com/3576292/Stefan%20Cronjaeger%20R%20Server.pdf Microsoft R '''Server'''] (not Microsoft R Open)


=== GPU ===
== GPU ==
* [http://www.parallelr.com/r-gpu-programming-for-all-with-gpur/ GPU Programming for All with ‘gpuR] from parallelr.com. The gpuR is available on [https://cran.r-project.org/web/packages/gpuR/index.html CRAN].
* [http://www.parallelr.com/r-gpu-programming-for-all-with-gpur/ GPU Programming for All with ‘gpuR] from parallelr.com. The gpuR is available on [https://cran.r-project.org/web/packages/gpuR/index.html CRAN].
* [https://cran.r-project.org/web/packages/gputools/index.html gputools]
* [https://cran.r-project.org/web/packages/gputools/index.html gputools]


=== Threads ===
== Threads ==
* [https://cran.r-project.org/web/packages/Rdsm/index.html Rdsm] package
* [https://cran.r-project.org/web/packages/Rdsm/index.html Rdsm] package
* [https://random-remarks.net/2016/12/11/a-very-experimental-threading-in-r/ (A Very) Experimental Threading in R] and a post from [https://matloff.wordpress.com/2016/12/11/threading-in-r/ Mad Scientist]
* [https://random-remarks.net/2016/12/11/a-very-experimental-threading-in-r/ (A Very) Experimental Threading in R] and a post from [https://matloff.wordpress.com/2016/12/11/threading-in-r/ Mad Scientist]


=== Benchmark ===
== Benchmark ==
[http://rpsychologist.com/benchmark-parallel-sim Are parallel simulations in the cloud worth it? Benchmarking my MBP vs my Workstation vs Amazon EC2]
[http://rpsychologist.com/benchmark-parallel-sim Are parallel simulations in the cloud worth it? Benchmarking my MBP vs my Workstation vs Amazon EC2]


Line 1,016: Line 1,016:
</syntaxhighlight>
</syntaxhighlight>


== Cloud Computing ==
= Cloud Computing =


=== Install R on Amazon EC2 ===
== Install R on Amazon EC2 ==
http://randyzwitch.com/r-amazon-ec2/
http://randyzwitch.com/r-amazon-ec2/


=== Bioconductor on Amazon EC2 ===
== Bioconductor on Amazon EC2 ==
http://www.bioconductor.org/help/bioconductor-cloud-ami/
http://www.bioconductor.org/help/bioconductor-cloud-ami/


== Big Data Analysis ==
= Big Data Analysis =
* [https://cran.r-project.org/web/views/HighPerformanceComputing.html CRAN Task View: High-Performance and Parallel Computing with R]
* [https://cran.r-project.org/web/views/HighPerformanceComputing.html CRAN Task View: High-Performance and Parallel Computing with R]
* [http://www.xmind.net/m/LKF2/ R for big data] in one picture
* [http://www.xmind.net/m/LKF2/ R for big data] in one picture
Line 1,033: Line 1,033:
** [https://olcf.ornl.gov/wp-content/uploads/2016/01/pbdr.pdf Programming with Big Data in R - pbdR] George Ostrouchov and Mike Matheson Oak Ridge National Laboratory
** [https://olcf.ornl.gov/wp-content/uploads/2016/01/pbdr.pdf Programming with Big Data in R - pbdR] George Ostrouchov and Mike Matheson Oak Ridge National Laboratory


=== bigmemory, biganalytics, bigtabulate ===
== bigmemory, biganalytics, bigtabulate ==


=== ff, ffbase ===
== ff, ffbase ==
* tapply does not work. [https://stackoverflow.com/questions/16470677/using-tapply-ave-functions-for-ff-vectors-in-r Using tapply, ave functions for ff vectors in R]
* tapply does not work. [https://stackoverflow.com/questions/16470677/using-tapply-ave-functions-for-ff-vectors-in-r Using tapply, ave functions for ff vectors in R]
* [http://www.bnosac.be/index.php/blog/12-popularity-bigdata-large-data-packages-in-r-and-ffbase-user-presentation Popularity bigdata / large data packages in R and ffbase useR presentation]
* [http://www.bnosac.be/index.php/blog/12-popularity-bigdata-large-data-packages-in-r-and-ffbase-user-presentation Popularity bigdata / large data packages in R and ffbase useR presentation]
Line 1,041: Line 1,041:
* [https://www.rdocumentation.org/packages/ffbase/versions/0.12.7/topics/ffbase-package ffbase] package
* [https://www.rdocumentation.org/packages/ffbase/versions/0.12.7/topics/ffbase-package ffbase] package


=== biglm ===
== biglm ==


=== data.table ===
== data.table ==
See [[#data.table_2|data.table]].
See [[#data.table_2|data.table]].


== Reproducible Research ==
= Reproducible Research =
http://cran.r-project.org/web/views/ReproducibleResearch.html
http://cran.r-project.org/web/views/ReproducibleResearch.html


=== Reproducible Environments ===
== Reproducible Environments ==
https://rviews.rstudio.com/2019/04/22/reproducible-environments/
https://rviews.rstudio.com/2019/04/22/reproducible-environments/


== Useful R packages ==
= Useful R packages =
* [https://support.rstudio.com/hc/en-us/articles/201057987-Quick-list-of-useful-R-packages Quick list of useful R packages]
* [https://support.rstudio.com/hc/en-us/articles/201057987-Quick-list-of-useful-R-packages Quick list of useful R packages]
* [https://github.com/qinwf/awesome-R awesome-R]
* [https://github.com/qinwf/awesome-R awesome-R]
* [https://stevenmortimer.com/one-r-package-a-day/ One R package a day]
* [https://stevenmortimer.com/one-r-package-a-day/ One R package a day]


=== RInside ===
== RInside ==
* http://dirk.eddelbuettel.com/code/rinside.html
* http://dirk.eddelbuettel.com/code/rinside.html
* http://dirk.eddelbuettel.com/papers/rfinance2010_rcpp_rinside_tutorial_handout.pdf
* http://dirk.eddelbuettel.com/papers/rfinance2010_rcpp_rinside_tutorial_handout.pdf


==== Ubuntu ====
=== Ubuntu ===
With RInside, R can be embedded in a graphical application. For example, $HOME/R/x86_64-pc-linux-gnu-library/3.0/RInside/examples/qt directory includes source code of a Qt application to show a kernel density plot with various options like kernel functions, bandwidth and an R command text box to generate the random data. See my demo on [http://www.youtube.com/watch?v=UQ8yKQcPTg0 Youtube]. I have tested this '''qtdensity''' example successfully using Qt 4.8.5.  
With RInside, R can be embedded in a graphical application. For example, $HOME/R/x86_64-pc-linux-gnu-library/3.0/RInside/examples/qt directory includes source code of a Qt application to show a kernel density plot with various options like kernel functions, bandwidth and an R command text box to generate the random data. See my demo on [http://www.youtube.com/watch?v=UQ8yKQcPTg0 Youtube]. I have tested this '''qtdensity''' example successfully using Qt 4.8.5.  
# Follow the instruction [[#cairoDevice|cairoDevice]] to install required libraries for cairoDevice package and then cairoDevice itself.
# Follow the instruction [[#cairoDevice|cairoDevice]] to install required libraries for cairoDevice package and then cairoDevice itself.
Line 1,077: Line 1,077:
Then we can go to the browser's address bar and type ''http://localhost:8080'' to see how it works (a screenshot is in [http://dirk.eddelbuettel.com/blog/2011/11/30/ here]).
Then we can go to the browser's address bar and type ''http://localhost:8080'' to see how it works (a screenshot is in [http://dirk.eddelbuettel.com/blog/2011/11/30/ here]).


==== Windows 7 ====
=== Windows 7 ===
To make RInside works on Windows OS, try the following  
To make RInside works on Windows OS, try the following  
# Make sure R is installed under '''C:\''' instead of '''C:\Program Files''' if we don't want to get an error like ''g++.exe: error: Files/R/R-3.0.1/library/RInside/include: No such file or directory''.
# Make sure R is installed under '''C:\''' instead of '''C:\Program Files''' if we don't want to get an error like ''g++.exe: error: Files/R/R-3.0.1/library/RInside/include: No such file or directory''.
Line 1,108: Line 1,108:
So the Qt and Wt web tool applications on Windows may or may not be possible.
So the Qt and Wt web tool applications on Windows may or may not be possible.


=== GUI ===
== GUI ==
==== Qt and R ====
=== Qt and R ===
* http://cran.r-project.org/web/packages/qtbase/index.html [https://stat.ethz.ch/pipermail/r-devel/2015-July/071495.html QtDesigner is such a tool, and its output is compatible with the qtbase R package]
* http://cran.r-project.org/web/packages/qtbase/index.html [https://stat.ethz.ch/pipermail/r-devel/2015-July/071495.html QtDesigner is such a tool, and its output is compatible with the qtbase R package]
* http://qtinterfaces.r-forge.r-project.org
* http://qtinterfaces.r-forge.r-project.org


=== tkrplot ===
== tkrplot ==
On Ubuntu, we need to install tk packages, such as by
On Ubuntu, we need to install tk packages, such as by
<pre>
<pre>
Line 1,119: Line 1,119:
</pre>
</pre>


=== reticulate - Interface to 'Python' ===
== reticulate - Interface to 'Python' ==
* https://cran.r-project.org/web/packages/reticulate/index.html, [https://github.com/rstudio/reticulate Github]
* https://cran.r-project.org/web/packages/reticulate/index.html, [https://github.com/rstudio/reticulate Github]
** Using Python in R markdown
** Using Python in R markdown
Line 1,177: Line 1,177:
</pre>
</pre>


=== Hadoop (eg ~100 terabytes) ===
== Hadoop (eg ~100 terabytes) ==
See also [http://cran.r-project.org/web/views/HighPerformanceComputing.html HighPerformanceComputing]
See also [http://cran.r-project.org/web/views/HighPerformanceComputing.html HighPerformanceComputing]


Line 1,189: Line 1,189:
* http://www.r-bloggers.com/search/hadoop
* http://www.r-bloggers.com/search/hadoop


==== [https://github.com/RevolutionAnalytics/RHadoop/wiki RHadoop] ====
=== [https://github.com/RevolutionAnalytics/RHadoop/wiki RHadoop] ===
* [http://www.rdatamining.com/tutorials/r-hadoop-setup-guide RDataMining.com] based on Mac.
* [http://www.rdatamining.com/tutorials/r-hadoop-setup-guide RDataMining.com] based on Mac.
* Ubuntu 12.04 - [http://crishantha.com/wp/?p=1414 Crishantha.com], [http://nikhilshah123sh.blogspot.com/2014/03/setting-up-rhadoop-in-ubuntu-1204.html nikhilshah123sh.blogspot.com].[http://bighadoop.wordpress.com/2013/02/25/r-and-hadoop-data-analysis-rhadoop/ Bighadoop.wordpress] contains an example.
* Ubuntu 12.04 - [http://crishantha.com/wp/?p=1414 Crishantha.com], [http://nikhilshah123sh.blogspot.com/2014/03/setting-up-rhadoop-in-ubuntu-1204.html nikhilshah123sh.blogspot.com].[http://bighadoop.wordpress.com/2013/02/25/r-and-hadoop-data-analysis-rhadoop/ Bighadoop.wordpress] contains an example.
Line 1,196: Line 1,196:
* [http://bigd8ta.com/step-by-step-guide-to-setting-up-an-r-hadoop-system/ Bigd8ta.com] based on Ubuntu 14.04.
* [http://bigd8ta.com/step-by-step-guide-to-setting-up-an-r-hadoop-system/ Bigd8ta.com] based on Ubuntu 14.04.


==== Snowdoop: an alternative to MapReduce algorithm ====
=== Snowdoop: an alternative to MapReduce algorithm ===
* http://matloff.wordpress.com/2014/11/26/how-about-a-snowdoop-package/
* http://matloff.wordpress.com/2014/11/26/how-about-a-snowdoop-package/
* http://matloff.wordpress.com/2014/12/26/snowdooppartools-update/comment-page-1/#comment-665
* http://matloff.wordpress.com/2014/12/26/snowdooppartools-update/comment-page-1/#comment-665


=== [http://cran.r-project.org/web/packages/XML/index.html XML] ===
== [http://cran.r-project.org/web/packages/XML/index.html XML] ==
On Ubuntu, we need to install libxml2-dev before we can install XML package.
On Ubuntu, we need to install libxml2-dev before we can install XML package.
<pre>
<pre>
Line 1,212: Line 1,212:
</pre>
</pre>


==== XML ====
=== XML ===
* http://giventhedata.blogspot.com/2012/06/r-and-web-for-beginners-part-ii-xml-in.html. It gave an example of extracting the XML-values from each XML-tag for all nodes and save them in a data frame using '''xmlSApply()'''.
* http://giventhedata.blogspot.com/2012/06/r-and-web-for-beginners-part-ii-xml-in.html. It gave an example of extracting the XML-values from each XML-tag for all nodes and save them in a data frame using '''xmlSApply()'''.
* http://www.quantumforest.com/2011/10/reading-html-pages-in-r-for-text-processing/
* http://www.quantumforest.com/2011/10/reading-html-pages-in-r-for-text-processing/
Line 1,267: Line 1,267:
This method can be used to monitor new tags/releases from some projects like [https://github.com/Ultimaker/Cura/releases Cura], BWA, Picard, [https://github.com/alexdobin/STAR/releases STAR]. But for some projects like [https://github.com/ncbi/sra-tools sratools] the '''class''' attribute in the '''span''' element ("css-truncate-target") can be different (such as "tag-name").
This method can be used to monitor new tags/releases from some projects like [https://github.com/Ultimaker/Cura/releases Cura], BWA, Picard, [https://github.com/alexdobin/STAR/releases STAR]. But for some projects like [https://github.com/ncbi/sra-tools sratools] the '''class''' attribute in the '''span''' element ("css-truncate-target") can be different (such as "tag-name").


==== xmlview ====
=== xmlview ===
* http://rud.is/b/2016/01/13/cobble-xpath-interactively-with-the-xmlview-package/
* http://rud.is/b/2016/01/13/cobble-xpath-interactively-with-the-xmlview-package/


=== RCurl ===
== RCurl ==
On Ubuntu, we need to install the packages (the first one is for XML package that RCurl suggests)
On Ubuntu, we need to install the packages (the first one is for XML package that RCurl suggests)
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
Line 1,278: Line 1,278:
</syntaxhighlight>
</syntaxhighlight>


==== Scrape google scholar results ====
=== Scrape google scholar results ===
https://github.com/tonybreyal/Blog-Reference-Functions/blob/master/R/googleScholarXScraper/googleScholarXScraper.R
https://github.com/tonybreyal/Blog-Reference-Functions/blob/master/R/googleScholarXScraper/googleScholarXScraper.R


Line 1,289: Line 1,289:
</pre>
</pre>


==== [https://cran.r-project.org/web/packages/devtools/index.html devtools] ====
=== [https://cran.r-project.org/web/packages/devtools/index.html devtools] ===
'''devtools''' package depends on Curl. It actually depends on some system files. If we just need to install a package, consider the [[#remotes|remotes]] package which was suggested by the [https://cran.r-project.org/web/packages/BiocManager/index.html BiocManager] package.
'''devtools''' package depends on Curl. It actually depends on some system files. If we just need to install a package, consider the [[#remotes|remotes]] package which was suggested by the [https://cran.r-project.org/web/packages/BiocManager/index.html BiocManager] package.
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
Line 1,301: Line 1,301:
[https://github.com/wch/movies/issues/3 Lazy-load database XXX is corrupt. internal error -3]. It often happens when you use install_github to install a package that's currently loaded; try restarting R and running the app again.
[https://github.com/wch/movies/issues/3 Lazy-load database XXX is corrupt. internal error -3]. It often happens when you use install_github to install a package that's currently loaded; try restarting R and running the app again.


==== [https://github.com/hadley/httr httr] ====
=== [https://github.com/hadley/httr httr] ===
httr imports curl, jsonlite, mime, openssl and R6 packages.
httr imports curl, jsonlite, mime, openssl and R6 packages.


Line 1,324: Line 1,324:
Since httr package was used in many other packages, take a look at how others use it. For example, [https://github.com/ropensci/aRxiv aRxiv] package.
Since httr package was used in many other packages, take a look at how others use it. For example, [https://github.com/ropensci/aRxiv aRxiv] package.


==== [http://cran.r-project.org/web/packages/curl/ curl] ====
=== [http://cran.r-project.org/web/packages/curl/ curl] ===
curl is independent of RCurl package.
curl is independent of RCurl package.


Line 1,340: Line 1,340:
</syntaxhighlight>
</syntaxhighlight>


==== [http://ropensci.org/packages/index.html rOpenSci] packages ====
=== [http://ropensci.org/packages/index.html rOpenSci] packages ===
'''rOpenSci''' contains packages that allow access to data repositories through the R statistical programming environment
'''rOpenSci''' contains packages that allow access to data repositories through the R statistical programming environment


=== [https://cran.r-project.org/web/packages/remotes/index.html remotes] ===
== [https://cran.r-project.org/web/packages/remotes/index.html remotes] ==
Download and install R packages stored in 'GitHub', 'BitBucket', or plain 'subversion' or 'git' repositories. This package is a lightweight replacement of the 'install_*' functions in 'devtools'. Also remotes does not require any extra OS level library (at least on Ubuntu 16.04).
Download and install R packages stored in 'GitHub', 'BitBucket', or plain 'subversion' or 'git' repositories. This package is a lightweight replacement of the 'install_*' functions in 'devtools'. Also remotes does not require any extra OS level library (at least on Ubuntu 16.04).


Line 1,352: Line 1,352:
</syntaxhighlight>
</syntaxhighlight>


=== DirichletMultinomial ===
== DirichletMultinomial ==
On Ubuntu, we do
On Ubuntu, we do
<pre>
<pre>
Line 1,358: Line 1,358:
</pre>
</pre>


=== Create GUI ===
== Create GUI ==
==== [http://cran.r-project.org/web/packages/gWidgets/index.html gWidgets] ====
=== [http://cran.r-project.org/web/packages/gWidgets/index.html gWidgets] ===


=== [http://cran.r-project.org/web/packages/GenOrd/index.html GenOrd]: Generate ordinal and discrete variables with given correlation matrix and marginal distributions ===
== [http://cran.r-project.org/web/packages/GenOrd/index.html GenOrd]: Generate ordinal and discrete variables with given correlation matrix and marginal distributions ==
[http://statistical-research.com/simulating-random-multivariate-correlated-data-categorical-variables/?utm_source=rss&utm_medium=rss&utm_campaign=simulating-random-multivariate-correlated-data-categorical-variables here]
[http://statistical-research.com/simulating-random-multivariate-correlated-data-categorical-variables/?utm_source=rss&utm_medium=rss&utm_campaign=simulating-random-multivariate-correlated-data-categorical-variables here]


=== [http://cran.r-project.org/web/packages/rjson/index.html rjson] ===
== [http://cran.r-project.org/web/packages/rjson/index.html rjson] ==
http://heuristically.wordpress.com/2013/05/20/geolocate-ip-addresses-in-r/
http://heuristically.wordpress.com/2013/05/20/geolocate-ip-addresses-in-r/


=== [http://cran.r-project.org/web/packages/RJSONIO/index.html RJSONIO] ===
== [http://cran.r-project.org/web/packages/RJSONIO/index.html RJSONIO] ==
==== Accessing Bitcoin Data with R ====
=== Accessing Bitcoin Data with R ===
http://blog.revolutionanalytics.com/2015/11/accessing-bitcoin-data-with-r.html
http://blog.revolutionanalytics.com/2015/11/accessing-bitcoin-data-with-r.html


==== Plot IP on google map ====
=== Plot IP on google map ===
* http://thebiobucket.blogspot.com/2011/12/some-fun-with-googlevis-plotting-blog.html#more  (RCurl, RJONIO, plyr, googleVis)
* http://thebiobucket.blogspot.com/2011/12/some-fun-with-googlevis-plotting-blog.html#more  (RCurl, RJONIO, plyr, googleVis)
* http://devblog.icans-gmbh.com/using-the-maxmind-geoip-api-with-r/ (RCurl, RJONIO, maps)
* http://devblog.icans-gmbh.com/using-the-maxmind-geoip-api-with-r/ (RCurl, RJONIO, maps)
Line 1,427: Line 1,427:
[http://jeffreyhorner.tumblr.com/page/3 Jeffrey Horner's note about deploying Rook App].
[http://jeffreyhorner.tumblr.com/page/3 Jeffrey Horner's note about deploying Rook App].


=== Map ===
== Map ==
==== [https://rstudio.github.io/leaflet/ leaflet] ====
=== [https://rstudio.github.io/leaflet/ leaflet] ===
* rstudio.github.io/leaflet/#installation-and-use
* rstudio.github.io/leaflet/#installation-and-use
* https://metvurst.wordpress.com/2015/07/24/mapview-basic-interactive-viewing-of-spatial-data-in-r-6/
* https://metvurst.wordpress.com/2015/07/24/mapview-basic-interactive-viewing-of-spatial-data-in-r-6/


==== choroplethr ====
=== choroplethr ===
* http://blog.revolutionanalytics.com/2014/01/easy-data-maps-with-r-the-choroplethr-package-.html
* http://blog.revolutionanalytics.com/2014/01/easy-data-maps-with-r-the-choroplethr-package-.html
* http://www.arilamstein.com/blog/2015/06/25/learn-to-map-census-data-in-r/
* http://www.arilamstein.com/blog/2015/06/25/learn-to-map-census-data-in-r/
* http://www.arilamstein.com/blog/2015/09/10/user-question-how-to-add-a-state-border-to-a-zip-code-map/
* http://www.arilamstein.com/blog/2015/09/10/user-question-how-to-add-a-state-border-to-a-zip-code-map/


==== ggplot2 ====
=== ggplot2 ===
[https://randomjohn.github.io/r-maps-with-census-data/ How to make maps with Census data in R]
[https://randomjohn.github.io/r-maps-with-census-data/ How to make maps with Census data in R]


=== [http://cran.r-project.org/web/packages/googleVis/index.html googleVis] ===
== [http://cran.r-project.org/web/packages/googleVis/index.html googleVis] ==
See an example from [[R#RJSONIO|RJSONIO]] above.
See an example from [[R#RJSONIO|RJSONIO]] above.


=== [https://cran.r-project.org/web/packages/googleAuthR/index.html googleAuthR] ===
== [https://cran.r-project.org/web/packages/googleAuthR/index.html googleAuthR] ==
Create R functions that interact with OAuth2 Google APIs easily, with auto-refresh and Shiny compatibility.
Create R functions that interact with OAuth2 Google APIs easily, with auto-refresh and Shiny compatibility.


=== gtrendsR - Google Trends ===
== gtrendsR - Google Trends ==
* [http://blog.revolutionanalytics.com/2015/12/download-and-plot-google-trends-data-with-r.html Download and plot Google Trends data with R]
* [http://blog.revolutionanalytics.com/2015/12/download-and-plot-google-trends-data-with-r.html Download and plot Google Trends data with R]
* [https://datascienceplus.com/analyzing-google-trends-data-in-r/ Analyzing Google Trends Data in R]
* [https://datascienceplus.com/analyzing-google-trends-data-in-r/ Analyzing Google Trends Data in R]
Line 1,455: Line 1,455:
* [http://www.kdnuggets.com/2017/09/python-vs-r-data-science-machine-learning.html?utm_content=buffere1df7&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer Python vs R – Who Is Really Ahead in Data Science, Machine Learning?] and [https://stackoverflow.blog/2017/09/06/incredible-growth-python/ The Incredible Growth of Python] by [https://twitter.com/drob?lang=en David Robinson]
* [http://www.kdnuggets.com/2017/09/python-vs-r-data-science-machine-learning.html?utm_content=buffere1df7&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer Python vs R – Who Is Really Ahead in Data Science, Machine Learning?] and [https://stackoverflow.blog/2017/09/06/incredible-growth-python/ The Incredible Growth of Python] by [https://twitter.com/drob?lang=en David Robinson]


=== quantmod ===
== quantmod ==
[http://www.thertrader.com/2015/12/13/maintaining-a-database-of-price-files-in-r/ Maintaining a database of price files in R]. It consists of 3 steps.
[http://www.thertrader.com/2015/12/13/maintaining-a-database-of-price-files-in-r/ Maintaining a database of price files in R]. It consists of 3 steps.


Line 1,462: Line 1,462:
# Create a batch file
# Create a batch file


=== [http://cran.r-project.org/web/packages/Rcpp/index.html Rcpp] ===
== [http://cran.r-project.org/web/packages/Rcpp/index.html Rcpp] ==


* [http://lists.r-forge.r-project.org/pipermail/rcpp-devel/ Discussion archive]
* [http://lists.r-forge.r-project.org/pipermail/rcpp-devel/ Discussion archive]
Line 1,474: Line 1,474:
</syntaxhighlight>
</syntaxhighlight>


==== Speed Comparison ====
=== Speed Comparison ===
* [http://blog.revolutionanalytics.com/2015/06/a-comparison-of-high-performance-computing-techniques-in-r.html A comparison of high-performance computing techniques in R]. It compares Rcpp to an R looping operator (like mapply), a parallelized version of a looping operator (like mcmapply), explicit parallelization, via the parallel package or the ParallelR suite.
* [http://blog.revolutionanalytics.com/2015/06/a-comparison-of-high-performance-computing-techniques-in-r.html A comparison of high-performance computing techniques in R]. It compares Rcpp to an R looping operator (like mapply), a parallelized version of a looping operator (like mcmapply), explicit parallelization, via the parallel package or the ParallelR suite.
* In the following example, C++ avoids the overhead of creating an intermediate object (eg vector of the same length as the original vector). The c++ uses an intermediate scalar. So C++ wins R over memory management in this case. <syntaxhighlight lang='rsplus'>
* In the following example, C++ avoids the overhead of creating an intermediate object (eg vector of the same length as the original vector). The c++ uses an intermediate scalar. So C++ wins R over memory management in this case. <syntaxhighlight lang='rsplus'>
Line 1,529: Line 1,529:
* [https://www.enchufa2.es/archives/boost-the-speed-of-r-calls-from-rcpp.html Boost the speed of R calls from Rcpp]
* [https://www.enchufa2.es/archives/boost-the-speed-of-r-calls-from-rcpp.html Boost the speed of R calls from Rcpp]


==== Use Rcpp in RStudio ====
=== Use Rcpp in RStudio ===
RStudio makes it easy to use Rcpp package.
RStudio makes it easy to use Rcpp package.


Line 1,587: Line 1,587:
</pre>
</pre>


==== Example 1. convolution example ====
=== Example 1. convolution example ===
First, Rcpp package should be installed (I am working on Linux system). Next we try one example shipped in Rcpp package.
First, Rcpp package should be installed (I am working on Linux system). Next we try one example shipped in Rcpp package.


Line 1,610: Line 1,610:
</pre>
</pre>


==== Example 2. Use together with inline package ====
=== Example 2. Use together with inline package ===
* http://adv-r.had.co.nz/C-interface.html#calling-c-functions-from-r
* http://adv-r.had.co.nz/C-interface.html#calling-c-functions-from-r
<pre>
<pre>
Line 1,631: Line 1,631:
</pre>
</pre>


==== Example 3. Calling an R function ====
=== Example 3. Calling an R function ===


==== [http://cran.r-project.org/web/packages/RcppParallel/index.html RcppParallel] ====
=== [http://cran.r-project.org/web/packages/RcppParallel/index.html RcppParallel] ===


==== rTRNG: parallel RNG in R ====
=== rTRNG: parallel RNG in R ===
[https://mirai-solutions.ch/news/2019/06/10/rTRNG-avanced-parallel-RNG-R/ rTRNG: ADVANCED PARALLEL RNG IN R]
[https://mirai-solutions.ch/news/2019/06/10/rTRNG-avanced-parallel-RNG-R/ rTRNG: ADVANCED PARALLEL RNG IN R]


=== [http://cran.r-project.org/web/packages/caret/index.html caret] ===
== [http://cran.r-project.org/web/packages/caret/index.html caret] ==
* http://topepo.github.io/caret/index.html & https://github.com/topepo/caret/
* http://topepo.github.io/caret/index.html & https://github.com/topepo/caret/
* https://www.r-project.org/conferences/useR-2013/Tutorials/kuhn/user_caret_2up.pdf
* https://www.r-project.org/conferences/useR-2013/Tutorials/kuhn/user_caret_2up.pdf
Line 1,644: Line 1,644:
* Cheatsheet https://www.rstudio.com/resources/cheatsheets/
* Cheatsheet https://www.rstudio.com/resources/cheatsheets/


=== Tool for connecting Excel with R ===
== Tool for connecting Excel with R ==
* https://bert-toolkit.com/
* https://bert-toolkit.com/
* [http://www.thertrader.com/2016/11/30/bert-a-newcomer-in-the-r-excel-connection/ BERT: a newcomer in the R Excel connection]
* [http://www.thertrader.com/2016/11/30/bert-a-newcomer-in-the-r-excel-connection/ BERT: a newcomer in the R Excel connection]
* http://blog.revolutionanalytics.com/2018/08/how-to-use-r-with-excel.html
* http://blog.revolutionanalytics.com/2018/08/how-to-use-r-with-excel.html


=== Read/Write Excel files package ===
== Read/Write Excel files package ==
* http://www.milanor.net/blog/?p=779
* http://www.milanor.net/blog/?p=779
* [https://www.displayr.com/how-to-read-an-excel-file-into-r/?utm_medium=Feed&utm_source=Syndication flipAPI]. One useful feature of DownloadXLSX, which is not supported by the readxl package, is that it can read Excel files directly from the URL.  
* [https://www.displayr.com/how-to-read-an-excel-file-into-r/?utm_medium=Feed&utm_source=Syndication flipAPI]. One useful feature of DownloadXLSX, which is not supported by the readxl package, is that it can read Excel files directly from the URL.  
Line 1,723: Line 1,723:
</syntaxhighlight>
</syntaxhighlight>


=== [https://cran.r-project.org/web/packages/readr/ readr] (it is not designed to read Excel files) ===
== [https://cran.r-project.org/web/packages/readr/ readr] (it is not designed to read Excel files) ==


Compared to base equivalents like '''read.csv()''', '''readr''' is much faster and gives more convenient output: it never converts strings to factors, can parse date/times, and it doesn’t munge the column names.
Compared to base equivalents like '''read.csv()''', '''readr''' is much faster and gives more convenient output: it never converts strings to factors, can parse date/times, and it doesn’t munge the column names.
Line 1,733: Line 1,733:
Note that '''fread()''' can read-n a selection of the columns.
Note that '''fread()''' can read-n a selection of the columns.


=== Colors ===
== Colors ==
* [https://scales.r-lib.org/ scales] package. This is used in ggplot2 package.
* [https://scales.r-lib.org/ scales] package. This is used in ggplot2 package.
* [http://colorspace.r-forge.r-project.org/articles/colorspace.html colorspace]: A Toolbox for Manipulating and Assessing Colors and Palettes.
* [http://colorspace.r-forge.r-project.org/articles/colorspace.html colorspace]: A Toolbox for Manipulating and Assessing Colors and Palettes.
Line 1,761: Line 1,761:
[[File:GgplotPalette.svg|300px]]
[[File:GgplotPalette.svg|300px]]


=== [http://cran.r-project.org/web/packages/ggplot2/index.html ggplot2] ===
== [http://cran.r-project.org/web/packages/ggplot2/index.html ggplot2] ==
See [[Ggplot2|ggplot2]]
See [[Ggplot2|ggplot2]]


=== Data Manipulation & Tidyverse ===
== Data Manipulation & Tidyverse ==
* [https://www.tidyverse.org/ Tidyverse] Homepage
* [https://www.tidyverse.org/ Tidyverse] Homepage
* [https://www.rstudio.com/resources/webinars/pipelines-for-data-analysis-in-r/ Pipelines for data analysis in R], [https://www.rstudio.com/resources/videos/data-science-in-the-tidyverse/ Data Science in the Tidyverse]
* [https://www.rstudio.com/resources/webinars/pipelines-for-data-analysis-in-r/ Pipelines for data analysis in R], [https://www.rstudio.com/resources/videos/data-science-in-the-tidyverse/ Data Science in the Tidyverse]
Line 1,794: Line 1,794:
** computing summary statistics (pay to view)
** computing summary statistics (pay to view)


==== Install on Ubuntu ====
=== Install on Ubuntu ===
[https://stackoverflow.com/a/46983233 How to install Tidyverse on Ubuntu 16.04 and 17.04]
[https://stackoverflow.com/a/46983233 How to install Tidyverse on Ubuntu 16.04 and 17.04]
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
Line 1,807: Line 1,807:
80 R packages will be installed after ''tidyverse'' has been installed.
80 R packages will be installed after ''tidyverse'' has been installed.


==== Install on Raspberry Pi/(ARM based) Chromebook ====
=== Install on Raspberry Pi/(ARM based) Chromebook ===
In additional to the requirements of installing on Ubuntu, I got an error when it is installing a dependent package [https://github.com/r-lib/fs/issues/146 fs]: '''undefined symbol: pthread_atfork'''. The [https://cran.r-project.org/web/packages/fs/index.html fs] package version is 1.2.6. The [https://github.com/r-lib/fs/issues/128#issuecomment-435552967 solution] is to add one line in fs/src/Makevars file and then install the "fs" package using the source on the local machine.
In additional to the requirements of installing on Ubuntu, I got an error when it is installing a dependent package [https://github.com/r-lib/fs/issues/146 fs]: '''undefined symbol: pthread_atfork'''. The [https://cran.r-project.org/web/packages/fs/index.html fs] package version is 1.2.6. The [https://github.com/r-lib/fs/issues/128#issuecomment-435552967 solution] is to add one line in fs/src/Makevars file and then install the "fs" package using the source on the local machine.


==== [http://rpubs.com/danmirman/Rgroup-part1 5 most useful data manipulation functions] ====
=== [http://rpubs.com/danmirman/Rgroup-part1 5 most useful data manipulation functions] ===
* subset() for making subsets of data (natch)
* subset() for making subsets of data (natch)
* merge() for combining data sets in a smart and easy way
* merge() for combining data sets in a smart and easy way
Line 1,817: Line 1,817:
* ddply()-plyr package for doing split-apply-combine operations, which covers a huge swath of the most tricky data operations
* ddply()-plyr package for doing split-apply-combine operations, which covers a huge swath of the most tricky data operations


==== [https://cran.r-project.org/web/packages/data.table/index.html data.table] ====
=== [https://cran.r-project.org/web/packages/data.table/index.html data.table] ===
Fast aggregation of large data (e.g. 100GB in RAM or just several GB size file), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns and a fast file reader (fread).
Fast aggregation of large data (e.g. 100GB in RAM or just several GB size file), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns and a fast file reader (fread).


Line 1,874: Line 1,874:
Question: how to make use multicore with data.table package?
Question: how to make use multicore with data.table package?


==== reshape & reshape2 (superceded by tidyr package) ====
=== reshape & reshape2 (superceded by tidyr package) ===
* [http://r-exercises.com/2016/07/06/data-shape-transformation-with-reshape/ Data Shape Transformation With Reshape()]
* [http://r-exercises.com/2016/07/06/data-shape-transformation-with-reshape/ Data Shape Transformation With Reshape()]
* Use '''acast()''' function in reshape2 package. It will convert data.frame used for analysis to a table-like data.frame good for display.
* Use '''acast()''' function in reshape2 package. It will convert data.frame used for analysis to a table-like data.frame good for display.
* http://lamages.blogspot.com/2013/10/creating-matrix-from-long-dataframe.html
* http://lamages.blogspot.com/2013/10/creating-matrix-from-long-dataframe.html


==== [http://cran.r-project.org/web/packages/tidyr/index.html tidyr] and benchmark ====
=== [http://cran.r-project.org/web/packages/tidyr/index.html tidyr] and benchmark ===
An evolution of reshape2. It's designed specifically for data tidying (not general reshaping or aggregating) and works well with dplyr data pipelines.
An evolution of reshape2. It's designed specifically for data tidying (not general reshaping or aggregating) and works well with dplyr data pipelines.


Line 1,916: Line 1,916:
In this example, the 'religion' column will not be included (-religion).
In this example, the 'religion' column will not be included (-religion).


==== dplyr, plyr packages ====
=== dplyr, plyr packages ===
* plyr package suffered from being slow in some cases. dplyr addresses this by porting much of the computation to C++. Another additional feature is the ability to work with data stored directly in an external '''database'''. The benefits of doing this are the data can be managed natively in a relational database, queries can be conducted on that database, and only the results of query returned.
* plyr package suffered from being slow in some cases. dplyr addresses this by porting much of the computation to C++. Another additional feature is the ability to work with data stored directly in an external '''database'''. The benefits of doing this are the data can be managed natively in a relational database, queries can be conducted on that database, and only the results of query returned.
* Essential functions: 3 rows functions, 3 column functions and 1 mixed function.
* Essential functions: 3 rows functions, 3 column functions and 1 mixed function.
Line 2,012: Line 2,012:
* [https://www.mango-solutions.com/blog/what-is-tidy-eval-and-why-should-i-care What is tidy eval and why should I care?]
* [https://www.mango-solutions.com/blog/what-is-tidy-eval-and-why-should-i-care What is tidy eval and why should I care?]


==== stringr ====
=== stringr ===
* https://www.rstudio.com/wp-content/uploads/2016/09/RegExCheatsheet.pdf
* https://www.rstudio.com/wp-content/uploads/2016/09/RegExCheatsheet.pdf
* [https://github.com/rstudio/cheatsheets/raw/master/strings.pdf stringr Cheat sheet] (2 pages, this will immediately download the pdf file)
* [https://github.com/rstudio/cheatsheets/raw/master/strings.pdf stringr Cheat sheet] (2 pages, this will immediately download the pdf file)
* [https://csgillespie.github.io/efficientR/data-carpentry.html#regular-expressions Efficient data carpentry &#8594; Regular expressions] from Efficient R programming book by Gillespie & Lovelace.
* [https://csgillespie.github.io/efficientR/data-carpentry.html#regular-expressions Efficient data carpentry &#8594; Regular expressions] from Efficient R programming book by Gillespie & Lovelace.


==== [https://github.com/smbache/magrittr magrittr] ====
=== [https://github.com/smbache/magrittr magrittr] ===
* [https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html Vignettes]
* [https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html Vignettes]
* [http://www.win-vector.com/blog/2018/04/magrittr-and-wrapr-pipes-in-r-an-examination/ magrittr and wrapr Pipes in R, an Examination]
* [http://www.win-vector.com/blog/2018/04/magrittr-and-wrapr-pipes-in-r-an-examination/ magrittr and wrapr Pipes in R, an Examination]
Line 2,108: Line 2,108:
* Functions that use lazy evaluation: tryCatch(), try()
* Functions that use lazy evaluation: tryCatch(), try()


==== outer() ====
=== outer() ===


==== Genomic sequence ====
=== Genomic sequence ===
* chartr
* chartr
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
Line 2,118: Line 2,118:
</syntaxhighlight>
</syntaxhighlight>


==== broom ====
=== broom ===
[https://cran.r-project.org/web/packages/broom/index.html broom]: Convert Statistical Analysis Objects into Tidy Tibbles
[https://cran.r-project.org/web/packages/broom/index.html broom]: Convert Statistical Analysis Objects into Tidy Tibbles


==== lobstr package - dig into the internal representation and structure of R objects ====
=== lobstr package - dig into the internal representation and structure of R objects ===
[https://www.tidyverse.org/articles/2018/12/lobstr/ lobstr 1.0.0]
[https://www.tidyverse.org/articles/2018/12/lobstr/ lobstr 1.0.0]


=== Data Science ===
== Data Science ==
See [[Data_science|Data science]] page
See [[Data_science|Data science]] page


=== [http://cran.r-project.org/web/packages/jpeg/index.html jpeg] ===
== [http://cran.r-project.org/web/packages/jpeg/index.html jpeg] ==
If we want to create the image on this wiki left hand side panel, we can use the '''jpeg''' package to read an existing plot and then edit and save it.
If we want to create the image on this wiki left hand side panel, we can use the '''jpeg''' package to read an existing plot and then edit and save it.


We can also use the jpeg package to import and manipulate a jpg image. See [http://moderndata.plot.ly/fun-with-heatmaps-and-plotly/ Fun with Heatmaps and Plotly].
We can also use the jpeg package to import and manipulate a jpg image. See [http://moderndata.plot.ly/fun-with-heatmaps-and-plotly/ Fun with Heatmaps and Plotly].


=== [http://cran.r-project.org/web/packages/Cairo/index.html Cairo] ===
== [http://cran.r-project.org/web/packages/Cairo/index.html Cairo] ==
See [[Heatmap#White_strips_.28artifacts.29|White strips problem]] in png() or tiff().
See [[Heatmap#White_strips_.28artifacts.29|White strips problem]] in png() or tiff().


=== [https://cran.r-project.org/web/packages/cairoDevice/ cairoDevice] ===
== [https://cran.r-project.org/web/packages/cairoDevice/ cairoDevice] ==
PS. Not sure the advantage of functions in this package compared to R's functions (eg. Cairo_svg() vs svg()).
PS. Not sure the advantage of functions in this package compared to R's functions (eg. Cairo_svg() vs svg()).


Line 2,145: Line 2,145:
On Windows OS, we may got the error: '''unable to load shared object 'C:/Program Files/R/R-3.0.2/library/cairoDevice/libs/x64/cairoDevice.dll' '''. We need to follow the instruction in [http://tolstoy.newcastle.edu.au/R/e6/help/09/05/15613.html here].
On Windows OS, we may got the error: '''unable to load shared object 'C:/Program Files/R/R-3.0.2/library/cairoDevice/libs/x64/cairoDevice.dll' '''. We need to follow the instruction in [http://tolstoy.newcastle.edu.au/R/e6/help/09/05/15613.html here].


=== [http://igraph.org/r/ igraph] ===
== [http://igraph.org/r/ igraph] ==
[https://shiring.github.io/genome/2016/12/14/homologous_genes_part2_post creating directed networks with igraph]
[https://shiring.github.io/genome/2016/12/14/homologous_genes_part2_post creating directed networks with igraph]


=== Identifying dependencies of R functions and scripts ===
== Identifying dependencies of R functions and scripts ==
https://stackoverflow.com/questions/8761857/identifying-dependencies-of-r-functions-and-scripts
https://stackoverflow.com/questions/8761857/identifying-dependencies-of-r-functions-and-scripts
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
Line 2,159: Line 2,159:
</syntaxhighlight>
</syntaxhighlight>


=== [http://cran.r-project.org/web/packages/iterators/ iterators] ===
== [http://cran.r-project.org/web/packages/iterators/ iterators] ==
Iterator is useful over for-loop if the data is already a '''collection'''. It can be used to iterate over a vector, data frame, matrix, file
Iterator is useful over for-loop if the data is already a '''collection'''. It can be used to iterate over a vector, data frame, matrix, file


Iterator can be combined to use with foreach package http://www.exegetic.biz/blog/2013/11/iterators-in-r/ has more elaboration.
Iterator can be combined to use with foreach package http://www.exegetic.biz/blog/2013/11/iterators-in-r/ has more elaboration.


=== Colors ===
== Colors ==
* http://www.bauer.uh.edu/parks/truecolor.htm Interactive RGB, Alpha and Color Picker  
* http://www.bauer.uh.edu/parks/truecolor.htm Interactive RGB, Alpha and Color Picker  
* http://deanattali.com/blog/colourpicker-package/ Not sure what it is doing
* http://deanattali.com/blog/colourpicker-package/ Not sure what it is doing
Line 2,171: Line 2,171:
* [http://sape.inf.usi.ch/quick-reference/ggplot2/colour Color names in R]
* [http://sape.inf.usi.ch/quick-reference/ggplot2/colour Color names in R]


==== [http://rpubs.com/gaston/colortools colortools] ====
=== [http://rpubs.com/gaston/colortools colortools] ===
Tools that allow users generate color schemes and palettes
Tools that allow users generate color schemes and palettes


==== [https://github.com/daattali/colourpicker colourpicker] ====
=== [https://github.com/daattali/colourpicker colourpicker] ===
A Colour Picker Tool for Shiny and for Selecting Colours in Plots
A Colour Picker Tool for Shiny and for Selecting Colours in Plots


==== [https://cran.r-project.org/web/packages/inlmisc/index.html inlmisc] ====
=== [https://cran.r-project.org/web/packages/inlmisc/index.html inlmisc] ===
[https://owi.usgs.gov/blog/tolcolors/ GetTolColors()]. Lots of examples.
[https://owi.usgs.gov/blog/tolcolors/ GetTolColors()]. Lots of examples.


=== [https://github.com/kevinushey/rex rex] ===
== [https://github.com/kevinushey/rex rex] ==
Friendly Regular Expressions
Friendly Regular Expressions


=== [http://cran.r-project.org/web/packages/formatR/index.html formatR] ===
== [http://cran.r-project.org/web/packages/formatR/index.html formatR] ==
'''The best strategy to avoid failure is to put comments in complete lines or after complete R expressions.'''
'''The best strategy to avoid failure is to put comments in complete lines or after complete R expressions.'''


Line 2,256: Line 2,256:
</pre>
</pre>


=== Download papers ===
== Download papers ==
==== [http://cran.r-project.org/web/packages/biorxivr/index.html biorxivr] ====
=== [http://cran.r-project.org/web/packages/biorxivr/index.html biorxivr] ===
Search and Download Papers from the bioRxiv Preprint Server
Search and Download Papers from the bioRxiv Preprint Server


==== [http://cran.r-project.org/web/packages/aRxiv/index.html aRxiv] ====
=== [http://cran.r-project.org/web/packages/aRxiv/index.html aRxiv] ===
Interface to the arXiv API
Interface to the arXiv API


==== [https://cran.r-project.org/web/packages/pdftools/index.html pdftools] ====
=== [https://cran.r-project.org/web/packages/pdftools/index.html pdftools] ===
* http://ropensci.org/blog/2016/03/01/pdftools-and-jeroen
* http://ropensci.org/blog/2016/03/01/pdftools-and-jeroen
* http://r-posts.com/how-to-extract-data-from-a-pdf-file-with-r/
* http://r-posts.com/how-to-extract-data-from-a-pdf-file-with-r/
* https://ropensci.org/technotes/2018/12/14/pdftools-20/
* https://ropensci.org/technotes/2018/12/14/pdftools-20/


=== [https://github.com/ColinFay/aside aside]: set it aside ===
== [https://github.com/ColinFay/aside aside]: set it aside ==
An RStudio addin to run long R commands aside your current session.
An RStudio addin to run long R commands aside your current session.


=== Teaching ===
== Teaching ==
* [https://cran.r-project.org/web/packages/smovie/vignettes/smovie-vignette.html smovie]: Some Movies to Illustrate Concepts in Statistics
* [https://cran.r-project.org/web/packages/smovie/vignettes/smovie-vignette.html smovie]: Some Movies to Illustrate Concepts in Statistics


=== Organize R research project ===
== Organize R research project ==
* [https://ntguardian.wordpress.com/2019/02/04/organizing-r-research-projects-cpat-case-study/ Organizing R Research Projects: CPAT, A Case Study]
* [https://ntguardian.wordpress.com/2019/02/04/organizing-r-research-projects-cpat-case-study/ Organizing R Research Projects: CPAT, A Case Study]
* [https://www.tidyverse.org/articles/2017/12/workflow-vs-script/ Project-oriented workflow]. It suggests the [https://github.com/r-lib/here here] package. Don't use '''setwd()''' and '''rm(list = ls())'''.
* [https://www.tidyverse.org/articles/2017/12/workflow-vs-script/ Project-oriented workflow]. It suggests the [https://github.com/r-lib/here here] package. Don't use '''setwd()''' and '''rm(list = ls())'''.
Line 2,283: Line 2,283:
** [https://ropenscilabs.github.io/drake-manual/index.html The drake R Package User Manual]
** [https://ropenscilabs.github.io/drake-manual/index.html The drake R Package User Manual]


==== How to save (and load) datasets in R (.Rdata vs .Rds file) ====
=== How to save (and load) datasets in R (.Rdata vs .Rds file) ===
[https://rcrastinate.rbind.io/post/how-to-save-and-load-data-in-r-an-overview/ How to save (and load) datasets in R: An overview]
[https://rcrastinate.rbind.io/post/how-to-save-and-load-data-in-r-an-overview/ How to save (and load) datasets in R: An overview]


=== Text to speech ===
== Text to speech ==
[https://shirinsplayground.netlify.com/2018/06/googlelanguager/ Text-to-Speech with the googleLanguageR package]
[https://shirinsplayground.netlify.com/2018/06/googlelanguager/ Text-to-Speech with the googleLanguageR package]


=== Weather data ===
== Weather data ==
* [https://github.com/ropensci/prism prism] package
* [https://github.com/ropensci/prism prism] package
* [http://www.weatherbase.com/weather/weather.php3?s=507781&cityname=Rockville-Maryland-United-States-of-America Weatherbase]
* [http://www.weatherbase.com/weather/weather.php3?s=507781&cityname=Rockville-Maryland-United-States-of-America Weatherbase]


=== logR ===
== logR ==
https://github.com/jangorecki/logR
https://github.com/jangorecki/logR


=== Progress bar ===
== Progress bar ==
https://github.com/r-lib/progress#readme
https://github.com/r-lib/progress#readme


Configurable Progress bars, they may include percentage, elapsed time, and/or the estimated completion time. They work in terminals, in 'Emacs' 'ESS', 'RStudio', 'Windows' 'Rgui' and the 'macOS'.
Configurable Progress bars, they may include percentage, elapsed time, and/or the estimated completion time. They work in terminals, in 'Emacs' 'ESS', 'RStudio', 'Windows' 'Rgui' and the 'macOS'.


=== cron ===
== cron ==
[https://github.com/bnosac/cronr cronR]
[https://github.com/bnosac/cronr cronR]


== Different ways of using R ==
= Different ways of using R =


=== 10 things R can do that might surprise you ===
== 10 things R can do that might surprise you ==
https://simplystatistics.org/2019/03/13/10-things-r-can-do-that-might-surprise-you/
https://simplystatistics.org/2019/03/13/10-things-r-can-do-that-might-surprise-you/


=== R call C/C++ ===
== R call C/C++ ==
Mainly talks about .C() and .Call().
Mainly talks about .C() and .Call().


Line 2,320: Line 2,320:
* http://mazamascience.com/WorkingWithData/?p=1099
* http://mazamascience.com/WorkingWithData/?p=1099


=== SEXP ===
== SEXP ==
Some examples from packages
Some examples from packages


* [https://www.bioconductor.org/packages/release/bioc/html/sva.html sva] package has one C code function
* [https://www.bioconductor.org/packages/release/bioc/html/sva.html sva] package has one C code function


=== R call Fortran ===
== R call Fortran ==
* [https://stat.ethz.ch/pipermail/r-devel/2015-March/070851.html R call Fortran 90]
* [https://stat.ethz.ch/pipermail/r-devel/2015-March/070851.html R call Fortran 90]
* [https://www.r-bloggers.com/the-need-for-speed-part-1-building-an-r-package-with-fortran-or-c/ The Need for Speed Part 1: Building an R Package with Fortran (or C)] (Very detailed)
* [https://www.r-bloggers.com/the-need-for-speed-part-1-building-an-r-package-with-fortran-or-c/ The Need for Speed Part 1: Building an R Package with Fortran (or C)] (Very detailed)


=== Embedding R ===
== Embedding R ==


* See [http://cran.r-project.org/doc/manuals/R-exts.html#Linking-GUIs-and-other-front_002dends-to-R Writing for R Extensions] Manual Chapter 8.
* See [http://cran.r-project.org/doc/manuals/R-exts.html#Linking-GUIs-and-other-front_002dends-to-R Writing for R Extensions] Manual Chapter 8.
Line 2,336: Line 2,336:
* https://stat.ethz.ch/pipermail/r-help/attachments/20110729/b7d86ed7/attachment.pl
* https://stat.ethz.ch/pipermail/r-help/attachments/20110729/b7d86ed7/attachment.pl


==== An very simple example (do not return from shell) from Writing R Extensions manual ====
=== An very simple example (do not return from shell) from Writing R Extensions manual ===
The command-line R front-end, R_HOME/bin/exec/R, is one such example. Its source code is in file <src/main/Rmain.c>.
The command-line R front-end, R_HOME/bin/exec/R, is one such example. Its source code is in file <src/main/Rmain.c>.


Line 2,348: Line 2,348:
More examples of embedding can be found in ''tests/Embedding'' directory. Read <index.html> for more information about these test examples.
More examples of embedding can be found in ''tests/Embedding'' directory. Read <index.html> for more information about these test examples.


==== An example from Bioconductor workshop ====
=== An example from Bioconductor workshop ===
* What is covered in this section is different from [[R#Create_a_standalone_Rmath_library|Create and use a standalone Rmath library]].
* What is covered in this section is different from [[R#Create_a_standalone_Rmath_library|Create and use a standalone Rmath library]].
* Use eval() function. See R-Ext [http://cran.r-project.org/doc/manuals/R-exts.html#Embedding-R-under-Unix_002dalikes 8.1] and [http://cran.r-project.org/doc/manuals/R-exts.html#Embedding-R-under-Windows 8.2] and [http://cran.r-project.org/doc/manuals/R-exts.html#Evaluating-R-expressions-from-C 5.11].
* Use eval() function. See R-Ext [http://cran.r-project.org/doc/manuals/R-exts.html#Embedding-R-under-Unix_002dalikes 8.1] and [http://cran.r-project.org/doc/manuals/R-exts.html#Embedding-R-under-Windows 8.2] and [http://cran.r-project.org/doc/manuals/R-exts.html#Evaluating-R-expressions-from-C 5.11].
Line 2,427: Line 2,427:
Reference http://bioconductor.org/help/course-materials/2012/Seattle-Oct-2012/AdvancedR.pdf
Reference http://bioconductor.org/help/course-materials/2012/Seattle-Oct-2012/AdvancedR.pdf


==== Create a Simple Socket Server in R ====
=== Create a Simple Socket Server in R ===
This example is coming from this [http://epub.ub.uni-muenchen.de/2085/1/tr012.pdf paper].  
This example is coming from this [http://epub.ub.uni-muenchen.de/2085/1/tr012.pdf paper].  


Line 2,467: Line 2,467:
</pre>
</pre>


==== [http://www.rforge.net/Rserve/doc.html Rserve] ====
=== [http://www.rforge.net/Rserve/doc.html Rserve] ===
Note the way of launching Rserve is like the way we launch C program when R was embedded in C. See [[R#An_example_from_Bioconductor_workshop|Example from Bioconductor workshop]].
Note the way of launching Rserve is like the way we launch C program when R was embedded in C. See [[R#An_example_from_Bioconductor_workshop|Example from Bioconductor workshop]].


See my [[Rserve]] page.
See my [[Rserve]] page.


==== (Commercial) [http://www.statconn.com/ StatconnDcom] ====
=== (Commercial) [http://www.statconn.com/ StatconnDcom] ===


==== [http://rdotnet.codeplex.com/ R.NET] ====
=== [http://rdotnet.codeplex.com/ R.NET] ===


==== [https://cran.r-project.org/web/packages/rJava/index.html rJava] ====
=== [https://cran.r-project.org/web/packages/rJava/index.html rJava] ===
* [https://jozefhajnala.gitlab.io/r/r901-primer-java-from-r-1/ A primer in using Java from R - part 1]
* [https://jozefhajnala.gitlab.io/r/r901-primer-java-from-r-1/ A primer in using Java from R - part 1]
* Note rJava is needed by [https://cran.r-project.org/web/packages/xlsx/index.html xlsx] package.
* Note rJava is needed by [https://cran.r-project.org/web/packages/xlsx/index.html xlsx] package.
Line 2,512: Line 2,512:
which will create new package 'default-jre' (under '''/usr/lib/jvm''') and 'default-jre-headless'.
which will create new package 'default-jre' (under '''/usr/lib/jvm''') and 'default-jre-headless'.


==== RCaller ====
=== RCaller ===


==== RApache ====
=== RApache ===
* http://www.stat.ucla.edu/~jeroen/files/seminar.pdf
* http://www.stat.ucla.edu/~jeroen/files/seminar.pdf


==== [http://dirk.eddelbuettel.com/code/littler.html littler] ====
=== [http://dirk.eddelbuettel.com/code/littler.html littler] ===
Provides hash-bang (#!) capability for R
Provides hash-bang (#!) capability for R


Line 2,546: Line 2,546:
'''r''' was not meant to run interactively like '''R'''. See ''man r''.
'''r''' was not meant to run interactively like '''R'''. See ''man r''.


==== RInside: Embed R in C++ ====
=== RInside: Embed R in C++ ===
See [[R#RInside|RInside]]
See [[R#RInside|RInside]]


Line 2,608: Line 2,608:
</pre>
</pre>


==== [http://www.rfortran.org/ RFortran] ====
=== [http://www.rfortran.org/ RFortran] ===
RFortran is an open source project with the following aim:
RFortran is an open source project with the following aim:


Line 2,615: Line 2,615:
It works only on Windows platform with Microsoft Visual Studio installed:(
It works only on Windows platform with Microsoft Visual Studio installed:(


=== Call R from other languages ===
== Call R from other languages ==
==== C ====
=== C ===
[http://sebastian-mader.net/programming/using-r-from-c-c/ Using R from C/C++]
[http://sebastian-mader.net/programming/using-r-from-c-c/ Using R from C/C++]


Line 2,623: Line 2,623:
Solution: add '''getNativeSymbolInfo()''' around your C/Fortran symbols. Search Google:r dyn.load not resolved from current namespace
Solution: add '''getNativeSymbolInfo()''' around your C/Fortran symbols. Search Google:r dyn.load not resolved from current namespace


==== JRI ====
=== JRI ===
http://www.rforge.net/JRI/
http://www.rforge.net/JRI/


==== ryp2 ====
=== ryp2 ===
http://rpy.sourceforge.net/rpy2.html
http://rpy.sourceforge.net/rpy2.html


=== Create a standalone Rmath library ===
== Create a standalone Rmath library ==
R has many math and statistical functions. We can easily use these functions in our C/C++/Fortran. The definite guide of doing this is on Chapter 9 "The standalone Rmath library" of [http://cran.r-project.org/doc/manuals/R-admin.html#The-standalone-Rmath-library R-admin manual].
R has many math and statistical functions. We can easily use these functions in our C/C++/Fortran. The definite guide of doing this is on Chapter 9 "The standalone Rmath library" of [http://cran.r-project.org/doc/manuals/R-admin.html#The-standalone-Rmath-library R-admin manual].


Here is my experience based on R 3.0.2 on Windows OS.
Here is my experience based on R 3.0.2 on Windows OS.


==== Create a static library <libRmath.a> and a dynamic library <Rmath.dll> ====
=== Create a static library <libRmath.a> and a dynamic library <Rmath.dll> ===
Suppose we have downloaded R source code and build R from its source. See [[R#Build_R_from_its_source|Build_R_from_its_source]]. Then the following 2 lines will generate files <libRmath.a> and <Rmath.dll> under C:\R\R-3.0.2\src\nmath\standalone directory.
Suppose we have downloaded R source code and build R from its source. See [[R#Build_R_from_its_source|Build_R_from_its_source]]. Then the following 2 lines will generate files <libRmath.a> and <Rmath.dll> under C:\R\R-3.0.2\src\nmath\standalone directory.
<pre>
<pre>
Line 2,641: Line 2,641:
</pre>
</pre>


==== Use Rmath library in our code ====
=== Use Rmath library in our code ===
<pre>
<pre>
set CPLUS_INCLUDE_PATH=C:\R\R-3.0.2\src\include
set CPLUS_INCLUDE_PATH=C:\R\R-3.0.2\src\include
Line 2,695: Line 2,695:
</pre>
</pre>


=== Calling R.dll directly ===
== Calling R.dll directly ==
See Chapter 8.2.2 of [http://cran.r-project.org/doc/manuals/R-exts.html#Calling-R_002edll-directly|Writing R Extensions]. This is related to embedding R under Windows. The file <R.dll> on Windows is like <libR.so> on Linux.
See Chapter 8.2.2 of [http://cran.r-project.org/doc/manuals/R-exts.html#Calling-R_002edll-directly|Writing R Extensions]. This is related to embedding R under Windows. The file <R.dll> on Windows is like <libR.so> on Linux.


=== Create HTML report ===
== Create HTML report ==
[http://www.bioconductor.org/packages/release/bioc/html/ReportingTools.html ReportingTools] (Jason Hackney) from Bioconductor.
[http://www.bioconductor.org/packages/release/bioc/html/ReportingTools.html ReportingTools] (Jason Hackney) from Bioconductor.


==== [http://cran.r-project.org/web/packages/htmlTable/index.html htmlTable] package ====
=== [http://cran.r-project.org/web/packages/htmlTable/index.html htmlTable] package ===
The htmlTable package is intended for generating tables using HTML formatting. This format is compatible with Markdown when used for HTML-output. The most basic table can easily be created by just passing a matrix or a data.frame to the htmlTable-function.
The htmlTable package is intended for generating tables using HTML formatting. This format is compatible with Markdown when used for HTML-output. The most basic table can easily be created by just passing a matrix or a data.frame to the htmlTable-function.


Line 2,707: Line 2,707:
* http://gforge.se/2014/01/fast-track-publishing-using-knitr-part-iv/
* http://gforge.se/2014/01/fast-track-publishing-using-knitr-part-iv/


==== [https://cran.r-project.org/web/packages/formattable/index.html formattable] ====
=== [https://cran.r-project.org/web/packages/formattable/index.html formattable] ===
* https://github.com/renkun-ken/formattable
* https://github.com/renkun-ken/formattable
* http://www.magesblog.com/2016/01/formatting-table-output-in-r.html
* http://www.magesblog.com/2016/01/formatting-table-output-in-r.html
* [https://www.displayr.com/formattable/ Make Beautiful Tables with the Formattable Package]
* [https://www.displayr.com/formattable/ Make Beautiful Tables with the Formattable Package]


==== [https://github.com/crubba/htmltab htmltab] package ====
=== [https://github.com/crubba/htmltab htmltab] package ===
This package is NOT used to CREATE html report but EXTRACT html table.
This package is NOT used to CREATE html report but EXTRACT html table.


==== [http://cran.r-project.org/web/packages/ztable/index.html ztable] package ====
=== [http://cran.r-project.org/web/packages/ztable/index.html ztable] package ===
Makes zebra-striped tables (tables with alternating row colors) in LaTeX and HTML formats easily from a data.frame, matrix, lm, aov, anova, glm or coxph objects.
Makes zebra-striped tables (tables with alternating row colors) in LaTeX and HTML formats easily from a data.frame, matrix, lm, aov, anova, glm or coxph objects.


=== Create academic report ===
== Create academic report ==
[http://cran.r-project.org/web/packages/reports/index.html reports] package in CRAN and in [https://github.com/trinker/reports github] repository. The youtube video gives an overview of the package.
[http://cran.r-project.org/web/packages/reports/index.html reports] package in CRAN and in [https://github.com/trinker/reports github] repository. The youtube video gives an overview of the package.


=== Create pdf and epub files ===
== Create pdf and epub files ==
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
# Idea:
# Idea:
Line 2,788: Line 2,788:
* http://tex.stackexchange.com/questions/156668/tex-to-epub-conversion
* http://tex.stackexchange.com/questions/156668/tex-to-epub-conversion


==== [https://www.rdocumentation.org/packages/knitr/versions/1.20/topics/kable kable()] for tables ====
=== [https://www.rdocumentation.org/packages/knitr/versions/1.20/topics/kable kable()] for tables ===
Create Tables In LaTeX, HTML, Markdown And ReStructuredText
Create Tables In LaTeX, HTML, Markdown And ReStructuredText


Line 2,797: Line 2,797:
* [https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html kableExtra] package
* [https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html kableExtra] package


=== Create Word report ===
== Create Word report ==


==== knitr + pandoc ====
=== knitr + pandoc ===
* http://www.r-statistics.com/2013/03/write-ms-word-document-using-r-with-as-little-overhead-as-possible/
* http://www.r-statistics.com/2013/03/write-ms-word-document-using-r-with-as-little-overhead-as-possible/
* http://www.carlboettiger.info/2012/04/07/writing-reproducibly-in-the-open-with-knitr.html
* http://www.carlboettiger.info/2012/04/07/writing-reproducibly-in-the-open-with-knitr.html
Line 2,849: Line 2,849:
</pre>
</pre>


==== pander ====
=== pander ===
Try pandoc[1] with a minimal reproducible example, you might give a try to my "[http://cran.r-project.org/web/packages/pander/ pander]" package [2] too:
Try pandoc[1] with a minimal reproducible example, you might give a try to my "[http://cran.r-project.org/web/packages/pander/ pander]" package [2] too:


Line 2,866: Line 2,866:
# http://rapporter.github.com/pander/#examples
# http://rapporter.github.com/pander/#examples


==== R2wd ====
=== R2wd ===
Use [http://cran.r-project.org/web/packages/R2wd/ R2wd] package. However, only 32-bit R is allowed and sometimes it can not produce all 'table's.  
Use [http://cran.r-project.org/web/packages/R2wd/ R2wd] package. However, only 32-bit R is allowed and sometimes it can not produce all 'table's.  
<pre>
<pre>
Line 2,887: Line 2,887:
The solution is to launch 32-bit R instead of 64-bit R since statconnDCOM does not support 64-bit R.
The solution is to launch 32-bit R instead of 64-bit R since statconnDCOM does not support 64-bit R.


==== Convert from pdf to word ====
=== Convert from pdf to word ===
The best rendering of advanced tables is done by converting from pdf to Word. See http://biostat.mc.vanderbilt.edu/wiki/Main/SweaveConvert
The best rendering of advanced tables is done by converting from pdf to Word. See http://biostat.mc.vanderbilt.edu/wiki/Main/SweaveConvert


==== rtf ====
=== rtf ===
Use [http://cran.r-project.org/web/packages/rtf/ rtf] package for Rich Text Format (RTF) Output.
Use [http://cran.r-project.org/web/packages/rtf/ rtf] package for Rich Text Format (RTF) Output.


==== [https://www.rdocumentation.org/packages/xtable/versions/1.8-2 xtable] ====
=== [https://www.rdocumentation.org/packages/xtable/versions/1.8-2 xtable] ===
Package xtable will produce html output. <syntaxhighlight lang='rsplus'>print(xtable(X), type="html")</syntaxhighlight>  
Package xtable will produce html output. <syntaxhighlight lang='rsplus'>print(xtable(X), type="html")</syntaxhighlight>  


If you save the file and then open it with Word, you will get serviceable results. I've had better luck copying the output from xtable and pasting it into Excel.
If you save the file and then open it with Word, you will get serviceable results. I've had better luck copying the output from xtable and pasting it into Excel.


==== [http://cran.r-project.org/web/packages/ReporteRs/index.html ReporteRs] ====
=== [http://cran.r-project.org/web/packages/ReporteRs/index.html ReporteRs] ===
Microsoft Word, Microsoft Powerpoint and HTML documents generation from R. The source code is hosted on https://github.com/davidgohel/ReporteRs
Microsoft Word, Microsoft Powerpoint and HTML documents generation from R. The source code is hosted on https://github.com/davidgohel/ReporteRs


[https://statbandit.wordpress.com/2016/10/28/a-quick-exploration-of-reporters/ A quick exploration]
[https://statbandit.wordpress.com/2016/10/28/a-quick-exploration-of-reporters/ A quick exploration]


=== PDF manipulation ===
== PDF manipulation ==
[https://github.com/pridiltal/staplr staplr]
[https://github.com/pridiltal/staplr staplr]


=== R Graphs Gallery ===
== R Graphs Gallery ==
* [https://www.facebook.com/pages/R-Graph-Gallery/169231589826661 Romain François]
* [https://www.facebook.com/pages/R-Graph-Gallery/169231589826661 Romain François]
* [http://shinyapps.stat.ubc.ca/r-graph-catalog/ R Graph Catalog] written using R + Shiny. The source code is available on [https://github.com/jennybc/r-graph-catalog Github].
* [http://shinyapps.stat.ubc.ca/r-graph-catalog/ R Graph Catalog] written using R + Shiny. The source code is available on [https://github.com/jennybc/r-graph-catalog Github].
* Forest plot. See the packages [https://cran.r-project.org/web/packages/rmeta/index.html rmeta] and [https://cran.r-project.org/web/packages/forestplot/ forestplot]. The forest plot can be used to plot the quantities like relative risk (with 95% CI) in survival data.
* Forest plot. See the packages [https://cran.r-project.org/web/packages/rmeta/index.html rmeta] and [https://cran.r-project.org/web/packages/forestplot/ forestplot]. The forest plot can be used to plot the quantities like relative risk (with 95% CI) in survival data.


=== COM client or server ===
== COM client or server ==


==== Client ====
=== Client ===


[http://www.omegahat.org/RDCOMClient/ RDCOMClient] where [http://cran.r-project.org/web/packages/excel.link/index.html excel.link] depends on it.
[http://www.omegahat.org/RDCOMClient/ RDCOMClient] where [http://cran.r-project.org/web/packages/excel.link/index.html excel.link] depends on it.


==== Server ====
=== Server ===
[http://www.omegahat.org/RDCOMServer/ RDCOMServer]
[http://www.omegahat.org/RDCOMServer/ RDCOMServer]


=== Use R under proxy ===
== Use R under proxy ==
http://support.rstudio.org/help/kb/faq/configuring-r-to-use-an-http-proxy
http://support.rstudio.org/help/kb/faq/configuring-r-to-use-an-http-proxy


=== RStudio ===
== RStudio ==
* [https://github.com/rstudio/rstudio Github]
* [https://github.com/rstudio/rstudio Github]
* Installing RStudio (1.0.44) on Ubuntu will not install Java even the source code contains 37.5% Java??
* Installing RStudio (1.0.44) on Ubuntu will not install Java even the source code contains 37.5% Java??
* [https://www.rstudio.com/products/rstudio/download/preview/ Preview]
* [https://www.rstudio.com/products/rstudio/download/preview/ Preview]


==== rstudio.cloud ====
=== rstudio.cloud ===
https://rstudio.cloud/
https://rstudio.cloud/


==== Launch RStudio ====
=== Launch RStudio ===
[[Rstudio#Multiple_versions_of_R|Multiple versions of R]]
[[Rstudio#Multiple_versions_of_R|Multiple versions of R]]


==== Create .Rproj file ====
=== Create .Rproj file ===
If you have an existing package that doesn't have an .Rproj file, you can use devtools::use_rstudio("path/to/package") to add it.
If you have an existing package that doesn't have an .Rproj file, you can use devtools::use_rstudio("path/to/package") to add it.


Line 2,943: Line 2,943:
* etc
* etc


==== package search ====
=== package search ===
https://github.com/RhoInc/CRANsearcher
https://github.com/RhoInc/CRANsearcher


==== Git ====
=== Git ===
* (Video) [https://www.rstudio.com/resources/videos/happy-git-and-gihub-for-the-user-tutorial/ Happy Git and Gihub for the useR – Tutorial]
* (Video) [https://www.rstudio.com/resources/videos/happy-git-and-gihub-for-the-user-tutorial/ Happy Git and Gihub for the useR – Tutorial]
* [https://owi.usgs.gov/blog/beyond-basic-git/ Beyond Basic R - Version Control with Git]
* [https://owi.usgs.gov/blog/beyond-basic-git/ Beyond Basic R - Version Control with Git]


=== Visual Studio ===
== Visual Studio ==
[http://blog.revolutionanalytics.com/2017/05/r-and-python-support-now-built-in-to-visual-studio-2017.html R and Python support now built in to Visual Studio 2017]
[http://blog.revolutionanalytics.com/2017/05/r-and-python-support-now-built-in-to-visual-studio-2017.html R and Python support now built in to Visual Studio 2017]


=== List files using regular expression ===
== List files using regular expression ==
* Extension
* Extension
<pre>
<pre>
Line 2,970: Line 2,970:
</pre>
</pre>


=== Hidden tool: rsync in Rtools ===
== Hidden tool: rsync in Rtools ==
<pre>
<pre>
c:\Rtools\bin>rsync -avz "/cygdrive/c/users/limingc/Downloads/a.exe" "/cygdrive/c/users/limingc/Documents/"
c:\Rtools\bin>rsync -avz "/cygdrive/c/users/limingc/Downloads/a.exe" "/cygdrive/c/users/limingc/Documents/"
Line 3,023: Line 3,023:
Unforunately, if the destination is a network drive, I could get a permission denied (13) error. See also http://superuser.com/questions/69620/rsync-file-permissions-on-windows
Unforunately, if the destination is a network drive, I could get a permission denied (13) error. See also http://superuser.com/questions/69620/rsync-file-permissions-on-windows


=== Install rgdal package (geospatial Data) on ubuntu ===
== Install rgdal package (geospatial Data) on ubuntu ==
Terminal
Terminal
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
Line 3,035: Line 3,035:
</syntaxhighlight>
</syntaxhighlight>


=== Install sf package ===
== Install sf package ==
I got the following error even I have installed some libraries.  
I got the following error even I have installed some libraries.  
<pre>
<pre>
Line 3,053: Line 3,053:
</syntaxhighlight>
</syntaxhighlight>


=== Set up Emacs on Windows ===
== Set up Emacs on Windows ==
Edit the file ''C:\Program Files\GNU Emacs 23.2\site-lisp\site-start.el'' with something like
Edit the file ''C:\Program Files\GNU Emacs 23.2\site-lisp\site-start.el'' with something like
<pre>
<pre>
Line 3,060: Line 3,060:
</pre>
</pre>


=== Database ===
== Database ==
* https://cran.r-project.org/web/views/Databases.html
* https://cran.r-project.org/web/views/Databases.html
* [http://blog.revolutionanalytics.com/2017/08/a-modern-database-interface-for-r.html A modern database interface for R]
* [http://blog.revolutionanalytics.com/2017/08/a-modern-database-interface-for-r.html A modern database interface for R]


==== [http://cran.r-project.org/web/packages/RSQLite/index.html RSQLite] ====
=== [http://cran.r-project.org/web/packages/RSQLite/index.html RSQLite] ===
* https://cran.r-project.org/web/packages/RSQLite/vignettes/RSQLite.html
* https://cran.r-project.org/web/packages/RSQLite/vignettes/RSQLite.html
* https://github.com/rstats-db/RSQLite
* https://github.com/rstats-db/RSQLite
Line 3,138: Line 3,138:
</syntaxhighlight>
</syntaxhighlight>


==== [https://cran.r-project.org/web/packages/sqldf/ sqldf] ====
=== [https://cran.r-project.org/web/packages/sqldf/ sqldf] ===
Manipulate R data frames using SQL. Depends on RSQLite. [http://datascienceplus.com/a-use-of-gsub-reshape2-and-sqldf-with-healthcare-data/ A use of gsub, reshape2 and sqldf with healthcare data]
Manipulate R data frames using SQL. Depends on RSQLite. [http://datascienceplus.com/a-use-of-gsub-reshape2-and-sqldf-with-healthcare-data/ A use of gsub, reshape2 and sqldf with healthcare data]


==== [https://cran.r-project.org/web/packages/RPostgreSQL/index.html RPostgreSQL] ====
=== [https://cran.r-project.org/web/packages/RPostgreSQL/index.html RPostgreSQL] ===


==== [[MySQL#Use_through_R|RMySQL]] ====
=== [[MySQL#Use_through_R|RMySQL]] ===
* http://datascienceplus.com/bringing-the-powers-of-sql-into-r/
* http://datascienceplus.com/bringing-the-powers-of-sql-into-r/
* See [[MySQL#Installation|here]] about the installation of the required package ('''libmysqlclient-dev''') in Ubuntu.
* See [[MySQL#Installation|here]] about the installation of the required package ('''libmysqlclient-dev''') in Ubuntu.


==== MongoDB ====
=== MongoDB ===
* http://www.r-bloggers.com/r-and-mongodb/
* http://www.r-bloggers.com/r-and-mongodb/
* http://watson.nci.nih.gov/~sdavis/blog/rmongodb-using-R-with-mongo/
* http://watson.nci.nih.gov/~sdavis/blog/rmongodb-using-R-with-mongo/


==== odbc ====
=== odbc ===


==== RODBC ====
=== RODBC ===


==== DBI ====
=== DBI ===


==== [https://cran.r-project.org/web/packages/dbplyr/index.html dbplyr] ====
=== [https://cran.r-project.org/web/packages/dbplyr/index.html dbplyr] ===
* To use databases with dplyr, you need to first install dbplyr
* To use databases with dplyr, you need to first install dbplyr
* https://db.rstudio.com/dplyr/
* https://db.rstudio.com/dplyr/
Line 3,230: Line 3,230:
</syntaxhighlight>
</syntaxhighlight>


==== NoSQL ====
=== NoSQL ===
[https://ropensci.org/technotes/2018/01/25/nodbi/ nodbi: the NoSQL Database Connector]
[https://ropensci.org/technotes/2018/01/25/nodbi/ nodbi: the NoSQL Database Connector]


=== Github ===
== Github ==


==== R source  ====
=== R source  ===
https://github.com/wch/r-source/  Daily update, interesting, should be visited every day. Clicking '''1000+ commits''' to look at daily changes.
https://github.com/wch/r-source/  Daily update, interesting, should be visited every day. Clicking '''1000+ commits''' to look at daily changes.


If we are interested in a certain branch (say 3.2), look for R-3-2-branch.
If we are interested in a certain branch (say 3.2), look for R-3-2-branch.


==== R packages (only) source (metacran) ====
=== R packages (only) source (metacran) ===
* https://github.com/cran/ by [https://github.com/gaborcsardi Gábor Csárdi], the author of '''[http://igraph.org/ igraph]''' software.
* https://github.com/cran/ by [https://github.com/gaborcsardi Gábor Csárdi], the author of '''[http://igraph.org/ igraph]''' software.


==== Bioconductor packages source ====
=== Bioconductor packages source ===
<strike>[https://stat.ethz.ch/pipermail/bioc-devel/2015-June/007675.html Announcement], https://github.com/Bioconductor-mirror </strike>
<strike>[https://stat.ethz.ch/pipermail/bioc-devel/2015-June/007675.html Announcement], https://github.com/Bioconductor-mirror </strike>


==== Send local repository to Github in R by using reports package ====
=== Send local repository to Github in R by using reports package ===
http://www.youtube.com/watch?v=WdOI_-aZV0Y
http://www.youtube.com/watch?v=WdOI_-aZV0Y


==== My collection ====
=== My collection ===
* https://github.com/arraytools
* https://github.com/arraytools
* https://gist.github.com/4383351 heatmap using leukemia data
* https://gist.github.com/4383351 heatmap using leukemia data
Line 3,255: Line 3,255:
* https://gist.github.com/4484270 biocLite  
* https://gist.github.com/4484270 biocLite  


==== How to download ====
=== How to download ===


Clone ~ Download.  
Clone ~ Download.  
Line 3,284: Line 3,284:
</pre>
</pre>


==== Jekyll ====
=== Jekyll ===
[http://statistics.rainandrhino.org/2015/12/15/jekyll-r-blogger-knitr-hyde.html An Easy Start with Jekyll, for R-Bloggers]
[http://statistics.rainandrhino.org/2015/12/15/jekyll-r-blogger-knitr-hyde.html An Easy Start with Jekyll, for R-Bloggers]


=== Connect R with Arduino ===
== Connect R with Arduino ==
* http://lamages.blogspot.com/2012/10/connecting-real-world-to-r-with-arduino.html
* http://lamages.blogspot.com/2012/10/connecting-real-world-to-r-with-arduino.html
* http://jean-robert.github.io/2012/11/11/thermometer-R-using-Arduino-Java.html
* http://jean-robert.github.io/2012/11/11/thermometer-R-using-Arduino-Java.html
Line 3,293: Line 3,293:
* http://www.rforge.net/Arduino/svn.html
* http://www.rforge.net/Arduino/svn.html


=== Android App ===
== Android App ==
* [https://play.google.com/store/apps/details?id=appinventor.ai_RInstructor.R2&hl=zh_TW R Instructor] $4.84
* [https://play.google.com/store/apps/details?id=appinventor.ai_RInstructor.R2&hl=zh_TW R Instructor] $4.84
* [http://realxyapp.blogspot.tw/2010/12/statistical-distribution.html Statistical Distribution] (Not R related app)
* [http://realxyapp.blogspot.tw/2010/12/statistical-distribution.html Statistical Distribution] (Not R related app)
* [https://datascienceplus.com/data-driven-introspection-of-my-android-mobile-usage-in-r/ Data-driven Introspection of my Android Mobile usage in R]
* [https://datascienceplus.com/data-driven-introspection-of-my-android-mobile-usage-in-r/ Data-driven Introspection of my Android Mobile usage in R]


=== Common plots tips ===
== Common plots tips ==
==== Grouped boxplots ====
=== Grouped boxplots ===
* [http://sphaerula.com/legacy/R/boxplotTwoWay.html Box Plots of Two-Way Layout]
* [http://sphaerula.com/legacy/R/boxplotTwoWay.html Box Plots of Two-Way Layout]
* [http://r-video-tutorial.blogspot.com/2013/06/box-plot-with-r-tutorial.html Step by step to create a grouped boxplots]
* [http://r-video-tutorial.blogspot.com/2013/06/box-plot-with-r-tutorial.html Step by step to create a grouped boxplots]
Line 3,307: Line 3,307:
* [https://stackoverflow.com/questions/28426026/plotting-boxplots-of-multiple-y-variables-using-ggplot2-qplot-or-others ggplot2 approach] (Hint: '''facet_grid''' is used)
* [https://stackoverflow.com/questions/28426026/plotting-boxplots-of-multiple-y-variables-using-ggplot2-qplot-or-others ggplot2 approach] (Hint: '''facet_grid''' is used)


==== [https://www.samruston.co.uk/ Weather Time Line] ====
=== [https://www.samruston.co.uk/ Weather Time Line] ===
The plot looks similar to a boxplot though it is not. See a [https://www.samruston.co.uk/images/screens/screen_2.png screenshot] on Android by [https://www.samruston.co.uk/ Sam Ruston].
The plot looks similar to a boxplot though it is not. See a [https://www.samruston.co.uk/images/screens/screen_2.png screenshot] on Android by [https://www.samruston.co.uk/ Sam Ruston].


==== Horizontal bar plot ====
=== Horizontal bar plot ===
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
library(ggplot2)
library(ggplot2)
Line 3,323: Line 3,323:
[[File:Ggplot2bar.svg|300px]]
[[File:Ggplot2bar.svg|300px]]


==== Include bar values in a barplot ====
=== Include bar values in a barplot ===
* https://stats.stackexchange.com/questions/3879/how-to-put-values-over-bars-in-barplot-in-r.
* https://stats.stackexchange.com/questions/3879/how-to-put-values-over-bars-in-barplot-in-r.
* [http://stackoverflow.com/questions/12481430/how-to-display-the-frequency-at-the-top-of-each-factor-in-a-barplot-in-r barplot(), text() and axis()] functions. The data can be from a table() object.
* [http://stackoverflow.com/questions/12481430/how-to-display-the-frequency-at-the-top-of-each-factor-in-a-barplot-in-r barplot(), text() and axis()] functions. The data can be from a table() object.
Line 3,334: Line 3,334:
For stacked barplot, see [http://t-redactyl.io/blog/2016/01/creating-plots-in-r-using-ggplot2-part-4-stacked-bar-plots.html this] post.
For stacked barplot, see [http://t-redactyl.io/blog/2016/01/creating-plots-in-r-using-ggplot2-part-4-stacked-bar-plots.html this] post.


==== Grouped barplots ====
=== Grouped barplots ===
* https://www.r-graph-gallery.com/barplot/, https://www.r-graph-gallery.com/48-grouped-barplot-with-ggplot2/ (simpliest, no error bars)<syntaxhighlight lang='rsplus'>
* https://www.r-graph-gallery.com/barplot/, https://www.r-graph-gallery.com/48-grouped-barplot-with-ggplot2/ (simpliest, no error bars)<syntaxhighlight lang='rsplus'>
library(ggplot2)
library(ggplot2)
Line 3,353: Line 3,353:
* [https://peltiertech.com/stacked-bar-chart-alternatives/ More alternatives] (not done by R)
* [https://peltiertech.com/stacked-bar-chart-alternatives/ More alternatives] (not done by R)


==== Math expression ====
=== Math expression ===
* [https://www.rdocumentation.org/packages/grDevices/versions/3.5.0/topics/plotmath ?plotmath]
* [https://www.rdocumentation.org/packages/grDevices/versions/3.5.0/topics/plotmath ?plotmath]
* https://stackoverflow.com/questions/4973898/combining-paste-and-expression-functions-in-plot-labels
* https://stackoverflow.com/questions/4973898/combining-paste-and-expression-functions-in-plot-labels
Line 3,391: Line 3,391:
</syntaxhighlight>
</syntaxhighlight>


==== Impose a line to a scatter plot ====
=== Impose a line to a scatter plot ===
* abline + lsfit # least squares
* abline + lsfit # least squares
: <syntaxhighlight lang='rsplus'>
: <syntaxhighlight lang='rsplus'>
Line 3,412: Line 3,412:
lines(stats::lowess(cars), col='red')
lines(stats::lowess(cars), col='red')
</syntaxhighlight>
</syntaxhighlight>
==== Rotating x axis labels for barplot ====
=== Rotating x axis labels for barplot ===
https://stackoverflow.com/questions/10286473/rotating-x-axis-labels-in-r-for-barplot
https://stackoverflow.com/questions/10286473/rotating-x-axis-labels-in-r-for-barplot
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
Line 3,418: Line 3,418:
</syntaxhighlight>
</syntaxhighlight>


==== Set R plots x axis to show at y=0 ====
=== Set R plots x axis to show at y=0 ===
https://stackoverflow.com/questions/3422203/set-r-plots-x-axis-to-show-at-y-0
https://stackoverflow.com/questions/3422203/set-r-plots-x-axis-to-show-at-y-0
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
Line 3,424: Line 3,424:
</syntaxhighlight>
</syntaxhighlight>


==== Different colors of axis labels in barplot ====
=== Different colors of axis labels in barplot ===
See [https://stackoverflow.com/questions/18839731/vary-colors-of-axis-labels-in-r-based-on-another-variable Vary colors of axis labels in R based on another variable]
See [https://stackoverflow.com/questions/18839731/vary-colors-of-axis-labels-in-r-based-on-another-variable Vary colors of axis labels in R based on another variable]


Line 3,441: Line 3,441:
</syntaxhighlight>
</syntaxhighlight>


==== Use [https://www.rdocumentation.org/packages/graphics/versions/3.4.3/topics/text text()] to draw labels on X/Y-axis including rotation ====
=== Use [https://www.rdocumentation.org/packages/graphics/versions/3.4.3/topics/text text()] to draw labels on X/Y-axis including rotation ===
* adj = 1 means top/rigth alignment. The default is to center the text.
* adj = 1 means top/rigth alignment. The default is to center the text.
* [https://www.rdocumentation.org/packages/graphics/versions/3.4.3/topics/par par("usr")] gives the extremes of the user coordinates of the plotting region of the form c(x1, x2, y1, y2).
* [https://www.rdocumentation.org/packages/graphics/versions/3.4.3/topics/par par("usr")] gives the extremes of the user coordinates of the plotting region of the form c(x1, x2, y1, y2).
Line 3,455: Line 3,455:
* https://www.r-bloggers.com/rotated-axis-labels-in-r-plots/
* https://www.r-bloggers.com/rotated-axis-labels-in-r-plots/


==== Vertically stacked plots with the same x axis ====
=== Vertically stacked plots with the same x axis ===
https://stackoverflow.com/questions/11794436/stacking-multiple-plots-vertically-with-the-same-x-axis-but-different-y-axes-in
https://stackoverflow.com/questions/11794436/stacking-multiple-plots-vertically-with-the-same-x-axis-but-different-y-axes-in


==== Increase/decrease legend font size ====
=== Increase/decrease legend font size ===
https://stackoverflow.com/a/36842578
https://stackoverflow.com/a/36842578
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
Line 3,467: Line 3,467:
</syntaxhighlight>
</syntaxhighlight>


==== Superimpose a density plot or any curves ====
=== Superimpose a density plot or any curves ===
Use '''lines()'''.  
Use '''lines()'''.  


Line 3,500: Line 3,500:
</syntaxhighlight>
</syntaxhighlight>


==== Custom scales ====
=== Custom scales ===
[https://rcrastinate.rbind.io/post/using-custom-scales-with-the-scales-package/ Using custom scales with the 'scales' package]
[https://rcrastinate.rbind.io/post/using-custom-scales-with-the-scales-package/ Using custom scales with the 'scales' package]


=== Time series ===
== Time series ==
* [https://www.amazon.com/Applied-Time-Analysis-R-Second/dp/1498734227 Applied Time Series Analysis with R]
* [https://www.amazon.com/Applied-Time-Analysis-R-Second/dp/1498734227 Applied Time Series Analysis with R]
* [http://www.springer.com/us/book/9780387759586 Time Series Analysis With Applications in R]
* [http://www.springer.com/us/book/9780387759586 Time Series Analysis With Applications in R]


==== Time series stock price plot ====
=== Time series stock price plot ===
* http://blog.revolutionanalytics.com/2015/08/plotting-time-series-in-r.html (ggplot2, xts, [https://rstudio.github.io/dygraphs/ dygraphs])
* http://blog.revolutionanalytics.com/2015/08/plotting-time-series-in-r.html (ggplot2, xts, [https://rstudio.github.io/dygraphs/ dygraphs])
* [https://datascienceplus.com/visualize-your-portfolios-performance-and-generate-a-nice-report-with-r/ Visualize your Portfolio’s Performance and Generate a Nice Report with R]
* [https://datascienceplus.com/visualize-your-portfolios-performance-and-generate-a-nice-report-with-r/ Visualize your Portfolio’s Performance and Generate a Nice Report with R]
Line 3,524: Line 3,524:
</syntaxhighlight>
</syntaxhighlight>


==== Timeline plot ====
=== Timeline plot ===
https://stackoverflow.com/questions/20695311/chronological-timeline-with-points-in-time-and-format-date
https://stackoverflow.com/questions/20695311/chronological-timeline-with-points-in-time-and-format-date


=== Circular plot ===
== Circular plot ==
* http://freakonometrics.hypotheses.org/20667 which uses https://cran.r-project.org/web/packages/circlize/ circlize] package.
* http://freakonometrics.hypotheses.org/20667 which uses https://cran.r-project.org/web/packages/circlize/ circlize] package.
* https://www.biostars.org/p/17728/
* https://www.biostars.org/p/17728/
Line 3,533: Line 3,533:
* [http://www.bioconductor.org/packages/release/bioc/html/OmicCircos.html OmicCircos] from Bioconductor.
* [http://www.bioconductor.org/packages/release/bioc/html/OmicCircos.html OmicCircos] from Bioconductor.


=== Word cloud ===
== Word cloud ==
* [http://www.sthda.com/english/wiki/text-mining-and-word-cloud-fundamentals-in-r-5-simple-steps-you-should-know Text mining and word cloud fundamentals in R : 5 simple steps you should know]
* [http://www.sthda.com/english/wiki/text-mining-and-word-cloud-fundamentals-in-r-5-simple-steps-you-should-know Text mining and word cloud fundamentals in R : 5 simple steps you should know]
* [https://www.displayr.com/alternatives-word-cloud/ 7 Alternatives to Word Clouds for Visualizing Long Lists of Data]
* [https://www.displayr.com/alternatives-word-cloud/ 7 Alternatives to Word Clouds for Visualizing Long Lists of Data]
* [https://www.littlemissdata.com/blog/steam-data-art1 Data + Art STEAM Project: Initial Results]
* [https://www.littlemissdata.com/blog/steam-data-art1 Data + Art STEAM Project: Initial Results]


=== World map ===
== World map ==
[https://www.enchufa2.es/archives/visualising-ssh-attacks-with-r.html Visualising SSH attacks with R] ([https://cran.r-project.org/package=rworldmap rworldmap] and [https://cran.r-project.org/package=rgeolocate rgeolocate] packages)
[https://www.enchufa2.es/archives/visualising-ssh-attacks-with-r.html Visualising SSH attacks with R] ([https://cran.r-project.org/package=rworldmap rworldmap] and [https://cran.r-project.org/package=rgeolocate rgeolocate] packages)


=== Diagram/flowchart/Directed acyclic diagrams (DAGs) ===
== Diagram/flowchart/Directed acyclic diagrams (DAGs) ==


==== [https://cran.r-project.org/web/packages/DiagrammeR/index.html DiagrammeR] ====
=== [https://cran.r-project.org/web/packages/DiagrammeR/index.html DiagrammeR] ===
* http://rich-iannone.github.io/DiagrammeR/
* http://rich-iannone.github.io/DiagrammeR/
* https://donlelek.github.io/2015-03-31-dags-with-r/
* https://donlelek.github.io/2015-03-31-dags-with-r/


==== [https://cran.r-project.org/web/packages/diagram/ diagram] ====
=== [https://cran.r-project.org/web/packages/diagram/ diagram] ===
Functions for Visualising Simple Graphs (Networks), Plotting Flow Diagrams
Functions for Visualising Simple Graphs (Networks), Plotting Flow Diagrams


==== DAGitty (browser-based and R package) ====
=== DAGitty (browser-based and R package) ===
* http://dagitty.net/
* http://dagitty.net/
* https://cran.r-project.org/web/packages/dagitty/index.html
* https://cran.r-project.org/web/packages/dagitty/index.html


==== dagR ====
=== dagR ===
* https://cran.r-project.org/web/packages/dagR
* https://cran.r-project.org/web/packages/dagR


=== Venn Diagram ===
== Venn Diagram ==
* limma http://www.ats.ucla.edu/stat/r/faq/venn.htm - only black and white?
* limma http://www.ats.ucla.edu/stat/r/faq/venn.htm - only black and white?
* VennDiagram - input has to be the numbers instead of the original vector?
* VennDiagram - input has to be the numbers instead of the original vector?
Line 3,591: Line 3,591:
[[File:Vennplot.png|250px]]
[[File:Vennplot.png|250px]]


=== Bump chart/Metro map ===
== Bump chart/Metro map ==
https://dominikkoch.github.io/Bump-Chart/
https://dominikkoch.github.io/Bump-Chart/


=== Amazing plots ===
== Amazing plots ==
==== New R logo 2/11/2016 ====
=== New R logo 2/11/2016 ===
* http://rud.is/b/2016/02/11/plot-the-new-svg-r-logo-with-ggplot2/
* http://rud.is/b/2016/02/11/plot-the-new-svg-r-logo-with-ggplot2/
* https://www.stat.auckland.ac.nz/~paul/Reports/Rlogo/Rlogo.html
* https://www.stat.auckland.ac.nz/~paul/Reports/Rlogo/Rlogo.html
Line 3,623: Line 3,623:
</syntaxhighlight>
</syntaxhighlight>


==== 3D plot ====
=== 3D plot ===
Using [https://chitchatr.wordpress.com/2010/06/28/fun-with-persp-function/ persp] function to create the following plot. Code in [https://gist.github.com/arraytools/ef955d017cb6b9ef0690a4fe79f809f9 github].
Using [https://chitchatr.wordpress.com/2010/06/28/fun-with-persp-function/ persp] function to create the following plot. Code in [https://gist.github.com/arraytools/ef955d017cb6b9ef0690a4fe79f809f9 github].


[[File:3dpersp.png|200px]]
[[File:3dpersp.png|200px]]


==== Christmas tree ====
=== Christmas tree ===
http://wiekvoet.blogspot.com/2014/12/merry-christmas.html. Code in [https://gist.github.com/arraytools/668404a33d32a6652d4dddf5d294689e github].
http://wiekvoet.blogspot.com/2014/12/merry-christmas.html. Code in [https://gist.github.com/arraytools/668404a33d32a6652d4dddf5d294689e github].


[[File:XMastree.png|150px]]
[[File:XMastree.png|150px]]


==== Happy Thanksgiving ====
=== Happy Thanksgiving ===
[http://blog.revolutionanalytics.com/2015/11/happy-thanksgiving.html Turkey]
[http://blog.revolutionanalytics.com/2015/11/happy-thanksgiving.html Turkey]


[[File:Turkey.png|150px]]
[[File:Turkey.png|150px]]


==== Happy Valentine's Day ====
=== Happy Valentine's Day ===
* [https://rud.is/b/2017/02/14/geom%E2%9D%A4%EF%B8%8F/  Geom❤️] 2017
* [https://rud.is/b/2017/02/14/geom%E2%9D%A4%EF%B8%8F/  Geom❤️] 2017
* [http://www.theanalyticslab.nl/2019/02/14/nerds-on-valentines-day/ Happy Valentines day by Nerds] 2019
* [http://www.theanalyticslab.nl/2019/02/14/nerds-on-valentines-day/ Happy Valentines day by Nerds] 2019


==== treemap ====
=== treemap ===
http://ipub.com/treemap/
http://ipub.com/treemap/


[[File:TreemapPop.png|150px]]
[[File:TreemapPop.png|150px]]


==== [https://en.wikipedia.org/wiki/Voronoi_diagram Voronoi diagram] ====
=== [https://en.wikipedia.org/wiki/Voronoi_diagram Voronoi diagram] ===
* https://www.stat.auckland.ac.nz/~paul/Reports/VoronoiTreemap/voronoiTreeMap.html
* https://www.stat.auckland.ac.nz/~paul/Reports/VoronoiTreemap/voronoiTreeMap.html
* http://letstalkdata.com/2014/05/creating-voronoi-diagrams-with-ggplot/
* http://letstalkdata.com/2014/05/creating-voronoi-diagrams-with-ggplot/


==== Silent Night ====
=== Silent Night ===
[[File:Silentnight.png|200px]]
[[File:Silentnight.png|200px]]


The code in [https://gist.github.com/arraytools/6d841de27bec48fa0b72559e9aeb13ad github].
The code in [https://gist.github.com/arraytools/6d841de27bec48fa0b72559e9aeb13ad github].


==== The Travelling Salesman Portrait ====
=== The Travelling Salesman Portrait ===
https://fronkonstin.com/2018/04/04/the-travelling-salesman-portrait/
https://fronkonstin.com/2018/04/04/the-travelling-salesman-portrait/


==== Moon phase calendar ====
=== Moon phase calendar ===
https://chichacha.netlify.com/2018/05/26/making-calendar-with-ggplot-moon-phase-calendar/
https://chichacha.netlify.com/2018/05/26/making-calendar-with-ggplot-moon-phase-calendar/


==== Calendar heatmap ====
=== Calendar heatmap ===
https://stackoverflow.com/questions/26171068/add-dates-to-calendar-heat-map-r
https://stackoverflow.com/questions/26171068/add-dates-to-calendar-heat-map-r


==== Chaos ====
=== Chaos ===
[https://fronkonstin.com/2019/01/10/rcpp-camaron-de-la-isla-and-the-beauty-of-maths/ Rcpp, Camarón de la Isla and the Beauty of Maths]
[https://fronkonstin.com/2019/01/10/rcpp-camaron-de-la-isla-and-the-beauty-of-maths/ Rcpp, Camarón de la Isla and the Beauty of Maths]


=== Google Analytics ===
== Google Analytics ==
==== GAR package ====
=== GAR package ===
http://www.analyticsforfun.com/2015/10/query-your-google-analytics-data-with.html
http://www.analyticsforfun.com/2015/10/query-your-google-analytics-data-with.html


=== Linear Programming ===
== Linear Programming ==
http://www.r-bloggers.com/modeling-and-solving-linear-programming-with-r-free-book/
http://www.r-bloggers.com/modeling-and-solving-linear-programming-with-r-free-book/


=== Amazon Alexa ===
== Amazon Alexa ==
* http://blagrants.blogspot.com/2016/02/theres-party-at-alexas-place.html
* http://blagrants.blogspot.com/2016/02/theres-party-at-alexas-place.html


=== R and Singularity ===
== R and Singularity ==
https://www.rstudio.com/rviews/2017/03/29/r-and-singularity/
https://www.rstudio.com/rviews/2017/03/29/r-and-singularity/


=== Teach kids about R with Minecraft ===
== Teach kids about R with Minecraft ==
http://blog.revolutionanalytics.com/2017/06/teach-kids-about-r-with-minecraft.html
http://blog.revolutionanalytics.com/2017/06/teach-kids-about-r-with-minecraft.html


=== Secure API keys ===
== Secure API keys ==
[http://blog.revolutionanalytics.com/2017/07/secret-package.html Securely store API keys in R scripts with the "secret" package]
[http://blog.revolutionanalytics.com/2017/07/secret-package.html Securely store API keys in R scripts with the "secret" package]


=== Vision and image recognition ===
== Vision and image recognition ==
* https://www.stoltzmaniac.com/google-vision-api-in-r-rooglevision/ Google vision API IN R] – RoogleVision
* https://www.stoltzmaniac.com/google-vision-api-in-r-rooglevision/ Google vision API IN R] – RoogleVision
* [http://www.bnosac.be/index.php/blog/66-computer-vision-algorithms-for-r-users Computer Vision Algorithms for R users] and https://github.com/bnosac/image
* [http://www.bnosac.be/index.php/blog/66-computer-vision-algorithms-for-r-users Computer Vision Algorithms for R users] and https://github.com/bnosac/image


=== Turn pictures into coloring pages ===
== Turn pictures into coloring pages ==
https://gist.github.com/jeroen/53a5f721cf81de2acba82ea47d0b19d0
https://gist.github.com/jeroen/53a5f721cf81de2acba82ea47d0b19d0


=== Numerical optimization ===
== Numerical optimization ==
* [http://stat.ethz.ch/R-manual/R-patched/library/stats/html/uniroot.html uniroot]: One Dimensional Root (Zero) Finding. This is used in [http://onlinelibrary.wiley.com/doi/10.1002/sim.7178/full simulating survival data for predefined censoring rate]
* [http://stat.ethz.ch/R-manual/R-patched/library/stats/html/uniroot.html uniroot]: One Dimensional Root (Zero) Finding. This is used in [http://onlinelibrary.wiley.com/doi/10.1002/sim.7178/full simulating survival data for predefined censoring rate]
* [http://stat.ethz.ch/R-manual/R-patched/library/stats/html/optimize.html optimize]: One Dimensional Optimization
* [http://stat.ethz.ch/R-manual/R-patched/library/stats/html/optimize.html optimize]: One Dimensional Optimization
Line 3,702: Line 3,702:
* [http://stat.ethz.ch/R-manual/R-patched/library/stats/html/nls.html nls]: Nonlinear Least Squares
* [http://stat.ethz.ch/R-manual/R-patched/library/stats/html/nls.html nls]: Nonlinear Least Squares


== R packages ==
= R packages =
[[R_packages|R packages]]
[[R_packages|R packages]]


== Tricks ==
= Tricks =


=== Getting help ===
== Getting help ==
* http://stackoverflow.com/questions/tagged/r and [https://stackoverflow.com/tags/r/info R page] contains resources.  
* http://stackoverflow.com/questions/tagged/r and [https://stackoverflow.com/tags/r/info R page] contains resources.  
* https://stat.ethz.ch/pipermail/r-help/
* https://stat.ethz.ch/pipermail/r-help/
* https://stat.ethz.ch/pipermail/r-devel/
* https://stat.ethz.ch/pipermail/r-devel/


=== Better Coder/coding, best practices ===
== Better Coder/coding, best practices ==
* http://www.mango-solutions.com/wp/2015/10/10-top-tips-for-becoming-a-better-coder/
* http://www.mango-solutions.com/wp/2015/10/10-top-tips-for-becoming-a-better-coder/
* [https://www.rstudio.com/rviews/2016/12/02/writing-good-r-code-and-writing-well/ Writing Good R Code and Writing Well]
* [https://www.rstudio.com/rviews/2016/12/02/writing-good-r-code-and-writing-well/ Writing Good R Code and Writing Well]
* [http://www.thertrader.com/2018/09/01/r-code-best-practices/ R Code – Best practices]
* [http://www.thertrader.com/2018/09/01/r-code-best-practices/ R Code – Best practices]


=== [https://en.wikipedia.org/wiki/Scientific_notation#E-notation E-notation] ===
== [https://en.wikipedia.org/wiki/Scientific_notation#E-notation E-notation] ==
6.022E23 (or 6.022e23) is equivalent to 6.022×10^23
6.022E23 (or 6.022e23) is equivalent to 6.022×10^23


=== Change default R repository ===
== Change default R repository ==
[[Rstudio#Change_repository|Change R repository]]
[[Rstudio#Change_repository|Change R repository]]


Line 3,740: Line 3,740:
</pre>
</pre>


=== Change the default web browser ===
== Change the default web browser ==
When I run help.start() function in LXLE, it cannot find its default web browser (seamonkey).
When I run help.start() function in LXLE, it cannot find its default web browser (seamonkey).
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
Line 3,779: Line 3,779:
* https://stat.ethz.ch/R-manual/R-devel/library/utils/html/browseURL.html
* https://stat.ethz.ch/R-manual/R-devel/library/utils/html/browseURL.html


=== Getting user's home directory ===
== Getting user's home directory ==
See [https://cran.r-project.org/bin/windows/base/rw-FAQ.html#What-are-HOME-and-working-directories_003f What are HOME and working directories?]
See [https://cran.r-project.org/bin/windows/base/rw-FAQ.html#What-are-HOME-and-working-directories_003f What are HOME and working directories?]
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
Line 3,798: Line 3,798:
</syntaxhighlight>
</syntaxhighlight>


=== Rprofile.site, Renviron.site (all platforms) and Rconsole (Windows only) ===
== Rprofile.site, Renviron.site (all platforms) and Rconsole (Windows only) ==
* https://cran.r-project.org/doc/manuals/r-release/R-admin.html ('''Rprofile.site'''). Put R statements.
* https://cran.r-project.org/doc/manuals/r-release/R-admin.html ('''Rprofile.site'''). Put R statements.
* https://cran.r-project.org/doc/manuals/r-release/R-exts.html  ('''Renviron.site'''). Define environment variables.
* https://cran.r-project.org/doc/manuals/r-release/R-exts.html  ('''Renviron.site'''). Define environment variables.
Line 3,962: Line 3,962:
</pre>
</pre>


==== What is the best place to save Rconsole on Windows platform ====
=== What is the best place to save Rconsole on Windows platform ===
Put/create the file <Rconsole> under ''C:/Users/USERNAME/Documents'' folder so no matter how R was upgraded/downgraded, it always find my preference.
Put/create the file <Rconsole> under ''C:/Users/USERNAME/Documents'' folder so no matter how R was upgraded/downgraded, it always find my preference.


Line 3,982: Line 3,982:
* editortext: black
* editortext: black


=== Saving and loading history automatically: .Rprofile & local() ===
== Saving and loading history automatically: .Rprofile & local() ==
* http://stat.ethz.ch/R-manual/R-patched/library/utils/html/savehistory.html
* http://stat.ethz.ch/R-manual/R-patched/library/utils/html/savehistory.html
* '''.Rprofile''' will automatically be loaded when R has started from that directory
* '''.Rprofile''' will automatically be loaded when R has started from that directory
Line 4,035: Line 4,035:
</pre>
</pre>


=== R release versions ===
== R release versions ==
[http://cran.r-project.org/web/packages/rversions/index.html rversions]: Query the main 'R' 'SVN' repository to find the released versions & dates.
[http://cran.r-project.org/web/packages/rversions/index.html rversions]: Query the main 'R' 'SVN' repository to find the released versions & dates.


=== Detect number of running R instances in Windows ===
== Detect number of running R instances in Windows ==
* http://stackoverflow.com/questions/15935931/detect-number-of-running-r-instances-in-windows-within-r
* http://stackoverflow.com/questions/15935931/detect-number-of-running-r-instances-in-windows-within-r
<pre>
<pre>
Line 4,047: Line 4,047:


Image Name                    PID Session Name        Session#    Mem Usage
Image Name                    PID Session Name        Session#    Mem Usage
========================= ======== ================ =========== ============
============================================================================
Rgui.exe                      1096 Console                    1    44,712 K
Rgui.exe                      1096 Console                    1    44,712 K


Line 4,053: Line 4,053:


Image Name                    PID Session Name        Session#    Mem Usage
Image Name                    PID Session Name        Session#    Mem Usage
========================= ======== ================ =========== ============
============================================================================
Rserve.exe                    6108 Console                    1    381,796 K
Rserve.exe                    6108 Console                    1    381,796 K
</pre>
</pre>
Line 4,061: Line 4,061:
[1] ""                                                                             
[1] ""                                                                             
[2] "Image Name                    PID Session Name        Session#    Mem Usage"
[2] "Image Name                    PID Session Name        Session#    Mem Usage"
[3] "========================= ======== ================ =========== ============"
[3] "============================================================================"
[4] "Rgui.exe                      1096 Console                    1    44,804 K"
[4] "Rgui.exe                      1096 Console                    1    44,804 K"


Line 4,067: Line 4,067:
</pre>
</pre>


=== Editor ===
== Editor ==
http://en.wikipedia.org/wiki/R_(programming_language)#Editors_and_IDEs
http://en.wikipedia.org/wiki/R_(programming_language)#Editors_and_IDEs


Line 4,076: Line 4,076:
* Komodo IDE with browser preview http://www.youtube.com/watch?v=wv89OOw9roI at 4:06 and http://docs.activestate.com/komodo/4.4/editor.html
* Komodo IDE with browser preview http://www.youtube.com/watch?v=wv89OOw9roI at 4:06 and http://docs.activestate.com/komodo/4.4/editor.html


=== GUI for Data Analysis ===
== GUI for Data Analysis ==


==== Rcmdr ====
=== Rcmdr ===
http://cran.r-project.org/web/packages/Rcmdr/index.html
http://cran.r-project.org/web/packages/Rcmdr/index.html


==== Deducer ====
=== Deducer ===
http://cran.r-project.org/web/packages/Deducer/index.html
http://cran.r-project.org/web/packages/Deducer/index.html


==== jamovi ====
=== jamovi ===
* https://www.jamovi.org/
* https://www.jamovi.org/
* [http://r4stats.com/2019/01/09/updated-review-jamovi/ Updated Review: jamovi User Interface to R]
* [http://r4stats.com/2019/01/09/updated-review-jamovi/ Updated Review: jamovi User Interface to R]


=== Scope ===
== Scope ==
See  
See  
* [http://cran.r-project.org/doc/manuals/R-intro.html#Assignment-within-functions Assignments within functions] in the '''An Introduction to R''' manual.
* [http://cran.r-project.org/doc/manuals/R-intro.html#Assignment-within-functions Assignments within functions] in the '''An Introduction to R''' manual.
Line 4,156: Line 4,156:
Example 3. [https://stackoverflow.com/questions/1169534/writing-functions-in-r-keeping-scoping-in-mind Writing functions in R, keeping scoping in mind]
Example 3. [https://stackoverflow.com/questions/1169534/writing-functions-in-r-keeping-scoping-in-mind Writing functions in R, keeping scoping in mind]


==== New environment ====
=== New environment ===
http://adv-r.had.co.nz/Environments.html
http://adv-r.had.co.nz/Environments.html


Line 4,166: Line 4,166:
</syntaxhighlight>
</syntaxhighlight>


=== Speedup R code ===
== Speedup R code ==
* [http://datascienceplus.com/strategies-to-speedup-r-code/ Strategies to speedup R code] from DataScience+
* [http://datascienceplus.com/strategies-to-speedup-r-code/ Strategies to speedup R code] from DataScience+


=== Profiler ===
== Profiler ==
(Video) [https://www.rstudio.com/resources/videos/understand-code-performance-with-the-profiler/ Understand Code Performance with the profiler]
(Video) [https://www.rstudio.com/resources/videos/understand-code-performance-with-the-profiler/ Understand Code Performance with the profiler]


=== && vs & ===
== && vs & ==
See https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/Logic.  
See https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/Logic.  


The shorter form performs elementwise comparisons in much the same way as arithmetic operators. The longer form evaluates left to right examining only the first element of each vector.  
The shorter form performs elementwise comparisons in much the same way as arithmetic operators. The longer form evaluates left to right examining only the first element of each vector.  


=== Vectorization ===
== Vectorization ==
* [https://en.wikipedia.org/wiki/Vectorization_%28mathematics%29 Vectorization (Mathematics)] from wikipedia
* [https://en.wikipedia.org/wiki/Vectorization_%28mathematics%29 Vectorization (Mathematics)] from wikipedia
* [https://en.wikipedia.org/wiki/Array_programming Array programming] from wikipedia
* [https://en.wikipedia.org/wiki/Array_programming Array programming] from wikipedia
Line 4,188: Line 4,188:
* [https://github.com/henrikbengtsson/matrixstats matrixStats]: Functions that Apply to Rows and Columns of Matrices (and to Vectors). E.g. col / rowMedians(), col / rowRanks(), and col / rowSds(). [https://github.com/HenrikBengtsson/matrixStats/wiki/Benchmark-reports Benchmark reports].
* [https://github.com/henrikbengtsson/matrixstats matrixStats]: Functions that Apply to Rows and Columns of Matrices (and to Vectors). E.g. col / rowMedians(), col / rowRanks(), and col / rowSds(). [https://github.com/HenrikBengtsson/matrixStats/wiki/Benchmark-reports Benchmark reports].


==== sapply vs vectorization ====
=== sapply vs vectorization ===
[http://theautomatic.net/2019/03/13/speed-test-sapply-vs-vectorization/ Speed test: sapply vs vectorization]
[http://theautomatic.net/2019/03/13/speed-test-sapply-vs-vectorization/ Speed test: sapply vs vectorization]


==== [https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/split split()] and sapply() ====
=== [https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/split split()] and sapply() ===
split() can be used to split a vector, columns or rows. See [https://stackoverflow.com/questions/3302356/how-to-split-a-data-frame How to split a data frame?]
split() can be used to split a vector, columns or rows. See [https://stackoverflow.com/questions/3302356/how-to-split-a-data-frame How to split a data frame?]
* Split rows of a data frame/matrix <syntaxhighlight lang='rsplus'>
* Split rows of a data frame/matrix <syntaxhighlight lang='rsplus'>
Line 4,223: Line 4,223:
</syntaxhighlight>
</syntaxhighlight>


==== Mean of duplicated columns: rowMeans ====
=== Mean of duplicated columns: rowMeans ===
* [https://stackoverflow.com/questions/35925529/reduce-columns-of-a-matrix-by-a-function-in-r Reduce columns of a matrix by a function in R]  <syntaxhighlight lang='rsplus'>
* [https://stackoverflow.com/questions/35925529/reduce-columns-of-a-matrix-by-a-function-in-r Reduce columns of a matrix by a function in R]  <syntaxhighlight lang='rsplus'>
x <- matrix(1:60, nr=10); x[1, 2:3] <- NA
x <- matrix(1:60, nr=10); x[1, 2:3] <- NA
Line 4,246: Line 4,246:
* [https://cran.r-project.org/web/packages/matrixStats/index.html matrixStats]: Functions that Apply to Rows and Columns of Matrices (and to Vectors)
* [https://cran.r-project.org/web/packages/matrixStats/index.html matrixStats]: Functions that Apply to Rows and Columns of Matrices (and to Vectors)


==== Mean of duplicated rows: colMeans and rowsum ====
=== Mean of duplicated rows: colMeans and rowsum ===
* colMeans(x, na.rm = FALSE, dims = 1)
* colMeans(x, na.rm = FALSE, dims = 1)
: <syntaxhighlight lang='rsplus'>
: <syntaxhighlight lang='rsplus'>
Line 4,319: Line 4,319:
</syntaxhighlight>
</syntaxhighlight>


=== Apply family ===
== Apply family ==
Vectorize, aggregate, apply, by, eapply, lapply, mapply, rapply, replicate, scale, sapply, split, tapply, and vapply.  
Vectorize, aggregate, apply, by, eapply, lapply, mapply, rapply, replicate, scale, sapply, split, tapply, and vapply.  


Line 4,352: Line 4,352:
* https://stat.ethz.ch/pipermail/r-help/2014-October/422455.html (has one example)
* https://stat.ethz.ch/pipermail/r-help/2014-October/422455.html (has one example)


==== Progress bar ====
=== Progress bar ===
[http://peter.solymos.org/code/2016/09/11/what-is-the-cost-of-a-progress-bar-in-r.html What is the cost of a progress bar in R?]
[http://peter.solymos.org/code/2016/09/11/what-is-the-cost-of-a-progress-bar-in-r.html What is the cost of a progress bar in R?]


The package 'pbapply' creates a text-mode progress bar - it works on any platforms. On Windows platform, check out [http://www.theanalystatlarge.com/for-loop-tracking-windows-progress-bar/ this post]. It uses  winProgressBar() and setWinProgressBar() functions.
The package 'pbapply' creates a text-mode progress bar - it works on any platforms. On Windows platform, check out [http://www.theanalystatlarge.com/for-loop-tracking-windows-progress-bar/ this post]. It uses  winProgressBar() and setWinProgressBar() functions.


==== lapply and its friends Map(), Reduce(), Filter() from the base package for manipulating lists ====
=== lapply and its friends Map(), Reduce(), Filter() from the base package for manipulating lists ===
* Examples of using lapply() + split() on a data frame. See [http://rollingyours.wordpress.com/category/r-programming-apply-lapply-tapply/ rollingyours.wordpress.com].
* Examples of using lapply() + split() on a data frame. See [http://rollingyours.wordpress.com/category/r-programming-apply-lapply-tapply/ rollingyours.wordpress.com].
* mapply() [https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/mapply documentation]. [https://stackoverflow.com/questions/9519543/merge-two-lists-in-r Use mapply() to merge lists].
* mapply() [https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/mapply documentation]. [https://stackoverflow.com/questions/9519543/merge-two-lists-in-r Use mapply() to merge lists].
Line 4,399: Line 4,399:
* [https://statcompute.wordpress.com/2018/09/08/playing-map-and-reduce-in-r-subsetting/ Playing Map() and Reduce() in R – Subsetting] - using parallel and future packages. [https://statcompute.wordpress.com/2018/09/22/union-multiple-data-frames-with-different-column-names/ Union Multiple Data.Frames with Different Column Names]
* [https://statcompute.wordpress.com/2018/09/08/playing-map-and-reduce-in-r-subsetting/ Playing Map() and Reduce() in R – Subsetting] - using parallel and future packages. [https://statcompute.wordpress.com/2018/09/22/union-multiple-data-frames-with-different-column-names/ Union Multiple Data.Frames with Different Column Names]


==== sapply & vapply ====
=== sapply & vapply ===
* [http://stackoverflow.com/questions/12339650/why-is-vapply-safer-than-sapply This] discusses why '''vapply''' is safer and faster than sapply.
* [http://stackoverflow.com/questions/12339650/why-is-vapply-safer-than-sapply This] discusses why '''vapply''' is safer and faster than sapply.
* [http://adv-r.had.co.nz/Functionals.html#functionals-loop Vector output: sapply and vapply] from Advanced R (Hadley Wickham).
* [http://adv-r.had.co.nz/Functionals.html#functionals-loop Vector output: sapply and vapply] from Advanced R (Hadley Wickham).
Line 4,407: Line 4,407:
See parallel::parSapply() for a parallel version of sapply(1:n, function(x)). We can this technique to speed up [https://github.com/SRTRdevhub/C_Statistic_Github/blob/master/Simulation_Demonstration.Rmd#L115 this example].
See parallel::parSapply() for a parallel version of sapply(1:n, function(x)). We can this technique to speed up [https://github.com/SRTRdevhub/C_Statistic_Github/blob/master/Simulation_Demonstration.Rmd#L115 this example].


==== rapply - recursive version of lapply ====
=== rapply - recursive version of lapply ===
* http://4dpiecharts.com/tag/recursive/
* http://4dpiecharts.com/tag/recursive/
* [https://github.com/wch/r-source/search?utf8=%E2%9C%93&q=rapply Search in R source code]. Mainly [https://github.com/wch/r-source/blob/trunk/src/library/stats/R/dendrogram.R r-source/src/library/stats/R/dendrogram.R].
* [https://github.com/wch/r-source/search?utf8=%E2%9C%93&q=rapply Search in R source code]. Mainly [https://github.com/wch/r-source/blob/trunk/src/library/stats/R/dendrogram.R r-source/src/library/stats/R/dendrogram.R].


==== replicate ====
=== replicate ===
https://www.datacamp.com/community/tutorials/tutorial-on-loops-in-r
https://www.datacamp.com/community/tutorials/tutorial-on-loops-in-r
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
Line 4,423: Line 4,423:
See [[#parallel_package|parSapply()]] for a parallel version of replicate().
See [[#parallel_package|parSapply()]] for a parallel version of replicate().


==== Vectorize ====
=== Vectorize ===
* [https://www.rdocumentation.org/packages/base/versions/3.5.3/topics/Vectorize Vectorize(FUN, vectorize.args = arg.names, SIMPLIFY = TRUE, USE.NAMES = TRUE)]: creates a function wrapper that vectorizes a scalar function. Its value is a list or vector or array. It calls '''mapply()'''. <syntaxhighlight lang='rsplus'>
* [https://www.rdocumentation.org/packages/base/versions/3.5.3/topics/Vectorize Vectorize(FUN, vectorize.args = arg.names, SIMPLIFY = TRUE, USE.NAMES = TRUE)]: creates a function wrapper that vectorizes a scalar function. Its value is a list or vector or array. It calls '''mapply()'''. <syntaxhighlight lang='rsplus'>
> rep(1:4, 4:1)
> rep(1:4, 4:1)
Line 4,463: Line 4,463:
</syntaxhighlight>
</syntaxhighlight>


=== plyr and dplyr packages ===
== plyr and dplyr packages ==
[https://peerj.com/collections/50-practicaldatascistats/ Practical Data Science for Stats - a PeerJ Collection]
[https://peerj.com/collections/50-practicaldatascistats/ Practical Data Science for Stats - a PeerJ Collection]


Line 4,485: Line 4,485:
* [http://rpubs.com/turnersd/plot-deseq-results-multipage-pdf DESeq2 post analysis] select(), gather(), arrange() and %>%.  
* [http://rpubs.com/turnersd/plot-deseq-results-multipage-pdf DESeq2 post analysis] select(), gather(), arrange() and %>%.  


==== [https://cran.r-project.org/web/packages/tibble/ tibble] ====
=== [https://cran.r-project.org/web/packages/tibble/ tibble] ===
'''Tibbles''' are data frames, but slightly tweaked to work better in the '''tidyverse'''.
'''Tibbles''' are data frames, but slightly tweaked to work better in the '''tidyverse'''.


Line 4,533: Line 4,533:
</syntaxhighlight>
</syntaxhighlight>


==== llply() ====
=== llply() ===
llply is equivalent to lapply except that it will preserve labels and can display a progress bar. This is handy if we want to do a crazy thing.
llply is equivalent to lapply except that it will preserve labels and can display a progress bar. This is handy if we want to do a crazy thing.
<pre>
<pre>
Line 4,544: Line 4,544:
returns a list of 49 GOs.
returns a list of 49 GOs.


==== ddply() ====
=== ddply() ===
http://lamages.blogspot.com/2012/06/transforming-subsets-of-data-in-r-with.html
http://lamages.blogspot.com/2012/06/transforming-subsets-of-data-in-r-with.html


==== ldply() ====
=== ldply() ===
[http://rpsychologist.com/an-r-script-to-automatically-look-at-pubmed-citation-counts-by-year-of-publication/ An R Script to Automatically download PubMed Citation Counts By Year of Publication]
[http://rpsychologist.com/an-r-script-to-automatically-look-at-pubmed-citation-counts-by-year-of-publication/ An R Script to Automatically download PubMed Citation Counts By Year of Publication]


=== Using R's set.seed() to set seeds for use in C/C++ (including Rcpp) ===
== Using R's set.seed() to set seeds for use in C/C++ (including Rcpp) ==
http://rorynolan.rbind.io/2018/09/30/rcsetseed/
http://rorynolan.rbind.io/2018/09/30/rcsetseed/


==== get_seed() ====
=== get_seed() ===
See the same blog  
See the same blog  
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
Line 4,561: Line 4,561:
</syntaxhighlight>
</syntaxhighlight>


=== How to select a seed for simulation or randomization ===
== How to select a seed for simulation or randomization ==
[https://sciprincess.wordpress.com/2019/03/14/how-to-select-a-seed-for-simulation-or-randomization/ How to select a seed for simulation or randomization]
[https://sciprincess.wordpress.com/2019/03/14/how-to-select-a-seed-for-simulation-or-randomization/ How to select a seed for simulation or randomization]


=== set.seed(), for loop and saving random seeds ===
== set.seed(), for loop and saving random seeds ==
http://r.789695.n4.nabble.com/set-seed-and-for-loop-td3585857.html. This question is legitimate when we want to debug on a certain iteration.
http://r.789695.n4.nabble.com/set-seed-and-for-loop-td3585857.html. This question is legitimate when we want to debug on a certain iteration.


Line 4,585: Line 4,585:
* Petr Savicky's comment is also useful in the situation when it is not difficult to re-generate the data.
* Petr Savicky's comment is also useful in the situation when it is not difficult to re-generate the data.


=== sample() inaccurate on very large populations, fixed in R 3.6.0 ===
== sample() inaccurate on very large populations, fixed in R 3.6.0 ==
* [https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17494 The default method for generating from a discrete uniform distribution (used in ‘sample()’, for instance) has been changed]. In prior versions, the probability of generating each integer could vary from equal by up to 0.04% (or possibly more if generating more than a million different integers). See also [https://www.r-bloggers.com/whats-new-in-r-3-6-0/amp/ What's new in R 3.6.0] by David Smith. <syntaxhighlight lang='rsplus'>
* [https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17494 The default method for generating from a discrete uniform distribution (used in ‘sample()’, for instance) has been changed]. In prior versions, the probability of generating each integer could vary from equal by up to 0.04% (or possibly more if generating more than a million different integers). See also [https://www.r-bloggers.com/whats-new-in-r-3-6-0/amp/ What's new in R 3.6.0] by David Smith. <syntaxhighlight lang='rsplus'>
# R 3.5.3
# R 3.5.3
Line 4,621: Line 4,621:
</syntaxhighlight>
</syntaxhighlight>


=== Regular Expression ===
== Regular Expression ==
See [[Regular_expression|here]].
See [[Regular_expression|here]].


=== Read rrd file ===
== Read rrd file ==
* https://en.wikipedia.org/wiki/RRDtool
* https://en.wikipedia.org/wiki/RRDtool
* http://oss.oetiker.ch/rrdtool/
* http://oss.oetiker.ch/rrdtool/
Line 4,630: Line 4,630:
* http://plamendimitrov.net/blog/2014/08/09/r-package-for-working-with-rrd-files/
* http://plamendimitrov.net/blog/2014/08/09/r-package-for-working-with-rrd-files/


=== file, connection ===
== file, connection ==
* [https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/cat cat()] and [https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/scan scan()] (read data into a vector or list from the console or file)
* [https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/cat cat()] and [https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/scan scan()] (read data into a vector or list from the console or file)
* read() and write()
* read() and write()
Line 4,645: Line 4,645:
</syntaxhighlight>
</syntaxhighlight>


=== Clipboard (?connections), textConnection(), pipe() ===
== Clipboard (?connections), textConnection(), pipe() ==
* On Windows, we can use readClipboard() and writeClipboard(). <syntaxhighlight lang='rsplus'>
* On Windows, we can use readClipboard() and writeClipboard(). <syntaxhighlight lang='rsplus'>
source("clipboard")
source("clipboard")
Line 4,669: Line 4,669:
</syntaxhighlight>
</syntaxhighlight>


=== read/manipulate binary data ===
== read/manipulate binary data ==
* x <- readBin(fn, raw(), file.info(fn)$size)
* x <- readBin(fn, raw(), file.info(fn)$size)
* rawToChar(x[1:16])
* rawToChar(x[1:16])
* See Biostrings C API
* See Biostrings C API


=== String Manipulation ===
== String Manipulation ==
* [http://gastonsanchez.com/blog/resources/how-to/2013/09/22/Handling-and-Processing-Strings-in-R.html ebook] by Gaston Sanchez.
* [http://gastonsanchez.com/blog/resources/how-to/2013/09/22/Handling-and-Processing-Strings-in-R.html ebook] by Gaston Sanchez.
* [http://blog.revolutionanalytics.com/2018/06/handling-strings-with-r.html A guide to working with character data in R] (6/22/2018)
* [http://blog.revolutionanalytics.com/2018/06/handling-strings-with-r.html A guide to working with character data in R] (6/22/2018)
Line 4,683: Line 4,683:
* [http://theautomatic.net/2019/05/17/four-ways-to-reverse-a-string-in-r/ Four ways to reverse a string in R]
* [http://theautomatic.net/2019/05/17/four-ways-to-reverse-a-string-in-r/ Four ways to reverse a string in R]


=== HTTPs connection ===  
== HTTPs connection ==  
HTTPS connection becomes default in R 3.2.2. See  
HTTPS connection becomes default in R 3.2.2. See  
* http://blog.rstudio.org/2015/08/17/secure-https-connections-for-r/  
* http://blog.rstudio.org/2015/08/17/secure-https-connections-for-r/  
Line 4,690: Line 4,690:
[http://developer.r-project.org/blosxom.cgi/R-devel/2016/12/15#n2016-12-15 R 3.3.2 patched] The internal methods of ‘download.file()’ and ‘url()’ now report if they are unable to follow the redirection of a ‘http://’ URL to a ‘https://’ URL (rather than failing silently)
[http://developer.r-project.org/blosxom.cgi/R-devel/2016/12/15#n2016-12-15 R 3.3.2 patched] The internal methods of ‘download.file()’ and ‘url()’ now report if they are unable to follow the redirection of a ‘http://’ URL to a ‘https://’ URL (rather than failing silently)


=== setInternet2 ===
== setInternet2 ==
There was a bug in ftp downloading in R 3.2.2 (r69053) Windows though it is fixed now in R 3.2 patch.
There was a bug in ftp downloading in R 3.2.2 (r69053) Windows though it is fixed now in R 3.2 patch.


Line 4,734: Line 4,734:
Note that setInternet2.Rd says As from \R 3.3.0 it changes nothing, and only \code{use = NA} is accepted. Also NEWS.Rd says setInternet2() has no effect and will be removed in due course.
Note that setInternet2.Rd says As from \R 3.3.0 it changes nothing, and only \code{use = NA} is accepted. Also NEWS.Rd says setInternet2() has no effect and will be removed in due course.


=== File operation ===
== File operation ==
* list.files()
* list.files()
* file.info()
* file.info()
Line 4,741: Line 4,741:
* file.copy()
* file.copy()


=== read/download/source a file from internet ===
== read/download/source a file from internet ==
==== Simple text file http ====
=== Simple text file http ===
<pre>
<pre>
retail <- read.csv("http://robjhyndman.com/data/ausretail.csv",header=FALSE)
retail <- read.csv("http://robjhyndman.com/data/ausretail.csv",header=FALSE)
</pre>
</pre>


==== Zip file and url() function ====
=== Zip file and url() function ===
<pre>
<pre>
con = gzcon(url('http://www.systematicportfolio.com/sit.gz', 'rb'))
con = gzcon(url('http://www.systematicportfolio.com/sit.gz', 'rb'))
Line 4,760: Line 4,760:
</pre>
</pre>


==== [http://cran.r-project.org/web/packages/downloader/index.html downloader] package ====
=== [http://cran.r-project.org/web/packages/downloader/index.html downloader] package ===
This package provides a wrapper for the download.file function, making it possible to download files over https on Windows, Mac OS X, and other Unix-like platforms. The RCurl package provides this functionality (and much more) but can be difficult to install because it must be compiled with external dependencies. This package has no external dependencies, so it is much easier to install.
This package provides a wrapper for the download.file function, making it possible to download files over https on Windows, Mac OS X, and other Unix-like platforms. The RCurl package provides this functionality (and much more) but can be difficult to install because it must be compiled with external dependencies. This package has no external dependencies, so it is much easier to install.


==== Google drive file based on https using [http://www.omegahat.org/RCurl/FAQ.html RCurl] package ====
=== Google drive file based on https using [http://www.omegahat.org/RCurl/FAQ.html RCurl] package ===
<pre>
<pre>
require(RCurl)
require(RCurl)
Line 4,770: Line 4,770:
</pre>
</pre>


==== Google sheet file using [https://github.com/jennybc/googlesheets googlesheets] package ====
=== Google sheet file using [https://github.com/jennybc/googlesheets googlesheets] package ===
[http://www.opiniomics.org/reading-data-from-google-sheets-into-r/ Reading data from google sheets into R]
[http://www.opiniomics.org/reading-data-from-google-sheets-into-r/ Reading data from google sheets into R]


==== Github files https using RCurl package ====
=== Github files https using RCurl package ===
* http://support.rstudio.org/help/kb/faq/configuring-r-to-use-an-http-proxy
* http://support.rstudio.org/help/kb/faq/configuring-r-to-use-an-http-proxy
* http://tonybreyal.wordpress.com/2011/11/24/source_https-sourcing-an-r-script-from-github/
* http://tonybreyal.wordpress.com/2011/11/24/source_https-sourcing-an-r-script-from-github/
Line 4,783: Line 4,783:
* [http://cran.r-project.org/web/packages/gistr/index.html gistr] package
* [http://cran.r-project.org/web/packages/gistr/index.html gistr] package


=== summarytools: create summary tables for vectors and data frames ===
== summarytools: create summary tables for vectors and data frames ==
https://github.com/dcomtois/summarytools. R Package for quickly and neatly summarizing vectors and data frames.
https://github.com/dcomtois/summarytools. R Package for quickly and neatly summarizing vectors and data frames.


=== Create publication tables using '''tables''' package ===
== Create publication tables using '''tables''' package ==
See p13 for example at [http://www.ianwatson.com.au/stata/tabout_tutorial.pdf#page=13 here]
See p13 for example at [http://www.ianwatson.com.au/stata/tabout_tutorial.pdf#page=13 here]


Line 4,830: Line 4,830:
</syntaxhighlight>
</syntaxhighlight>


=== ClinReport: Statistical Reporting in Clinical Trials ===
== ClinReport: Statistical Reporting in Clinical Trials ==
https://cran.r-project.org/web/packages/ClinReport/index.html
https://cran.r-project.org/web/packages/ClinReport/index.html


=== Append figures to PDF files ===
== Append figures to PDF files ==
[https://stackoverflow.com/a/13274272 How to append a plot to an existing pdf file]. Hint: use the recordPlot() function.
[https://stackoverflow.com/a/13274272 How to append a plot to an existing pdf file]. Hint: use the recordPlot() function.


=== Extracting tables from PDFs ===  
== Extracting tables from PDFs ==  
* [http://datascienceplus.com/extracting-tables-from-pdfs-in-r-using-the-tabulizer-package/ extracting Tables from PDFs in R] using Tabulizer. This needs the [https://cran.r-project.org/web/packages/rJava/index.html rJava] package. Linux works fine. Some issue came out on my macOS 10.12 Sierra. '''Library not loaded: /Library/Java/JavaVirtualMachines/jdk-9.jdk/Contents/Home/lib/server/libjvm.dylib. Referenced from: /Users/XXXXXXX/Library/R/3.5/library/rJava/libs/rJava.so'''.  
* [http://datascienceplus.com/extracting-tables-from-pdfs-in-r-using-the-tabulizer-package/ extracting Tables from PDFs in R] using Tabulizer. This needs the [https://cran.r-project.org/web/packages/rJava/index.html rJava] package. Linux works fine. Some issue came out on my macOS 10.12 Sierra. '''Library not loaded: /Library/Java/JavaVirtualMachines/jdk-9.jdk/Contents/Home/lib/server/libjvm.dylib. Referenced from: /Users/XXXXXXX/Library/R/3.5/library/rJava/libs/rJava.so'''.  
* [https://cran.r-project.org/web/packages/pdftools/ pdftools] - Text Extraction, Rendering and Converting of PDF Documents. [https://ropensci.org/technotes/2018/12/14/pdftools-20/ pdf_text() and pdf_data()] functions. <syntaxhighlight lang='rsplus'>
* [https://cran.r-project.org/web/packages/pdftools/ pdftools] - Text Extraction, Rendering and Converting of PDF Documents. [https://ropensci.org/technotes/2018/12/14/pdftools-20/ pdf_text() and pdf_data()] functions. <syntaxhighlight lang='rsplus'>
Line 4,853: Line 4,853:
</syntaxhighlight>
</syntaxhighlight>


=== Create flat tables in R console using ftable() ===
== Create flat tables in R console using ftable() ==
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
> ftable(Titanic, row.vars = 1:3)
> ftable(Titanic, row.vars = 1:3)
Line 4,945: Line 4,945:
</syntaxhighlight>
</syntaxhighlight>


==== [https://www.rdocumentation.org/packages/stats/versions/3.5.1/topics/addmargins addmargins] ====
=== [https://www.rdocumentation.org/packages/stats/versions/3.5.1/topics/addmargins addmargins] ===
Puts Arbitrary Margins On Multidimensional Tables Or Arrays
Puts Arbitrary Margins On Multidimensional Tables Or Arrays


=== tracemem, data type, copy ===
== tracemem, data type, copy ==
[http://stackoverflow.com/questions/18359940/r-programming-vector-a1-2-avoid-copying-the-whole-vector/18361181#18361181 How to avoid copying a long vector]
[http://stackoverflow.com/questions/18359940/r-programming-vector-a1-2-avoid-copying-the-whole-vector/18361181#18361181 How to avoid copying a long vector]


=== Tell if the current R is running in 32-bit or 64-bit mode ===
== Tell if the current R is running in 32-bit or 64-bit mode ==
<pre>
<pre>
8 * .Machine$sizeof.pointer
8 * .Machine$sizeof.pointer
Line 4,957: Line 4,957:
where '''sizeof.pointer''' returns the number of *bytes* in a C SEXP type and '8' means number of bits per byte.
where '''sizeof.pointer''' returns the number of *bytes* in a C SEXP type and '8' means number of bits per byte.


=== 32- and 64-bit ===
== 32- and 64-bit ==
See [http://cran.r-project.org/doc/manuals/R-admin.html#Choosing-between-32_002d-and-64_002dbit-builds R-admin.html].
See [http://cran.r-project.org/doc/manuals/R-admin.html#Choosing-between-32_002d-and-64_002dbit-builds R-admin.html].
* For speed you may want to use a 32-bit build, but to handle large datasets a 64-bit build.
* For speed you may want to use a 32-bit build, but to handle large datasets a 64-bit build.
Line 4,963: Line 4,963:
* Since R 2.15.0, it is possible to select '64-bit Files' from the standard installer even on a 32-bit version of Windows (2012/3/30).
* Since R 2.15.0, it is possible to select '64-bit Files' from the standard installer even on a 32-bit version of Windows (2012/3/30).


=== Handling length 2^31 and more in R 3.0.0 ===
== Handling length 2^31 and more in R 3.0.0 ==


From R News for 3.0.0 release:
From R News for 3.0.0 release:
Line 4,997: Line 4,997:
# For large dataset, it makes sense to use database or specially crafted packages like [http://cran.r-project.org/web/packages/bigmemory/ bigmemory] or [http://cran.r-project.org/web/packages/ff/ ff] or [https://privefl.github.io/bigstatsr/ bigstatsr].
# For large dataset, it makes sense to use database or specially crafted packages like [http://cran.r-project.org/web/packages/bigmemory/ bigmemory] or [http://cran.r-project.org/web/packages/ff/ ff] or [https://privefl.github.io/bigstatsr/ bigstatsr].


=== NA in index ===
== NA in index ==
* Question: what is seq(1, 3)[c(1, 2, NA)]?
* Question: what is seq(1, 3)[c(1, 2, NA)]?


Line 5,024: Line 5,024:
Don't just use x != "" OR !is.na(x).
Don't just use x != "" OR !is.na(x).


=== Constant and 'L' ===
== Constant and 'L' ==
Add 'L' after a constant. For example,
Add 'L' after a constant. For example,
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
Line 5,036: Line 5,036:
</syntaxhighlight>
</syntaxhighlight>


=== Factor ===
== Factor ==
We can specify the factor levels and new labels using the factor() function.
We can specify the factor levels and new labels using the factor() function.


Line 5,045: Line 5,045:
</syntaxhighlight>
</syntaxhighlight>


=== Data frame ===
== Data frame ==
* http://blog.datacamp.com/15-easy-solutions-data-frame-problems-r/
* http://blog.datacamp.com/15-easy-solutions-data-frame-problems-r/


==== stringsAsFactors = FALSE ====
=== stringsAsFactors = FALSE ===
http://www.win-vector.com/blog/2018/03/r-tip-use-stringsasfactors-false/
http://www.win-vector.com/blog/2018/03/r-tip-use-stringsasfactors-false/


==== Convert data frame factor columns to characters ====
=== Convert data frame factor columns to characters ===
[https://stackoverflow.com/questions/2851015/convert-data-frame-columns-from-factors-to-characters Convert data.frame columns from factors to characters]
[https://stackoverflow.com/questions/2851015/convert-data-frame-columns-from-factors-to-characters Convert data.frame columns from factors to characters]
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
Line 5,061: Line 5,061:
</syntaxhighlight>
</syntaxhighlight>


==== data.frame to vector ====
=== data.frame to vector ===
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
> a= matrix(1:6, 2,3)
> a= matrix(1:6, 2,3)
Line 5,075: Line 5,075:
</syntaxhighlight>
</syntaxhighlight>


==== merge ====
=== merge ===
[https://jozefhajnala.gitlab.io/r/r006-merge/ How to perform merges (joins) on two or more data frames with base R, tidyverse and data.table]
[https://jozefhajnala.gitlab.io/r/r006-merge/ How to perform merges (joins) on two or more data frames with base R, tidyverse and data.table]


==== matrix vs data.frame ====
=== matrix vs data.frame ===
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
ip1 <- installed.packages()[,c(1,3:4)] # class(ip1) = 'matrix'
ip1 <- installed.packages()[,c(1,3:4)] # class(ip1) = 'matrix'
Line 5,108: Line 5,108:
So the length of a data frame is the number of columns. When we use sapply() function on a data frame, it will apply to each column of the data frame.
So the length of a data frame is the number of columns. When we use sapply() function on a data frame, it will apply to each column of the data frame.


==== Convert a matrix (not data frame) of characters to numeric ====
=== Convert a matrix (not data frame) of characters to numeric ===
[https://stackoverflow.com/a/20791975 Just change the mode of the object]
[https://stackoverflow.com/a/20791975 Just change the mode of the object]
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
Line 5,126: Line 5,126:
</syntaxhighlight>
</syntaxhighlight>


==== Select Data Frame Columns in R ====
=== Select Data Frame Columns in R ===
This is part of series of DATA MANIPULATION IN R from [https://www.datanovia.com/en/lessons/select-data-frame-columns-in-r/ datanovia.com]
This is part of series of DATA MANIPULATION IN R from [https://www.datanovia.com/en/lessons/select-data-frame-columns-in-r/ datanovia.com]


Line 5,134: Line 5,134:
* Helper functions - starts_with(), ends_with(), contains(), matches(), one_of(): Select columns/variables based on their names
* Helper functions - starts_with(), ends_with(), contains(), matches(), one_of(): Select columns/variables based on their names


==== Creating data frame using structure() function ====
=== Creating data frame using structure() function ===
[https://tomaztsql.wordpress.com/2019/05/27/creating-data-frame-using-structure-function-in-r/ Creating data frame using structure() function in R]
[https://tomaztsql.wordpress.com/2019/05/27/creating-data-frame-using-structure-function-in-r/ Creating data frame using structure() function in R]


==== Warning: row names were found from a short variable and have been discarded ====
=== Warning: row names were found from a short variable and have been discarded ===
https://stackoverflow.com/a/23534617
https://stackoverflow.com/a/23534617


Line 5,158: Line 5,158:
</syntaxhighlight>
</syntaxhighlight>


=== matrix (column-major order) multiply a vector ===
== matrix (column-major order) multiply a vector ==
* [https://en.wikipedia.org/wiki/Row-_and_column-major_order#Programming_languages_and_libraries R (like Fortran) is following the column-major order]
* [https://en.wikipedia.org/wiki/Row-_and_column-major_order#Programming_languages_and_libraries R (like Fortran) is following the column-major order]


Line 5,179: Line 5,179:
</syntaxhighlight>
</syntaxhighlight>


=== Print a vector by suppressing names ===
== Print a vector by suppressing names ==
Use '''unname'''.
Use '''unname'''.


=== format.pval ===
== format.pval ==
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
> args(format.pval)
> args(format.pval)
Line 5,194: Line 5,194:
</syntaxhighlight>
</syntaxhighlight>


=== options(digits) ===
== options(digits) ==
* [https://stackoverflow.com/a/2288013 Controlling number of decimal digits in print output in R]
* [https://stackoverflow.com/a/2288013 Controlling number of decimal digits in print output in R]
* [https://stackoverflow.com/a/10712012 ?print.default]
* [https://stackoverflow.com/a/10712012 ?print.default]
Line 5,217: Line 5,217:
</syntaxhighlight>
</syntaxhighlight>


=== [https://stackoverflow.com/questions/5352099/how-to-disable-scientific-notation Disable scientific notation in printing]: options(scipen) ===
== [https://stackoverflow.com/questions/5352099/how-to-disable-scientific-notation Disable scientific notation in printing]: options(scipen) ==
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
> numer = 29707; denom = 93874
> numer = 29707; denom = 93874
Line 5,237: Line 5,237:
</syntaxhighlight>
</syntaxhighlight>


=== sprintf ===
== sprintf ==
==== Format number as fixed width, with leading zeros ====
=== Format number as fixed width, with leading zeros ===
* https://stackoverflow.com/questions/8266915/format-number-as-fixed-width-with-leading-zeros
* https://stackoverflow.com/questions/8266915/format-number-as-fixed-width-with-leading-zeros
* https://stackoverflow.com/questions/14409084/pad-with-leading-zeros-to-common-width?rq=1
* https://stackoverflow.com/questions/14409084/pad-with-leading-zeros-to-common-width?rq=1
Line 5,253: Line 5,253:
</syntaxhighlight>
</syntaxhighlight>


==== sprintf does not print ====
=== sprintf does not print ===
Use cat() or print() outside sprintf(). sprintf() do not print in a non interactive mode.
Use cat() or print() outside sprintf(). sprintf() do not print in a non interactive mode.
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
Line 5,259: Line 5,259:
</syntaxhighlight>
</syntaxhighlight>


=== Creating publication quality graphs in R ===
== Creating publication quality graphs in R ==
* http://teachpress.environmentalinformatics-marburg.de/2013/07/creating-publication-quality-graphs-in-r-7/
* http://teachpress.environmentalinformatics-marburg.de/2013/07/creating-publication-quality-graphs-in-r-7/


=== HDF5 : Hierarchical Data Format===
== HDF5 : Hierarchical Data Format==
HDF5 is an open binary file format for storing and managing large, complex datasets. The file format was developed by the HDF Group, and is widely used in scientific computing.
HDF5 is an open binary file format for storing and managing large, complex datasets. The file format was developed by the HDF Group, and is widely used in scientific computing.


Line 5,320: Line 5,320:
</syntaxhighlight>
</syntaxhighlight>


=== Formats for writing/saving and sharing data ===
== Formats for writing/saving and sharing data ==
[http://www.econometricsbysimulation.com/2016/12/efficiently-saving-and-sharing-data-in-r_46.html Efficiently Saving and Sharing Data in R]
[http://www.econometricsbysimulation.com/2016/12/efficiently-saving-and-sharing-data-in-r_46.html Efficiently Saving and Sharing Data in R]


=== Write unix format files on Windows and vice versa ===
== Write unix format files on Windows and vice versa ==
https://stat.ethz.ch/pipermail/r-devel/2012-April/063931.html
https://stat.ethz.ch/pipermail/r-devel/2012-April/063931.html


=== with() and within() functions ===
== with() and within() functions ==
within() is similar to with() except it is used to create new columns and merge them with the original data sets. See [http://www.youtube.com/watch?v=pZ6Bnxg9E8w&list=PLOU2XLYxmsIK9qQfztXeybpHvru-TrqAP youtube video].
within() is similar to with() except it is used to create new columns and merge them with the original data sets. See [http://www.youtube.com/watch?v=pZ6Bnxg9E8w&list=PLOU2XLYxmsIK9qQfztXeybpHvru-TrqAP youtube video].
<pre>
<pre>
Line 5,346: Line 5,346:
</pre>
</pre>


=== stem(): stem-and-leaf plot, bar chart on terminals ===
== stem(): stem-and-leaf plot, bar chart on terminals ==
* https://en.wikipedia.org/wiki/Stem-and-leaf_display
* https://en.wikipedia.org/wiki/Stem-and-leaf_display
* https://stackoverflow.com/questions/14736556/ascii-plotting-functions-for-r
* https://stackoverflow.com/questions/14736556/ascii-plotting-functions-for-r
* [https://cran.r-project.org/web/packages/txtplot/index.html txtplot] package
* [https://cran.r-project.org/web/packages/txtplot/index.html txtplot] package


=== Graphical Parameters, Axes and Text, Combining Plots ===
== Graphical Parameters, Axes and Text, Combining Plots ==
[http://www.statmethods.net/advgraphs/axes.html statmethods.net]
[http://www.statmethods.net/advgraphs/axes.html statmethods.net]


=== 15 Questions All R Users Have About Plots ===
== 15 Questions All R Users Have About Plots ==
See http://blog.datacamp.com/15-questions-about-r-plots/. This is a tremendous post. It covers the built-in plot() function and ggplot() from ggplot2 package.
See http://blog.datacamp.com/15-questions-about-r-plots/. This is a tremendous post. It covers the built-in plot() function and ggplot() from ggplot2 package.


Line 5,373: Line 5,373:
# What Is The Function Of hjust And vjust In ggplot2?
# What Is The Function Of hjust And vjust In ggplot2?


=== jitter function ===
== jitter function ==
* https://www.rdocumentation.org/packages/base/versions/3.5.2/topics/jitter
* https://www.rdocumentation.org/packages/base/versions/3.5.2/topics/jitter
* [https://statistical-programming.com/jitter-r-function-example/ The jitter R Function] 3 Example Codes (Basic Application & Boxplot Visualization)
* [https://statistical-programming.com/jitter-r-function-example/ The jitter R Function] 3 Example Codes (Basic Application & Boxplot Visualization)
Line 5,379: Line 5,379:
* [https://stats.stackexchange.com/a/146174 How to calculate Area Under the Curve (AUC), or the c-statistic, by hand]
* [https://stats.stackexchange.com/a/146174 How to calculate Area Under the Curve (AUC), or the c-statistic, by hand]


=== Scatterplot with the "rug" function ===
== Scatterplot with the "rug" function ==
<pre>
<pre>
require(stats)  # both 'density' and its default method
require(stats)  # both 'density' and its default method
Line 5,392: Line 5,392:
See also the [https://stat.ethz.ch/R-manual/R-devel/library/graphics/html/stripchart.html stripchart()] function which produces one dimensional scatter plots (or dot plots) of the given data.
See also the [https://stat.ethz.ch/R-manual/R-devel/library/graphics/html/stripchart.html stripchart()] function which produces one dimensional scatter plots (or dot plots) of the given data.


=== Identify/Locate Points in a Scatter Plot ===
== Identify/Locate Points in a Scatter Plot ==
[https://www.rdocumentation.org/packages/graphics/versions/3.5.1/topics/identify ?identify]
[https://www.rdocumentation.org/packages/graphics/versions/3.5.1/topics/identify ?identify]


=== Draw a single plot with two different y-axes ===
== Draw a single plot with two different y-axes ==
* http://www.gettinggeneticsdone.com/2015/04/r-single-plot-with-two-different-y-axes.html
* http://www.gettinggeneticsdone.com/2015/04/r-single-plot-with-two-different-y-axes.html


=== Draw Color Palette ===
== Draw Color Palette ==
* http://teachpress.environmentalinformatics-marburg.de/2013/07/creating-publication-quality-graphs-in-r-7/
* http://teachpress.environmentalinformatics-marburg.de/2013/07/creating-publication-quality-graphs-in-r-7/


=== SVG ===
== SVG ==
==== Embed svg in html ====
=== Embed svg in html ===
* http://www.magesblog.com/2016/02/using-svg-graphics-in-blog-posts.html
* http://www.magesblog.com/2016/02/using-svg-graphics-in-blog-posts.html


==== svglite ====
=== svglite ===
https://blog.rstudio.org/2016/11/14/svglite-1-2-0/
https://blog.rstudio.org/2016/11/14/svglite-1-2-0/


==== pdf -> svg ====
=== pdf -> svg ===
Using Inkscape. See [https://robertgrantstats.wordpress.com/2017/09/07/svg-from-stats-software-the-good-the-bad-and-the-ugly/ this post].
Using Inkscape. See [https://robertgrantstats.wordpress.com/2017/09/07/svg-from-stats-software-the-good-the-bad-and-the-ugly/ this post].


=== read.table ===
== read.table ==
==== clipboard ====
=== clipboard ===
<syntaxhighlight lang="rsplus">
<syntaxhighlight lang="rsplus">
source("clipboard")
source("clipboard")
Line 5,418: Line 5,418:
</syntaxhighlight>
</syntaxhighlight>


==== inline text ====
=== inline text ===
<syntaxhighlight lang="rsplus">
<syntaxhighlight lang="rsplus">
mydf <- read.table(header=T, text='
mydf <- read.table(header=T, text='
Line 5,428: Line 5,428:
</syntaxhighlight>
</syntaxhighlight>


==== http(s) connection ====
=== http(s) connection ===
<syntaxhighlight lang="rsplus">
<syntaxhighlight lang="rsplus">
temp = getURL("https://gist.github.com/arraytools/6743826/raw/23c8b0bc4b8f0d1bfe1c2fad985ca2e091aeb916/ip.txt",  
temp = getURL("https://gist.github.com/arraytools/6743826/raw/23c8b0bc4b8f0d1bfe1c2fad985ca2e091aeb916/ip.txt",  
Line 5,435: Line 5,435:
</syntaxhighlight>
</syntaxhighlight>


==== read only specific columns ====
=== read only specific columns ===
Use 'colClasses' option in read.table, read.delim, .... For example, the following example reads only the 3rd column of the text file and also changes its data type from a data frame to a vector. Note that we have include double quotes around NULL.
Use 'colClasses' option in read.table, read.delim, .... For example, the following example reads only the 3rd column of the text file and also changes its data type from a data frame to a vector. Note that we have include double quotes around NULL.
<syntaxhighlight lang="rsplus">
<syntaxhighlight lang="rsplus">
Line 5,453: Line 5,453:
Another method is to use '''pipe()''', '''cut''' or '''awk'''. See [https://stackoverflow.com/questions/2193742/ways-to-read-only-select-columns-from-a-file-into-r-a-happy-medium-between-re ways to read only selected columns from a file into R]
Another method is to use '''pipe()''', '''cut''' or '''awk'''. See [https://stackoverflow.com/questions/2193742/ways-to-read-only-select-columns-from-a-file-into-r-a-happy-medium-between-re ways to read only selected columns from a file into R]


=== Serialization ===
== Serialization ==
If we want to pass an R object to C (use recv() function), we can use writeBin() to output the stream size and then use serialize() function to output the stream to a file. See the  
If we want to pass an R object to C (use recv() function), we can use writeBin() to output the stream size and then use serialize() function to output the stream to a file. See the  
[https://stat.ethz.ch/pipermail/r-devel/attachments/20130628/56473803/attachment.pl post] on R mailing list.
[https://stat.ethz.ch/pipermail/r-devel/attachments/20130628/56473803/attachment.pl post] on R mailing list.
Line 5,468: Line 5,468:
then read <length> bytes from the connection.
then read <length> bytes from the connection.


=== socketConnection ===
== socketConnection ==
See ?socketconnection.  
See ?socketconnection.  


==== Simple example ====
=== Simple example ===
from the socketConnection's manual.
from the socketConnection's manual.


Line 5,494: Line 5,494:
</pre>
</pre>


==== Use nc in client ====
=== Use nc in client ===


The client does not have to be the R. We can use telnet, nc, etc. See the post [https://stat.ethz.ch/pipermail/r-sig-hpc/2009-April/000144.html here]. For example, on the client machine, we can issue
The client does not have to be the R. We can use telnet, nc, etc. See the post [https://stat.ethz.ch/pipermail/r-sig-hpc/2009-April/000144.html here]. For example, on the client machine, we can issue
Line 5,510: Line 5,510:
Some post I don't have a chance to read. http://digitheadslabnotebook.blogspot.com/2010/09/how-to-send-http-put-request-from-r.html
Some post I don't have a chance to read. http://digitheadslabnotebook.blogspot.com/2010/09/how-to-send-http-put-request-from-r.html


==== Use curl command in client ====
=== Use curl command in client ===
On the server,
On the server,
<pre>
<pre>
Line 5,528: Line 5,528:
</pre>
</pre>


==== Use telnet command in client ====
=== Use telnet command in client ===
On the server,
On the server,
<pre>
<pre>
Line 5,553: Line 5,553:
Some [http://blog.gahooa.com/2009/01/23/basics-of-telnet-and-http/ tutorial] about using telnet on http request. And [http://unixhelp.ed.ac.uk/tables/telnet_commands.html this] is a summary of using telnet.
Some [http://blog.gahooa.com/2009/01/23/basics-of-telnet-and-http/ tutorial] about using telnet on http request. And [http://unixhelp.ed.ac.uk/tables/telnet_commands.html this] is a summary of using telnet.


=== Subsetting ===
== Subsetting ==
[http://lib.stat.cmu.edu/R/CRAN/doc/manuals/R-lang.html#Subset-assignment Subset assignment of R Language Definition] and [http://lib.stat.cmu.edu/R/CRAN/doc/manuals/R-lang.html#Manipulation-of-functions Manipulation of functions].
[http://lib.stat.cmu.edu/R/CRAN/doc/manuals/R-lang.html#Subset-assignment Subset assignment of R Language Definition] and [http://lib.stat.cmu.edu/R/CRAN/doc/manuals/R-lang.html#Manipulation-of-functions Manipulation of functions].


Line 5,563: Line 5,563:
</pre>
</pre>


==== Avoid Coercing Indices To Doubles ====
=== Avoid Coercing Indices To Doubles ===
[https://www.jottr.org/2018/04/02/coercion-of-indices/ 1 or 1L]
[https://www.jottr.org/2018/04/02/coercion-of-indices/ 1 or 1L]


=== as.formula() ===
== as.formula() ==
* [https://stackoverflow.com/questions/5251507/how-to-succinctly-write-a-formula-with-many-variables-from-a-data-frame How to succinctly write a formula with many variables from a data frame?]
* [https://stackoverflow.com/questions/5251507/how-to-succinctly-write-a-formula-with-many-variables-from-a-data-frame How to succinctly write a formula with many variables from a data frame?]
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
Line 5,624: Line 5,624:
</syntaxhighlight>
</syntaxhighlight>


=== S3 and S4 methods ===
== S3 and S4 methods ==
* How S4 works in R https://www.rdocumentation.org/packages/methods/versions/3.5.1/topics/Methods_Details
* How S4 works in R https://www.rdocumentation.org/packages/methods/versions/3.5.1/topics/Methods_Details
* Software for Data Analysis: Programming with R by John Chambers
* Software for Data Analysis: Programming with R by John Chambers
Line 5,688: Line 5,688:
</syntaxhighlight>
</syntaxhighlight>


==== See what methods work on an object ====
=== See what methods work on an object ===
see what methods work on an object, e.g. a GRanges object:  
see what methods work on an object, e.g. a GRanges object:  
<syntaxhighlight lang='rsplus'>methods(class="GRanges")</syntaxhighlight> Or if you have an object, x: <syntaxhighlight lang='rsplus'>methods(class=class(x))</syntaxhighlight>  
<syntaxhighlight lang='rsplus'>methods(class="GRanges")</syntaxhighlight> Or if you have an object, x: <syntaxhighlight lang='rsplus'>methods(class=class(x))</syntaxhighlight>  


==== View S3 function definition: double colon '::' and triple colon ':::' operators ====
=== View S3 function definition: double colon '::' and triple colon ':::' operators ===
?":::"
?":::"


Line 5,703: Line 5,703:
</syntaxhighlight>
</syntaxhighlight>


==== Read the source code (include Fortran/C, S3 and S4 methods) ====
=== Read the source code (include Fortran/C, S3 and S4 methods) ===
* [https://github.com/jimhester/lookup#readme lookup] package
* [https://github.com/jimhester/lookup#readme lookup] package
* [https://blog.r-hub.io/2019/05/14/read-the-source/ Read the source]
* [https://blog.r-hub.io/2019/05/14/read-the-source/ Read the source]


==== mcols() and DataFrame() from Bioc [http://bioconductor.org/packages/release/bioc/html/S4Vectors.html S4Vectors] package ====
=== mcols() and DataFrame() from Bioc [http://bioconductor.org/packages/release/bioc/html/S4Vectors.html S4Vectors] package ===
* mcols: Get or set the metadata columns.
* mcols: Get or set the metadata columns.
* colData: SummarizedExperiment instances from GenomicRanges
* colData: SummarizedExperiment instances from GenomicRanges
Line 5,734: Line 5,734:
</syntaxhighlight>
</syntaxhighlight>


=== findInterval() ===
== findInterval() ==
Related functions are cuts() and split(). See also
Related functions are cuts() and split(). See also
* [http://books.google.com/books?id=oKY5QeSWb4cC&pg=PT310&lpg=PT310&dq=r+findinterval3&source=bl&ots=YjNMkHrTMw&sig=y_wIA1um420xVCI5IoGivABge-s&hl=en&sa=X&ei=gm_yUrSqLKXesAS2_IGoBQ&ved=0CFIQ6AEwBTgo#v=onepage&q=r%20findinterval3&f=false R Graphs Cookbook]
* [http://books.google.com/books?id=oKY5QeSWb4cC&pg=PT310&lpg=PT310&dq=r+findinterval3&source=bl&ots=YjNMkHrTMw&sig=y_wIA1um420xVCI5IoGivABge-s&hl=en&sa=X&ei=gm_yUrSqLKXesAS2_IGoBQ&ved=0CFIQ6AEwBTgo#v=onepage&q=r%20findinterval3&f=false R Graphs Cookbook]
* [http://adv-r.had.co.nz/Rcpp.html Hadley Wickham]
* [http://adv-r.had.co.nz/Rcpp.html Hadley Wickham]


=== order(), rank() and sort() ===
== order(), rank() and sort() ==
If we want to find the indices of the first 25 genes with the smallest p-values, we can use '''order(pval)[1:25]'''.
If we want to find the indices of the first 25 genes with the smallest p-values, we can use '''order(pval)[1:25]'''.
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
Line 5,758: Line 5,758:
</syntaxhighlight>
</syntaxhighlight>


=== do.call, rbind, lapply ===
== do.call, rbind, lapply ==
Lots of examples. See for example [https://stat.ethz.ch/pipermail/r-help/attachments/20140423/62d8d103/attachment.pl this one] for creating a data frame from a vector.
Lots of examples. See for example [https://stat.ethz.ch/pipermail/r-help/attachments/20140423/62d8d103/attachment.pl this one] for creating a data frame from a vector.
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
Line 5,811: Line 5,811:
</syntaxhighlight>
</syntaxhighlight>


=== How to get examples from help file ===
== How to get examples from help file ==
See [https://stat.ethz.ch/pipermail/r-help/2014-April/369342.html this post].
See [https://stat.ethz.ch/pipermail/r-help/2014-April/369342.html this post].
Method 1:
Method 1:
Line 5,823: Line 5,823:
</pre>
</pre>


=== "[" and "[[" with the sapply() function ===
== "[" and "[[" with the sapply() function ==
Suppose we want to extract string from the id like "ABC-123-XYZ" before the first hyphen.
Suppose we want to extract string from the id like "ABC-123-XYZ" before the first hyphen.
<pre>
<pre>
Line 5,833: Line 5,833:
</pre>
</pre>


=== Dealing with date ===
== Dealing with date ==
<pre>
<pre>
d1 = date()
d1 = date()
Line 5,858: Line 5,858:
* We want our dates and times as class "Date" or the class "POSIXct", "POSIXlt". For more information type ?POSIXlt.
* We want our dates and times as class "Date" or the class "POSIXct", "POSIXlt". For more information type ?POSIXlt.


=== [http://adv-r.had.co.nz/Computing-on-the-language.html Nonstandard evaluation] and deparse/substitute ===
== [http://adv-r.had.co.nz/Computing-on-the-language.html Nonstandard evaluation] and deparse/substitute ==
* [https://cran.r-project.org/web/packages/lazyeval/vignettes/lazyeval.html Vignette] from the [https://cran.r-project.org/web/packages/lazyeval/index.html lazyeval] package. It is needed in three cases
* [https://cran.r-project.org/web/packages/lazyeval/vignettes/lazyeval.html Vignette] from the [https://cran.r-project.org/web/packages/lazyeval/index.html lazyeval] package. It is needed in three cases
** Labelling: turn an argument into a label
** Labelling: turn an argument into a label
Line 5,912: Line 5,912:
</syntaxhighlight>
</syntaxhighlight>


=== The ‘…’ argument ===
== The ‘…’ argument ==
See [http://cran.r-project.org/doc/manuals/R-intro.html#The-three-dots-argument Section 10.4 of An Introduction to R]. Especially, the expression '''list(...)''' evaluates all such arguments and returns them in a named list
See [http://cran.r-project.org/doc/manuals/R-intro.html#The-three-dots-argument Section 10.4 of An Introduction to R]. Especially, the expression '''list(...)''' evaluates all such arguments and returns them in a named list


=== Lazy evaluation in R functions arguments ===
== Lazy evaluation in R functions arguments ==
* http://adv-r.had.co.nz/Functions.html
* http://adv-r.had.co.nz/Functions.html
* https://stat.ethz.ch/pipermail/r-devel/2015-February/070688.html
* https://stat.ethz.ch/pipermail/r-devel/2015-February/070688.html
Line 5,967: Line 5,967:
</pre>
</pre>


=== Backtick sign, infix/prefix/postfix operators ===  
== Backtick sign, infix/prefix/postfix operators ==  
The backtick sign ` (not the single quote) refers to functions or variables that have otherwise reserved or illegal names; e.g. '&&', '+', '(', 'for', 'if', etc. See some examples in [http://adv-r.had.co.nz/Functions.html this note].
The backtick sign ` (not the single quote) refers to functions or variables that have otherwise reserved or illegal names; e.g. '&&', '+', '(', 'for', 'if', etc. See some examples in [http://adv-r.had.co.nz/Functions.html this note].


Line 5,977: Line 5,977:
</pre>
</pre>


=== List data type ===
== List data type ==
==== [http://adv-r.had.co.nz/Functions.html Calling a function given a list of arguments] ====
=== [http://adv-r.had.co.nz/Functions.html Calling a function given a list of arguments] ===
<pre>
<pre>
> args <- list(c(1:10, NA, NA), na.rm = TRUE)
> args <- list(c(1:10, NA, NA), na.rm = TRUE)
Line 5,987: Line 5,987:
</pre>
</pre>


=== Error handling and exceptions, tryCatch(), stop(), warning() and message() ===
== Error handling and exceptions, tryCatch(), stop(), warning() and message() ==
* http://adv-r.had.co.nz/Exceptions-Debugging.html
* http://adv-r.had.co.nz/Exceptions-Debugging.html
* try() allows execution to continue even after an error has occurred. You can suppress the message with try(..., silent = TRUE).
* try() allows execution to continue even after an error has occurred. You can suppress the message with try(..., silent = TRUE).
Line 6,028: Line 6,028:
</pre>
</pre>


=== Using list type ===
== Using list type ==
==== Avoid if-else or switch ====
=== Avoid if-else or switch ===
?plot.stepfun.
?plot.stepfun.
<pre>
<pre>
Line 6,054: Line 6,054:
[[File:StepfunExample.svg|400px]]
[[File:StepfunExample.svg|400px]]


=== Open a new Window device ===
== Open a new Window device ==
X11() or dev.new()
X11() or dev.new()


=== par() ===
== par() ==
?par  
?par  


==== text size and font on main, lab & axis ====
=== text size and font on main, lab & axis ===
* [https://www.statmethods.net/advgraphs/parameters.html Graphical Parameters] from statmethods.net.
* [https://www.statmethods.net/advgraphs/parameters.html Graphical Parameters] from statmethods.net.
* [https://designdatadecisions.wordpress.com/2015/06/09/graphs-in-r-overlaying-data-summaries-in-dotplots/ Overlaying Data Summaries in Dotplots]
* [https://designdatadecisions.wordpress.com/2015/06/09/graphs-in-r-overlaying-data-summaries-in-dotplots/ Overlaying Data Summaries in Dotplots]
Line 6,072: Line 6,072:
* col.axis="grey50"
* col.axis="grey50"


==== layout ====
=== layout ===
http://datascienceplus.com/adding-text-to-r-plot/
http://datascienceplus.com/adding-text-to-r-plot/


==== reset the settings ====
=== reset the settings ===
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
op <- par(mfrow=c(2,1), mar = c(5,7,4,2) + 0.1)  
op <- par(mfrow=c(2,1), mar = c(5,7,4,2) + 0.1)  
Line 6,082: Line 6,082:
</syntaxhighlight>
</syntaxhighlight>


==== mtext (margin text) vs title ====
=== mtext (margin text) vs title ===
* https://datascienceplus.com/adding-text-to-r-plot/
* https://datascienceplus.com/adding-text-to-r-plot/
* https://datascienceplus.com/mastering-r-plot-part-2-axis/
* https://datascienceplus.com/mastering-r-plot-part-2-axis/


==== mgp (axis label locations) ====
=== mgp (axis label locations) ===
# The margin line (in ‘mex’ units) for the axis title, axis labels and axis line.  Note that ‘mgp[1]’ affects ‘title’ whereas ‘mgp[2:3]’ affect ‘axis’.  The default is ‘c(3, 1, 0)’. If we like to make the axis labels closer to an axis, we can use mgp=c(2.3, 1, 0) for example.
# The margin line (in ‘mex’ units) for the axis title, axis labels and axis line.  Note that ‘mgp[1]’ affects ‘title’ whereas ‘mgp[2:3]’ affect ‘axis’.  The default is ‘c(3, 1, 0)’. If we like to make the axis labels closer to an axis, we can use mgp=c(2.3, 1, 0) for example.
# http://rfunction.com/archives/1302 mgp – A numeric vector of length 3, which sets the axis label locations relative to the edge of the inner plot window. The first value represents the location the labels (i.e. xlab and ylab in plot), the second the tick-mark labels, and third the tick marks. The default is c(3, 1, 0).
# http://rfunction.com/archives/1302 mgp – A numeric vector of length 3, which sets the axis label locations relative to the edge of the inner plot window. The first value represents the location the labels (i.e. xlab and ylab in plot), the second the tick-mark labels, and third the tick marks. The default is c(3, 1, 0).


==== pch ====
=== pch ===
[[File:R pch.png|250px]]
[[File:R pch.png|250px]]


Line 6,097: Line 6,097:
* Full circle: pch=16
* Full circle: pch=16


==== lty (line type) ====
=== lty (line type) ===
[[File:R lty.png|250px]]
[[File:R lty.png|250px]]


([http://www.sthda.com/english/wiki/line-types-in-r-lty figure source])
([http://www.sthda.com/english/wiki/line-types-in-r-lty figure source])


==== las (label style) ====
=== las (label style) ===
0: The default, parallel to the axis
0: The default, parallel to the axis


Line 6,111: Line 6,111:
3: Always vertical
3: Always vertical


==== oma (outer margin), common title for two plots ====
=== oma (outer margin), common title for two plots ===
The following trick is useful when we want to draw multiple plots with a common title.
The following trick is useful when we want to draw multiple plots with a common title.


Line 6,123: Line 6,123:
[https://datascienceplus.com/mastering-r-plot-part-3-outer-margins/ Mastering R plot – Part 3: Outer margins] '''mtext()''' & '''par(xpd)'''.
[https://datascienceplus.com/mastering-r-plot-part-3-outer-margins/ Mastering R plot – Part 3: Outer margins] '''mtext()''' & '''par(xpd)'''.


=== Non-standard fonts in postscript and pdf graphics ===
== Non-standard fonts in postscript and pdf graphics ==
https://cran.r-project.org/doc/Rnews/Rnews_2006-2.pdf#page=41
https://cran.r-project.org/doc/Rnews/Rnews_2006-2.pdf#page=41


=== Suppress warnings ===
== Suppress warnings ==
Use [https://www.rdocumentation.org/packages/base/versions/3.4.1/topics/options options()]. If ''warn'' is negative all warnings are ignored. If ''warn'' is zero (the default) warnings are stored until the top--level function returns.  
Use [https://www.rdocumentation.org/packages/base/versions/3.4.1/topics/options options()]. If ''warn'' is negative all warnings are ignored. If ''warn'' is zero (the default) warnings are stored until the top--level function returns.  
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
Line 6,141: Line 6,141:
</syntaxhighlight>
</syntaxhighlight>


=== NULL, NA, NaN, Inf ===
== NULL, NA, NaN, Inf ==
https://tomaztsql.wordpress.com/2018/07/04/r-null-values-null-na-nan-inf/
https://tomaztsql.wordpress.com/2018/07/04/r-null-values-null-na-nan-inf/


=== save() vs saveRDS() ===
== save() vs saveRDS() ==
# saveRDS() can only save one R object while save() does not have this constraint.
# saveRDS() can only save one R object while save() does not have this constraint.
# saveRDS() doesn’t save the both the object and its name it just saves a representation of the object. As a result, the saved object can be loaded into a named object within R that is different from the name it had when originally serialized. See [http://www.fromthebottomoftheheap.net/2012/04/01/saving-and-loading-r-objects/ this post].
# saveRDS() doesn’t save the both the object and its name it just saves a representation of the object. As a result, the saved object can be loaded into a named object within R that is different from the name it had when originally serialized. See [http://www.fromthebottomoftheheap.net/2012/04/01/saving-and-loading-r-objects/ this post].
Line 6,154: Line 6,154:
</pre>
</pre>


=== [https://www.rdocumentation.org/packages/base/versions/3.5.0/topics/all.equal ==, all.equal(), identical()] ===
== [https://www.rdocumentation.org/packages/base/versions/3.5.0/topics/all.equal ==, all.equal(), identical()] ==
* ==: exact match
* ==: exact match
* all.equal: compare R objects x and y testing ‘near equality’
* all.equal: compare R objects x and y testing ‘near equality’
Line 6,168: Line 6,168:
See also the [http://cran.r-project.org/web/packages/testthat/index.html testhat] package.
See also the [http://cran.r-project.org/web/packages/testthat/index.html testhat] package.


=== testhat ===
== testhat ==
* https://github.com/r-lib/testthat
* https://github.com/r-lib/testthat
* [http://www.win-vector.com/blog/2019/03/unit-tests-in-r/ Unit Tests in R]
* [http://www.win-vector.com/blog/2019/03/unit-tests-in-r/ Unit Tests in R]


=== tinytest ===
== tinytest ==
[https://cran.r-project.org/web/packages/tinytest/index.html tinytest]: Lightweight but Feature Complete Unit Testing Framework
[https://cran.r-project.org/web/packages/tinytest/index.html tinytest]: Lightweight but Feature Complete Unit Testing Framework


=== Numerical Pitfall ===
== Numerical Pitfall ==
[http://bayesfactor.blogspot.com/2016/05/numerical-pitfalls-in-computing-variance.html Numerical pitfalls in computing variance]
[http://bayesfactor.blogspot.com/2016/05/numerical-pitfalls-in-computing-variance.html Numerical pitfalls in computing variance]
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
Line 6,182: Line 6,182:
</syntaxhighlight>
</syntaxhighlight>


=== Sys.getpid() ===
== Sys.getpid() ==
This can be used to monitor R process memory usage or stop the R process. See [https://stat.ethz.ch/pipermail/r-devel/2016-November/073360.html this post].
This can be used to monitor R process memory usage or stop the R process. See [https://stat.ethz.ch/pipermail/r-devel/2016-November/073360.html this post].


=== How to debug an R code ===
== How to debug an R code ==
==== Using assign() in functions ====
=== Using assign() in functions ===
For example, insert the following line to your function
For example, insert the following line to your function
<pre>
<pre>
Line 6,192: Line 6,192:
</pre>
</pre>


=== Debug lapply()/sapply() ===
== Debug lapply()/sapply() ==
* https://stackoverflow.com/questions/1395622/debugging-lapply-sapply-calls
* https://stackoverflow.com/questions/1395622/debugging-lapply-sapply-calls
* https://stat.ethz.ch/R-manual/R-devel/library/utils/html/recover.html. Use options(error=NULL) to turn it off.
* https://stat.ethz.ch/R-manual/R-devel/library/utils/html/recover.html. Use options(error=NULL) to turn it off.


=== Debugging with RStudio ===
== Debugging with RStudio ==
* https://www.rstudio.com/resources/videos/debugging-techniques-in-rstudio/
* https://www.rstudio.com/resources/videos/debugging-techniques-in-rstudio/
* https://github.com/ajmcoqui/debuggingRStudio/blob/master/RStudio_Debugging_Cheatsheet.pdf
* https://github.com/ajmcoqui/debuggingRStudio/blob/master/RStudio_Debugging_Cheatsheet.pdf
* https://support.rstudio.com/hc/en-us/articles/205612627-Debugging-with-RStudio
* https://support.rstudio.com/hc/en-us/articles/205612627-Debugging-with-RStudio


=== Debug R source code ===
== Debug R source code ==
==== Build R with debug information ====
=== Build R with debug information ===
* [[R#Build_R_from_its_source|R -> Build R from its source on Windows]]
* [[R#Build_R_from_its_source|R -> Build R from its source on Windows]]
* http://www.stats.uwo.ca/faculty/murdoch/software/debuggingR/ (defunct)
* http://www.stats.uwo.ca/faculty/murdoch/software/debuggingR/ (defunct)
Line 6,277: Line 6,277:
* Same idea for the Rcpp situation. See [https://stackoverflow.com/questions/21226337/what-are-productive-ways-to-debug-rcpp-compiled-code-loaded-in-r-on-os-x-maveri What are productive ways to debug Rcpp compiled code loaded in R (on OS X Mavericks)?]
* Same idea for the Rcpp situation. See [https://stackoverflow.com/questions/21226337/what-are-productive-ways-to-debug-rcpp-compiled-code-loaded-in-r-on-os-x-maveri What are productive ways to debug Rcpp compiled code loaded in R (on OS X Mavericks)?]


==== .Call ====
=== .Call ===
* [https://cran.rstudio.com/doc/manuals/r-release/R-exts.html#Calling-_002eCall Writing R Extensions] manual.
* [https://cran.rstudio.com/doc/manuals/r-release/R-exts.html#Calling-_002eCall Writing R Extensions] manual.
* [http://adv-r.had.co.nz/C-interface.html R’s C interface] from Advanced R by Hadley Wickham
* [http://adv-r.had.co.nz/C-interface.html R’s C interface] from Advanced R by Hadley Wickham


==== Registering native routines ====
=== Registering native routines ===
https://cran.rstudio.com/doc/manuals/r-release/R-exts.html#Registering-native-routines
https://cran.rstudio.com/doc/manuals/r-release/R-exts.html#Registering-native-routines


Pay attention to the prefix argument '''.fixes''' (eg .fixes = "C_") in '''useDynLib()''' function in the NAMESPACE file.
Pay attention to the prefix argument '''.fixes''' (eg .fixes = "C_") in '''useDynLib()''' function in the NAMESPACE file.


==== Example of debugging cor() function ====
=== Example of debugging cor() function ===
Note that R's cor() function called a C function cor().
Note that R's cor() function called a C function cor().
<pre>
<pre>
Line 6,296: Line 6,296:
A step-by-step screenshot of debugging using the GNU debugger '''gdb''' can be found on my Github repository https://github.com/arraytools/r-debug.
A step-by-step screenshot of debugging using the GNU debugger '''gdb''' can be found on my Github repository https://github.com/arraytools/r-debug.


=== Locale bug (grep did not handle UTF-8 properly PR#16264) ===
== Locale bug (grep did not handle UTF-8 properly PR#16264) ==
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16264
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16264


=== Path length in dir.create() (PR#17206) ===
== Path length in dir.create() (PR#17206) ==
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17206 (Windows only)
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17206 (Windows only)


=== install.package() error, R_LIBS_USER is empty in R 3.4.1 & .libPaths() ===
== install.package() error, R_LIBS_USER is empty in R 3.4.1 & .libPaths() ==
* https://support.rstudio.com/hc/en-us/community/posts/115008369408-Since-update-to-R-3-4-1-R-LIBS-USER-is-empty and http://r.789695.n4.nabble.com/R-LIBS-USER-on-Ubuntu-16-04-td4740935.html. Modify '''/etc/R/Renviron''' (if you have a sudo right) by uncomment out line 43.
* https://support.rstudio.com/hc/en-us/community/posts/115008369408-Since-update-to-R-3-4-1-R-LIBS-USER-is-empty and http://r.789695.n4.nabble.com/R-LIBS-USER-on-Ubuntu-16-04-td4740935.html. Modify '''/etc/R/Renviron''' (if you have a sudo right) by uncomment out line 43.
<pre>
<pre>
Line 6,359: Line 6,359:
A better solution is to specify R_LIBS_USER in '''~/.Renviron''' file or '''~/.bash_profile'''; see [http://stat.ethz.ch/R-manual/R-patched/library/base/html/Startup.html ?Startup].
A better solution is to specify R_LIBS_USER in '''~/.Renviron''' file or '''~/.bash_profile'''; see [http://stat.ethz.ch/R-manual/R-patched/library/base/html/Startup.html ?Startup].


=== Using external data from within another package ===
== Using external data from within another package ==
https://logfc.wordpress.com/2017/03/02/using-external-data-from-within-another-package/
https://logfc.wordpress.com/2017/03/02/using-external-data-from-within-another-package/


=== How to run R scripts from the command line ===
== How to run R scripts from the command line ==
https://support.rstudio.com/hc/en-us/articles/218012917-How-to-run-R-scripts-from-the-command-line
https://support.rstudio.com/hc/en-us/articles/218012917-How-to-run-R-scripts-from-the-command-line


=== How to exit a sourced R script ===
== How to exit a sourced R script ==
* [http://stackoverflow.com/questions/25313406/how-to-exit-a-sourced-r-script How to exit a sourced R script]
* [http://stackoverflow.com/questions/25313406/how-to-exit-a-sourced-r-script How to exit a sourced R script]
* [http://r.789695.n4.nabble.com/Problem-using-the-source-function-within-R-functions-td907180.html Problem using the source-function within R-functions] ''' ''The best way to handle the generic sort of problem you are describing is to take those source'd files, and rewrite their content as functions to be called from your other functions.'' '''
* [http://r.789695.n4.nabble.com/Problem-using-the-source-function-within-R-functions-td907180.html Problem using the source-function within R-functions] ''' ''The best way to handle the generic sort of problem you are describing is to take those source'd files, and rewrite their content as functions to be called from your other functions.'' '''


=== Decimal point & decimal comma ===
== Decimal point & decimal comma ==
Countries using Arabic numerals with decimal comma (Austria, Belgium, Brazil France, Germany, Netherlands, Norway, South Africa, Spain, Sweden, ...) https://en.wikipedia.org/wiki/Decimal_mark
Countries using Arabic numerals with decimal comma (Austria, Belgium, Brazil France, Germany, Netherlands, Norway, South Africa, Spain, Sweden, ...) https://en.wikipedia.org/wiki/Decimal_mark


=== setting seed locally (not globally) in R ===
== setting seed locally (not globally) in R ==
https://stackoverflow.com/questions/14324096/setting-seed-locally-not-globally-in-r
https://stackoverflow.com/questions/14324096/setting-seed-locally-not-globally-in-r


=== R's internal C API ===
== R's internal C API ==
https://github.com/hadley/r-internals
https://github.com/hadley/r-internals


=== cleancall package for C resource cleanup ===
== cleancall package for C resource cleanup ==
[https://www.tidyverse.org/articles/2019/05/resource-cleanup-in-c-and-the-r-api/ Resource Cleanup in C and the R API]
[https://www.tidyverse.org/articles/2019/05/resource-cleanup-in-c-and-the-r-api/ Resource Cleanup in C and the R API]


=== Random numbers: multivariate normal ===
== Random numbers: multivariate normal ==
Why [https://www.rdocumentation.org/packages/MASS/versions/7.3-49/topics/mvrnorm MASS::mvrnorm()] gives different result on Mac and Linux/Windows?
Why [https://www.rdocumentation.org/packages/MASS/versions/7.3-49/topics/mvrnorm MASS::mvrnorm()] gives different result on Mac and Linux/Windows?


Line 6,486: Line 6,486:
</syntaxhighlight>
</syntaxhighlight>


=== rle() running length encoding ===
== rle() running length encoding ==
* https://en.wikipedia.org/wiki/Run-length_encoding
* https://en.wikipedia.org/wiki/Run-length_encoding
* [https://masterr.org/r/how-to-find-consecutive-repeats-in-r/ How to Find Consecutive Repeats in R]
* [https://masterr.org/r/how-to-find-consecutive-repeats-in-r/ How to Find Consecutive Repeats in R]
Line 6,495: Line 6,495:
* R's [https://www.rdocumentation.org/packages/S4Vectors/versions/0.10.2/topics/Rle-class Rle class] from S4Vectors package which was used in for example [http://genomicsclass.github.io/book/pages/iranges_granges.html IRanges/GRanges/GenomicRanges] package
* R's [https://www.rdocumentation.org/packages/S4Vectors/versions/0.10.2/topics/Rle-class Rle class] from S4Vectors package which was used in for example [http://genomicsclass.github.io/book/pages/iranges_granges.html IRanges/GRanges/GenomicRanges] package


=== citation() ===
== citation() ==
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
citation()
citation()
Line 6,502: Line 6,502:
</syntaxhighlight>
</syntaxhighlight>


=== Monitor memory usage ===
== Monitor memory usage ==
* Windows: memory.size(max=TRUE)
* Windows: memory.size(max=TRUE)
* Linux  
* Linux  
Line 6,514: Line 6,514:
* [https://serverfault.com/a/264856 How can I see the memory usage of a Linux process?] ''pmap $PID | tail -n 1'' is recommended. It shows the memory usage in absolute value (eg 1722376K).
* [https://serverfault.com/a/264856 How can I see the memory usage of a Linux process?] ''pmap $PID | tail -n 1'' is recommended. It shows the memory usage in absolute value (eg 1722376K).


== Resource ==
= Resource =
=== Books ===
== Books ==
* A list of recommended books http://blog.revolutionanalytics.com/2015/11/r-recommended-reading.html
* A list of recommended books http://blog.revolutionanalytics.com/2015/11/r-recommended-reading.html
* [http://statisticalestimation.blogspot.com/2016/11/learning-r-programming-by-reading-books.html Learning R programming by reading books: A book list]
* [http://statisticalestimation.blogspot.com/2016/11/learning-r-programming-by-reading-books.html Learning R programming by reading books: A book list]
Line 6,551: Line 6,551:
* [https://learningstatisticswithr.com/book/ Learning statistics with R: A tutorial for psychology students and other beginners] by Danielle Navarro
* [https://learningstatisticswithr.com/book/ Learning statistics with R: A tutorial for psychology students and other beginners] by Danielle Navarro


=== Webinar ===
== Webinar ==
* [https://www.rstudio.com/resources/webinars/ RStudio] & its [https://github.com/rstudio/webinars github] repository
* [https://www.rstudio.com/resources/webinars/ RStudio] & its [https://github.com/rstudio/webinars github] repository


=== useR! ===
== useR! ==
* http://blog.revolutionanalytics.com/2017/07/revisiting-user2017.html
* http://blog.revolutionanalytics.com/2017/07/revisiting-user2017.html


=== Blogs, Tips, Socials, Communities ===
== Blogs, Tips, Socials, Communities ==
* Google: revolutionanalytics In case you missed it
* Google: revolutionanalytics In case you missed it
* [http://r4stats.com/articles/why-r-is-hard-to-learn/ Why R is hard to learn] by Bob Musenchen.
* [http://r4stats.com/articles/why-r-is-hard-to-learn/ Why R is hard to learn] by Bob Musenchen.
Line 6,564: Line 6,564:
* [https://hbctraining.github.io/main/ Bioinformatics Training at the Harvard Chan Bioinformatics Core]
* [https://hbctraining.github.io/main/ Bioinformatics Training at the Harvard Chan Bioinformatics Core]


=== Bug Tracking System ===
== Bug Tracking System ==
https://bugs.r-project.org/bugzilla3/ and [https://bugs.r-project.org/bugzilla3/query.cgi Search existing bug reports]. Remember to select 'All' in the Status drop-down list.
https://bugs.r-project.org/bugzilla3/ and [https://bugs.r-project.org/bugzilla3/query.cgi Search existing bug reports]. Remember to select 'All' in the Status drop-down list.


Use '''sessionInfo()'''.
Use '''sessionInfo()'''.

Revision as of 20:36, 23 June 2019

Install and upgrade R

Here

Online Editor

We can run R on web browsers without installing it on local machines (similar to [/ideone.com Ideone.com] for C++. It does not require an account either (cf RStudio).

rdrr.io

It can produce graphics too. The package I am testing (cobs) is available too.

rstudio.cloud

RDocumentation

The interactive engine is based on DataCamp Light

For example, tbl_df function from dplyr package.

The website DataCamp allows to run library() on the Script window. After that, we can use the packages on R Console.

Here is a list of (common) R packages that users can use on the web.

The packages on RDocumentation may be outdated. For example, the current stringr on CRAN is v1.2.0 (2/18/2017) but RDocumentation has v1.1.0 (8/19/2016).

Web Applications

See also CRAN Task View: Web Technologies and Services

TexLive

TexLive can be installed by 2 ways

  • sudo apt install texlive It includes tlmgr utility for package manager.
  • Official website

texlive-latex-extra

https://packages.debian.org/sid/texlive-latex-extra

For example, framed and titling packages are included.

tlmgr - TeX Live package manager

https://www.tug.org/texlive/tlmgr.html

TinyTex

https://github.com/yihui/tinytex

Rmarkdown: create HTML5 web, slides and more

Rmarkdown

HTTP protocol

An HTTP server is conceptually simple:

  1. Open port 80 for listening
  2. When contact is made, gather a little information (get mainly - you can ignore the rest for now)
  3. Translate the request into a file request
  4. Open the file and spit it back at the client

It gets more difficult depending on how much of HTTP you want to support - POST is a little more complicated, scripts, handling multiple requests, etc.

Example in R

> co <- socketConnection(port=8080, server=TRUE, blocking=TRUE) 
> # Now open a web browser and type http://localhost:8080/index.html
> readLines(co,1)
[1] "GET /index.html HTTP/1.1"
> readLines(co,1)
[1] "Host: localhost:8080"
> readLines(co,1)
[1] "User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:23.0) Gecko/20100101 Firefox/23.0"
> readLines(co,1)
[1] "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
> readLines(co,1)
[1] "Accept-Language: en-US,en;q=0.5"
> readLines(co,1)
[1] "Accept-Encoding: gzip, deflate"
> readLines(co,1)
[1] "Connection: keep-alive"
> readLines(co,1)
[1] ""

Example in C (Very simple http server written in C, 187 lines)

Create a simple hello world html page and save it as <index.html> in the current directory (/home/brb/Downloads/)

Launch the server program (assume we have done gcc http_server.c -o http_server)

$ ./http_server -p 50002
Server started at port no. 50002 with root directory as /home/brb/Downloads

Secondly open a browser and type http://localhost:50002/index.html. The server will respond

GET /index.html HTTP/1.1
Host: localhost:50002
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:23.0) Gecko/20100101 Firefox/23.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive

file: /home/brb/Downloads/index.html
GET /favicon.ico HTTP/1.1
Host: localhost:50002
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:23.0) Gecko/20100101 Firefox/23.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive

file: /home/brb/Downloads/favicon.ico
GET /favicon.ico HTTP/1.1
Host: localhost:50003
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:23.0) Gecko/20100101 Firefox/23.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive

file: /home/brb/Downloads/favicon.ico

The browser will show the page from <index.html> in server.

The only bad thing is the code does not close the port. For example, if I have use Ctrl+C to close the program and try to re-launch with the same port, it will complain socket() or bind(): Address already in use.


Another Example in C (55 lines)

http://mwaidyanatha.blogspot.com/2011/05/writing-simple-web-server-in-c.html

The response is embedded in the C code.

If we test the server program by opening a browser and type "http://localhost:15000/", the server received the follwing 7 lines

GET / HTTP/1.1
Host: localhost:15000
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:23.0) Gecko/20100101 Firefox/23.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive

If we include a non-executable file's name in the url, we will be able to download that file. Try "http://localhost:15000/client.c".

If we use telnet program to test, wee need to type anything we want

$ telnet localhost 15000
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
ThisCanBeAnything        <=== This is what I typed in the client and it is also shown on server
HTTP/1.1 200 OK          <=== From here is what I got from server
Content-length: 37Content-Type: text/html

HTML_DATA_HERE_AS_YOU_MENTIONED_ABOVE <=== The html tags are not passed from server, interesting!
Connection closed by foreign host.
$ 

See also more examples under C page.

Others

shiny

See Shiny.

plumber: Turning your R code into a RESTful Web API

Docker

httpuv

http and WebSocket library.

See also the servr package which can start an HTTP server in R to serve static files, or dynamic documents that can be converted to HTML files (e.g., R Markdown) under a given directory.

RApache

gWidgetsWWW

Rook

Since R 2.13, the internal web server was exposed.

Tutorual from useR2012 and Jeffrey Horner

Here is another one from http://www.rinfinance.com.

Rook is also supported by [rApache too. See http://rapache.net/manual.html.

Google group. https://groups.google.com/forum/?fromgroups#!forum/rrook

Advantage

  • the web applications are created on desktop, whether it is Windows, Mac or Linux.
  • No Apache is needed.
  • create multiple applications at the same time. This complements the limit of rApache.

4 lines of code example.

library(Rook)
s <- Rhttpd$new()
s$start(quiet=TRUE)
s$print()
s$browse(1)  # OR s$browse("RookTest")

Notice that after s$browse() command, the cursor will return to R because the command just a shortcut to open the web page http://127.0.0.1:10215/custom/RookTest.

Rook.png Rook2.png Rookapprnorm.png

We can add Rook application to the server; see ?Rhttpd.

s$add(
    app=system.file('exampleApps/helloworld.R',package='Rook'),name='hello'
)
s$add(
    app=system.file('exampleApps/helloworldref.R',package='Rook'),name='helloref'
)
s$add(
    app=system.file('exampleApps/summary.R',package='Rook'),name='summary'
)

s$print()

#Server started on 127.0.0.1:10221
#[1] RookTest http://127.0.0.1:10221/custom/RookTest
#[2] helloref http://127.0.0.1:10221/custom/helloref
#[3] summary  http://127.0.0.1:10221/custom/summary
#[4] hello    http://127.0.0.1:10221/custom/hello

#  Stops the server but doesn't uninstall the app
## Not run: 
s$stop()

## End(Not run)
s$remove(all=TRUE)
rm(s)

For example, the interface and the source code of summary app are given below

Rookappsummary.png

app <- function(env) {
    req <- Rook::Request$new(env)
    res <- Rook::Response$new()
    res$write('Choose a CSV file:\n')
    res$write('<form method="POST" enctype="multipart/form-data">\n')
    res$write('<input type="file" name="data">\n')
    res$write('<input type="submit" name="Upload">\n</form>\n<br>')

    if (!is.null(req$POST())){
	data <- req$POST()[['data']]
	res$write("<h3>Summary of Data</h3>");
	res$write("<pre>")
	res$write(paste(capture.output(summary(read.csv(data$tempfile,stringsAsFactors=FALSE)),file=NULL),collapse='\n'))
	res$write("</pre>")
	res$write("<h3>First few lines (head())</h3>");
	res$write("<pre>")
	res$write(paste(capture.output(head(read.csv(data$tempfile,stringsAsFactors=FALSE)),file=NULL),collapse='\n'))
	res$write("</pre>") 
    }
    res$finish()
}

More example:

sumo

Sumo is a fully-functional web application template that exposes an authenticated user's R session within java server pages. See the paper http://journal.r-project.org/archive/2012-1/RJournal_2012-1_Bergsma+Smith.pdf.

Stockplot

FastRWeb

http://cran.r-project.org/web/packages/FastRWeb/index.html

WebDriver

'WebDriver' Client for 'PhantomJS'

https://github.com/rstudio/webdriver

Rwui

CGHWithR and WebDevelopR

CGHwithR is still working with old version of R although it is removed from CRAN. Its successor is WebDevelopR. Its The vignette (year 2013) provides a review of several available methods.

manipulate from RStudio

This is not a web application. But the manipulate package can be used to create interactive plot within R(Studio) environment easily. Its source is available at here.

Mathematica also has manipulate function for plotting; see here.

RCloud

RCloud is an environment for collaboratively creating and sharing data analysis scripts. RCloud lets you mix analysis code in R, HTML5, Markdown, Python, and others. Much like Sage, iPython notebooks and Mathematica, RCloud provides a notebook interface that lets you easily record a session and annotate it with text, equations, and supporting images.

See also the Talk in UseR 2014.

cloudyr and flyio - Input Output Files in R from Cloud or Local

https://blog.socialcops.com/inside-sc/announcements/flyio-r-package-interact-data-cloud/ Announcing flyio, an R Package to Interact with Data in the Cloud]

Dropbox access

rdrop2 package

Web page scraping

http://www.slideshare.net/schamber/web-data-from-r#btnNext

xml2 package

rvest package depends on xml2.

purrr: Functional Programming Tools

rvest

Easy web scraping with R

On Ubuntu, we need to install two packages first!

sudo apt-get install libcurl4-openssl-dev # OR libcurl4-gnutls-dev

sudo apt-get install libxml2-dev

Animate

V8: Embedded JavaScript Engine for R

R⁶ — General (Attys) Distributions: V8, rvest, ggbeeswarm, hrbrthemes and tidyverse packages are used.

pubmed.mineR

Text mining of PubMed Abstracts (http://www.ncbi.nlm.nih.gov/pubmed). The algorithms are designed for two formats (text and XML) from PubMed.

R code for scraping the P-values from pubmed, calculating the Science-wise False Discovery Rate, et al (Jeff Leek)

These R packages import sports, weather, stock data and more

Diving Into Dynamic Website Content with splashr

https://rud.is/b/2017/02/09/diving-into-dynamic-website-content-with-splashr/

Send email

mailR

Easiest. Require rJava package (not trivial to install, see rJava). mailR is an interface to Apache Commons Email to send emails from within R. See also send bulk email

Before we use the mailR package, we have followed here to have Allow less secure apps: 'ON' ; or you might get an error Error: EmailException (Java): Sending the email to the following server failed : smtp.gmail.com:465. Once we turn on this option, we may get an email for the notification of this change. Note that the recipient can be other than a gmail.

> send.mail(from = "[email protected]",
  to = c("[email protected]", "Recipient 2 <[email protected]>"),
  replyTo = c("Reply to someone else <[email protected]>")
  subject = "Subject of the email",
  body = "Body of the email",
  smtp = list(host.name = "smtp.gmail.com", port = 465, user.name = "gmail_username", passwd = "password", ssl = TRUE),
  attach.files ="./myattachment.txt",
  authenticate = TRUE,
  send = TRUE)
[1] "Java-Object{org.apache.commons.mail.SimpleEmail@7791a895}"

MailR SMTP Setup (Gmail, Outlook, Yahoo) | STARTTLS

gmailr

More complicated. gmailr provides access the Google's gmail.com RESTful API. Vignette and an example on here. Note that it does not use a password; it uses a json file for oauth authentication downloaded from https://console.cloud.google.com/. See also https://github.com/jimhester/gmailr/issues/1.

library(gmailr)
gmail_auth('mysecret.json', scope = 'compose') 

test_email <- mime() %>%
  to("[email protected]") %>%
  from("[email protected]") %>%
  subject("This is a subject") %>%
  html_body("<html><body>I wish <b>this</b> was bold</body></html>")
send_message(test_email)

sendmailR

sendmailR provides a simple SMTP client. It is not clear how to use the package (i.e. where to enter the password).

emayili

emayili: Sending Email from R

GEO (Gene Expression Omnibus)

See this internal link.

Interactive html output

sendplot

RIGHT

The supported plot types include scatterplot, barplot, box plot, line plot and pie plot.

In addition to tooltip boxes, the package can create a table showing all information about selected nodes.

r2d3

r2d3 - R Interface to D3 Visualizations

d3Network

library(d3Network)

Source <- c("A", "A", "A", "A", "B", "B", "C", "C", "D") 
Target <- c("B", "C", "D", "J", "E", "F", "G", "H", "I") 
NetworkData <- data.frame(Source, Target) 

d3SimpleNetwork(NetworkData, height = 800, width = 1024, file="tmp.html")

htmlwidgets for R

Embed widgets in R Markdown documents and Shiny web applications.

networkD3

This is a port of Christopher Gandrud's d3Network package to the htmlwidgets framework.

scatterD3

scatterD3 is an HTML R widget for interactive scatter plots visualization. It is based on the htmlwidgets R package and on the d3.js javascript library.

dygraphs

rthreejs - Create interactive 3D scatter plots, network plots, and globes

Examples

d3heatmap

See R

svgPanZoom

This 'htmlwidget' provides pan and zoom interactivity to R graphics, including 'base', 'lattice', and 'ggplot2'. The interactivity is provided through the 'svg-pan-zoom.js' library.

DT: An R interface to the DataTables library

plotly

Amazon

Download product information and reviews from Amazon.com

sudo apt-get install libxml2-dev
sudo apt-get install libcurl4-openssl-dev

and in R

install.packages("devtools")
install.packages("XML")
install.packages("pbapply")
install.packages("dplyr")
devtools::install_github("56north/Rmazon")
product_info <- Rmazon::get_product_info("1593273843")
reviews <- Rmazon::get_reviews("1593273843")
reviews[1,6] # only show partial characters from the 1st review
nchar(reviews[1,6])
as.character(reviews[1,6]) # show the complete text from the 1st review

reviews <- Rmazon::get_reviews("B07BNGJXGS")
# Fetching 30 reviews of 'BOOX Note Ereader,Android 6.0 32 GB 10.3" Dual Touch HD Display'
#   |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 02s
reviews
# A tibble: 30 x 6
   reviewRating reviewDate reviewFormat Verified_Purcha… reviewHeadline
          <dbl> <chr>      <lgl>        <lgl>            <chr>         
 1            4 May 23, 2… NA           TRUE             Good for PDF …
 2            3 May 8, 20… NA           FALSE            The reading s…
 3            5 May 17, 2… NA           TRUE             E-reader and …
 4            3 May 24, 2… NA           TRUE             Good hardware…
 5            3 June 21, … NA           TRUE             Poor QC       
 6            5 August 5,… NA           TRUE             Excellent for…
 7            5 May 31, 2… NA           TRUE             Especially li…
 8            5 July 4, 2… NA           TRUE             Android 6 rea…
 9            4 July 15, … NA           TRUE             Remember the …
10            4 June 9, 2… NA           TRUE             Overall fanta…
# ... with 20 more rows, and 1 more variable: reviewText <chr>
reviews[1, 6] # 6-th column is the review text

gutenbergr

Edinbr: Text Mining with R

Twitter

Faces of #rstats Twitter

OCR

Wikipedia

WikipediR: R's MediaWiki API client library

Creating local repository for CRAN and Bioconductor

R repository

Parallel Computing

  1. Example code for the book Parallel R by McCallum and Weston.
  2. A gentle introduction to parallel computing in R
  3. An introduction to distributed memory parallelism in R and C
  4. Processing: When does it worth?

Security warning from Windows/Mac

It seems it is safe to choose 'Cancel' when Windows Firewall tried to block R program when we use makeCluster() to create a socket cluster.

library(parallel)
cl <- makeCluster(2)
clusterApply(cl, 1:2, get("+"), 3)
stopCluster(cl)

WindowsSecurityAlert.png RegisterDoParallel mac.png

If we like to see current firewall settings, just click Windows Start button, search 'Firewall' and choose 'Windows Firewall with Advanced Security'. In the 'Inbound Rules', we can see what programs (like, R for Windows GUI front-end, or Rserve) are among the rules. These rules are called 'private' in the 'Profile' column. Note that each of them may appear twice because one is 'TCP' protocol and the other one has a 'UDP' protocol.

Detect number of cores

parallel::detectCores()

Don't use the default option getOption("mc.cores", 2L) (PS it only returns 2.) in mclapply() unless you are a developer for a package.

However, it is a different story when we run the R code in HPC cluster. Read the discussion Whether to use the detectCores function in R to specify the number of cores for parallel processing?

On NIH's biowulf, even I specify an interactive session with 4 cores, the parallel::detectCores() function returns 56. This number is the same as the output from the bash command grep processor /proc/cpuinfo or (better) lscpu. The free -hm also returns a full 125GB size instead of my requested size (4GB by default). The future::availableCores() fixes the problem. See Biowulf's R webpage for a detailed instructure.

How can I print when using %dopar%

https://stackoverflow.com/a/15078540

Create and register your cluster with something like:

library(doParallel)
cl<-makeCluster(4, outfile = "debugParallel.txt")
registerDoParallel(cl)
foreach(x=list(1, 2, "a"))  %dopar%  
{
  print(x)
}
stopCluster(cl)

And the debugParallel.txt will look like

starting worker pid=17718 on localhost:11704 at 15:10:29.665
starting worker pid=17733 on localhost:11704 at 15:10:29.850
starting worker pid=17747 on localhost:11704 at 15:10:30.025
starting worker pid=17761 on localhost:11704 at 15:10:30.183
[1] 1
[1] 2
[1] "a"

Setting outfile to the empty string ("") will prevent doParallel (or snow whatever) from redirecting the output, often resulting in the output from your print messages showing up on the terminal of the master process. Your foreach loop doesn't need to change at all.

parallel package (including parLapply, parSapply)

Parallel package was included in R 2.14.0. It is derived from the snow and multicore packages and provides many of the same functions as those packages.

The parallel package provides several *apply functions for R users to quickly modify their code using parallel computing.

  • makeCluster(makePSOCKcluster, makeForkCluster), stopCluster. Other cluster types are passed to package snow.
  • clusterCall, clusterEvalQ: source R files and/or load libraries
  • clusterSplit
  • clusterApply, clusterApplyLB (vs the foreach package)
  • clusterExport: export variables
  • clusterMap
  • parallel::mclapply() and parallel::parLapply()
  • parLapply, parSapply, parApply, parRapply, parCapply. Note that
    • parSapply() can be used to as a parallel version of the replicate() function. See this example.
    • An iteration parameter needs to be added to the first parameter of the main function.
  • parLapplyLB, parSapplyLB (load balance version)
  • clusterSetRNGStream, nextRNGStream, nextRNGSubStream

Examples (See ?clusterApply)

library(parallel)
cl <- makeCluster(2, type = "SOCK")
clusterApply(cl, 1:2, function(x) x*3)    # OR clusterApply(cl, 1:2, get("*"), 3)
# [[1]]
# [1] 3
#
# [[2]]
# [1] 6
parSapply(cl, 1:20, get("+"), 3)
#  [1]  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
stopCluster(cl)

An example of using clusterCall() or clusterEvalQ()

library(parallel)

cl <- makeCluster(4)
clusterCall(cl, function() { 
  source("test.R") 
})
# clusterEvalQ(cl, {
#    source("test.R")
# })

## do some parallel work
stopCluster(cl)

mclapply() from the 'parallel' package is a mult-core version of lapply()

  • Be providing the number of cores in mclapply() using mc.cores argument (2 is used by default)
  • Be careful on the need and the side-effect of using "L'Ecuyer-CMRG" seed.
  • R doesn't reset the seed when “L'Ecuyer-CMRG” RNG is used?
    library(parallel)
    system.time(mclapply(1:1e4L, function(x) rnorm(x)))
    system.time(mclapply(1:1e4L, function(x) rnorm(x), mc.cores = 4))
    
    set.seed(1234)
    mclapply(1:3, function(x) rnorm(x))
    set.seed(1234)
    mclapply(1:3, function(x) rnorm(x)) # cannot reproduce the result
    
    set.seed(123, "L'Ecuyer")
    mclapply(1:3, function(x) rnorm(x))
    mclapply(1:3, function(x) rnorm(x)) # results are not changed once we have run set.seed( , "L'Ecuyer")
    
    set.seed(1234)                      # use set.seed() in order to get a new reproducible result
    mclapply(1:3, function(x) rnorm(x)) 
    mclapply(1:3, function(x) rnorm(x)) # results are not changed
  • An example of using parallel::mcMap().

Note

  1. R doesn't reset the seed when “L'Ecuyer-CMRG” RNG is used?
  2. Windows OS can not use mclapply(). The mclapply() implementation relies on forking and Windows does not support forking. mclapply from the parallel package is implemented as a serial function on Windows systems. The parallelsugar package was created based on the above idea.
  3. Another choice for Windows OS is to use parLapply() function in parallel package.
  4. Understanding the differences between mclapply and parLapply in R You don't have to worry about reproducing your environment on each of the cluster workers if mclapply() is used.
    ncores <- as.integer( Sys.getenv('NUMBER_OF_PROCESSORS') )
    cl <- makeCluster(getOption("cl.cores", ncores))
    LLID2GOIDs2 <- parLapply(cl, rLLID, function(x) {
                                        library(org.Hs.eg.db); get("org.Hs.egGO")[[x]]} 
                            )
    stopCluster(cl)
    It does work. Cut the computing time from 100 sec to 29 sec on 4 cores.

mclapply() vs foreach()

https://stackoverflow.com/questions/44806048/r-mclapply-vs-foreach

parallel vs doParallel package

doParallel is a foreach parallel adaptor for the parallel package.

doParallel provides a parallel backend for the %dopar% function using the parallel package.

doParallel depends on foreach, iterators, parallel and utils packages.

parallelsugar package

If we load parallelsugar, the default implementation of parallel::mclapply, which used fork based clusters, will be overwritten by parallelsugar::mclapply, which is implemented with socket clusters.

library(parallel) 

system.time( mclapply(1:4, function(xx){ Sys.sleep(10) }) )
##    user  system elapsed 
##    0.00    0.00   40.06 

library(parallelsugar)
## 
## Attaching package: ‘parallelsugar’
## 
## The following object is masked from ‘package:parallel’:
## 
##     mclapply

system.time( mclapply(1:4, function(xx){ Sys.sleep(10) }) )
##    user  system elapsed 
##    0.04    0.08   12.98

snow package

Supported cluster types are "SOCK", "PVM", "MPI", and "NWS".

multicore package

This package is removed from CRAN.

Consider using package ‘parallel’ instead.

foreach package

This package depends on one of the following

  • doParallel - Foreach parallel adaptor for the parallel package
  • doSNOW - Foreach parallel adaptor for the snow package
  • doMC - Foreach parallel adaptor for the multicore package. Used in glmnet vignette.
  • doMPI - Foreach parallel adaptor for the Rmpi package
  • doRedis - Foreach parallel adapter for the rredis package

as a backend.

library(foreach)
library(doParallel)

m <- matrix(rnorm(9), 3, 3)

cl <- makeCluster(2, type = "SOCK")
registerDoParallel(cl) # register the parallel backend with the foreach package
foreach(i=1:nrow(m), .combine=rbind) %dopar%
  (m[i,] / mean(m[i,]))

stopCluster(cl)

See also this post Updates to the foreach package and its friends on Oct 2015.

combine list of lists

comb <- function(...) {
  mapply('cbind', ..., SIMPLIFY=FALSE)
}

library(foreach)
library(doParallel)

cl <- makeCluster(2)
registerDoParallel(cl) # register the parallel backend with the foreach package

m <- rbind(rep(1,3), rep(2,3))

# nrow(m) can represents number of permutations (2 in this toy example)
tmp <- foreach(i=1:nrow(m)) %dopar% {
  a <- m[i, ]
  b <- a * 10
  list(a, b)
}; tmp
# [[1]]
# [[1]][[1]]
# [1] 1 1 1
#
# [[1]][[2]]
# [1] 10 10 10
#
#
# [[2]]
# [[2]][[1]]
# [1] 2 2 2
#
# [[2]][[2]]
# [1] 20 20 20

foreach(i=1:nrow(m), .combine = "comb") %dopar% {
  a <- m[i,]
  b <- a * 10
  list(a, b)
}
# [[1]]
#      [,1] [,2]
# [1,]    1    2
# [2,]    1    2
# [3,]    1    2
#
# [[2]]
#      [,1] [,2]
# [1,]   10   20
# [2,]   10   20
# [3,]   10   20
stopCluster(cl)

Replacing double loops

library(foreach)
library(doParallel)

nc <- 4
nr <- 2

cores=detectCores()
cl <- makeCluster(cores[1]-1)
registerDoParallel(cl)
# set.seed(1234) # not work
# set.seed(1234, "L'Ecuyer-CMRG") # not work either
# library("doRNG")
# registerDoRNG(seed = 1985)    # not work with nested foreach
# Error in list(e1 = list(args = (1:nr)(), argnames = "i", evalenv = <environment>,  :
#  nested/conditional foreach loops are not supported yet.
m <- foreach (i = 1:nr, .combine='rbind') %:% # nesting operator
  foreach (j = 1:nc) %dopar% {
    rnorm(1, i*5, j) # code to parallelise
}
m
stopCluster(cl)

Note that since the random seed (see the next session) does not work on nested loop, it is better to convert nested loop (two indices) to a single loop (one index).

set seed and doRNG package

Export libraries, variables, functions

clusterEvalQ(cl, {
  library(biospear)
  library(glmnet)
  library(survival)
})
clusterExport(cl, list("var1", "foo2"))

Summary the result

foreach returns the result in a list. For example, if each component is a matrix we can use

  • Reduce("+", res)/length(res) # Reduce("+", res, na.rm = TRUE) not working
  • apply(simplify2array(res), 1:2, mean, na.rm = TRUE)

to get the average of matrices over the list.

Read a list files

Reading List Faster With parallel, doParallel, and pbapply

Don't use too many cores: Error in unserialize(socklist[[n]])

error reading from connection. See https://stackoverflow.com/a/24352032

Why is foreach() sometimes slower than for?

See https://stackoverflow.com/a/10414280 or https://stackoverflow.com/q/16963808

  • foreach is only advisable if you have relatively few rounds through very time consuming functions.
  • foreach() is best when the number of jobs does not hugely exceed the number of processors you will be using.
library(foreach)
library(parallel)
nCores <- future::availableCores()
cl <- makeCluster(nCores[[1]])
registerDoParallel(cl)

B<-10000

myFunc<-function() for(i in 1:B) sqrt(i)

myFunc2<-function() foreach(i = 1:B)  %do% sqrt(i)

myParFunc<-function() foreach(i = 1:B) %dopar% sqrt(i)

system.time(myFunc())
#   user  system elapsed 
#  0.001   0.002   0.004 
system.time(myFunc2())
#   user  system elapsed 
#  2.473   0.005   2.477 
system.time(myParFunc())
#   user  system elapsed 
#  3.560   0.242   3.829 

stopCluster(cl)

Another example: reduce the probe sets corresponding to the same gene symbol by taking the probe set with the largest gene expression. It is not just the speed problem; it runs out of memory on a 44928x393 matrix with a PC has 8GB RAM. On a smaller data with 5000 genes and 100 arrays it took 0.071 seconds using split() + sapply() but it took 39 seconds using the foreach method.

require(magrittr)
x <- matrix(rnorm(5000*100), nr=5000) # gene expression
y <- sample(3000, 5000, replace = TRUE) # gene symbol
rownames(x) <- names(y) <- 1:5000 # probe sets name

spt <- apply(x,1,mean,na.rm=TRUE) %>% split(., y)
ind <- sapply(spt, function(a) names(which.max(a)))
x.reduced <- x[ind, ]; rownames(x.reduced) <- names(spt)

Progress bar (doSNOW + foreach)

snowfall package

http://www.imbi.uni-freiburg.de/parallel/docs/Reisensburg2009_TutParallelComputing_Knaus_Porzelius.pdf

Rmpi package

Some examples/tutorials

OpenMP

BiocParallel

RcppParallel

future & future.apply & doFuture packages

Apache Spark

Microsoft R Server

GPU

Threads

Benchmark

Are parallel simulations in the cloud worth it? Benchmarking my MBP vs my Workstation vs Amazon EC2

R functions to run timing

# Method 1
system.time( invisible(rnorm(10000)))

# Method 2
btime <- Sys.time()
invisible(rnorm(10000))
Sys.time() - btime

Cloud Computing

Install R on Amazon EC2

http://randyzwitch.com/r-amazon-ec2/

Bioconductor on Amazon EC2

http://www.bioconductor.org/help/bioconductor-cloud-ami/

Big Data Analysis

bigmemory, biganalytics, bigtabulate

ff, ffbase

biglm

data.table

See data.table.

Reproducible Research

http://cran.r-project.org/web/views/ReproducibleResearch.html

Reproducible Environments

https://rviews.rstudio.com/2019/04/22/reproducible-environments/

Useful R packages

RInside

Ubuntu

With RInside, R can be embedded in a graphical application. For example, $HOME/R/x86_64-pc-linux-gnu-library/3.0/RInside/examples/qt directory includes source code of a Qt application to show a kernel density plot with various options like kernel functions, bandwidth and an R command text box to generate the random data. See my demo on Youtube. I have tested this qtdensity example successfully using Qt 4.8.5.

  1. Follow the instruction cairoDevice to install required libraries for cairoDevice package and then cairoDevice itself.
  2. Install Qt. Check 'qmake' command becomes available by typing 'whereis qmake' or 'which qmake' in terminal.
  3. Open Qt Creator from Ubuntu start menu/Launcher. Open the project file $HOME/R/x86_64-pc-linux-gnu-library/3.0/RInside/examples/qt/qtdensity.pro in Qt Creator.
  4. Under Qt Creator, hit 'Ctrl + R' or the big green triangle button on the lower-left corner to build/run the project. If everything works well, you shall see the interactive program qtdensity appears on your desktop.

Qtdensity.png.

With RInside + Wt web toolkit installed, we can also create a web application. To demonstrate the example in examples/wt directory, we can do

cd ~/R/x86_64-pc-linux-gnu-library/3.0/RInside/examples/wt
make
sudo ./wtdensity --docroot . --http-address localhost --http-port 8080

Then we can go to the browser's address bar and type http://localhost:8080 to see how it works (a screenshot is in here).

Windows 7

To make RInside works on Windows OS, try the following

  1. Make sure R is installed under C:\ instead of C:\Program Files if we don't want to get an error like g++.exe: error: Files/R/R-3.0.1/library/RInside/include: No such file or directory.
  2. Install RTools
  3. Instal RInside package from source (the binary version will give an error )
  4. Create a DOS batch file containing necessary paths in PATH environment variable
@echo off
set PATH=C:\Rtools\bin;c:\Rtools\gcc-4.6.3\bin;%PATH%
set PATH=C:\R\R-3.0.1\bin\i386;%PATH%
set PKG_LIBS=`Rscript -e "Rcpp:::LdFlags()"`
set PKG_CPPFLAGS=`Rscript -e "Rcpp:::CxxFlags()"`
set R_HOME=C:\R\R-3.0.1
echo Setting environment for using R
cmd

In the Windows command prompt, run

cd C:\R\R-3.0.1\library\RInside\examples\standard
make -f Makefile.win

Now we can test by running any of executable files that make generates. For example, rinside_sample0.

rinside_sample0

As for the Qt application qdensity program, we need to make sure the same version of MinGW was used in building RInside/Rcpp and Qt. See some discussions in

So the Qt and Wt web tool applications on Windows may or may not be possible.

GUI

Qt and R

tkrplot

On Ubuntu, we need to install tk packages, such as by

sudo apt-get install tk-dev

reticulate - Interface to 'Python'

def add_three(x):
    z = x + 3
    return z
---
title: "R Notebook"
output: html_notebook
---

```{r}
library(reticulate)
py_discover_config()
x <- 5
source_python("test.py")
y <- add_three(x)
print(y)
```

Pass R variables to Python. Works
```{python}
a = 7
print(r.x)
```

Pass python variables to R. Works.
```{r}
py$a
py_run_string("y = 10"); py$y
```

Hadoop (eg ~100 terabytes)

See also HighPerformanceComputing

RHadoop

Snowdoop: an alternative to MapReduce algorithm

XML

On Ubuntu, we need to install libxml2-dev before we can install XML package.

sudo apt-get update
sudo apt-get install libxml2-dev

On CentOS,

yum -y install libxml2 libxml2-devel

XML

library(XML)

# Read and parse HTML file
doc.html = htmlTreeParse('http://apiolaza.net/babel.html', useInternal = TRUE)

# Extract all the paragraphs (HTML tag is p, starting at
# the root of the document). Unlist flattens the list to
# create a character vector.
doc.text = unlist(xpathApply(doc.html, '//p', xmlValue))

# Replace all by spaces
doc.text = gsub('\n', ' ', doc.text)

# Join all the elements of the character vector into a single
# character string, separated by spaces
doc.text = paste(doc.text, collapse = ' ')

This post http://stackoverflow.com/questions/25315381/using-xpathsapply-to-scrape-xml-attributes-in-r can be used to monitor new releases from github.com.

> library(RCurl) # getURL()
> library(XML)   # htmlParse and xpathSApply
> xData <- getURL("https://github.com/alexdobin/STAR/releases")
> doc = htmlParse(xData)
> plain.text <- xpathSApply(doc, "//span[@class='css-truncate-target']", xmlValue)
  # I look at the source code and search 2.5.3a and find the tag as
  # <span class="css-truncate-target">2.5.3a</span>
> plain.text
 [1] "2.5.3a"      "2.5.2b"      "2.5.2a"      "2.5.1b"      "2.5.1a"     
 [6] "2.5.0c"      "2.5.0b"      "STAR_2.5.0a" "STAR_2.4.2a" "STAR_2.4.1d"
>
> # try bwa
> > xData <- getURL("https://github.com/lh3/bwa/releases")
> doc = htmlParse(xData)
> xpathSApply(doc, "//span[@class='css-truncate-target']", xmlValue)
[1] "v0.7.15" "v0.7.13"

> # try picard
> xData <- getURL("https://github.com/broadinstitute/picard/releases")
> doc = htmlParse(xData)
> xpathSApply(doc, "//span[@class='css-truncate-target']", xmlValue)
 [1] "2.9.1" "2.9.0" "2.8.3" "2.8.2" "2.8.1" "2.8.0" "2.7.2" "2.7.1" "2.7.0"
[10] "2.6.0"

This method can be used to monitor new tags/releases from some projects like Cura, BWA, Picard, STAR. But for some projects like sratools the class attribute in the span element ("css-truncate-target") can be different (such as "tag-name").

xmlview

RCurl

On Ubuntu, we need to install the packages (the first one is for XML package that RCurl suggests)

# Test on Ubuntu 14.04
sudo apt-get install libxml2-dev
sudo apt-get install libcurl4-openssl-dev

Scrape google scholar results

https://github.com/tonybreyal/Blog-Reference-Functions/blob/master/R/googleScholarXScraper/googleScholarXScraper.R

No google ID is required

Seems not work

 Error in data.frame(footer = xpathLVApply(doc, xpath.base, "/font/span[@class='gs_fl']",  : 
  arguments imply differing number of rows: 2, 0 

devtools

devtools package depends on Curl. It actually depends on some system files. If we just need to install a package, consider the remotes package which was suggested by the BiocManager package.

# Ubuntu 14.04
sudo apt-get install libcurl4-openssl-dev

# Ubuntu 16.04
sudo apt-get install build-essential libcurl4-gnutls-dev libxml2-dev libssl-dev

Lazy-load database XXX is corrupt. internal error -3. It often happens when you use install_github to install a package that's currently loaded; try restarting R and running the app again.

httr

httr imports curl, jsonlite, mime, openssl and R6 packages.

When I tried to install httr package, I got an error and some message:

Configuration failed because openssl was not found. Try installing:
 * deb: libssl-dev (Debian, Ubuntu, etc)
 * rpm: openssl-devel (Fedora, CentOS, RHEL)
 * csw: libssl_dev (Solaris)
 * brew: openssl (Mac OSX)
If openssl is already installed, check that 'pkg-config' is in your
PATH and PKG_CONFIG_PATH contains a openssl.pc file. If pkg-config
is unavailable you can set INCLUDE_DIR and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
--------------------------------------------------------------------
ERROR: configuration failed for package ‘openssl’

It turns out after I run sudo apt-get install libssl-dev in the terminal (Debian), it would go smoothly with installing httr package. Nice httr!

Real example: see this post. Unfortunately I did not get a table result; I only get an html file (R 3.2.5, httr 1.1.0 on Ubuntu and Debian).

Since httr package was used in many other packages, take a look at how others use it. For example, aRxiv package.

curl

curl is independent of RCurl package.

library(curl)
h <- new_handle()
handle_setform(h,
  name="aaa", email="bbb"
)
req <- curl_fetch_memory("http://localhost/d/phpmyql3_scripts/ch02/form2.html", handle = h)
rawToChar(req$content)

rOpenSci packages

rOpenSci contains packages that allow access to data repositories through the R statistical programming environment

remotes

Download and install R packages stored in 'GitHub', 'BitBucket', or plain 'subversion' or 'git' repositories. This package is a lightweight replacement of the 'install_*' functions in 'devtools'. Also remotes does not require any extra OS level library (at least on Ubuntu 16.04).

Example:

# https://github.com/henrikbengtsson/matrixstats
remotes::install_github('HenrikBengtsson/matrixStats@develop')

DirichletMultinomial

On Ubuntu, we do

sudo apt-get install libgsl0-dev

Create GUI

gWidgets

GenOrd: Generate ordinal and discrete variables with given correlation matrix and marginal distributions

here

rjson

http://heuristically.wordpress.com/2013/05/20/geolocate-ip-addresses-in-r/

RJSONIO

Accessing Bitcoin Data with R

http://blog.revolutionanalytics.com/2015/11/accessing-bitcoin-data-with-r.html

Plot IP on google map

The following example is modified from the first of above list.

require(RJSONIO) # fromJSON
require(RCurl)   # getURL

temp = getURL("https://gist.github.com/arraytools/6743826/raw/23c8b0bc4b8f0d1bfe1c2fad985ca2e091aeb916/ip.txt", 
                           ssl.verifypeer = FALSE)
ip <- read.table(textConnection(temp), as.is=TRUE)
names(ip) <- "IP"
nr = nrow(ip)
 
Lon <- as.numeric(rep(NA, nr))
Lat <- Lon
Coords <- data.frame(Lon, Lat)
 
ip2coordinates <- function(ip) {
  api <- "http://freegeoip.net/json/"
  get.ips <- getURL(paste(api, URLencode(ip), sep=""))
  # result <- ldply(fromJSON(get.ips), data.frame)
  result <- data.frame(fromJSON(get.ips))
  names(result)[1] <- "ip.address"
  return(result)
}

for (i in 1:nr){
  cat(i, "\n")
  try(
  Coords[i, 1:2] <- ip2coordinates(ip$IP[i])[c("longitude", "latitude")]
  )
}
 
# append to log-file:
logfile <- data.frame(ip, Lat = Coords$Lat, Long = Coords$Lon,
                                       LatLong = paste(round(Coords$Lat, 1), round(Coords$Lon, 1), sep = ":")) 
log_gmap <- logfile[!is.na(logfile$Lat), ]

require(googleVis) # gvisMap
gmap <- gvisMap(log_gmap, "LatLong",
                options = list(showTip = TRUE, enableScrollWheel = TRUE,
                               mapType = 'hybrid', useMapTypeControl = TRUE,
                               width = 1024, height = 800))
plot(gmap)

GoogleVis.png

The plot.gvis() method in googleVis packages also teaches the startDynamicHelp() function in the tools package, which was used to launch a http server. See Jeffrey Horner's note about deploying Rook App.

Map

leaflet

choroplethr

ggplot2

How to make maps with Census data in R

googleVis

See an example from RJSONIO above.

googleAuthR

Create R functions that interact with OAuth2 Google APIs easily, with auto-refresh and Shiny compatibility.

gtrendsR - Google Trends

quantmod

Maintaining a database of price files in R. It consists of 3 steps.

  1. Initial data downloading
  2. Update existing data
  3. Create a batch file

Rcpp

It may be necessary to install dependency packages for RcppEigen.

sudo apt-get install libblas-dev liblapack-dev
sudo apt-get install gfortran

Speed Comparison

  • A comparison of high-performance computing techniques in R. It compares Rcpp to an R looping operator (like mapply), a parallelized version of a looping operator (like mcmapply), explicit parallelization, via the parallel package or the ParallelR suite.
  • In the following example, C++ avoids the overhead of creating an intermediate object (eg vector of the same length as the original vector). The c++ uses an intermediate scalar. So C++ wins R over memory management in this case.
    # http://blog.mckuhn.de/2016/03/avoiding-unnecessary-memory-allocations.html 
    library(Rcpp)
    
    `%count<%` <- cppFunction('
    size_t count_less(NumericVector x, NumericVector y) {
      const size_t nx = x.size();
      const size_t ny = y.size();
      if (nx > 1 & ny > 1) stop("Only one parameter can be a vector!");
      size_t count = 0;
      if (nx == 1) {
        double c = x[0];
        for (int i = 0; i < ny; i++) count += c < y[i];
      } else {
        double c = y[0];
        for (int i = 0; i < nx; i++) count += x[i] < c;
      }
      return count;
    }
    ')
    
    set.seed(42)
    
    N <- 10^7
    v <- runif(N, 0, 10000)
    
    # Testing on my ODroid xu4 running ubuntu 15.10
    system.time(sum(v < 5000))
    #   user  system elapsed
    #  1.135   0.305   1.453
    system.time(v %count<% 5000)
    #   user  system elapsed
    #  0.535   0.000   0.540
  • Why R for data science – and not Python?
    library(Rcpp)
     
    bmi_R <- function(weight, height) {
      weight / (height * height)
    }
    bmi_R(80, 1.85) # body mass index of person with 80 kg and 185 cm
    ## [1] 23.37473
     
    cppFunction("
      float bmi_cpp(float weight, float height) {
        return weight / (height * height);
      }
    ")
    bmi_cpp(80, 1.85) # same with cpp function
    ## [1] 23.37473
  • Boost the speed of R calls from Rcpp

Use Rcpp in RStudio

RStudio makes it easy to use Rcpp package.

Open RStudio, click New File -> C++ File. It will create a C++ template on the RStudio editor

#include <Rcpp.h>
using namespace Rcpp;

// Below is a simple example of exporting a C++ function to R. You can
// source this function into an R session using the Rcpp::sourceCpp 
// function (or via the Source button on the editor toolbar)

// For more on using Rcpp click the Help button on the editor toolbar

// [[Rcpp::export]]
int timesTwo(int x) {
   return x * 2;
}

Now in R console, type

library(Rcpp)
sourceCpp("~/Downloads/timesTwo.cpp")
timesTwo(9)
# [1] 18

See more examples on http://adv-r.had.co.nz/Rcpp.html and Calculating a fuzzy kmeans membership matrix

If we wan to test Boost library, we can try it in RStudio. Consider the following example in stackoverflow.com.

// [[Rcpp::depends(BH)]]
#include <Rcpp.h>
#include <boost/foreach.hpp>
#include <boost/math/special_functions/gamma.hpp>

#define foreach BOOST_FOREACH

using namespace boost::math;

//[[Rcpp::export]]
Rcpp::NumericVector boost_gamma( Rcpp::NumericVector x ) {
  foreach( double& elem, x ) {
    elem = boost::math::tgamma(elem);
  };

  return x;
}

Then the R console

boost_gamma(0:10 + 1)
#  [1]       1       1       2       6      24     120     720    5040   40320
# [10]  362880 3628800

identical( boost_gamma(0:10 + 1), factorial(0:10) )
# [1] TRUE

Example 1. convolution example

First, Rcpp package should be installed (I am working on Linux system). Next we try one example shipped in Rcpp package.

PS. If R was not available in global environment (such as built by ourselves), we need to modify 'Makefile' file by replacing 'R' command with its complete path (4 places).

cd ~/R/x86_64-pc-linux-gnu-library/3.0/Rcpp/examples/ConvolveBenchmarks/
make
R

Then type the following in an R session to see how it works. Note that we don't need to issue library(Rcpp) in R.

dyn.load("convolve3_cpp.so")
x <- .Call("convolve3cpp", 1:3, 4:6)
x # 4 13 28 27 18

If we have our own cpp file, we need to use the following way to create dynamic loaded library file. Note that the character (grave accent) ` is not (single quote)'. If you mistakenly use ', it won't work.

export PKG_CXXFLAGS=`Rscript -e "Rcpp:::CxxFlags()"`
export PKG_LIBS=`Rscript -e "Rcpp:::LdFlags()"`
R CMD SHLIB xxxx.cpp

Example 2. Use together with inline package

library(inline)
src <-'
 Rcpp::NumericVector xa(a);
 Rcpp::NumericVector xb(b);
 int n_xa = xa.size(), n_xb = xb.size();

 Rcpp::NumericVector xab(n_xa + n_xb - 1);
 for (int i = 0; i < n_xa; i++)
 for (int j = 0; j < n_xb; j++)
 xab[i + j] += xa[i] * xb[j];
 return xab;
'
fun <- cxxfunction(signature(a = "numeric", b = "numeric"),
 src, plugin = "Rcpp")
fun(1:3, 1:4) 
# [1]  1  4 10 16 17 12

Example 3. Calling an R function

RcppParallel

rTRNG: parallel RNG in R

rTRNG: ADVANCED PARALLEL RNG IN R

caret

Tool for connecting Excel with R

Read/Write Excel files package

  • http://www.milanor.net/blog/?p=779
  • flipAPI. One useful feature of DownloadXLSX, which is not supported by the readxl package, is that it can read Excel files directly from the URL.
  • xlsx: depends on Java
  • openxlsx: not depend on Java. Depend on zip application. On Windows, it seems to be OK without installing Rtools. But it can not read xls file; it works on xlsx file.
  • readxl: it does not depend on anything although it can only read but not write Excel files.
    • It is part of tidyverse package. The readxl website provides several articles for more examples.
    • readxl webinar.
    • One advantage of read_excel (as with read_csv in the readr package) is that the data imports into an easy to print object with three attributes a tbl_df, a tbl and a data.frame.
    • For writing to Excel formats, use writexl or openxlsx package.
  • writexl: zero dependency xlsx writer for R
library(readxl)
read_excel(path, sheet = NULL, range = NULL, col_names = TRUE, 
    col_types = NULL, na = "", trim_ws = TRUE, skip = 0, n_max = Inf, 
    guess_max = min(1000, n_max), progress = readxl_progress(), 
    .name_repair = "unique")
# Example
read_excel(path, range = cell_cols("c:cx"), col_types = "numeric")

For the Chromosome column, integer values becomes strings (but converted to double, so 5 becomes 5.000000) or NA (empty on sheets).

> head(read_excel("~/Downloads/BRCA.xls", 4)[ , -9], 3)
  UniqueID (Double-click) CloneID UGCluster
1                   HK1A1   21652 Hs.445981
2                   HK1A2   22012 Hs.119177
3                   HK1A4   22293 Hs.501376
                                                    Name Symbol EntrezID
1 Catenin (cadherin-associated protein), alpha 1, 102kDa CTNNA1     1495
2                              ADP-ribosylation factor 3   ARF3      377
3                          Uroporphyrinogen III synthase   UROS     7390
  Chromosome      Cytoband ChimericClusterIDs Filter
1   5.000000        5q31.2               <NA>      1
2  12.000000         12q13               <NA>      1
3       <NA> 10q25.2-q26.3               <NA>      1

The hidden worksheets become visible (Not sure what are those first rows mean in the output).

> excel_sheets("~/Downloads/BRCA.xls")
DEFINEDNAME: 21 00 00 01 0b 00 00 00 02 00 00 00 00 00 00 0d 3b 01 00 00 00 9a 0c 00 00 1a 00 
DEFINEDNAME: 21 00 00 01 0b 00 00 00 04 00 00 00 00 00 00 0d 3b 03 00 00 00 9b 0c 00 00 0a 00 
DEFINEDNAME: 21 00 00 01 0b 00 00 00 03 00 00 00 00 00 00 0d 3b 02 00 00 00 9a 0c 00 00 06 00 
[1] "Experiment descriptors" "Filtered log ratio"     "Gene identifiers"      
[4] "Gene annotations"       "CollateInfo"            "GeneSubsets"           
[7] "GeneSubsetsTemp"

The Chinese character works too.

> read_excel("~/Downloads/testChinese.xlsx", 1)
   中文 B C
1     a b c
2     1 2 3

To read all worksheets we need a convenient function

read_excel_allsheets <- function(filename) {
    sheets <- readxl::excel_sheets(filename)
    sheets <- sheets[-1] # Skip sheet 1
    x <- lapply(sheets, function(X) readxl::read_excel(filename, sheet = X, col_types = "numeric"))
    names(x) <- sheets
    x
}
dcfile <- "table0.77_dC_biospear.xlsx"
dc <- read_excel_allsheets(dcfile)
# Each component (eg dc[[1]]) is a tibble.

readr (it is not designed to read Excel files)

Compared to base equivalents like read.csv(), readr is much faster and gives more convenient output: it never converts strings to factors, can parse date/times, and it doesn’t munge the column names.

1.0.0 released.

The read_csv() function from the readr package is as fast as fread() function from data.table package. For files beyond 100MB in size fread() and read_csv() can be expected to be around 5 times faster than read.csv(). See 5.3 of Efficient R Programming book.

Note that fread() can read-n a selection of the columns.

Colors

Below is an example using the option scale_fill_brewer(palette = "Paired"). See the source code at gist. Note that only 'set1' and 'set3' palettes in qualitative scheme can support up to 12 classes.

According to the information from the colorbrew website, qualitative schemes do not imply magnitude differences between legend classes, and hues are used to create the primary visual differences between classes.

GgplotPalette.svg

ggplot2

See ggplot2

Data Manipulation & Tidyverse

   Import
     |
     | readr, readxl
     | haven, DBI, httr   +----- Visualize ------+
     |                    |    ggplot2, ggvis    |
     |                    |                      |
   Tidy ------------- Transform 
   tibble               dplyr                   Model 
   tidyr                  |                    broom
                          +------ Model ---------+
  • R for Data Science and tidyverse package (it is a collection of ggplot2, tibble, tidyr, readr, purrr & dplyr packages).
    • tidyverse, among others, was used at Mining CRAN DESCRIPTION Files (tbl_df(), %>%, summarise(), count(), mutate(), arrange(), unite(), ggplot(), filter(), select(), ...). Note that there is a problem to reproduce the result. I need to run cran <- cran[, -14] to remove the MD5sum column.
    • Compile R for Data Science to a PDF
  • Data Wrangling with dplyr and tidyr Cheat Sheet
  • Data Wrangling with Tidyverse from the Harvard Chan School of Public Health.
  • Best packages for data manipulation in R. It demonstrates to perform the same tasks using data.table and dplyr packages. data.table is faster and it may be a go-to package when performance and memory are the constraints.
  • DATA MANIPULATION IN R by Alboukadel Kassambara
    • subset data frame columns: pull(), select(), select_if(), other helper functions
    • subset (filter) data frame rows: slice(), filter(), filter_all(), filter_if(), filter_at(), sample_n(), top_n()
    • identify and remove duplicate rows: duplicated(), unique(), distinct()
    • ordering rows: arrange(), desc()
    • renaming and adding columns: rename()
    • compute and add new variables to a data frame: mutate(), transmutate()
    • computing summary statistics (pay to view)

Install on Ubuntu

How to install Tidyverse on Ubuntu 16.04 and 17.04

# Ubuntu >= 18.04. However, I get unmet dependencies errors on R 3.5.3.
# r-cran-curl : Depends: r-api-3.4
sudo apt-get install r-cran-curl r-cran-openssl r-cran-xml2

# Works fine on Ubuntu 16.04, 18.04
sudo apt install libcurl4-openssl-dev libssl-dev libxml2-dev

80 R packages will be installed after tidyverse has been installed.

Install on Raspberry Pi/(ARM based) Chromebook

In additional to the requirements of installing on Ubuntu, I got an error when it is installing a dependent package fs: undefined symbol: pthread_atfork. The fs package version is 1.2.6. The solution is to add one line in fs/src/Makevars file and then install the "fs" package using the source on the local machine.

5 most useful data manipulation functions

  • subset() for making subsets of data (natch)
  • merge() for combining data sets in a smart and easy way
  • melt()-reshape2 package for converting from wide to long data formats. See an example here where we want to combine multiple columns of values into 1 column. melt() is replaced by gather().
  • dcast()-reshape2 package for converting from long to wide data formats (or just use tapply()), and for making summary tables
  • ddply()-plyr package for doing split-apply-combine operations, which covers a huge swath of the most tricky data operations

data.table

Fast aggregation of large data (e.g. 100GB in RAM or just several GB size file), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns and a fast file reader (fread).

Some resources:

  • https://www.rdocumentation.org/packages/data.table/versions/1.12.0
  • R Packages: dplyr vs data.table
  • Cheat sheet from RStudio
  • Reading large data tables in R. fread(FILENAME)
  • Note that 'x[, 2] always return 2. If you want to do the thing you want, use x[, 2, with=FALSE] or x[, V2] where V2 is the header name. See the FAQ #1 in data.table.
  • Understanding data.table Rolling Joins
  • Intro to The data.table Package
    • Subsetting rows and/or columns
    • Alternative to using tapply(), aggregate(), table() to summarize data
    • Similarities to SQL, DT[i, j, by]
  • R : data.table (with 50 examples) from ListenData
    • Describe Data
    • Selecting or Keeping Columns
    • Rename Variables
    • Subsetting Rows / Filtering
    • Faster Data Manipulation with Indexing
    • Performance Comparison
    • Sorting Data
    • Adding Columns (Calculation on rows)
    • How to write Sub Queries (like SQL)
    • Summarize or Aggregate Columns
    • GROUP BY (Within Group Calculation)
    • Remove Duplicates
    • Extract values within a group
    • SQL's RANK OVER PARTITION
    • Cumulative SUM by GROUP
    • Lag and Lead
    • Between and LIKE Operator
    • Merging / Joins
    • Convert a data.table to data.frame
  • R Tutorial: data.table from dezyre.com
    • Syntax: DT[where, select|update|do, by]
    • Keys and setkey()
    • Fast grouping using j and by: DT[,sum(v),by=x]
    • Fast ordered joins: X[Y,roll=TRUE]
  • In the Introduction to data.table vignette, the data.table::order() function is SLOWER than base::order() from my Odroid xu4 (running Ubuntu 14.04.4 trusty on uSD)
    odt = data.table(col=sample(1e7))
    (t1 <- system.time(ans1 <- odt[base::order(col)]))  ## uses order from base R
    #   user  system elapsed 
    #  2.730   0.210   2.947 
    (t2 <- system.time(ans2 <- odt[order(col)]))        ## uses data.table's order
    #   user  system elapsed 
    #  2.830   0.215   3.052
    (identical(ans1, ans2))
    # [1] TRUE
  • Boost Your Data Munging with R
  • rbindlist(). One problem, it uses too much memory. In fact, when I try to analyze R package downloads, the command "dat <- rbindlist(logs)" uses up my 64GB memory (OS becomes unresponsive).

OpenMP enabled compiler for Mac. This instruction works on my Mac El Capitan (10.11.6) when I need to upgrade the data.table version from 1.11.4 to 1.11.6.

Question: how to make use multicore with data.table package?

reshape & reshape2 (superceded by tidyr package)

tidyr and benchmark

An evolution of reshape2. It's designed specifically for data tidying (not general reshaping or aggregating) and works well with dplyr data pipelines.

Make wide tables long with gather() (see 6.3.1 of Efficient R Programming)

library(tidyr)
library(efficient)
data(pew) # wide table
dim(pew) # 18 x 10,  (religion, '<$10k', '$10--20k', '$20--30k', ..., '>150k') 
pewt <- gather(data = pew, key = Income, value = Count, -religion)
dim(pew) # 162 x 3,  (religion, Income, Count)

args(gather)
# function(data, key, value, ..., na.rm = FALSE, convert = FALSE, factor_key = FALSE)

where the three arguments of gather() requires:

  • data: a data frame in which column names will become row vaues
  • key: the name of the categorical variable into which the column names in the original datasets are converted.
  • value: the name of cell value columns

In this example, the 'religion' column will not be included (-religion).

dplyr, plyr packages

  • plyr package suffered from being slow in some cases. dplyr addresses this by porting much of the computation to C++. Another additional feature is the ability to work with data stored directly in an external database. The benefits of doing this are the data can be managed natively in a relational database, queries can be conducted on that database, and only the results of query returned.
  • Essential functions: 3 rows functions, 3 column functions and 1 mixed function.
           select, mutate, rename
            +------------------+
filter      +                  +
arrange     +                  +
group_by    +                  +
drop_na     +                  +
ungroup     + summarise        +
            +------------------+
  • These functions works on data frames and tibble objects.
iris %>% filter(Species == "setosa") %>% count()
head(iris %>% filter(Species == "setosa") %>% arrange(Sepal.Length))
  • dplyr tutorial from PH525x series (Biomedical Data Science by Rafael Irizarry and Michael Love). For select() function, some additional options to select columns based on a specific criteria include
    • start_with()/ ends_with() = Select columns that start/end with a character string
    • contains() = Select columns that contain a character string
    • matches() = Select columns that match a regular expression
    • one_of() = Select columns names that are from a group of names
  • Data Transformation in the book R for Data Science. Five key functions in the dplyr package:
    • Filter rows: filter()
    • Arrange rows: arrange()
    • Select columns: select()
    • Add new variables: mutate()
    • Grouped summaries: group_by() & summarise()
# filter
jan1 <- filter(flights, month == 1, day == 1)
filter(flights, month == 11 | month == 12)
filter(flights, arr_delay <= 120, dep_delay <= 120)
df <- tibble(x = c(1, NA, 3))
filter(df, x > 1)
filter(df, is.na(x) | x > 1)

# arrange
arrange(flights, year, month, day)
arrange(flights, desc(arr_delay))

# select
select(flights, year, month, day)
select(flights, year:day)
select(flights, -(year:day))

# mutate
flights_sml <- select(flights, 
  year:day, 
  ends_with("delay"), 
  distance, 
  air_time
)
mutate(flights_sml,
  gain = arr_delay - dep_delay,
  speed = distance / air_time * 60
)
# if you only want to keep the new variables
transmute(flights,
  gain = arr_delay - dep_delay,
  hours = air_time / 60,
  gain_per_hour = gain / hours
)

# summarise()
by_day <- group_by(flights, year, month, day)
summarise(by_day, delay = mean(dep_delay, na.rm = TRUE))

# pipe. Note summarise() can return more than 1 variable.
delays <- flights %>% 
  group_by(dest) %>% 
  summarise(
    count = n(),
    dist = mean(distance, na.rm = TRUE),
    delay = mean(arr_delay, na.rm = TRUE)
  ) %>% 
  filter(count > 20, dest != "HNL")
flights %>% 
  group_by(year, month, day) %>% 
  summarise(mean = mean(dep_delay, na.rm = TRUE))

stringr

magrittr

Instead of nested statements, it is using pipe operator %>%. So the code is easier to read. Impressive!

x %>% f     # f(x)
x %>% f(y)  # f(x, y)
x %>% f(arg=y)  # f(x, arg=y)
x %>% f(z, .) # f(z, x)
x %>% f(y) %>% g(z)  #  g(f(x, y), z)

x %>% select(which(colSums(!is.na(.))>0))  # remove columns with all missing data
x %>% select(which(colSums(!is.na(.))>0)) %>% filter((rowSums(!is.na(.))>0)) # remove all-NA columns _and_ rows
suppressPackageStartupMessages(library("dplyr"))
starwars %>%
  filter(., height > 200) %>%
  select(., height, mass) %>%
  head(.)
# instead of 
starwars %>%
  filter(height > 200) %>%
  select(height, mass) %>%
  head
iris$Species
iris[["Species"]]

iris %>%
`[[`("Species")

iris %>%
`[[`(5)

iris %>%
  subset(select = "Species")
  • Split-apply-combine: group + summarize + sort/arrange + top n. The following example is from Efficient R programming.
data(wb_ineq, package = "efficient")
wb_ineq %>% 
  filter(grepl("g", Country)) %>%
  group_by(Year) %>%
  summarise(gini = mean(gini, na.rm  = TRUE)) %>%
  arrange(desc(gini)) %>%
  top_n(n = 5)
# Examples from R for Data Science-Import, Tidy, Transform, Visualize, and Model
diamonds <- ggplot2::diamonds
diamonds2 <- diamonds %>% dplyr::mutate(price_per_carat = price / carat)

pryr::object_size(diamonds)
pryr::object_size(diamonds2)
pryr::object_size(diamonds, diamonds2)

rnorm(100) %>% matrix(ncol = 2) %>% plot() %>% str()
rnorm(100) %>% matrix(ncol = 2) %T>% plot() %>% str() # 'tee' pipe
    # %T>% works like %>% except that it returns the lefthand side (rnorm(100) %>% matrix(ncol = 2))  
    # instead of the righthand side.

# If a function does not have a data frame based api, you can use %$%.
# It explodes out the variables in a data frame.
mtcars %$% cor(disp, mpg) 

# For assignment, magrittr provides the %<>% operator
mtcars <- mtcars %>% transform(cyl = cyl * 2) # can be simplified by
mtcars %<>% transform(cyl = cyl * 2)

Upsides of using magrittr: no need to create intermediate objects, code is easy to read.

When not to use the pipe

  • your pipes are longer than (say) 10 steps
  • you have multiple inputs or outputs
  • Functions that use the current environment: assign(), get(), load()
  • Functions that use lazy evaluation: tryCatch(), try()

outer()

Genomic sequence

  • chartr
> yourSeq <- "AAAACCCGGGTTTNNN"
> chartr("ACGT", "TGCA", yourSeq)
[1] "TTTTGGGCCCAAANNN"

broom

broom: Convert Statistical Analysis Objects into Tidy Tibbles

lobstr package - dig into the internal representation and structure of R objects

lobstr 1.0.0

Data Science

See Data science page

jpeg

If we want to create the image on this wiki left hand side panel, we can use the jpeg package to read an existing plot and then edit and save it.

We can also use the jpeg package to import and manipulate a jpg image. See Fun with Heatmaps and Plotly.

Cairo

See White strips problem in png() or tiff().

cairoDevice

PS. Not sure the advantage of functions in this package compared to R's functions (eg. Cairo_svg() vs svg()).

For ubuntu OS, we need to install 2 libraries and 1 R package RGtk2.

sudo apt-get install libgtk2.0-dev libcairo2-dev

On Windows OS, we may got the error: unable to load shared object 'C:/Program Files/R/R-3.0.2/library/cairoDevice/libs/x64/cairoDevice.dll' . We need to follow the instruction in here.

igraph

creating directed networks with igraph

Identifying dependencies of R functions and scripts

https://stackoverflow.com/questions/8761857/identifying-dependencies-of-r-functions-and-scripts

library(mvbutils)
foodweb(where = "package:batr")

foodweb( find.funs("package:batr"), prune="survRiskPredict", lwd=2)

foodweb( find.funs("package:batr"), prune="classPredict", lwd=2)

iterators

Iterator is useful over for-loop if the data is already a collection. It can be used to iterate over a vector, data frame, matrix, file

Iterator can be combined to use with foreach package http://www.exegetic.biz/blog/2013/11/iterators-in-r/ has more elaboration.

Colors

colortools

Tools that allow users generate color schemes and palettes

colourpicker

A Colour Picker Tool for Shiny and for Selecting Colours in Plots

inlmisc

GetTolColors(). Lots of examples.

rex

Friendly Regular Expressions

formatR

The best strategy to avoid failure is to put comments in complete lines or after complete R expressions.

See also this discussion on stackoverflow talks about R code reformatting.

library(formatR)
tidy_source("Input.R", file = "output.R", width.cutoff=70)
tidy_source("clipboard") 
# default width is getOption("width") which is 127 in my case.

Some issues

  • Comments appearing at the beginning of a line within a long complete statement. This will break tidy_source().
cat("abcd",
    # This is my comment
    "defg")

will result in

> tidy_source("clipboard")
Error in base::parse(text = code, srcfile = NULL) : 
  3:1: unexpected string constant
2: invisible(".BeGiN_TiDy_IdEnTiFiEr_HaHaHa# This is my comment.HaHaHa_EnD_TiDy_IdEnTiFiEr")
3: "defg"
   ^
  • Comments appearing at the end of a line within a long complete statement won't break tidy_source() but tidy_source() cannot re-locate/tidy the comma sign.
cat("abcd"
    ,"defg"   # This is my comment
  ,"ghij")

will become

cat("abcd", "defg"  # This is my comment
, "ghij") 

Still bad!!

  • Comments appearing at the end of a line within a long complete statement breaks tidy_source() function. For example,
cat("</p>",
	"<HR SIZE=5 WIDTH=\"100%\" NOSHADE>",
	ifelse(codeSurv == 0,"<h3><a name='Genes'><b><u>Genes which are differentially expressed among classes:</u></b></a></h3>", #4/9/09
	                     "<h3><a name='Genes'><b><u>Genes significantly associated with survival:</u></b></a></h3>"), 
	file=ExternalFileName, sep="\n", append=T)

will result in

> tidy_source("clipboard", width.cutoff=70)
Error in base::parse(text = code, srcfile = NULL) : 
  3:129: unexpected SPECIAL
2: "<HR SIZE=5 WIDTH=\"100%\" NOSHADE>" ,
3: ifelse ( codeSurv == 0 , "<h3><a name='Genes'><b><u>Genes which are differentially expressed among classes:</u></b></a></h3>" , %InLiNe_IdEnTiFiEr%
  • width.cutoff parameter is not always working. For example, there is no any change for the following snippet though I hope it will move the cat() to the next line.
if (codePF & !GlobalTest & !DoExactPermTest) cat(paste("Multivariate Permutations test was computed based on", 
    NumPermutations, "random permutations"), "<BR>", " ", file = ExternalFileName, 
    sep = "\n", append = T)
  • It merges lines though I don't always want to do that. For example
cat("abcd"
    ,"defg"  
  ,"ghij")

will become

cat("abcd", "defg", "ghij") 

Download papers

biorxivr

Search and Download Papers from the bioRxiv Preprint Server

aRxiv

Interface to the arXiv API

pdftools

aside: set it aside

An RStudio addin to run long R commands aside your current session.

Teaching

  • smovie: Some Movies to Illustrate Concepts in Statistics

Organize R research project

How to save (and load) datasets in R (.Rdata vs .Rds file)

How to save (and load) datasets in R: An overview

Text to speech

Text-to-Speech with the googleLanguageR package

Weather data

logR

https://github.com/jangorecki/logR

Progress bar

https://github.com/r-lib/progress#readme

Configurable Progress bars, they may include percentage, elapsed time, and/or the estimated completion time. They work in terminals, in 'Emacs' 'ESS', 'RStudio', 'Windows' 'Rgui' and the 'macOS'.

cron

cronR

Different ways of using R

10 things R can do that might surprise you

https://simplystatistics.org/2019/03/13/10-things-r-can-do-that-might-surprise-you/

R call C/C++

Mainly talks about .C() and .Call().

Note that scalars and arrays must be passed using pointers. So if we want to access a function not exported from a package, we may need to modify the function to make the arguments as pointers.

SEXP

Some examples from packages

  • sva package has one C code function

R call Fortran

Embedding R

An very simple example (do not return from shell) from Writing R Extensions manual

The command-line R front-end, R_HOME/bin/exec/R, is one such example. Its source code is in file <src/main/Rmain.c>.

This example can be run by

R_HOME/bin/R CMD R_HOME/bin/exec/R

Note:

  1. R_HOME/bin/exec/R is the R binary. However, it couldn't be launched directly unless R_HOME and LD_LIBRARY_PATH are set up. Again, this is explained in Writing R Extension manual.
  2. R_HOME/bin/R is a shell-script front-end where users can invoke it. It sets up the environment for the executable. It can be copied to /usr/local/bin/R. When we run R_HOME/bin/R, it actually runs R_HOME/bin/R CMD R_HOME/bin/exec/R (see line 259 of R_HOME/bin/R as in R 3.0.2) so we know the important role of R_HOME/bin/exec/R.

More examples of embedding can be found in tests/Embedding directory. Read <index.html> for more information about these test examples.

An example from Bioconductor workshop

Example: Create <embed.c> file

#include <Rembedded.h>
#include <Rdefines.h>

static void doSplinesExample();
int
main(int argc, char *argv[])
{
    Rf_initEmbeddedR(argc, argv);
    doSplinesExample();
    Rf_endEmbeddedR(0);
    return 0;
}
static void
doSplinesExample()
{
    SEXP e, result;
    int errorOccurred;

    // create and evaluate 'library(splines)'
    PROTECT(e = lang2(install("library"), mkString("splines")));
    R_tryEval(e, R_GlobalEnv, &errorOccurred);
    if (errorOccurred) {
        // handle error
    }
    UNPROTECT(1);

    // 'options(FALSE)' ...
    PROTECT(e = lang2(install("options"), ScalarLogical(0)));
    // ... modified to 'options(example.ask=FALSE)' (this is obscure)
    SET_TAG(CDR(e), install("example.ask"));
    R_tryEval(e, R_GlobalEnv, NULL);
    UNPROTECT(1);

    // 'example("ns")'
    PROTECT(e = lang2(install("example"), mkString("ns")));
    R_tryEval(e, R_GlobalEnv, &errorOccurred);
    UNPROTECT(1);
}

Then build the executable. Note that I don't need to create R_HOME variable.

cd 
tar xzvf 
cd R-3.0.1
./configure --enable-R-shlib
make
cd tests/Embedding
make
~/R-3.0.1/bin/R CMD ./Rtest

nano embed.c
# Using a single line will give an error and cannot not show the real problem.
# ../../bin/R CMD gcc -I../../include -L../../lib -lR embed.c
# A better way is to run compile and link separately
gcc -I../../include -c embed.c
gcc -o embed embed.o -L../../lib -lR -lRblas
../../bin/R CMD ./embed

Note that if we want to call the executable file ./embed directly, we shall set up R environment by specifying R_HOME variable and including the directories used in linking R in LD_LIBRARY_PATH. This is based on the inform provided by Writing R Extensions.

export R_HOME=/home/brb/Downloads/R-3.0.2
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/brb/Downloads/R-3.0.2/lib
./embed # No need to include R CMD in front.

Question: Create a data frame in C? Answer: Use data.frame() via an eval() call from C. Or see the code is stats/src/model.c, as part of model.frame.default. Or using Rcpp as here.

Reference http://bioconductor.org/help/course-materials/2012/Seattle-Oct-2012/AdvancedR.pdf

Create a Simple Socket Server in R

This example is coming from this paper.

Create an R function

simpleServer <- function(port=6543)
{
  sock <- socketConnection ( port=port , server=TRUE)
  on.exit(close( sock ))
  cat("\nWelcome to R!\nR>" ,file=sock )
  while(( line <- readLines ( sock , n=1)) != "quit")
  {
    cat(paste("socket >" , line , "\n"))
    out<- capture.output (try(eval(parse(text=line ))))
    writeLines ( out , con=sock )
    cat("\nR> " ,file =sock )
  }
}

Then run simpleServer(). Open another terminal and try to communicate with the server

$ telnet localhost 6543
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.

Welcome to R!
R> summary(iris[, 3:5])
  Petal.Length    Petal.Width          Species  
 Min.   :1.000   Min.   :0.100   setosa    :50  
 1st Qu.:1.600   1st Qu.:0.300   versicolor:50  
 Median :4.350   Median :1.300   virginica :50  
 Mean   :3.758   Mean   :1.199                  
 3rd Qu.:5.100   3rd Qu.:1.800                  
 Max.   :6.900   Max.   :2.500                  

R> quit
Connection closed by foreign host.

Rserve

Note the way of launching Rserve is like the way we launch C program when R was embedded in C. See Example from Bioconductor workshop.

See my Rserve page.

(Commercial) StatconnDcom

R.NET

rJava

Terminal

# jdk 7
sudo apt-get install openjdk-7-*
update-alternatives --config java
# oracle jdk 8
sudo add-apt-repository -y ppa:webupd8team/java
sudo apt-get update
echo debconf shared/accepted-oracle-license-v1-1 select true | sudo debconf-set-selections
echo debconf shared/accepted-oracle-license-v1-1 seen true | sudo debconf-set-selections
sudo apt-get -y install openjdk-8-jdk

and then run the following (thanks to http://stackoverflow.com/questions/12872699/error-unable-to-load-installed-packages-just-now) to fix an error: libjvm.so: cannot open shared object file: No such file or directory.

  • Create the file /etc/ld.so.conf.d/java.conf with the following entries:
/usr/lib/jvm/java-8-oracle/jre/lib/amd64
/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server
  • And then run sudo ldconfig

Now go back to R

install.packages("rJava")

Done!

If above does not work, a simple way is by (under Ubuntu) running

sudo apt-get install r-cran-rjava

which will create new package 'default-jre' (under /usr/lib/jvm) and 'default-jre-headless'.

RCaller

RApache

littler

Provides hash-bang (#!) capability for R

FAQs:

root@ed5f80320266:/# ls -l /usr/bin/{r,R*}
# R 3.5.2 docker container
-rwxr-xr-x 1 root root 82632 Jan 26 18:26 /usr/bin/r        # binary, can be used for 'shebang' lines, r --help
                                              # Example: r --verbose -e "date()"

-rwxr-xr-x 1 root root  8722 Dec 20 11:35 /usr/bin/R        # text, R --help
                                              # Example: R -q -e "date()"

-rwxr-xr-x 1 root root 14552 Dec 20 11:35 /usr/bin/Rscript  # binary, can be used for 'shebang' lines, Rscript --help
                                              # It won't show the startup message when it is used in the command line.
                                              # Example: Rscript -e "date()"

We can install littler using two ways.

  • install.packages("littler"). This will install the latest version but the binary 'r' program is only available under the package/bin directory (eg ~/R/x86_64-pc-linux-gnu-library/3.4/littler/bin/r). You need to create a soft link in order to access it globally.
  • sudo apt install littler. This will install 'r' globally; however, the installed version may be old.

After the installation, vignette contains several examples. The off-line vignette has a table of contents. Nice! The web version of examples does not have the TOC.

r was not meant to run interactively like R. See man r.

RInside: Embed R in C++

See RInside

(From RInside documentation) The RInside package makes it easier to embed R in your C++ applications. There is no code you would execute directly from the R environment. Rather, you write C++ programs that embed R which is illustrated by some the included examples.

The included examples are armadillo, eigen, mpi, qt, standard, threads and wt.

To run 'make' when we don't have a global R, we should modify the file <Makefile>. Also if we just want to create one executable file, we can do, for example, 'make rinside_sample1'.

To run any executable program, we need to specify LD_LIBRARY_PATH variable, something like

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/brb/Downloads/R-3.0.2/lib 

The real build process looks like (check <Makefile> for completeness)

g++ -I/home/brb/Downloads/R-3.0.2/include \
    -I/home/brb/Downloads/R-3.0.2/library/Rcpp/include \
    -I/home/brb/Downloads/R-3.0.2/library/RInside/include -g -O2 -Wall \
    -I/usr/local/include   \
    rinside_sample0.cpp  \
    -L/home/brb/Downloads/R-3.0.2/lib -lR  -lRblas -lRlapack \
    -L/home/brb/Downloads/R-3.0.2/library/Rcpp/lib -lRcpp \
    -Wl,-rpath,/home/brb/Downloads/R-3.0.2/library/Rcpp/lib \
    -L/home/brb/Downloads/R-3.0.2/library/RInside/lib -lRInside \
    -Wl,-rpath,/home/brb/Downloads/R-3.0.2/library/RInside/lib \
    -o rinside_sample0

Hello World example of embedding R in C++.

#include <RInside.h>                    // for the embedded R via RInside

int main(int argc, char *argv[]) {

    RInside R(argc, argv);              // create an embedded R instance 

    R["txt"] = "Hello, world!\n";	// assign a char* (string) to 'txt'

    R.parseEvalQ("cat(txt)");           // eval the init string, ignoring any returns

    exit(0);
}

The above can be compared to the Hello world example in Qt.

#include <QApplication.h>
#include <QPushButton.h>

int main( int argc, char **argv )
{
    QApplication app( argc, argv );

    QPushButton hello( "Hello world!", 0 );
    hello.resize( 100, 30 );

    app.setMainWidget( &hello );
    hello.show();

    return app.exec();
}

RFortran

RFortran is an open source project with the following aim:

To provide an easy to use Fortran software library that enables Fortran programs to transfer data and commands to and from R.

It works only on Windows platform with Microsoft Visual Studio installed:(

Call R from other languages

C

Using R from C/C++

Error: “not resolved from current namespace” error, when calling C routines from R

Solution: add getNativeSymbolInfo() around your C/Fortran symbols. Search Google:r dyn.load not resolved from current namespace

JRI

http://www.rforge.net/JRI/

ryp2

http://rpy.sourceforge.net/rpy2.html

Create a standalone Rmath library

R has many math and statistical functions. We can easily use these functions in our C/C++/Fortran. The definite guide of doing this is on Chapter 9 "The standalone Rmath library" of R-admin manual.

Here is my experience based on R 3.0.2 on Windows OS.

Create a static library <libRmath.a> and a dynamic library <Rmath.dll>

Suppose we have downloaded R source code and build R from its source. See Build_R_from_its_source. Then the following 2 lines will generate files <libRmath.a> and <Rmath.dll> under C:\R\R-3.0.2\src\nmath\standalone directory.

cd C:\R\R-3.0.2\src\nmath\standalone
make -f Makefile.win

Use Rmath library in our code

set CPLUS_INCLUDE_PATH=C:\R\R-3.0.2\src\include
set LIBRARY_PATH=C:\R\R-3.0.2\src\nmath\standalone
# It is not LD_LIBRARY_PATH in above.

# Created <RmathEx1.cpp> from the book "Statistical Computing in C++ and R" web site
# http://math.la.asu.edu/~eubank/CandR/ch4Code.cpp
# It is OK to save the cpp file under any directory.

# Force to link against the static library <libRmath.a>
g++ RmathEx1.cpp -lRmath -lm -o RmathEx1.exe
# OR
g++ RmathEx1.cpp -Wl,-Bstatic -lRmath -lm -o RmathEx1.exe

# Force to link against dynamic library <Rmath.dll>
g++ RmathEx1.cpp Rmath.dll -lm -o RmathEx1Dll.exe

Test the executable program. Note that the executable program RmathEx1.exe can be transferred to and run in another computer without R installed. Isn't it cool!

c:\R>RmathEx1
Enter a argument for the normal cdf:
1
Enter a argument for the chi-squared cdf:
1
Prob(Z <= 1) = 0.841345
Prob(Chi^2 <= 1)= 0.682689

Below is the cpp program <RmathEx1.cpp>.

//RmathEx1.cpp
#define MATHLIB_STANDALONE 
#include <iostream>
#include "Rmath.h"

using std::cout; using std::cin; using std::endl;

int main()
{
  double x1, x2;
  cout << "Enter a argument for the normal cdf:" << endl;
  cin >> x1;
  cout << "Enter a argument for the chi-squared cdf:" << endl;
  cin >> x2;

  cout << "Prob(Z <= " << x1 << ") = " << 
    pnorm(x1, 0, 1, 1, 0)  << endl;
  cout << "Prob(Chi^2 <= " << x2 << ")= " << 
    pchisq(x2, 1, 1, 0) << endl;
  return 0;
}

Calling R.dll directly

See Chapter 8.2.2 of R Extensions. This is related to embedding R under Windows. The file <R.dll> on Windows is like <libR.so> on Linux.

Create HTML report

ReportingTools (Jason Hackney) from Bioconductor.

htmlTable package

The htmlTable package is intended for generating tables using HTML formatting. This format is compatible with Markdown when used for HTML-output. The most basic table can easily be created by just passing a matrix or a data.frame to the htmlTable-function.

formattable

htmltab package

This package is NOT used to CREATE html report but EXTRACT html table.

ztable package

Makes zebra-striped tables (tables with alternating row colors) in LaTeX and HTML formats easily from a data.frame, matrix, lm, aov, anova, glm or coxph objects.

Create academic report

reports package in CRAN and in github repository. The youtube video gives an overview of the package.

Create pdf and epub files

# Idea:
#        knitr        pdflatex
#   rnw -------> tex ----------> pdf
library(knitr)
knit("example.rnw") # create example.tex file
  • A very simple example <002-minimal.Rnw> from yihui.name works fine on linux.
git clone https://github.com/yihui/knitr-examples.git
  • <knitr-minimal.Rnw>. I have no problem to create pdf file on Windows but still cannot generate pdf on Linux from tex file. Some people suggested to run sudo apt-get install texlive-fonts-recommended to install missing fonts. It works!

To see a real example, check out DESeq2 package (inst/doc subdirectory). In addition to DESeq2, I also need to install DESeq, BiocStyle, airway, vsn, gplots, and pasilla packages from Bioconductor. Note that, it is best to use sudo/admin account to install packages.

Or starts with markdown file. Download the example <001-minimal.Rmd> and remove the last line of getting png file from internet.

# Idea:
#        knitr        pandoc
#   rmd -------> md ----------> pdf

git clone https://github.com/yihui/knitr-examples.git
cd knitr-examples
R -e "library(knitr); knit('001-minimal.Rmd')"
pandoc 001-minimal.md -o 001-minimal.pdf # require pdflatex to be installed !!

To create an epub file (not success yet on Windows OS, missing figures on Linux OS)

# Idea:
#        knitr        pandoc
#   rnw -------> tex ----------> markdown or epub

library(knitr)
knit("DESeq2.Rnw") # create DESeq2.tex
system("pandoc  -f latex -t markdown -o DESeq2.md DESeq2.tex")
## Windows OS, epub cannot be built
pandoc:
Error:
"source" (line 41, column 7):
unexpected "k"
expecting "{document}"

## Linux OS, epub missing figures and R codes.
## First install texlive base and extra packages
## sudo apt-get install texlive-latex-base texlive-latex-extra
pandoc: Could not find media `figure/SchwederSpjotvoll-1', skipping...
pandoc: Could not find media `figure/sortedP-1', skipping...
pandoc: Could not find media `figure/figHeatmap2c-1', skipping...
pandoc: Could not find media `figure/figHeatmap2b-1', skipping...
pandoc: Could not find media `figure/figHeatmap2a-1', skipping...
pandoc: Could not find media `figure/plotCountsAdv-1', skipping...
pandoc: Could not find media `figure/plotCounts-1', skipping...
pandoc: Could not find media `figure/MA-1', skipping...
pandoc: Could not find media `figure/MANoPrior-1', skipping...

The problems are at least

  • figures need to be generated under the same directory as the source code
  • figures cannot be in the format of pdf (DESeq2 generates both pdf and png files format)
  • missing R codes

Convert tex to epub

kable() for tables

Create Tables In LaTeX, HTML, Markdown And ReStructuredText

Create Word report

knitr + pandoc

It is better to create rmd file in RStudio. Rstudio provides a template for rmd file and it also provides a quick reference to R markdown language.

# Idea:
#        knitr       pandoc
#   rmd -------> md --------> docx
library(knitr)
knit2html("example.rmd") #Create md and html files

and then

FILE <- "example"
system(paste0("pandoc -o ", FILE, ".docx ", FILE, ".md"))

Note. For example reason, if I play around the above 2 commands for several times, the knit2html() does not work well. However, if I click 'Knit HTML' button on the RStudio, it then works again.

Another way is

library(pander)
name = "demo"
knit(paste0(name, ".Rmd"), encoding = "utf-8")
Pandoc.brew(file = paste0(name, ".md"), output = paste0(-name, "docx"), convert = "docx")

Note that once we have used knitr command to create a md file, we can use pandoc shell command to convert it to different formats:

  • A pdf file: pandoc -s report.md -t latex -o report.pdf
  • A html file: pandoc -s report.md -o report.html (with the -c flag html files can be added easily)
  • Openoffice: pandoc report.md -o report.odt
  • Word docx: pandoc report.md -o report.docx

We can also create the epub file for reading on Kobo ereader. For example, download this file and save it as example.Rmd. I need to remove the line containing the link to http://i.imgur.com/RVNmr.jpg since it creates an error when I run pandoc (not sure if it is the pandoc version I have is too old). Now we just run these 2 lines to get the epub file. Amazing!

knit("example.Rmd")
pandoc("example.md", format="epub")

PS. If we don't remove the link, we will get an error message (pandoc 1.10.1 on Windows 7)

> pandoc("Rmd_to_Epub.md", format="epub")
executing pandoc   -f markdown -t epub -o Rmd_to_Epub.epub "Rmd_to_Epub.utf8md"
pandoc.exe: .\.\http://i.imgur.com/RVNmr.jpg: openBinaryFile: invalid argument (Invalid argument)
Error in (function (input, format, ext, cfg)  : conversion failed
In addition: Warning message:
running command 'pandoc   -f markdown -t epub -o Rmd_to_Epub.epub "Rmd_to_Epub.utf8md"' had status 1

pander

Try pandoc[1] with a minimal reproducible example, you might give a try to my "pander" package [2] too:

library(pander)
Pandoc.brew(system.file('examples/minimal.brew', package='pander'),
            output = tempfile(), convert = 'docx')

Where the content of the "minimal.brew" file is something you might have got used to with Sweave - although it's using "brew" syntax instead. See the examples of pander [3] for more details. Please note that pandoc should be installed first, which is pretty easy on Windows.

  1. http://johnmacfarlane.net/pandoc/
  2. http://rapporter.github.com/pander/
  3. http://rapporter.github.com/pander/#examples

R2wd

Use R2wd package. However, only 32-bit R is allowed and sometimes it can not produce all 'table's.

> library(R2wd)
> wdGet()
Loading required package: rcom
Loading required package: rscproxy
rcom requires a current version of statconnDCOM installed.
To install statconnDCOM type
     installstatconnDCOM()

This will download and install the current version of statconnDCOM

You will need a working Internet connection
because installation needs to download a file.
Error in if (wdapp[["Documents"]][["Count"]] == 0) wdapp[["Documents"]]$Add() : 
  argument is of length zero 

The solution is to launch 32-bit R instead of 64-bit R since statconnDCOM does not support 64-bit R.

Convert from pdf to word

The best rendering of advanced tables is done by converting from pdf to Word. See http://biostat.mc.vanderbilt.edu/wiki/Main/SweaveConvert

rtf

Use rtf package for Rich Text Format (RTF) Output.

xtable

Package xtable will produce html output.

print(xtable(X), type="html")

If you save the file and then open it with Word, you will get serviceable results. I've had better luck copying the output from xtable and pasting it into Excel.

ReporteRs

Microsoft Word, Microsoft Powerpoint and HTML documents generation from R. The source code is hosted on https://github.com/davidgohel/ReporteRs

A quick exploration

PDF manipulation

staplr

R Graphs Gallery

COM client or server

Client

RDCOMClient where excel.link depends on it.

Server

RDCOMServer

Use R under proxy

http://support.rstudio.org/help/kb/faq/configuring-r-to-use-an-http-proxy

RStudio

  • Github
  • Installing RStudio (1.0.44) on Ubuntu will not install Java even the source code contains 37.5% Java??
  • Preview

rstudio.cloud

https://rstudio.cloud/

Launch RStudio

Multiple versions of R

Create .Rproj file

If you have an existing package that doesn't have an .Rproj file, you can use devtools::use_rstudio("path/to/package") to add it.

With an RStudio project file, you can

  • Restore .RData into workspace at startup
  • Save workspace to .RData on exit
  • Always save history (even if no saving .RData)
  • etc

package search

https://github.com/RhoInc/CRANsearcher

Git

Visual Studio

R and Python support now built in to Visual Studio 2017

List files using regular expression

  • Extension
list.files(pattern = "\\.txt$")

where the dot (.) is a metacharacter. It is used to refer to any character.

  • Start with
list.files(pattern = "^Something")

Using Sys.glob()"' as

> Sys.glob("~/Downloads/*.txt")
[1] "/home/brb/Downloads/ip.txt"       "/home/brb/Downloads/valgrind.txt"

Hidden tool: rsync in Rtools

c:\Rtools\bin>rsync -avz "/cygdrive/c/users/limingc/Downloads/a.exe" "/cygdrive/c/users/limingc/Documents/"
sending incremental file list
a.exe

sent 323142 bytes  received 31 bytes  646346.00 bytes/sec
total size is 1198416  speedup is 3.71

c:\Rtools\bin>

And rsync works best when we need to sync folder.

c:\Rtools\bin>rsync -avz "/cygdrive/c/users/limingc/Downloads/binary" "/cygdrive/c/users/limingc/Documents/"
sending incremental file list
binary/
binary/Eula.txt
binary/cherrytree.lnk
binary/depends64.chm
binary/depends64.dll
binary/depends64.exe
binary/mtputty.exe
binary/procexp.chm
binary/procexp.exe
binary/pscp.exe
binary/putty.exe
binary/sqlite3.exe
binary/wget.exe

sent 4115294 bytes  received 244 bytes  1175868.00 bytes/sec
total size is 8036311  speedup is 1.95

c:\Rtools\bin>rm c:\users\limingc\Documents\binary\procexp.exe
cygwin warning:
  MS-DOS style path detected: c:\users\limingc\Documents\binary\procexp.exe
  Preferred POSIX equivalent is: /cygdrive/c/users/limingc/Documents/binary/procexp.exe
  CYGWIN environment variable option "nodosfilewarning" turns off this warning.
  Consult the user's guide for more details about POSIX paths:
    http://cygwin.com/cygwin-ug-net/using.html#using-pathnames

c:\Rtools\bin>rsync -avz "/cygdrive/c/users/limingc/Downloads/binary" "/cygdrive/c/users/limingc/Documents/"
sending incremental file list
binary/
binary/procexp.exe

sent 1767277 bytes  received 35 bytes  3534624.00 bytes/sec
total size is 8036311  speedup is 4.55

c:\Rtools\bin>

Unforunately, if the destination is a network drive, I could get a permission denied (13) error. See also http://superuser.com/questions/69620/rsync-file-permissions-on-windows

Install rgdal package (geospatial Data) on ubuntu

Terminal

sudo apt-get install libgdal1-dev libproj-dev # https://stackoverflow.com/a/44389304
sudo apt-get install libgdal1i # Ubuntu 16.04 https://stackoverflow.com/a/12143411

R

install.packages("rgdal")

Install sf package

I got the following error even I have installed some libraries.

checking GDAL version >= 2.0.1... no
configure: error: sf is not compatible with GDAL versions below 2.0.1

Then I follow the instruction here

sudo apt remove libgdal-dev
sudo apt remove libproj-dev
sudo apt remove gdal-bin
sudo add-apt-repository ppa:ubuntugis/ubuntugis-stable

sudo apt update
sudo apt-cache policy libgdal-dev # Make sure a version >= 2.0 appears 
sudo apt install libgdal-dev

Set up Emacs on Windows

Edit the file C:\Program Files\GNU Emacs 23.2\site-lisp\site-start.el with something like

(setq-default inferior-R-program-name
              "c:/program files/r/r-2.15.2/bin/i386/rterm.exe")

Database

RSQLite

Creating a new database:

library(DBI)

mydb <- dbConnect(RSQLite::SQLite(), "my-db.sqlite")
dbDisconnect(mydb)
unlink("my-db.sqlite")

# temporary database
mydb <- dbConnect(RSQLite::SQLite(), "")
dbDisconnect(mydb)

Loading data:

mydb <- dbConnect(RSQLite::SQLite(), "")
dbWriteTable(mydb, "mtcars", mtcars)
dbWriteTable(mydb, "iris", iris)

dbListTables(mydb)

dbListFields(con, "mtcars")

dbReadTable(con, "mtcars")

Queries:

dbGetQuery(mydb, 'SELECT * FROM mtcars LIMIT 5')

dbGetQuery(mydb, 'SELECT * FROM iris WHERE "Sepal.Length" < 4.6')

dbGetQuery(mydb, 'SELECT * FROM iris WHERE "Sepal.Length" < :x', params = list(x = 4.6))

res <- dbSendQuery(con, "SELECT * FROM mtcars WHERE cyl = 4")
dbFetch(res)

Batched queries:

dbClearResult(rs)
rs <- dbSendQuery(mydb, 'SELECT * FROM mtcars')
while (!dbHasCompleted(rs)) {
  df <- dbFetch(rs, n = 10)
  print(nrow(df))
}

dbClearResult(rs)

Multiple parameterised queries:

rs <- dbSendQuery(mydb, 'SELECT * FROM iris WHERE "Sepal.Length" = :x')
dbBind(rs, param = list(x = seq(4, 4.4, by = 0.1)))
nrow(dbFetch(rs))
#> [1] 4
dbClearResult(rs)

Statements:

dbExecute(mydb, 'DELETE FROM iris WHERE "Sepal.Length" < 4')
#> [1] 0
rs <- dbSendStatement(mydb, 'DELETE FROM iris WHERE "Sepal.Length" < :x')
dbBind(rs, param = list(x = 4.5))
dbGetRowsAffected(rs)
#> [1] 4
dbClearResult(rs)

sqldf

Manipulate R data frames using SQL. Depends on RSQLite. A use of gsub, reshape2 and sqldf with healthcare data

RPostgreSQL

RMySQL

MongoDB

odbc

RODBC

DBI

dbplyr

Create a new SQLite database:

surveys <- read.csv("data/surveys.csv")
plots <- read.csv("data/plots.csv")

my_db_file <- "portal-database.sqlite"
my_db <- src_sqlite(my_db_file, create = TRUE)

copy_to(my_db, surveys)
copy_to(my_db, plots)
my_db

Connect to a database:

download.file(url = "https://ndownloader.figshare.com/files/2292171",
              destfile = "portal_mammals.sqlite", mode = "wb")

library(dbplyr)
library(dplyr)
mammals <- src_sqlite("portal_mammals.sqlite")

Querying the database with the SQL syntax:

tbl(mammals, sql("SELECT year, species_id, plot_id FROM surveys"))

Querying the database with the dplyr syntax:

surveys <- tbl(mammals, "surveys")
surveys %>%
    select(year, species_id, plot_id)
head(surveys, n = 10)

show_query(head(surveys, n = 10)) # show which SQL commands are actually sent to the database

Simple database queries:

surveys %>%
  filter(weight < 5) %>%
  select(species_id, sex, weight)

Laziness (instruct R to stop being lazy):

data_subset <- surveys %>%
  filter(weight < 5) %>%
  select(species_id, sex, weight) %>%
  collect()

Complex database queries:

plots <- tbl(mammals, "plots")
plots # # The plot_id column features in the plots table

surveys # The plot_id column also features in the surveys table

# Join databases method 1
plots %>%
  filter(plot_id == 1) %>%
  inner_join(surveys) %>%
  collect()

NoSQL

nodbi: the NoSQL Database Connector

Github

R source

https://github.com/wch/r-source/ Daily update, interesting, should be visited every day. Clicking 1000+ commits to look at daily changes.

If we are interested in a certain branch (say 3.2), look for R-3-2-branch.

R packages (only) source (metacran)

Bioconductor packages source

Announcement, https://github.com/Bioconductor-mirror

Send local repository to Github in R by using reports package

http://www.youtube.com/watch?v=WdOI_-aZV0Y

My collection

How to download

Clone ~ Download.

  • Command line
git clone https://gist.github.com/4484270.git

This will create a subdirectory called '4484270' with all cloned files there.

  • Within R
library(devtools)
source_gist("4484270")

or First download the json file from

https://api.github.com/users/MYUSERLOGIN/gists

and then

library(RJSONIO)
x <- fromJSON("~/Downloads/gists.json")
setwd("~/Downloads/")
gist.id <- lapply(x, "[[", "id")
lapply(gist.id, function(x){
  cmd <- paste0("git clone https://gist.github.com/", x, ".git")
  system(cmd)
})

Jekyll

An Easy Start with Jekyll, for R-Bloggers

Connect R with Arduino

Android App

Common plots tips

Grouped boxplots

Weather Time Line

The plot looks similar to a boxplot though it is not. See a screenshot on Android by Sam Ruston.

Horizontal bar plot

library(ggplot2)
dtf <- data.frame(x = c("ETB", "PMA", "PER", "KON", "TRA", 
                        "DDR", "BUM", "MAT", "HED", "EXP"),
                  y = c(.02, .11, -.01, -.03, -.03, .02, .1, -.01, -.02, 0.06))
ggplot(dtf, aes(x, y)) +
  geom_bar(stat = "identity", aes(fill = x), show.legend = FALSE) + 
  coord_flip() + xlab("") + ylab("Fold Change")

Ggplot2bar.svg

Include bar values in a barplot

Use text().

Or use geom_text() if we are using the ggplot2 package. See an example here or this.

For stacked barplot, see this post.

Grouped barplots

Math expression

# Expressions
plot(x,y, xlab = expression(hat(x)[t]),
     ylab = expression(phi^{rho + a}),
     main = "Pure Expressions")

# Expressions with Spacing
# '~' is to add space and '*' is to squish characters together
plot(1:10, xlab= expression(Delta * 'C'))
plot(x,y, xlab = expression(hat(x)[t] ~ z ~ w),
     ylab = expression(phi^{rho + a} * z * w),
     main = "Pure Expressions with Spacing")

# Expressions with Text
plot(x,y, 
     xlab = expression(paste("Text here ", hat(x), " here ", z^rho, " and here")), 
     ylab = expression(paste("Here is some text of ", phi^{rho})), 
     main = "Expressions with Text")

# Substituting Expressions
plot(x,y, 
     xlab = substitute(paste("Here is ", pi, " = ", p), list(p = py)), 
     ylab = substitute(paste("e is = ", e ), list(e = ee)), 
     main = "Substituted Expressions")

Impose a line to a scatter plot

  • abline + lsfit # least squares
plot(cars)
abline(lsfit(cars[, 1], cars[, 2]))
# OR
abline(lm(cars[,2] ~ cars[,1]))
  • abline + line # robust line fitting
plot(cars)
(z <- line(cars))
abline(coef(z), col = 'green')
  • lines
plot(cars)
fit <- lm(cars[,2] ~ cars[,1])
lines(cars[,1], fitted(fit), col="blue")
lines(stats::lowess(cars), col='red')

Rotating x axis labels for barplot

https://stackoverflow.com/questions/10286473/rotating-x-axis-labels-in-r-for-barplot

barplot(mytable,main="Car makes",ylab="Freqency",xlab="make",las=2)

Set R plots x axis to show at y=0

https://stackoverflow.com/questions/3422203/set-r-plots-x-axis-to-show-at-y-0

plot(1:10, rnorm(10), ylim=c(0,10), yaxs="i")

Different colors of axis labels in barplot

See Vary colors of axis labels in R based on another variable

Method 1: Append labels for the 2nd, 3rd, ... color gradually because 'col.axis' argument cannot accept more than one color.

tN <- table(Ni <- stats::rpois(100, lambda = 5))
r <- barplot(tN, col = rainbow(20))
axis(1, 1, LETTERS[1], col.axis="red", col="red")
axis(1, 2, LETTERS[2], col.axis="blue", col = "blue")

Method 2: text() which can accept multiple colors in 'col' parameter but we need to find out the (x, y) by ourselves.

barplot(tN, col = rainbow(20), axisnames = F)
text(4:6, par("usr")[3]-2 , LETTERS[4:6], col=c("black","red","blue"), xpd=TRUE)

Use text() to draw labels on X/Y-axis including rotation

par(mar = c(5, 6, 4, 5) + 0.1)
plot(..., xaxt = "n") # "n" suppresses plotting of the axis; need mtext() and axis() to supplement
text(x = barCenters, y = par("usr")[3] - 1, srt = 45,
     adj = 1, labels = myData$names, xpd = TRUE)

Vertically stacked plots with the same x axis

https://stackoverflow.com/questions/11794436/stacking-multiple-plots-vertically-with-the-same-x-axis-but-different-y-axes-in

Increase/decrease legend font size

https://stackoverflow.com/a/36842578

plot(rnorm(100))
op <- par(cex=2)
legend("topleft", legend = 1:4, col=1:4, pch=1)
par(op)

Superimpose a density plot or any curves

Use lines().

Example 1

plot(cars, main = "Stopping Distance versus Speed")
lines(stats::lowess(cars))

Example 2

require(survival)
n = 10000
beta1 = 2; beta2 = -1
lambdaT = 1 # baseline hazard
lambdaC = 2  # hazard of censoring
set.seed(1234)
x1 = rnorm(n,0)
x2 = rnorm(n,0)
# true event time
T = rweibull(n, shape=1, scale=lambdaT*exp(-beta1*x1-beta2*x2)) 
C <- rweibull(n, shape=1, scale=lambdaC)   
time = pmin(T,C)  
status <- 1*(T <= C) 
status2 <- 1-status
plot(survfit(Surv(time, status2) ~ 1), 
     ylab="Survival probability",
     main = 'Exponential censoring time')
xseq <- seq(.1, max(time), length =100)
func <- function(x) 1-pweibull(x, shape = 1, scale = lambdaC)
lines(xseq, func(xseq), col = 'red') # survival function of Weibull

Custom scales

Using custom scales with the 'scales' package

Time series

Time series stock price plot

library(quantmod)
getSymbols("AAPL")
getSymbols("IBM") # similar to AAPL
getSymbols("CSCO") # much smaller than AAPL, IBM
getSymbols("DJI") # Dow Jones, huge 
chart_Series(Cl(AAPL), TA="add_TA(Cl(IBM), col='blue', on=1); add_TA(Cl(CSCO), col = 'green', on=1)", 
    col='orange', subset = '2017::2017-08')

tail(Cl(DJI))

Timeline plot

https://stackoverflow.com/questions/20695311/chronological-timeline-with-points-in-time-and-format-date

Circular plot

Word cloud

World map

Visualising SSH attacks with R (rworldmap and rgeolocate packages)

Diagram/flowchart/Directed acyclic diagrams (DAGs)

DiagrammeR

diagram

Functions for Visualising Simple Graphs (Networks), Plotting Flow Diagrams

DAGitty (browser-based and R package)

dagR

Venn Diagram

# systemPipeR package method
library(systemPipeR)
setlist <- list(A=sample(letters, 18), B=sample(letters, 16), C=sample(letters, 20), D=sample(letters, 22), E=sample(letters, 18)) 
OLlist <- overLapper(setlist[1:3], type="vennsets")
vennPlot(list(OLlist))                             

# R script source method
source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/overLapper.R") 
setlist <- list(A=sample(letters, 18), B=sample(letters, 16), C=sample(letters, 20), D=sample(letters, 22), E=sample(letters, 18)) 
# or (obtained by dput(setlist))
setlist <- structure(list(A = c("o", "h", "u", "p", "i", "s", "a", "w", 
"b", "z", "n", "c", "k", "j", "y", "m", "t", "q"), B = c("h", 
"r", "x", "y", "b", "t", "d", "o", "m", "q", "g", "v", "c", "u", 
"f", "z"), C = c("b", "e", "t", "u", "s", "j", "o", "k", "d", 
"l", "g", "i", "w", "n", "p", "a", "y", "x", "m", "z"), D = c("f", 
"g", "b", "k", "j", "m", "e", "q", "i", "d", "o", "l", "c", "t", 
"x", "r", "s", "u", "w", "a", "z", "n"), E = c("u", "w", "o", 
"k", "n", "h", "p", "z", "l", "m", "r", "d", "q", "s", "x", "b", 
"v", "t"), F = c("o", "j", "r", "c", "l", "l", "u", "b", "f", 
"d", "u", "m", "y", "t", "y", "s", "a", "g", "t", "m", "x", "m"
)), .Names = c("A", "B", "C", "D", "E", "F"))

OLlist <- overLapper(setlist[1:3], type="vennsets")
counts <- list(sapply(OLlist$Venn_List, length))  
vennPlot(counts=counts)

Vennplot.png

Bump chart/Metro map

https://dominikkoch.github.io/Bump-Chart/

Amazing plots

New R logo 2/11/2016

library(sp)
library(maptools)
library(ggplot2)
library(ggthemes)
# rgeos requires the installation of GEOS from http://trac.osgeo.org/geos/
system("curl http://download.osgeo.org/geos/geos-3.5.0.tar.bz2 | tar jx")
system("cd geos-3.5.0; ./configure; make; sudo make install")
library(rgeos)
 
r_wkt_gist_file <- "https://gist.githubusercontent.com/hrbrmstr/07d0ccf14c2ff109f55a/raw/db274a39b8f024468f8550d7aeaabb83c576f7ef/rlogo.wkt"
if (!file.exists("rlogo.wkt")) download.file(r_wkt_gist_file, "rlogo.wkt")
rlogo <- readWKT(paste0(readLines("rlogo.wkt", warn=FALSE))) # rgeos
 
rlogo_shp <- SpatialPolygonsDataFrame(rlogo, data.frame(poly=c("halo", "r"))) # sp
rlogo_poly <- fortify(rlogo_shp, region="poly") # ggplot2
 
ggplot(rlogo_poly) + 
  geom_polygon(aes(x=long, y=lat, group=id, fill=id)) + 
  scale_fill_manual(values=c(halo="#b8babf", r="#1e63b5")) +
  coord_equal() + 
  theme_map() + 
  theme(legend.position="none")

3D plot

Using persp function to create the following plot. Code in github.

3dpersp.png

Christmas tree

http://wiekvoet.blogspot.com/2014/12/merry-christmas.html. Code in github.

XMastree.png

Happy Thanksgiving

Turkey

Turkey.png

Happy Valentine's Day

treemap

http://ipub.com/treemap/

TreemapPop.png

Voronoi diagram

Silent Night

Silentnight.png

The code in github.

The Travelling Salesman Portrait

https://fronkonstin.com/2018/04/04/the-travelling-salesman-portrait/

Moon phase calendar

https://chichacha.netlify.com/2018/05/26/making-calendar-with-ggplot-moon-phase-calendar/

Calendar heatmap

https://stackoverflow.com/questions/26171068/add-dates-to-calendar-heat-map-r

Chaos

Rcpp, Camarón de la Isla and the Beauty of Maths

Google Analytics

GAR package

http://www.analyticsforfun.com/2015/10/query-your-google-analytics-data-with.html

Linear Programming

http://www.r-bloggers.com/modeling-and-solving-linear-programming-with-r-free-book/

Amazon Alexa

R and Singularity

https://www.rstudio.com/rviews/2017/03/29/r-and-singularity/

Teach kids about R with Minecraft

http://blog.revolutionanalytics.com/2017/06/teach-kids-about-r-with-minecraft.html

Secure API keys

Securely store API keys in R scripts with the "secret" package

Vision and image recognition

Turn pictures into coloring pages

https://gist.github.com/jeroen/53a5f721cf81de2acba82ea47d0b19d0

Numerical optimization

R packages

R packages

Tricks

Getting help

Better Coder/coding, best practices

E-notation

6.022E23 (or 6.022e23) is equivalent to 6.022×10^23

Change default R repository

Change R repository

Edit global Rprofile file. On *NIX platforms, it's located in /usr/lib/R/library/base/R/Rprofile although local .Rprofile settings take precedence.

For example, I can specify the R mirror I like by creating a single line <.Rprofile> file under my home directory.

local({
  r = getOption("repos")
  r["CRAN"] = "https://cran.rstudio.com/"
  options(repos = r)
})
options(continue = "  ")
message("Hi MC, loading ~/.Rprofile")
if (interactive()) {
  .Last <- function() try(savehistory("~/.Rhistory"))
}

Change the default web browser

When I run help.start() function in LXLE, it cannot find its default web browser (seamonkey).

> help.start()
If the browser launched by 'xdg-open' is already running, it is *not*
    restarted, and you must switch to its window.
Otherwise, be patient ...
> /usr/bin/xdg-open: 461: /usr/bin/xdg-open: x-www-browser: not found
/usr/bin/xdg-open: 461: /usr/bin/xdg-open: firefox: not found
/usr/bin/xdg-open: 461: /usr/bin/xdg-open: mozilla: not found
/usr/bin/xdg-open: 461: /usr/bin/xdg-open: epiphany: not found
/usr/bin/xdg-open: 461: /usr/bin/xdg-open: konqueror: not found
/usr/bin/xdg-open: 461: /usr/bin/xdg-open: chromium-browser: not found
/usr/bin/xdg-open: 461: /usr/bin/xdg-open: google-chrome: not found
/usr/bin/xdg-open: 461: /usr/bin/xdg-open: links2: not found
/usr/bin/xdg-open: 461: /usr/bin/xdg-open: links: not found
/usr/bin/xdg-open: 461: /usr/bin/xdg-open: lynx: not found
/usr/bin/xdg-open: 461: /usr/bin/xdg-open: w3m: not found
xdg-open: no method available for opening 'http://127.0.0.1:27919/doc/html/index.html'

The solution is to put

options(browser='seamonkey')

in the .Rprofile of your home directory. If the browser is not in the global PATH, we need to put the full path above.

For one-time only purpose, we can use the browser option in help.start() function:

> help.start(browser="seamonkey")
If the browser launched by 'seamonkey' is already running, it is *not*
    restarted, and you must switch to its window.
Otherwise, be patient ...

We can work made a change (or create the file) ~/.Renviron or etc/Renviron. See

Getting user's home directory

See What are HOME and working directories?

# Windows
normalizePath("~")   # "C:\\Users\\brb\\Documents"
Sys.getenv("R_USER") # "C:/Users/brb/Documents"
Sys.getenv("HOME")   # "C:/Users/brb/Documents"

# Mac
normalizePath("~")   # [1] "/Users/brb"
Sys.getenv("R_USER") # [1] ""
Sys.getenv("HOME")   # "/Users/brb"

# Linux
normalizePath("~")   # [1] "/home/brb"
Sys.getenv("R_USER") # [1] ""
Sys.getenv("HOME")   # [1] "/home/brb"

Rprofile.site, Renviron.site (all platforms) and Rconsole (Windows only)

If we like to install R packages to a personal directory, follow this. Just add the line

R_LIBS_SITE=F:/R/library

to the file R_HOME/etc/x64/Renviron.site.

Note that on Windows OS, R/etc contains

$ ls -l /c/Progra~1/r/r-3.2.0/etc
total 142
-rw-r--r--    1   Administ     1043 Jun 20  2013 Rcmd_environ
-rw-r--r--    1   Administ     1924 Mar 17  2010 Rconsole
-rw-r--r--    1   Administ      943 Oct  3  2011 Rdevga
-rw-r--r--    1   Administ      589 May 20  2013 Rprofile.site
-rw-r--r--    1   Administ   251894 Jan 17  2015 curl-ca-bundle.crt
drwxr-xr-x    1   Administ        0 Jun  8 10:30 i386
-rw-r--r--    1   Administ     1160 Dec 31  2014 repositories
-rw-r--r--    1   Administ    30188 Mar 17  2010 rgb.txt
drwxr-xr-x    3   Administ        0 Jun  8 10:30 x64

$ ls /c/Progra~1/r/r-3.2.0/etc/i386
Makeconf

$ cat /c/Progra~1/r/r-3.2.0/etc/Rconsole
# Optional parameters for the console and the pager
# The system-wide copy is in R_HOME/etc.
# A user copy can be installed in `R_USER'.

## Style
# This can be `yes' (for MDI) or `no' (for SDI).
  MDI = yes
# MDI = no

# the next two are only relevant for MDI
toolbar = yes
statusbar = no

## Font.
# Please use only fixed width font.
# If font=FixedFont the system fixed font is used; in this case
# points and style are ignored. If font begins with "TT ", only
# True Type fonts are searched for.
font = TT Courier New
points = 10
style = normal # Style can be normal, bold, italic

# Dimensions (in characters) of the console.
rows = 25
columns = 80
# Dimensions (in characters) of the internal pager.
pgrows = 25
pgcolumns = 80
# should options(width=) be set to the console width?
setwidthonresize = yes

# memory limits for the console scrolling buffer, in chars and lines
# NB: bufbytes is in bytes for R < 2.7.0, chars thereafter.
bufbytes = 250000
buflines = 8000

# Initial position of the console (pixels, relative to the workspace for MDI)
# xconsole = 0
# yconsole = 0

# Dimension of MDI frame in pixels
# Format (w*h+xorg+yorg) or use -ve w and h for offsets from right bottom
# This will come up maximized if w==0
# MDIsize = 0*0+0+0
# MDIsize = 1000*800+100+0
# MDIsize = -50*-50+50+50  # 50 pixels space all round

# The internal pager can displays help in a single window
# or in multiple windows (one for each topic)
# pagerstyle can be set to `singlewindow' or `multiplewindows'
pagerstyle = multiplewindows

## Colours for console and pager(s)
# (see rwxxxx/etc/rgb.txt for the known colours).
background = White
normaltext = NavyBlue
usertext = Red
highlight = DarkRed

## Initial position of the graphics window
## (pixels, <0 values from opposite edge)
xgraphics = -25
ygraphics = 0

## Language for messages
language =

## Default setting for console buffering: 'yes' or 'no'
buffered = yes

and on Linux

brb@brb-T3500:~$ whereis R
R: /usr/bin/R /etc/R /usr/lib/R /usr/bin/X11/R /usr/local/lib/R /usr/share/R /usr/share/man/man1/R.1.gz

brb@brb-T3500:~$ ls /usr/lib/R
bin  COPYING  etc  lib  library  modules  site-library  SVN-REVISION

brb@brb-T3500:~$ ls /usr/lib/R/etc
javaconf  ldpaths  Makeconf  Renviron  Renviron.orig  Renviron.site  Renviron.ucf  repositories  Rprofile.site

brb@brb-T3500:~$ ls /usr/local/lib/R
site-library

and

brb@brb-T3500:~$ cat /usr/lib/R/etc/Rprofile.site
##                                              Emacs please make this -*- R -*-
## empty Rprofile.site for R on Debian
##
## Copyright (C) 2008 Dirk Eddelbuettel and GPL'ed
##
## see help(Startup) for documentation on ~/.Rprofile and Rprofile.site

# ## Example of .Rprofile
# options(width=65, digits=5)
# options(show.signif.stars=FALSE)
# setHook(packageEvent("grDevices", "onLoad"),
#         function(...) grDevices::ps.options(horizontal=FALSE))
# set.seed(1234)
# .First <- function() cat("\n   Welcome to R!\n\n")
# .Last <- function()  cat("\n   Goodbye!\n\n")

# ## Example of Rprofile.site
# local({
#  # add MASS to the default packages, set a CRAN mirror
#  old <- getOption("defaultPackages"); r <- getOption("repos")
#  r["CRAN"] <- "http://my.local.cran"
#  options(defaultPackages = c(old, "MASS"), repos = r)
#})
brb@brb-T3500:~$ cat /usr/lib/R/etc/Renviron.site
##                                              Emacs please make this -*- R -*-
## empty Renviron.site for R on Debian
##
## Copyright (C) 2008 Dirk Eddelbuettel and GPL'ed
##
## see help(Startup) for documentation on ~/.Renviron and Renviron.site

# ## Example ~/.Renviron on Unix
# R_LIBS=~/R/library
# PAGER=/usr/local/bin/less

# ## Example .Renviron on Windows
# R_LIBS=C:/R/library
# MY_TCLTK="c:/Program Files/Tcl/bin"

# ## Example of setting R_DEFAULT_PACKAGES (from R CMD check)
# R_DEFAULT_PACKAGES='utils,grDevices,graphics,stats'
# # this loads the packages in the order given, so they appear on
# # the search path in reverse order.
brb@brb-T3500:~$

What is the best place to save Rconsole on Windows platform

Put/create the file <Rconsole> under C:/Users/USERNAME/Documents folder so no matter how R was upgraded/downgraded, it always find my preference.

My preferred settings:

  • Font: Consolas (it will be shown as "TT Consolas" in Rconsole)
  • Size: 12
  • background: black
  • normaltext: white
  • usertext: GreenYellow or orange (close to RStudio's Cobalt theme) or sienna1 or SpringGreen or tan1 or yellow

and others (default options)

  • pagebg: white
  • pagetext: navy
  • highlight: DarkRed
  • dataeditbg: white
  • dataedittext: navy (View() function)
  • dataedituser: red
  • editorbg: white (edit() function)
  • editortext: black

Saving and loading history automatically: .Rprofile & local()

options(continue="  ") # default is "+ "
options(editor="nano") # default is "vi" on Linux
# options(htmlhelp=TRUE) 

local((r <- getOption("repos")
  r["CRAN"] <- "http://cran.rstudio.com"
  options(repos = r)))

.First <- function(){
 # library(Hmisc)
 cat("\nWelcome at", date(), "\n")
}

.Last <- function(){
 cat("\nGoodbye at ", date(), "\n")
}  
  • https://stackoverflow.com/questions/16734937/saving-and-loading-history-automatically
  • The history file will always be read from the $HOME directory and the history file will be overwritten by a new session. These two problems can be solved if we define R_HISTFILE system variable.
  • local() function can be used in .Rprofile file to set up the environment even no new variables will be created (change repository, install packages, load libraries, source R files, run system() function, file/directory I/O, etc)

Linux or Mac

In ~/.profile or ~/.bashrc I put:

export R_HISTFILE=~/.Rhistory

In ~/.Rprofile I put:

if (interactive()) {
  if (.Platform$OS.type == "unix")  .First <- function() try(utils::loadhistory("~/.Rhistory")) 
  .Last <- function() try(savehistory(file.path(Sys.getenv("HOME"), ".Rhistory")))
}

Windows

If you launch R by clicking its icon from Windows Desktop, the R starts in C:\User\$USER\Documents directory. So we can create a new file .Rprofile in this directory.

if (interactive()) {
  .Last <- function() try(savehistory(file.path(Sys.getenv("HOME"), ".Rhistory")))
}

R release versions

rversions: Query the main 'R' 'SVN' repository to find the released versions & dates.

Detect number of running R instances in Windows

C:\Program Files\R>tasklist /FI "IMAGENAME eq Rscript.exe"
INFO: No tasks are running which match the specified criteria.

C:\Program Files\R>tasklist /FI "IMAGENAME eq Rgui.exe"

Image Name                     PID Session Name        Session#    Mem Usage
============================================================================
Rgui.exe                      1096 Console                    1     44,712 K

C:\Program Files\R>tasklist /FI "IMAGENAME eq Rserve.exe"

Image Name                     PID Session Name        Session#    Mem Usage
============================================================================
Rserve.exe                    6108 Console                    1    381,796 K

In R, we can use

> system('tasklist /FI "IMAGENAME eq Rgui.exe" ', intern = TRUE)
[1] ""                                                                            
[2] "Image Name                     PID Session Name        Session#    Mem Usage"
[3] "============================================================================"
[4] "Rgui.exe                      1096 Console                    1     44,804 K"

> length(system('tasklist /FI "IMAGENAME eq Rgui.exe" ', intern = TRUE))-3

Editor

http://en.wikipedia.org/wiki/R_(programming_language)#Editors_and_IDEs

  • Emacs + ESS. The ESS is useful in the case I want to tidy R code (the tidy_source() function in the formatR package sometimes gives errors; eg when I tested it on an R file like <GetComparisonResults.R> from BRB-ArrayTools v4.4 stable).
  • Rstudio - editor/R terminal/R graphics/file browser/package manager. The new version (0.98) also provides a new feature for debugging step-by-step. See also RStudio Tricks
  • geany - I like the feature that it shows defined functions on the side panel even for R code. RStudio can also do this (see the bottom of the code panel).
  • Rgedit which includes a feature of splitting screen into two panes and run R in the bottom panel. See here.
  • Komodo IDE with browser preview http://www.youtube.com/watch?v=wv89OOw9roI at 4:06 and http://docs.activestate.com/komodo/4.4/editor.html

GUI for Data Analysis

Rcmdr

http://cran.r-project.org/web/packages/Rcmdr/index.html

Deducer

http://cran.r-project.org/web/packages/Deducer/index.html

jamovi

Scope

See

  • Assignments within functions in the An Introduction to R manual.
  • source() does not work like C's preprocessor where statements in header files will be literally inserted into the code. It does not work when you define a variable in a function but want to use it outside the function (even through source())
## foo.R ##
cat(ArrayTools, "\n")
## End of foo.R

# 1. Error
predict <- function() {
  ArrayTools <- "C:/Program Files" # or through load() function 
  source("foo.R")                  # or through a function call; foo()
}
predict()   # Object ArrayTools not found

# 2. OK. Make the variable global
predict <- function() {
  ArrayTools <<- "C:/Program Files'
  source("foo.R")
}
predict()  
ArrayTools

# 3. OK. Create a global variable
ArrayTools <- "C:/Program Files"
predict <- function() {
  source("foo.R")
}
predict()

Note that any ordinary assignments done within the function are local and temporary and are lost after exit from the function.

Example 1.

> ttt <- data.frame(type=letters[1:5], JpnTest=rep("999", 5), stringsAsFactors = F)
> ttt
  type JpnTest
1    a     999
2    b     999
3    c     999
4    d     999
5    e     999
> jpntest <- function() { ttt$JpnTest[1] ="N5"; print(ttt)}
> jpntest()
  type JpnTest
1    a      N5
2    b     999
3    c     999
4    d     999
5    e     999
> ttt
  type JpnTest
1    a     999
2    b     999
3    c     999
4    d     999
5    e     999

Example 2. How can we set global variables inside a function? The answer is to use the "<<-" operator or assign(, , envir = .GlobalEnv) function.

Other resource: Advanced R by Hadley Wickham.

Example 3. Writing functions in R, keeping scoping in mind

New environment

http://adv-r.had.co.nz/Environments.html

Run the same function on a bunch of R objects

mye = new.env()
load(<filename>, mye)
for(n in names(mye)) n = as_tibble(mye[[n]])

Speedup R code

Profiler

(Video) Understand Code Performance with the profiler

&& vs &

See https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/Logic.

The shorter form performs elementwise comparisons in much the same way as arithmetic operators. The longer form evaluates left to right examining only the first element of each vector.

Vectorization

sapply vs vectorization

Speed test: sapply vs vectorization

split() and sapply()

split() can be used to split a vector, columns or rows. See How to split a data frame?

  • Split rows of a data frame/matrix
    split(mtcars,mtcars$cyl)
  • Split columns of a data frame/matrix.
    ma <- cbind(x = 1:10, y = (-4:5)^2, z = 11:20)
    split(ma, cbind(rep(1,10), rep(2, 10), rep(1,10)))
    # $`1`
    #  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
    #
    # $`2`
    #  [1] 16  9  4  1  0  1  4  9 16 25
  • split() + sapply() to merge columns. See below 'Mean of duplicated columns' for more detail.
  • split() + sapply() to split a vector. See nsFilter() function which can remove duplicated probesets/rows using unique Entrez Gene IDs (genefilter package). The source code of nsFilter() and findLargest().
    tSsp = split.default(testStat, lls) 
    # testStat is a vector of numerics including probeset IDs as names
    # lls is a vector of entrez IDs (same length as testStat)
    # tSSp is a list of the same length as unique elements of lls.
    
    sapply(tSsp, function(x) names(which.max(x))) 
    # return a vector of probset IDs of length of unique entrez IDs
And here is another example from the bigmemory vignette,
planeindices <- split(1:nrow(x), x[,'TailNum'])
planeStart <- sapply(planeindices,
                     function(i) birthmonth(x[i, c('Year','Month'),
                                            drop=FALSE]))

Mean of duplicated columns: rowMeans

  • Reduce columns of a matrix by a function in R
    x <- matrix(1:60, nr=10); x[1, 2:3] <- NA
    colnames(x) <- c("A","A", "b", "b", "b", "c"); x
    res <- sapply(split(1:ncol(x), colnames(x)), 
                  function(i) rowMeans(x[, i, drop=F], na.rm = TRUE))
    res
    
    # vapply() is safter than sapply(). 
    # The 3rd arg in vapply() is a template of the return value.
    res2 <- vapply(split(1:ncol(x), colnames(x)), 
                   function(i) rowMeans(x[, i, drop=F], na.rm = TRUE),
                   rep(0, nrow(x)))
  • colSums, rowSums, colMeans, rowMeans (no group variable). These functions are equivalent to use of ‘apply’ with ‘FUN = mean’ or ‘FUN = sum’ with appropriate margins, but are a lot faster.
    rowMeans(x, na.rm=T)
    # [1] 31 27 28 29 30 31 32 33 34 35
    
    apply(x, 1, mean, na.rm=T)
    # [1] 31 27 28 29 30 31 32 33 34 35
  • matrixStats: Functions that Apply to Rows and Columns of Matrices (and to Vectors)

Mean of duplicated rows: colMeans and rowsum

  • colMeans(x, na.rm = FALSE, dims = 1)
x <- matrix(1:60, nr=10); x[1, 2:3] <- NA; x
rownames(x) <- c(rep("a", 2), rep("b", 3), rep("c", 4), "d")
res <- sapply(split(1:nrow(x), rownames(x)), 
              function(i) colMeans(x[i, , drop=F], na.rm = TRUE))
res <- t(res) # transpose is needed since sapply() will form the resulting matrix by columns
  • rowsum(x, group, reorder = TRUE, …)
x <- matrix(runif(100), ncol = 5) # 20 x 5
group <- sample(1:8, 20, TRUE)
(xsum <- rowsum(x, group)) # 8 x 5
  • How to calculate mean/median per group in a dataframe in r where doBy and dplyr are recommended.
  • matrixStats: Functions that Apply to Rows and Columns of Matrices (and to Vectors)
  • doBy package
  • use ave() and unique()
  • data.table package
  • plyr package
  • aggregate() function. Too slow! http://slowkow.com/2015/01/28/data-table-aggregate/. Don't use aggregate post.
    > attach(mtcars)
    dim(mtcars)
    [1] 32 11
    > head(mtcars)
                       mpg cyl disp  hp drat    wt  qsec vs am gear carb
    Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
    Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
    Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
    Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
    Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
    Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
    > with(mtcars, table(cyl, vs))
       vs
    cyl  0  1
      4  1 10
      6  3  4
      8 14  0
    > aggdata <-aggregate(mtcars, by=list(cyl,vs),  FUN=mean, na.rm=TRUE)
    > print(aggdata)
      Group.1 Group.2      mpg cyl   disp       hp     drat       wt     qsec vs
    1       4       0 26.00000   4 120.30  91.0000 4.430000 2.140000 16.70000  0
    2       6       0 20.56667   6 155.00 131.6667 3.806667 2.755000 16.32667  0
    3       8       0 15.10000   8 353.10 209.2143 3.229286 3.999214 16.77214  0
    4       4       1 26.73000   4 103.62  81.8000 4.035000 2.300300 19.38100  1
    5       6       1 19.12500   6 204.55 115.2500 3.420000 3.388750 19.21500  1
             am     gear     carb
    1 1.0000000 5.000000 2.000000
    2 1.0000000 4.333333 4.666667
    3 0.1428571 3.285714 3.500000
    4 0.7000000 4.000000 1.500000
    5 0.0000000 3.500000 2.500000
    > detach(mtcars)
    
    # Another example: select rows with a minimum value from a certain column (yval in this case)
    > mydf <- read.table(header=T, text='
     id xval yval
     A 1  1
     A -2  2
     B 3  3
     B 4  4
     C 5  5
     ')
    > x = mydf$xval
    > y = mydf$yval
    > aggregate(mydf[, c(2,3)], by=list(id=mydf$id), FUN=function(x) x[which.min(y)])
      id xval yval
    1  A    1    1
    2  B    3    3
    3  C    5    5

Apply family

Vectorize, aggregate, apply, by, eapply, lapply, mapply, rapply, replicate, scale, sapply, split, tapply, and vapply.

The following list gives a hierarchical relationship among these functions.

  • apply(X, MARGIN, FUN, ...) – Apply a Functions Over Array Margins
  • lapply(X, FUN, ...) – Apply a Function over a List (including a data frame) or Vector X.
    • sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE) – Apply a Function over a List or Vector
      • replicate(n, expr, simplify = "array")
    • mapply(FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE) – Multivariate version of sapply
      • Vectorize(FUN, vectorize.args = arg.names, SIMPLIFY = TRUE, USE.NAMES = TRUE) - Vectorize a Scalar Function
      • Map(FUN, ...) A wrapper to mapply with SIMPLIFY = FALSE, so it is guaranteed to return a list.
    • vapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE) – similar to sapply, but has a pre-specified type of return value
    • rapply(object, f, classes = "ANY", deflt = NULL, how = c("unlist", "replace", "list"), ...) – A recursive version of lapply
  • tapply(V, INDEX, FUN = NULL, ..., default = NA, simplify = TRUE) – Apply a Function Over a "Ragged" Array. V is typically a vector where split() will be applied. INDEX is a list of one or more factors.
    • aggregate(D, by, FUN, ..., simplify = TRUE, drop = TRUE) - Apply a function to each columns of subset data frame split by factors. FUN (such as mean(), weighted.mean(), sum()) is a simple function applied to a vector. D is typically a data frame.
    • by(D, INDICES, FUN, ..., simplify = TRUE) - Apply a Function to each subset data frame split by factors. FUN (such as summary(), lm()) is applied to a data frame. D is typically a data frame.
  • eapply(env, FUN, ..., all.names = FALSE, USE.NAMES = TRUE) – Apply a Function over values in an environment

Difference between apply vs sapply vs lapply vs tapply?

  • apply - When you want to apply a function to the rows or columns or both of a matrix and output is a one-dimensional if only row or column is selected else it is a 2D-matrix
  • lapply - When you want to apply a function to each element of a list in turn and get a list back.
  • sapply - When you want to apply a function to each element of a list in turn, but you want a vector back, rather than a list.
  • tapply - When you want to apply a function to subsets of a vector and the subsets are defined by some other vector, usually a factor.

Some short examples:

Note that, apply's performance is not always better than a for loop. See

Progress bar

What is the cost of a progress bar in R?

The package 'pbapply' creates a text-mode progress bar - it works on any platforms. On Windows platform, check out this post. It uses winProgressBar() and setWinProgressBar() functions.

lapply and its friends Map(), Reduce(), Filter() from the base package for manipulating lists

  • Examples of using lapply() + split() on a data frame. See rollingyours.wordpress.com.
  • mapply() documentation. Use mapply() to merge lists.
  • Map() and Reduce() in functional programming
  • Map(), Reduce(), and Filter() from Advanced R by Hadley
    • If you have two or more lists (or data frames) that you need to process in parallel, use Map(). One good example is to compute the weighted.mean() function that requires two input objects. Map() is similar to mapply() function and is more concise than lapply(). Advanced R has a comment that Map() is better than mapply().
      # Syntax: Map(f, ...)
      
      xs <- replicate(5, runif(10), simplify = FALSE)
      ws <- replicate(5, rpois(10, 5) + 1, simplify = FALSE)
      Map(weighted.mean, xs, ws)
      
      # instead of a more clumsy way
      lapply(seq_along(xs), function(i) {
        weighted.mean(xs[[i]], ws[[i]])
      })
    • Reduce() reduces a vector, x, to a single value by recursively calling a function, f, two arguments at a time. A good example of using Reduce() function is to read a list of matrix files and merge them. See How to combine multiple matrix frames into one using R?
      # Syntax: Reduce(f, x, ...)
      
      > m1 <- data.frame(id=letters[1:4], val=1:4)
      > m2 <- data.frame(id=letters[2:6], val=2:6)
      > merge(m1, m2, "id", all = T)
        id val.x val.y
      1  a     1    NA
      2  b     2     2
      3  c     3     3
      4  d     4     4
      5  e    NA     5
      6  f    NA     6
      > m <- list(m1, m2)
      > Reduce(function(x,y) merge(x,y, "id",all=T), m)
        id val.x val.y
      1  a     1    NA
      2  b     2     2
      3  c     3     3
      4  d     4     4
      5  e    NA     5
      6  f    NA     6
  • Playing Map() and Reduce() in R – Subsetting - using parallel and future packages. Union Multiple Data.Frames with Different Column Names

sapply & vapply

See parallel::parSapply() for a parallel version of sapply(1:n, function(x)). We can this technique to speed up this example.

rapply - recursive version of lapply

replicate

https://www.datacamp.com/community/tutorials/tutorial-on-loops-in-r

> replicate(5, rnorm(3))
           [,1]       [,2]       [,3]      [,4]        [,5]
[1,]  0.2509130 -0.3526600 -0.3170790  1.064816 -0.53708856
[2,]  0.5222548  1.5343319  0.6120194 -1.811913 -1.09352459
[3,] -1.9905533 -0.8902026 -0.5489822  1.308273  0.08773477

See parSapply() for a parallel version of replicate().

Vectorize

  • Vectorize(FUN, vectorize.args = arg.names, SIMPLIFY = TRUE, USE.NAMES = TRUE): creates a function wrapper that vectorizes a scalar function. Its value is a list or vector or array. It calls mapply().
    > rep(1:4, 4:1)
     [1] 1 1 1 1 2 2 2 3 3 4
    > vrep <- Vectorize(rep.int)
    > vrep(1:4, 4:1)
    [[1]]
    [1] 1 1 1 1
    
    [[2]]
    [1] 2 2 2
    
    [[3]]
    [1] 3 3
    
    [[4]]
    [1] 4
  • Vectorizing functions in R is easy
    > rweibull(1, 1, c(1, 2)) # no error but not sure what it gives?
    [1] 2.17123
    > Vectorize("rweibull")(n=1, shape = 1, scale = c(1, 2)) 
    [1] 1.6491761 0.9610109
  • https://blogs.msdn.microsoft.com/gpalem/2013/03/28/make-vectorize-your-friend-in-r/
    myfunc <- function(a, b) a*b
    myfunc(1, 2) # 2
    myfunc(3, 5) # 15
    myfunc(c(1,3), c(2,5)) # 2 15
    Vectorize(myfunc)(c(1,3), c(2,5)) # 2 15
    
    myfunc2 <- function(a, b) if (length(a) == 1) a * b else NA
    myfunc2(1, 2) # 2 
    myfunc2(3, 5) # 15
    myfunc2(c(1,3), c(2,5)) # NA
    Vectorize(myfunc2)(c(1, 3), c(2, 5)) # 2 15
    Vectorize(myfunc2)(c(1, 3, 6), c(2, 5)) # 2 15 12
                                            # parameter will be re-used

plyr and dplyr packages

Practical Data Science for Stats - a PeerJ Collection

The Split-Apply-Combine Strategy for Data Analysis (plyr package) in J. Stat Software.

A quick introduction to plyr with a summary of apply functions in R and compare them with functions in plyr package.

  1. plyr has a common syntax -- easier to remember
  2. plyr requires less code since it takes care of the input and output format
  3. plyr can easily be run in parallel -- faster

Tutorials

Examples of using dplyr:

tibble

Tibbles are data frames, but slightly tweaked to work better in the tidyverse.

Tibble objects

  • it does not have row names (cf data frame),
  • it never changes the type of the inputs (e.g. it never converts strings to factors!),
  • it never changes the names of variables

Tibbles Vignette

> data(pew, package = "efficient")
> dim(pew) 
[1] 18 10
> class(pew) # tibble is also a data frame!!
[1] "tbl_df"     "tbl"        "data.frame"

> tidyr::gather(pew, key=Income, value = Count, -religion) # make wide tables long
# A tibble: 162 x 3
                                                       religion Income Count
                                                          <chr>  <chr> <int>
 1                                                     Agnostic  <$10k    27
 2                                                      Atheist  <$10k    12
 ...
> mean(tidyr::gather(pew, key=Income, value = Count, -religion)[, 3])
[1] NA
Warning message:
In mean.default(tidyr::gather(pew, key = Income, value = Count,  :
  argument is not numeric or logical: returning NA
> mean(tidyr::gather(pew, key=Income, value = Count, -religion)[[3]])
[1] 181.6975

If we try to do a match on some column of a tibble object, we will get zero matches. The issue is we cannot use an index to get a tibble column.

Subsetting: to extract a column from a tibble object, use [[ or $ or dplyr::pull(). Select Data Frame Columns in R.

TibbleObject$VarName
# OR
TibbleObject[["VarName"]]
# OR
pull(TibbleObject, VarName) # won't be a tibble object anymore

dplyr::select(TibbleObject, -c(VarName1, VarName2)) # still a tibble object
# OR
dplyr::select(TibbleObject, 2:5) #

llply()

llply is equivalent to lapply except that it will preserve labels and can display a progress bar. This is handy if we want to do a crazy thing.

LLID2GOIDs <- lapply(rLLID, function(x) get("org.Hs.egGO")[[x]])

where rLLID is a list of entrez ID. For example,

get("org.Hs.egGO")[["6772"]]

returns a list of 49 GOs.

ddply()

http://lamages.blogspot.com/2012/06/transforming-subsets-of-data-in-r-with.html

ldply()

An R Script to Automatically download PubMed Citation Counts By Year of Publication

Using R's set.seed() to set seeds for use in C/C++ (including Rcpp)

http://rorynolan.rbind.io/2018/09/30/rcsetseed/

get_seed()

See the same blog

get_seed <- function() {
  sample.int(.Machine$integer.max, 1)
}

How to select a seed for simulation or randomization

How to select a seed for simulation or randomization

set.seed(), for loop and saving random seeds

http://r.789695.n4.nabble.com/set-seed-and-for-loop-td3585857.html. This question is legitimate when we want to debug on a certain iteration.

set.seed(1001) 
data <- vector("list", 30) 
seeds <- vector("list", 30) 
for(i in 1:30) { 
  seeds[[i]] <- .Random.seed 
  data[[i]] <- runif(5) 
} 
 
.Random.seed <- seeds[[23]]  # restore 
data.23 <- runif(5) 
data.23 
data[[23]]
  • Duncan Murdoch: This works in this example, but wouldn't work with all RNGs, because some of them save state outside of .Random.seed. See ?.Random.seed for details.
  • Uwe Ligges's comment: set.seed() actually generates a seed. See ?set.seed that points us to .Random.seed (and relevant references!) which contains the actual current seed.
  • Petr Savicky's comment is also useful in the situation when it is not difficult to re-generate the data.

sample() inaccurate on very large populations, fixed in R 3.6.0

  • The default method for generating from a discrete uniform distribution (used in ‘sample()’, for instance) has been changed. In prior versions, the probability of generating each integer could vary from equal by up to 0.04% (or possibly more if generating more than a million different integers). See also What's new in R 3.6.0 by David Smith.
    # R 3.5.3
    set.seed(123)
    m <- (2/5)*2^32
    m > 2^31
    # [1] FALSE
    log10(m)
    # [1] 9.23502
    x <- sample(m, 1000000, replace = TRUE)
    table(x %% 2)
    #      0      1 
    # 400070 599930
  • Fast sampling support in dqrng
  • Differences of the output of sample()
    # R 3.5.3
    # docker run --net=host -it --rm r-base:3.5.3
    > set.seed(1234)
    > sample(5)
    [1] 1 3 2 4 5
    
    # R 3.6.0
    # docker run --net=host -it --rm r-base:3.6.0
    > set.seed(1234)
    > sample(5)
    [1] 4 5 2 3 1
    > RNGkind(sample.kind = "Rounding")
    Warning message:
    In RNGkind(sample.kind = "Rounding") : non-uniform 'Rounding' sampler used
    > set.seed(1234)
    > sample(5)
    [1] 1 3 2 4 5

Regular Expression

See here.

Read rrd file

file, connection

  • cat() and scan() (read data into a vector or list from the console or file)
  • read() and write()
  • read.table() and write.table()
out = file('tmp.txt', 'w')
writeLines("abcd", out)
writeLines("eeeeee", out)
close(out)
readLines('tmp.txt')
unlink('tmp.txt')
args(writeLines)
# function (text, con = stdout(), sep = "\n", useBytes = FALSE)

Clipboard (?connections), textConnection(), pipe()

  • On Windows, we can use readClipboard() and writeClipboard().
    source("clipboard")
    read.table("clipboard")
  • reading/writing clipboard on macOS. Use textConnection() function:
    x <- read.delim(textConnection("<USE_KEYBOARD_TO_PASTE_FROM_CLIPBOARD>"))
    # Or on Mac
    x <- read.delim(pipe("pbpaste"))
    # safely ignore the warning: incomplete final line found by readTableHeader on 'pbpaste'
    An example is to copy data from this post. In this case we need to use read.table() instead of read.delim().
  • Write to clipboard on mac. Note: pbcopy and pbpaste are macOS terminal commands. See pbcopy & pbpaste: Manipulating the Clipboard from the Command Line.
    • pbcopy: takes standard input and places it in the clipboard buffer
    • pbpaste: takes data from the clipboard buffer and writes it to the standard output
clip <- pipe("pbcopy", "w")
write.table(apply(x, 1, mean), file = clip, row.names=F, col.names=F)
close(clip)
  • On Linux, we need to install "xclip". See R Copy from Clipboard in Ubuntu Linux. It seems to work.
    # sudo apt-get install xclip
    read.table(pipe("xclip -selection clipboard -o",open="r"))

read/manipulate binary data

  • x <- readBin(fn, raw(), file.info(fn)$size)
  • rawToChar(x[1:16])
  • See Biostrings C API

String Manipulation

HTTPs connection

HTTPS connection becomes default in R 3.2.2. See

R 3.3.2 patched The internal methods of ‘download.file()’ and ‘url()’ now report if they are unable to follow the redirection of a ‘http://’ URL to a ‘https://’ URL (rather than failing silently)

setInternet2

There was a bug in ftp downloading in R 3.2.2 (r69053) Windows though it is fixed now in R 3.2 patch.

Read the discussion reported on 8/8/2015. The error only happened on ftp not http connection. The final solution is explained in this post. The following demonstrated the original problem.

url <- paste0("ftp://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/All/",
              "GCF_000001405.13.assembly.txt")
f1 <- tempfile()
download.file(url, f1)

It seems the bug was fixed in R 3.2-branch. See 8/16/2015 patch r69089 where a new argument INTERNET_FLAG_PASSIVE was added to InternetOpenUrl() function of wininet library. This article and this post explain differences of active and passive FTP.

The following R command will show the exact svn revision for the R you are currently using.

R.Version()$"svn rev"

If setInternet2(T), then https protocol is supported in download.file().

When setInternet(T) is enabled by default, download.file() does not work for ftp protocol (this is used in getGEO() function of the GEOquery package). If I use setInternet(F), download.file() works again for ftp protocol.

The setInternet2() function is defined in R> src> library> utils > R > windows > sysutils.R.

R up to 3.2.2

setInternet2 <- function(use = TRUE) .Internal(useInternet2(use))

See also

  • <src/include/Internal.h> (declare do_setInternet2()),
  • <src/main/names.c> (show do_setInternet2() in C)
  • <src/main/internet.c> (define do_setInternet2() in C).

Note that: setInternet2(T) becomes default in R 3.2.2. To revert to the previous default use setInternet2(FALSE). See the <doc/NEWS.pdf> file. If we use setInternet2(F), then it solves the bug of getGEO() error. But it disables the https file download using the download.file() function. In R < 3.2.2, it is also possible to download from https by setIneternet2(T).

R 3.3.0

setInternet2 <- function(use = TRUE) {
    if(!is.na(use)) stop("use != NA is defunct")
    NA
}

Note that setInternet2.Rd says As from \R 3.3.0 it changes nothing, and only \code{use = NA} is accepted. Also NEWS.Rd says setInternet2() has no effect and will be removed in due course.

File operation

  • list.files()
  • file.info()
  • dir.create()
  • file.create()
  • file.copy()

read/download/source a file from internet

Simple text file http

retail <- read.csv("http://robjhyndman.com/data/ausretail.csv",header=FALSE)

Zip file and url() function

con = gzcon(url('http://www.systematicportfolio.com/sit.gz', 'rb'))
source(con)
close(con)

Here url() function is like file(), gzfile(), bzfile(), xzfile(), unz(), pipe(), fifo(), socketConnection(). They are used to create connections. By default, the connection is not opened (except for ‘socketConnection’), but may be opened by setting a non-empty value of argument ‘open’. See ?url.

Another example of using url() is

load(url("http:/www.example.com/example.RData"))

downloader package

This package provides a wrapper for the download.file function, making it possible to download files over https on Windows, Mac OS X, and other Unix-like platforms. The RCurl package provides this functionality (and much more) but can be difficult to install because it must be compiled with external dependencies. This package has no external dependencies, so it is much easier to install.

Google drive file based on https using RCurl package

require(RCurl)
myCsv <- getURL("https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AkuuKBh0jM2TdGppUFFxcEdoUklCQlJhM2kweGpoUUE&single=true&gid=0&output=csv")
read.csv(textConnection(myCsv))

Google sheet file using googlesheets package

Reading data from google sheets into R

Github files https using RCurl package

x = getURL("https://gist.github.com/arraytools/6671098/raw/c4cb0ca6fe78054da8dbe253a05f7046270d5693/GeneIDs.txt", 
            ssl.verifypeer = FALSE)
read.table(text=x)

summarytools: create summary tables for vectors and data frames

https://github.com/dcomtois/summarytools. R Package for quickly and neatly summarizing vectors and data frames.

Create publication tables using tables package

See p13 for example at here

R's tables packages is the best solution. For example,

> library(tables)
> tabular( (Species + 1) ~ (n=1) + Format(digits=2)*
+          (Sepal.Length + Sepal.Width)*(mean + sd), data=iris )
                                                  
                Sepal.Length      Sepal.Width     
 Species    n   mean         sd   mean        sd  
 setosa      50 5.01         0.35 3.43        0.38
 versicolor  50 5.94         0.52 2.77        0.31
 virginica   50 6.59         0.64 2.97        0.32
 All        150 5.84         0.83 3.06        0.44
> str(iris)
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
</pre>
and
<pre>
# This example shows some of the less common options         
> Sex <- factor(sample(c("Male", "Female"), 100, rep=TRUE))
> Status <- factor(sample(c("low", "medium", "high"), 100, rep=TRUE))
> z <- rnorm(100)+5
> fmt <- function(x) {
  s <- format(x, digits=2)
  even <- ((1:length(s)) %% 2) == 0
  s[even] <- sprintf("(%s)", s[even])
  s
}
> tabular( Justify(c)*Heading()*z*Sex*Heading(Statistic)*Format(fmt())*(mean+sd) ~ Status )
                  Status              
 Sex    Statistic high   low    medium
 Female mean       4.88   4.96   5.17 
        sd        (1.20) (0.82) (1.35)
 Male   mean       4.45   4.31   5.05 
        sd        (1.01) (0.93) (0.75)

ClinReport: Statistical Reporting in Clinical Trials

https://cran.r-project.org/web/packages/ClinReport/index.html

Append figures to PDF files

How to append a plot to an existing pdf file. Hint: use the recordPlot() function.

Extracting tables from PDFs

  • extracting Tables from PDFs in R using Tabulizer. This needs the rJava package. Linux works fine. Some issue came out on my macOS 10.12 Sierra. Library not loaded: /Library/Java/JavaVirtualMachines/jdk-9.jdk/Contents/Home/lib/server/libjvm.dylib. Referenced from: /Users/XXXXXXX/Library/R/3.5/library/rJava/libs/rJava.so.
  • pdftools - Text Extraction, Rendering and Converting of PDF Documents. pdf_text() and pdf_data() functions.
    library(pdftools)
    pdf_file <- "https://github.com/ropensci/tabulizer/raw/master/inst/examples/data.pdf"
    txt <- pdf_text(pdf_file) # length = number of pages
    # Suppose the table we are interested in is on page 1
    cat(txt[1]) # Good but not in a data frame format
    
    pdf_data(pdf_file)[[1]]  # data frame/tibble format
    However, it seems it does not work on Table S6. Tabulizer package is better at this case.
  • How To Convert PDF To Text On Linux (GUI And Command Line). It works when I tested my PDF file.
    sudo apt install poppler-utils
    pdftotext -layout input.pdf output.txt
    pdftotext -layout -f 3 -l 4 input.pdf output.txt # from page 3 to 4.

Create flat tables in R console using ftable()

> ftable(Titanic, row.vars = 1:3)
                   Survived  No Yes
Class Sex    Age                   
1st   Male   Child            0   5
             Adult          118  57
      Female Child            0   1
             Adult            4 140
2nd   Male   Child            0  11
             Adult          154  14
      Female Child            0  13
             Adult           13  80
3rd   Male   Child           35  13
             Adult          387  75
      Female Child           17  14
             Adult           89  76
Crew  Male   Child            0   0
             Adult          670 192
      Female Child            0   0
             Adult            3  20
> ftable(Titanic, row.vars = 1:2, col.vars = "Survived")
             Survived  No Yes
Class Sex                    
1st   Male            118  62
      Female            4 141
2nd   Male            154  25
      Female           13  93
3rd   Male            422  88
      Female          106  90
Crew  Male            670 192
      Female            3  20
> ftable(Titanic, row.vars = 2:1, col.vars = "Survived")
             Survived  No Yes
Sex    Class                 
Male   1st            118  62
       2nd            154  25
       3rd            422  88
       Crew           670 192
Female 1st              4 141
       2nd             13  93
       3rd            106  90
       Crew             3  20
> str(Titanic)
 table [1:4, 1:2, 1:2, 1:2] 0 0 35 0 0 0 17 0 118 154 ...
 - attr(*, "dimnames")=List of 4
  ..$ Class   : chr [1:4] "1st" "2nd" "3rd" "Crew"
  ..$ Sex     : chr [1:2] "Male" "Female"
  ..$ Age     : chr [1:2] "Child" "Adult"
  ..$ Survived: chr [1:2] "No" "Yes"
> x <- ftable(mtcars[c("cyl", "vs", "am", "gear")])
> x
          gear  3  4  5
cyl vs am              
4   0  0        0  0  0
       1        0  0  1
    1  0        1  2  0
       1        0  6  1
6   0  0        0  0  0
       1        0  2  1
    1  0        2  2  0
       1        0  0  0
8   0  0       12  0  0
       1        0  0  2
    1  0        0  0  0
       1        0  0  0
> ftable(x, row.vars = c(2, 4))
        cyl  4     6     8   
        am   0  1  0  1  0  1
vs gear                      
0  3         0  0  0  0 12  0
   4         0  0  0  2  0  0
   5         0  1  0  1  0  2
1  3         1  0  2  0  0  0
   4         2  6  2  0  0  0
   5         0  1  0  0  0  0
> 
> ## Start with expressions, use table()'s "dnn" to change labels
> ftable(mtcars$cyl, mtcars$vs, mtcars$am, mtcars$gear, row.vars = c(2, 4),
         dnn = c("Cylinders", "V/S", "Transmission", "Gears"))

          Cylinders     4     6     8   
          Transmission  0  1  0  1  0  1
V/S Gears                               
0   3                   0  0  0  0 12  0
    4                   0  0  0  2  0  0
    5                   0  1  0  1  0  2
1   3                   1  0  2  0  0  0
    4                   2  6  2  0  0  0
    5                   0  1  0  0  0  0

addmargins

Puts Arbitrary Margins On Multidimensional Tables Or Arrays

tracemem, data type, copy

How to avoid copying a long vector

Tell if the current R is running in 32-bit or 64-bit mode

8 * .Machine$sizeof.pointer

where sizeof.pointer returns the number of *bytes* in a C SEXP type and '8' means number of bits per byte.

32- and 64-bit

See R-admin.html.

  • For speed you may want to use a 32-bit build, but to handle large datasets a 64-bit build.
  • Even on 64-bit builds of R there are limits on the size of R objects, some of which stem from the use of 32-bit integers (especially in FORTRAN code). For example, the dimensionas of an array are limited to 2^31 -1.
  • Since R 2.15.0, it is possible to select '64-bit Files' from the standard installer even on a 32-bit version of Windows (2012/3/30).

Handling length 2^31 and more in R 3.0.0

From R News for 3.0.0 release:

There is a subtle change in behaviour for numeric index values 2^31 and larger. These never used to be legitimate and so were treated as NA, sometimes with a warning. They are now legal for long vectors so there is no longer a warning, and x[2^31] <- y will now extend the vector on a 64-bit platform and give an error on a 32-bit one.

In R 2.15.2, if I try to assign a vector of length 2^31, I will get an error

> x <- seq(1, 2^31)
Error in from:to : result would be too long a vector

However, for R 3.0.0 (tested on my 64-bit Ubuntu with 16GB RAM. The R was compiled by myself):

> system.time(x <- seq(1,2^31))
   user  system elapsed
  8.604  11.060 120.815
> length(x)
[1] 2147483648
> length(x)/2^20
[1] 2048
> gc()
             used    (Mb) gc trigger    (Mb)   max used    (Mb)
Ncells     183823     9.9     407500    21.8     350000    18.7
Vcells 2147764406 16386.2 2368247221 18068.3 2148247383 16389.9
>

Note:

  1. 2^31 length is about 2 Giga length. It takes about 16 GB (2^31*8/2^20 MB) memory.
  2. On Windows, it is almost impossible to work with 2^31 length of data if the memory is less than 16 GB because virtual disk on Windows does not work well. For example, when I tested on my 12 GB Windows 7, the whole Windows system freezes for several minutes before I force to power off the machine.
  3. My slide in http://goo.gl/g7sGX shows the screenshots of running the above command on my Ubuntu and RHEL machines. As you can see the linux is pretty good at handling large (> system RAM) data. That said, as long as your linux system is 64-bit, you can possibly work on large data without too much pain.
  4. For large dataset, it makes sense to use database or specially crafted packages like bigmemory or ff or bigstatsr.

NA in index

  • Question: what is seq(1, 3)[c(1, 2, NA)]?

Answer: It will reserve the element with NA in indexing and return the value NA for it.

  • Question: What is TRUE & NA?

Answer: NA

  • Question: What is FALSE & NA?

Answer: FALSE

  • Question: c("A", "B", NA) != "" ?

Answer: TRUE TRUE NA

  • Question: which(c("A", "B", NA) != "") ?

Answer: 1 2

  • Question: c(1, 2, NA) != "" & !is.na(c(1, 2, NA)) ?

Answer: TRUE TRUE FALSE

  • Question: c("A", "B", NA) != "" & !is.na(c("A", "B", NA)) ?

Answer: TRUE TRUE FALSE

Conclusion: In order to exclude empty or NA for numerical or character data type, we can use which() or a convenience function keep.complete(x) <- function(x) x != "" & !is.na(x). This will guarantee return logical values and not contain NAs.

Don't just use x != "" OR !is.na(x).

Constant and 'L'

Add 'L' after a constant. For example,

for(i in 1L:n) { }

if (max.lines > 0L) { }

label <- paste0(n-i+1L, ": ")

n <- length(x);  if(n == 0L) { }

Factor

We can specify the factor levels and new labels using the factor() function.

sex <- factor(sex, levels = c("0", "1"), labels = c("Male", "Female"))
drug_treatment <- factor(drug_treatment, levels = c("Placebo", "Low dose", "High dose"))
health_status <- factor(health_status, levels = c("Healthy", "Alzheimer's"))

Data frame

stringsAsFactors = FALSE

http://www.win-vector.com/blog/2018/03/r-tip-use-stringsasfactors-false/

Convert data frame factor columns to characters

Convert data.frame columns from factors to characters

# Method 1:
bob <- data.frame(lapply(bob, as.character), stringsAsFactors=FALSE)

# Method 2:
bob[] <- lapply(bob, as.character)

data.frame to vector

> a= matrix(1:6, 2,3)
> rownames(a) <- c("a", "b")
> colnames(a) <- c("x", "y", "z")
> a
  x y z
a 1 3 5
b 2 4 6
> unlist(data.frame(a))
x1 x2 y1 y2 z1 z2 
 1  2  3  4  5  6

merge

How to perform merges (joins) on two or more data frames with base R, tidyverse and data.table

matrix vs data.frame

ip1 <- installed.packages()[,c(1,3:4)] # class(ip1) = 'matrix'
unique(ip1$Priority)
# Error in ip1$Priority : $ operator is invalid for atomic vectors
unique(ip1[, "Priority"])   # OK

ip2 <- as.data.frame(installed.packages()[,c(1,3:4)], stringsAsFactors = FALSE) # matrix -> data.frame
unique(ip2$Priority)     # OK

The length of a matrix and a data frame is different.

> length(matrix(1:6, 3, 2))
[1] 6
> length(data.frame(matrix(1:6, 3, 2)))
[1] 2
> x[1]
  X1
1  1
2  2
3  3
4  4
5  5
6  6
> x[[1]]
[1] 1 2 3 4 5 6

So the length of a data frame is the number of columns. When we use sapply() function on a data frame, it will apply to each column of the data frame.

Convert a matrix (not data frame) of characters to numeric

Just change the mode of the object

tmp <- cbind(a=c("0.12", "0.34"), b =c("0.567", "0.890")); tmp
     a     b
1 0.12 0.567
2 0.34 0.890
> is.data.frame(tmp) # FALSE
> is.matrix(tmp)     # TRUE
> sum(tmp)
Error in sum(tmp) : invalid 'type' (character) of argument
> mode(tmp)  # "character"

> mode(tmp) <- "numeric"
> sum(tmp)
[1] 1.917

Select Data Frame Columns in R

This is part of series of DATA MANIPULATION IN R from datanovia.com

  • pull(): Extract column values as a vector. The column of interest can be specified either by name or by index.
  • select(): Extract one or multiple columns as a data table. It can be also used to remove columns from the data frame.
  • select_if(): Select columns based on a particular condition. One can use this function to, for example, select columns if they are numeric.
  • Helper functions - starts_with(), ends_with(), contains(), matches(), one_of(): Select columns/variables based on their names

Creating data frame using structure() function

Creating data frame using structure() function in R

Warning: row names were found from a short variable and have been discarded

https://stackoverflow.com/a/23534617

My example:

str(trainData)
# 'data.frame':	503 obs. of  500 variables:
#  $ bm001: num  0.429 1 -0.5 1.415 -1.899 ...
#  $ bm002: num  0.0568 1 0.5 0.3556 -1.16 ...
# ...
trainData[1:3, 1:3]
#        bm001      bm002    bm003
# 1  0.4289449 0.05676296 1.657966
# 2  1.0000000 1.00000000 1.000000
# 3 -0.5000000 0.50000000 0.500000
o <- data.frame(time = trainData[1, ], status = trainData[2, ], treat = trainData[3, ], t(TData))
# Warning message:
# In data.frame(time = trainData[1, ], status = trainData[2, ], treat = trainData[3,  :
#   row names were found from a short variable and have been discarded

matrix (column-major order) multiply a vector

> matrix(1:6, 3,2)
     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6
> matrix(1:6, 3,2) * c(1,2,3)
     [,1] [,2]
[1,]    1    4
[2,]    4   10
[3,]    9   18
> matrix(1:6, 3,2) * c(1,2,3,4)
     [,1] [,2]
[1,]    1   16
[2,]    4    5
[3,]    9   12

Print a vector by suppressing names

Use unname.

format.pval

> args(format.pval)
function (pv, digits = max(1L, getOption("digits") - 2L), eps = .Machine$double.eps, 
    na.form = "NA", ...) 

> format.pval(c(stats::runif(5), pi^-100, NA))
[1] "0.19571" "0.46793" "0.71696" "0.93200" "0.74485" "< 2e-16" "NA"     
> format.pval(c(0.1, 0.0001, 1e-27))
[1] "1e-01"  "1e-04"  "<2e-16"

options(digits)

In R,

> options()$digits # Default
[1] 7
> 100000.07 + .04
[1] 100000.1
> options(digits = 16)
> 100000.07 + .04
[1] 100000.11

In Python,

>>> 100000.07 + .04
100000.11

Disable scientific notation in printing: options(scipen)

> numer = 29707; denom = 93874
> c(numer/denom, numer, denom) 
[1] 3.164561e-01 2.970700e+04 9.387400e+04

# Method 1. Without changing the global option
> format(c(numer/denom, numer, denom), scientific=FALSE)
[1] "    0.3164561" "29707.0000000" "93874.0000000"

# Method 2. Change the global option
> options(scipen=999)
> numer/denom
[1] 0.3164561
> c(numer/denom, numer, denom)
[1]     0.3164561 29707.0000000 93874.0000000
> c(4/5, numer, denom)
[1]     0.8 29707.0 93874.0

sprintf

Format number as fixed width, with leading zeros

# sprintf()
a <- seq(1,101,25)
sprintf("name_%03d", a)
[1] "name_001" "name_026" "name_051" "name_076" "name_101"

# formatC()
paste("name", formatC(a, width=3, flag="0"), sep="_")
[1] "name_001" "name_026" "name_051" "name_076" "name_101"

sprintf does not print

Use cat() or print() outside sprintf(). sprintf() do not print in a non interactive mode.

cat(sprintf('%5.2f\t%i\n',1.234, l234))

Creating publication quality graphs in R

HDF5 : Hierarchical Data Format

HDF5 is an open binary file format for storing and managing large, complex datasets. The file format was developed by the HDF Group, and is widely used in scientific computing.

> h5ls(destination_file)
   group                           name       otype  dclass           dim
0      /                           data   H5I_GROUP                      
1  /data                     expression H5I_DATASET INTEGER 35238 x 65429
2      /                           info   H5I_GROUP                      
3  /info                         author H5I_DATASET  STRING             1
4  /info                        contact H5I_DATASET  STRING             1
5  /info                  creation-date H5I_DATASET  STRING             1
6  /info                            lab H5I_DATASET  STRING             1
7  /info                        version H5I_DATASET  STRING             1
8      /                           meta   H5I_GROUP                      
9  /meta           Sample_channel_count H5I_DATASET  STRING         65429
10 /meta     Sample_characteristics_ch1 H5I_DATASET  STRING         65429
11 /meta         Sample_contact_address H5I_DATASET  STRING         65429
12 /meta            Sample_contact_city H5I_DATASET  STRING         65429
13 /meta         Sample_contact_country H5I_DATASET  STRING         65429
14 /meta      Sample_contact_department H5I_DATASET  STRING         65429
15 /meta           Sample_contact_email H5I_DATASET  STRING         65429
16 /meta       Sample_contact_institute H5I_DATASET  STRING         65429
17 /meta      Sample_contact_laboratory H5I_DATASET  STRING         65429
18 /meta            Sample_contact_name H5I_DATASET  STRING         65429
19 /meta           Sample_contact_phone H5I_DATASET  STRING         65429
20 /meta Sample_contact_zip-postal_code H5I_DATASET  STRING         65429
21 /meta         Sample_data_processing H5I_DATASET  STRING         65429
22 /meta          Sample_data_row_count H5I_DATASET  STRING         65429
23 /meta             Sample_description H5I_DATASET  STRING         65429
24 /meta    Sample_extract_protocol_ch1 H5I_DATASET  STRING         65429
25 /meta           Sample_geo_accession H5I_DATASET  STRING         65429
26 /meta        Sample_instrument_model H5I_DATASET  STRING         65429
27 /meta        Sample_last_update_date H5I_DATASET  STRING         65429
28 /meta       Sample_library_selection H5I_DATASET  STRING         65429
29 /meta          Sample_library_source H5I_DATASET  STRING         65429
30 /meta        Sample_library_strategy H5I_DATASET  STRING         65429
31 /meta            Sample_molecule_ch1 H5I_DATASET  STRING         65429
32 /meta            Sample_organism_ch1 H5I_DATASET  STRING         65429
33 /meta             Sample_platform_id H5I_DATASET  STRING         65429
34 /meta                Sample_relation H5I_DATASET  STRING         65429
35 /meta               Sample_series_id H5I_DATASET  STRING         65429
36 /meta         Sample_source_name_ch1 H5I_DATASET  STRING         65429
37 /meta                  Sample_status H5I_DATASET  STRING         65429
38 /meta         Sample_submission_date H5I_DATASET  STRING         65429
39 /meta    Sample_supplementary_file_1 H5I_DATASET  STRING         65429
40 /meta    Sample_supplementary_file_2 H5I_DATASET  STRING         65429
41 /meta               Sample_taxid_ch1 H5I_DATASET  STRING         65429
42 /meta                   Sample_title H5I_DATASET  STRING         65429
43 /meta                    Sample_type H5I_DATASET  STRING         65429
44 /meta                          genes H5I_DATASET  STRING         35238

Formats for writing/saving and sharing data

Efficiently Saving and Sharing Data in R

Write unix format files on Windows and vice versa

https://stat.ethz.ch/pipermail/r-devel/2012-April/063931.html

with() and within() functions

within() is similar to with() except it is used to create new columns and merge them with the original data sets. See youtube video.

closePr <- with(mariokart, totalPr - shipPr)
head(closePr, 20)

mk <- within(mariokart, {
             closePr <- totalPr - shipPr
     })
head(mk) # new column closePr

mk <- mariokart
aggregate(. ~ wheels + cond, mk, mean)
# create mean according to each level of (wheels, cond)

aggregate(totalPr ~ wheels + cond, mk, mean)

tapply(mk$totalPr, mk[, c("wheels", "cond")], mean)

stem(): stem-and-leaf plot, bar chart on terminals

Graphical Parameters, Axes and Text, Combining Plots

statmethods.net

15 Questions All R Users Have About Plots

See http://blog.datacamp.com/15-questions-about-r-plots/. This is a tremendous post. It covers the built-in plot() function and ggplot() from ggplot2 package.

  1. How To Draw An Empty R Plot? plot.new()
  2. How To Set The Axis Labels And Title Of The R Plots?
  3. How To Add And Change The Spacing Of The Tick Marks Of Your R Plot? axis()
  4. How To Create Two Different X- or Y-axes? par(new=TRUE), axis(), mtext()
  5. How To Add Or Change The R Plot’s Legend? legend()
  6. How To Draw A Grid In Your R Plot? grid()
  7. How To Draw A Plot With A PNG As Background? rasterImage() from the png package
  8. How To Adjust The Size Of Points In An R Plot? cex argument
  9. How To Fit A Smooth Curve To Your R Data? loess() and lines()
  10. How To Add Error Bars In An R Plot? arrows()
  11. How To Save A Plot As An Image On Disc
  12. How To Plot Two R Plots Next To Each Other? par(mfrow), gridBase package, lattice package
  13. How To Plot Multiple Lines Or Points? plot(), lines()
  14. How To Fix The Aspect Ratio For Your R Plots? asp parameter
  15. What Is The Function Of hjust And vjust In ggplot2?

jitter function

Scatterplot with the "rug" function

require(stats)  # both 'density' and its default method
with(faithful, {
    plot(density(eruptions, bw = 0.15))
    rug(eruptions)
    rug(jitter(eruptions, amount = 0.01), side = 3, col = "light blue")
})

RugFunction.png

See also the stripchart() function which produces one dimensional scatter plots (or dot plots) of the given data.

Identify/Locate Points in a Scatter Plot

?identify

Draw a single plot with two different y-axes

Draw Color Palette

SVG

Embed svg in html

svglite

https://blog.rstudio.org/2016/11/14/svglite-1-2-0/

pdf -> svg

Using Inkscape. See this post.

read.table

clipboard

source("clipboard")
read.table("clipboard")

inline text

mydf <- read.table(header=T, text='
 cond yval
    A 2
    B 2.5
    C 1.6
')

http(s) connection

temp = getURL("https://gist.github.com/arraytools/6743826/raw/23c8b0bc4b8f0d1bfe1c2fad985ca2e091aeb916/ip.txt", 
                           ssl.verifypeer = FALSE)
ip <- read.table(textConnection(temp), as.is=TRUE)

read only specific columns

Use 'colClasses' option in read.table, read.delim, .... For example, the following example reads only the 3rd column of the text file and also changes its data type from a data frame to a vector. Note that we have include double quotes around NULL.

x <- read.table("var_annot.vcf", colClasses = c(rep("NULL", 2), "character", rep("NULL", 7)), 
                skip=62, header=T, stringsAsFactors = FALSE)[, 1]
# 
system.time(x <- read.delim("Methylation450k.txt", 
                colClasses = c("character", "numeric", rep("NULL", 188)), stringsAsFactors = FALSE))

To know the number of columns, we might want to read the first row first.

library(magrittr)
scan("var_annot.vcf", sep="\t", what="character", skip=62, nlines=1, quiet=TRUE) %>% length()

Another method is to use pipe(), cut or awk. See ways to read only selected columns from a file into R

Serialization

If we want to pass an R object to C (use recv() function), we can use writeBin() to output the stream size and then use serialize() function to output the stream to a file. See the post on R mailing list.

> a <- list(1,2,3)
> a_serial <- serialize(a, NULL)
> a_length <- length(a_serial)
> a_length
[1] 70
> writeBin(as.integer(a_length), connection, endian="big")
> serialize(a, connection)

In C++ process, I receive one int variable first to get the length, and then read <length> bytes from the connection.

socketConnection

See ?socketconnection.

Simple example

from the socketConnection's manual.

Open one R session

con1 <- socketConnection(port = 22131, server = TRUE) # wait until a connection from some client
writeLines(LETTERS, con1)
close(con1)

Open another R session (client)

con2 <- socketConnection(Sys.info()["nodename"], port = 22131)
# as non-blocking, may need to loop for input
readLines(con2)
while(isIncomplete(con2)) {
   Sys.sleep(1)
   z <- readLines(con2)
   if(length(z)) print(z)
}
close(con2)

Use nc in client

The client does not have to be the R. We can use telnet, nc, etc. See the post here. For example, on the client machine, we can issue

nc localhost 22131   [ENTER]

Then the client will wait and show anything written from the server machine. The connection from nc will be terminated once close(con1) is given.

If I use the command

nc -v -w 2 localhost -z 22130-22135

then the connection will be established for a short time which means the cursor on the server machine will be returned. If we issue the above nc command again on the client machine it will show the connection to the port 22131 is refused. PS. "-w" switch denotes the number of seconds of the timeout for connects and final net reads.

Some post I don't have a chance to read. http://digitheadslabnotebook.blogspot.com/2010/09/how-to-send-http-put-request-from-r.html

Use curl command in client

On the server,

con1 <- socketConnection(port = 8080, server = TRUE)

On the client,

curl --trace-ascii debugdump.txt http://localhost:8080/

Then go to the server,

while(nchar(x <- readLines(con1, 1)) > 0) cat(x, "\n")

close(con1) # return cursor in the client machine

Use telnet command in client

On the server,

con1 <- socketConnection(port = 8080, server = TRUE)

On the client,

sudo apt-get install telnet
telnet localhost 8080
abcdefg
hijklmn
qestst

Go to the server,

readLines(con1, 1)
readLines(con1, 1)
readLines(con1, 1)
close(con1) # return cursor in the client machine

Some tutorial about using telnet on http request. And this is a summary of using telnet.

Subsetting

Subset assignment of R Language Definition and Manipulation of functions.

The result of the command x[3:5] <- 13:15 is as if the following had been executed

`*tmp*` <- x
x <- "[<-"(`*tmp*`, 3:5, value=13:15)
rm(`*tmp*`)

Avoid Coercing Indices To Doubles

1 or 1L

as.formula()

? as.formula
xnam <- paste("x", 1:25, sep="")
fmla <- as.formula(paste("y ~ ", paste(xnam, collapse= "+")))
outcome <- "mpg"
variables <- c("cyl", "disp", "hp", "carb")

# Method 1. The 'Call' portion of the model is reported as “formula = f” 
# our modeling effort, 
# fully parameterized!
f <- as.formula(
  paste(outcome, 
        paste(variables, collapse = " + "), 
        sep = " ~ "))
print(f)
# mpg ~ cyl + disp + hp + carb

model <- lm(f, data = mtcars)
print(model)

# Call:
#   lm(formula = f, data = mtcars)
# 
# Coefficients:
#   (Intercept)          cyl         disp           hp         carb  
#     34.021595    -1.048523    -0.026906     0.009349    -0.926863  

# Method 2. eval() + bquote() + ".()"
format(terms(model))  #  or model$terms
# [1] "mpg ~ cyl + disp + hp + carb"

# The new line of code
model <- eval(bquote(   lm(.(f), data = mtcars)   ))

print(model)
# Call:
#   lm(formula = mpg ~ cyl + disp + hp + carb, data = mtcars)
# 
# Coefficients:
#   (Intercept)          cyl         disp           hp         carb  
#     34.021595    -1.048523    -0.026906     0.009349    -0.926863  

# Note if we skip ".()" operator
> eval(bquote(   lm(f, data = mtcars)   ))

Call:
lm(formula = f, data = mtcars)

Coefficients:
(Intercept)          cyl         disp           hp         carb  
  34.021595    -1.048523    -0.026906     0.009349    -0.926863

S3 and S4 methods

To get the source code of S4 methods, we can use showMethod(), getMethod() and showMethod(). For example

library(qrqc)
showMethods("gcPlot")
getMethod("gcPlot", "FASTQSummary") # get an error
showMethods("gcPlot", "FASTQSummary") # good.
  • Debug a S4 function
> library(genefilter) # Bioconductor
> showMethods("nsFilter")
Function: nsFilter (package genefilter)
eset="ExpressionSet"
> debug(nsFilter, signature="ExpressionSet")
library(IRanges)
ir <- IRanges(start=c(10, 20, 30), width=5)
ir

class(ir)
## [1] "IRanges"
## attr(,"package")
## [1] "IRanges"

getClassDef(class(ir))
## Class "IRanges" [package "IRanges"]
## 
## Slots:
##                                                                       
## Name:            start           width           NAMES     elementType
## Class:         integer         integer characterORNULL       character
##                                       
## Name:  elementMetadata        metadata
## Class: DataTableORNULL            list
## 
## Extends: 
## Class "Ranges", directly
## Class "IntegerList", by class "Ranges", distance 2
## Class "RangesORmissing", by class "Ranges", distance 2
## Class "AtomicList", by class "Ranges", distance 3
## Class "List", by class "Ranges", distance 4
## Class "Vector", by class "Ranges", distance 5
## Class "Annotated", by class "Ranges", distance 6
## 
## Known Subclasses: "NormalIRanges"

See what methods work on an object

see what methods work on an object, e.g. a GRanges object:

methods(class="GRanges")

Or if you have an object, x:

methods(class=class(x))

View S3 function definition: double colon '::' and triple colon ':::' operators

?":::"

  • pkg::name returns the value of the exported variable name in namespace pkg
  • pkg:::name returns the value of the internal variable name
base::"+"
stats:::coef.default

Read the source code (include Fortran/C, S3 and S4 methods)

mcols() and DataFrame() from Bioc S4Vectors package

  • mcols: Get or set the metadata columns.
  • colData: SummarizedExperiment instances from GenomicRanges
  • DataFrame: The DataFrame class extends the DataTable virtual class and supports the storage of any type of object (with length and [ methods) as columns.

For example, in Shrinkage of logarithmic fold changes vignette of the DESeq2paper package

> mcols(ddsNoPrior[genes, ])
DataFrame with 2 rows and 21 columns
   baseMean   baseVar   allZero dispGeneEst    dispFit dispersion  dispIter dispOutlier   dispMAP
  <numeric> <numeric> <logical>   <numeric>  <numeric>  <numeric> <numeric>   <logical> <numeric>
1  163.5750  8904.607     FALSE  0.06263141 0.03862798  0.0577712         7       FALSE 0.0577712
2  175.3883 59643.515     FALSE  2.25306109 0.03807917  2.2530611        12        TRUE 1.6011440
  Intercept strain_DBA.2J_vs_C57BL.6J SE_Intercept SE_strain_DBA.2J_vs_C57BL.6J WaldStatistic_Intercept
  <numeric>                 <numeric>    <numeric>                    <numeric>               <numeric>
1  6.210188                  1.735829    0.1229354                    0.1636645               50.515872
2  6.234880                  1.823173    0.6870629                    0.9481865                9.074686
  WaldStatistic_strain_DBA.2J_vs_C57BL.6J WaldPvalue_Intercept WaldPvalue_strain_DBA.2J_vs_C57BL.6J
                                <numeric>            <numeric>                            <numeric>
1                                10.60602         0.000000e+00                         2.793908e-26
2                                 1.92280         1.140054e-19                         5.450522e-02
   betaConv  betaIter  deviance  maxCooks
  <logical> <numeric> <numeric> <numeric>
1      TRUE         3  210.4045 0.2648753
2      TRUE         9  243.7455 0.3248949

findInterval()

Related functions are cuts() and split(). See also

order(), rank() and sort()

If we want to find the indices of the first 25 genes with the smallest p-values, we can use order(pval)[1:25].

> x = sample(10)
> x
 [1]  4  3 10  7  5  8  6  1  9  2
> order(x)
 [1]  8 10  2  1  5  7  4  6  9  3
> rank(x)
 [1]  4  3 10  7  5  8  6  1  9  2
> rank(10*x)
 [1]  4  3 10  7  5  8  6  1  9  2

> x[order(x)]
 [1]  1  2  3  4  5  6  7  8  9 10
> sort(x)
 [1]  1  2  3  4  5  6  7  8  9 10

do.call, rbind, lapply

Lots of examples. See for example this one for creating a data frame from a vector.

x <- readLines(textConnection("---CLUSTER 1 ---
 3
 4
 5
 6
 ---CLUSTER 2 ---
 9
 10
 8
 11"))

 # create a list of where the 'clusters' are
 clust <- c(grep("CLUSTER", x), length(x) + 1L)

 # get size of each cluster
 clustSize <- diff(clust) - 1L

 # get cluster number
 clustNum <- gsub("[^0-9]+", "", x[grep("CLUSTER", x)])

 result <- do.call(rbind, lapply(seq(length(clustNum)), function(.cl){
     cbind(Object = x[seq(clust[.cl] + 1L, length = clustSize[.cl])]
         , Cluster = .cl
         )
     }))

 result

     Object Cluster
[1,] "3"    "1"
[2,] "4"    "1"
[3,] "5"    "1"
[4,] "6"    "1"
[5,] "9"    "2"
[6,] "10"   "2"
[7,] "8"    "2"
[8,] "11"   "2"

A 2nd example is to sort a data frame by using do.call(order, list()).

Another example is to reproduce aggregate(). aggregate() = do.call() + by().

attach(mtcars)
do.call(rbind, by(mtcars, list(cyl, vs), colMeans))
# the above approach give the same result as the following
# except it does not have an extra Group.x columns
aggregate(mtcars, list(cyl, vs), FUN=mean)

How to get examples from help file

See this post. Method 1:

example(acf, give.lines=TRUE)

Method 2:

Rd <- utils:::.getHelpFile(?acf)
tools::Rd2ex(Rd)

"[" and "[[" with the sapply() function

Suppose we want to extract string from the id like "ABC-123-XYZ" before the first hyphen.

sapply(strsplit("ABC-123-XYZ", "-"), "[", 1)

is the same as

sapply(strsplit("ABC-123-XYZ", "-"), function(x) x[1])

Dealing with date

d1 = date()
class(d1) # "character"
d2 = Sys.Date()
class(d2) # "Date"

format(d2, "%a %b %d")

library(lubridate); ymd("20140108") # "2014-01-08 UTC"
mdy("08/04/2013") # "2013-08-04 UTC"
dmy("03-04-2013") # "2013-04-03 UTC"
ymd_hms("2011-08-03 10:15:03") # "2011-08-03 10:15:03 UTC"
ymd_hms("2011-08-03 10:15:03", tz="Pacific/Auckland") 
# "2011-08-03 10:15:03 NZST"
?Sys.timezone
x = dmy(c("1jan2013", "2jan2013", "31mar2013", "30jul2013"))
wday(x[1]) # 3
wday(x[1], label=TRUE) # Tues

Nonstandard evaluation and deparse/substitute

  • Vignette from the lazyeval package. It is needed in three cases
    • Labelling: turn an argument into a label
    • Formulas
    • Dot-dot-dot
  • substitute(expr, env) - capture expression.
    • substitute() is often paired with deparse() to create informative labels for data sets and plots.
    • Use 'substitute' to include the variable's name in a plot title, e.g.: var <- "abc"; hist(var,main=substitute(paste("Dist of ", var))) will show the title "Dist of var" instead of "Dist of abc" in the title.
  • quote(expr) - similar to substitute() but do nothing?? noquote - print character strings without quotes
  • eval(expr, envir), evalq(expr, envir) - eval evaluates its first argument in the current scope before passing it to the evaluator: evalq avoids this.
  • deparse(expr) - turns unevaluated expressions into character strings. For example,
> deparse(args(lm))
[1] "function (formula, data, subset, weights, na.action, method = \"qr\", " 
[2] "    model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, "
[3] "    contrasts = NULL, offset, ...) "                                    
[4] "NULL"     

> deparse(args(lm), width=20)
[1] "function (formula, data, "        "    subset, weights, "           
[3] "    na.action, method = \"qr\", " "    model = TRUE, x = FALSE, "   
[5] "    y = FALSE, qr = TRUE, "       "    singular.ok = TRUE, "        
[7] "    contrasts = NULL, "           "    offset, ...) "               
[9] "NULL"

Following is another example. Assume we have a bunch of functions (f1, f2, ...; each function implements a different algorithm) with same input arguments format (eg a1, a2). We like to run these function on the same data (to compare their performance).

f1 <- function(x) x+1; f2 <- function(x) x+2; f3 <- function(x) x+3

f1(1:3)
f2(1:3)
f3(1:3)

# Or
myfun <- function(f, a) {
    eval(parse(text = f))(a)
}
myfun("f1", 1:3)
myfun("f2", 1:3)
myfun("f3", 1:3)

# Or with lapply
method <- c("f1", "f2", "f3")
res <- lapply(method, function(M) {
                    Mres <- eval(parse(text = M))(1:3)
                    return(Mres)
})
names(res) <- method

The ‘…’ argument

See Section 10.4 of An Introduction to R. Especially, the expression list(...) evaluates all such arguments and returns them in a named list

Lazy evaluation in R functions arguments

R function arguments are lazy — they’re only evaluated if they’re actually used.

  • Example 1. By default, R function arguments are lazy.
f <- function(x) {
  999
}
f(stop("This is an error!"))
#> [1] 999
  • Example 2. If you want to ensure that an argument is evaluated you can use force().
add <- function(x) {
  force(x)
  function(y) x + y
}
adders2 <- lapply(1:10, add)
adders2[[1]](10)
#> [1] 11
adders2[[10]](10)
#> [1] 20
  • Example 3. Default arguments are evaluated inside the function.
f <- function(x = ls()) {
  a <- 1
  x
}

# ls() evaluated inside f:
f()
# [1] "a" "x"

# ls() evaluated in global environment:
f(ls())
# [1] "add"    "adders" "f" 
  • Example 4. Laziness is useful in if statements — the second statement below will be evaluated only if the first is true.
x <- NULL
if (!is.null(x) && x > 0) {

}

Backtick sign, infix/prefix/postfix operators

The backtick sign ` (not the single quote) refers to functions or variables that have otherwise reserved or illegal names; e.g. '&&', '+', '(', 'for', 'if', etc. See some examples in this note.

infix operator.

1 + 2    # infix
+ 1 2    # prefix
1 2 +    # postfix

List data type

Calling a function given a list of arguments

> args <- list(c(1:10, NA, NA), na.rm = TRUE)
> do.call(mean, args)
[1] 5.5
> mean(c(1:10, NA, NA), na.rm = TRUE)
[1] 5.5

Error handling and exceptions, tryCatch(), stop(), warning() and message()

out <- try({
  a <- 1
  b <- "x"
  a + b
})

elements <- list(1:10, c(-1, 10), c(T, F), letters)
results <- lapply(elements, log)
is.error <- function(x) inherits(x, "try-error")
succeeded <- !sapply(results, is.error)
  • tryCatch(): With tryCatch() you map conditions to handlers (like switch()), named functions that are called with the condition as an input. Note that try() is a simplified version of tryCatch().
tryCatch(expr, ..., finally)

show_condition <- function(code) {
  tryCatch(code,
    error = function(c) "error",
    warning = function(c) "warning",
    message = function(c) "message"
  )
}
show_condition(stop("!"))
#> [1] "error"
show_condition(warning("?!"))
#> [1] "warning"
show_condition(message("?"))
#> [1] "message"
show_condition(10)
#> [1] 10

Below is another snippet from available.packages() function,

z <- tryCatch(download.file(....), error = identity)
if (!inherits(z, "error")) STATEMENTS

Using list type

Avoid if-else or switch

?plot.stepfun.

y0 <- c(1,2,4,3)
sfun0  <- stepfun(1:3, y0, f = 0)
sfun.2 <- stepfun(1:3, y0, f = .2)
sfun1  <- stepfun(1:3, y0, right = TRUE)

tt <- seq(0, 3, by = 0.1)
op <- par(mfrow = c(2,2))
plot(sfun0); plot(sfun0, xval = tt, add = TRUE, col.hor = "bisque")
plot(sfun.2);plot(sfun.2, xval = tt, add = TRUE, col = "orange") # all colors
plot(sfun1);lines(sfun1, xval = tt, col.hor = "coral")
##-- This is  revealing :
plot(sfun0, verticals = FALSE,
     main = "stepfun(x, y0, f=f)  for f = 0, .2, 1")

for(i in 1:3)
  lines(list(sfun0, sfun.2, stepfun(1:3, y0, f = 1))[[i]], col = i)
legend(2.5, 1.9, paste("f =", c(0, 0.2, 1)), col = 1:3, lty = 1, y.intersp = 1)

par(op)

StepfunExample.svg

Open a new Window device

X11() or dev.new()

par()

?par

text size and font on main, lab & axis

Examples:

  • cex.main=0.9
  • cex.lab=0.8
  • font.lab=2
  • cex.axis=0.8
  • font.axis=2
  • col.axis="grey50"

layout

http://datascienceplus.com/adding-text-to-r-plot/

reset the settings

op <- par(mfrow=c(2,1), mar = c(5,7,4,2) + 0.1) 
....
par(op) # mfrow=c(1,1), mar = c(5,4,4,2) + .1

mtext (margin text) vs title

mgp (axis label locations)

  1. The margin line (in ‘mex’ units) for the axis title, axis labels and axis line. Note that ‘mgp[1]’ affects ‘title’ whereas ‘mgp[2:3]’ affect ‘axis’. The default is ‘c(3, 1, 0)’. If we like to make the axis labels closer to an axis, we can use mgp=c(2.3, 1, 0) for example.
  2. http://rfunction.com/archives/1302 mgp – A numeric vector of length 3, which sets the axis label locations relative to the edge of the inner plot window. The first value represents the location the labels (i.e. xlab and ylab in plot), the second the tick-mark labels, and third the tick marks. The default is c(3, 1, 0).

pch

R pch.png

(figure source)

  • Full circle: pch=16

lty (line type)

R lty.png

(figure source)

las (label style)

0: The default, parallel to the axis

1: Always horizontal

2: Perpendicular to the axis

3: Always vertical

oma (outer margin), common title for two plots

The following trick is useful when we want to draw multiple plots with a common title.

par(mfrow=c(1,2),oma = c(0, 0, 2, 0))  # oma=c(0, 0, 0, 0) by default
plot(1:10,  main="Plot 1")
plot(1:100,  main="Plot 2")
mtext("Title for Two Plots", outer = TRUE, cex = 1.5) # outer=FALSE by default

Mastering R plot – Part 3: Outer margins mtext() & par(xpd).

Non-standard fonts in postscript and pdf graphics

https://cran.r-project.org/doc/Rnews/Rnews_2006-2.pdf#page=41

Suppress warnings

Use options(). If warn is negative all warnings are ignored. If warn is zero (the default) warnings are stored until the top--level function returns.

op <- options("warn")
options(warn = -1)
....
options(op)

# OR
warnLevel <- options()$warn
options(warn = -1)
...
options(warn = warnLevel)

NULL, NA, NaN, Inf

https://tomaztsql.wordpress.com/2018/07/04/r-null-values-null-na-nan-inf/

save() vs saveRDS()

  1. saveRDS() can only save one R object while save() does not have this constraint.
  2. saveRDS() doesn’t save the both the object and its name it just saves a representation of the object. As a result, the saved object can be loaded into a named object within R that is different from the name it had when originally serialized. See this post.
x <- 5
saveRDS(x, "myfile.rds")
x2 <- readRDS("myfile.rds")
identical(mod, mod2, ignore.environment = TRUE)

==, all.equal(), identical()

  • ==: exact match
  • all.equal: compare R objects x and y testing ‘near equality’
  • identical: The safe and reliable way to test two objects for being exactly equal.
x <- 1.0; y <- 0.99999999999
all.equal(x, y)
# [1] TRUE
identical(x, y)
# [1] FALSE

See also the testhat package.

testhat

tinytest

tinytest: Lightweight but Feature Complete Unit Testing Framework

Numerical Pitfall

Numerical pitfalls in computing variance

.1 - .3/3
## [1] 0.00000000000000001388

Sys.getpid()

This can be used to monitor R process memory usage or stop the R process. See this post.

How to debug an R code

Using assign() in functions

For example, insert the following line to your function

 assign(envir=globalenv(), "GlobalVar", localvar)

Debug lapply()/sapply()

Debugging with RStudio

Debug R source code

Build R with debug information

$ ./configure --help
$ ./configure --enable-R-shlib --with-valgrind-instrumentation=2 \
                               --with-system-valgrind-headers \
               CFLAGS='-g -O0 -fPIC' \
               FFLAGS='-g -O0 -fPIC' \
               CXXFLAGS='-g -O0 -fPIC' \
               FCFLAGS='-g -O0 -fPIC' 
$ make -j4
$ sudo make install
# Make sure to create a file <src/Makevars> with something like: CFLAGS=-ggdb -O0
# Or more generally
# CFLAGS=-Wall -Wextra -pedantic -O0 -ggdb
# CXXFLAGS=-Wall -Wextra -pedantic -O0 -ggdb
# FFLAGS=-Wall -Wextra -pedantic -O0 -ggdb

$ tree nidemo
$ R CMD INSTALL nidemo
$ cat bug.R
$ R -f bug.R 
$ R -d gdb
(gdb) r
> library(nidemo)
> Ctrl+C
(gdb) b nid_buggy_freq
(gdb) c  # continue
> buggy_freq("nidemo/DESCRIPTION") # stop at breakpoint 1
(gdb) list
(gdb) n # step through
(gdb) # press RETURN a few times until you see the bug
(gdb) d 1 # delete the first break point
(gdb) b Rf_error # R's C entry point for the error function
(gdb) c
> buggy_freq("nidemo/DESCRIPTION")
(gdb) bt 5 # last 5 stack frames
(gdb) frame 2
(gdb) list
(gdb) p freq_data
(gdb) p ans
(gdb) call Rf_PrintValues(ans)
(gdb) call Rf_PrintValues(fname)
(gdb) q
# Edit buggy.c

$ R CMD INSTALL nidemo # re-install the package
$ R -f bug.R
$ R -d gdb
(gdb) run
> source("bug.R") # error happened
(gdb) bt 5 # show the last 5 frames
(gdb) frame 2
(gdb) list
(gdb) frame 1
(gdb) list
(gdb) p file
(gdb) p fh
(gdb) q
# Edit buggy.c

$ R CMD INSTALL nidemo
$ R -f bug.R

.Call

Registering native routines

https://cran.rstudio.com/doc/manuals/r-release/R-exts.html#Registering-native-routines

Pay attention to the prefix argument .fixes (eg .fixes = "C_") in useDynLib() function in the NAMESPACE file.

Example of debugging cor() function

Note that R's cor() function called a C function cor().

stats::cor
....
 .Call(C_cor, x, y, na.method, method == "kendall")

A step-by-step screenshot of debugging using the GNU debugger gdb can be found on my Github repository https://github.com/arraytools/r-debug.

Locale bug (grep did not handle UTF-8 properly PR#16264)

https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16264

Path length in dir.create() (PR#17206)

https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17206 (Windows only)

install.package() error, R_LIBS_USER is empty in R 3.4.1 & .libPaths()

R_LIBS_USER=${R_LIBS_USER-'~/R/x86_64-pc-linux-gnu-library/3.4'}
R_LIBS_USER="${HOME}/R/${R_PLATFORM}-library/3.4"

On Mac & R 3.4.0 (it's fine)

> Sys.getenv("R_LIBS_USER")
[1] "~/Library/R/3.4/library"
> .libPaths()
[1] "/Library/Frameworks/R.framework/Versions/3.4/Resources/library"

On Linux & R 3.3.1 (ARM)

> Sys.getenv("R_LIBS_USER")
[1] "~/R/armv7l-unknown-linux-gnueabihf-library/3.3"
> .libPaths()
[1] "/home/$USER/R/armv7l-unknown-linux-gnueabihf-library/3.3"
[2] "/usr/local/lib/R/library"

On Linux & R 3.4.1 (*Problematic*)

> Sys.getenv("R_LIBS_USER")
[1] ""
> .libPaths()
[1] "/usr/local/lib/R/site-library" "/usr/lib/R/site-library"
[3] "/usr/lib/R/library"

I need to specify the lib parameter when I use the install.packages command.

> install.packages("devtools", "~/R/x86_64-pc-linux-gnu-library/3.4")
> library(devtools)
Error in library(devtools) : there is no package called 'devtools'

# Specify lib.loc parameter will not help with the dependency package
> library(devtools, lib.loc = "~/R/x86_64-pc-linux-gnu-library/3.4")
Error: package or namespace load failed for 'devtools':
 .onLoad failed in loadNamespace() for 'devtools', details:
  call: loadNamespace(name)
  error: there is no package called 'withr'

# A solution is to redefine .libPaths
> .libPaths(c("~/R/x86_64-pc-linux-gnu-library/3.4", .libPaths()))
> library(devtools) # Works

A better solution is to specify R_LIBS_USER in ~/.Renviron file or ~/.bash_profile; see ?Startup.

Using external data from within another package

https://logfc.wordpress.com/2017/03/02/using-external-data-from-within-another-package/

How to run R scripts from the command line

https://support.rstudio.com/hc/en-us/articles/218012917-How-to-run-R-scripts-from-the-command-line

How to exit a sourced R script

Decimal point & decimal comma

Countries using Arabic numerals with decimal comma (Austria, Belgium, Brazil France, Germany, Netherlands, Norway, South Africa, Spain, Sweden, ...) https://en.wikipedia.org/wiki/Decimal_mark

setting seed locally (not globally) in R

https://stackoverflow.com/questions/14324096/setting-seed-locally-not-globally-in-r

R's internal C API

https://github.com/hadley/r-internals

cleancall package for C resource cleanup

Resource Cleanup in C and the R API

Random numbers: multivariate normal

Why MASS::mvrnorm() gives different result on Mac and Linux/Windows?

The reason could be the covariance matrix decomposition - and that may be due to the LAPACK/BLAS libraries. See

set.seed(1234)
junk <- biospear::simdata(n=500, p=500, q.main = 10, q.inter = 10, 
                          prob.tt = .5, m0=1, alpha.tt= -.5, 
                          beta.main= -.5, beta.inter= -.5, b.corr = .7, b.corr.by=25, 
                          wei.shape = 1, recr=3, fu=2, timefactor=1)
## Method 1: MASS::mvrnorm()
## This is simdata() has used. It gives different numbers on different OS.
## 
library(MASS)
set.seed(1234)
m0 <-1
n <- 500
prob.tt <- .5
p <- 500
b.corr.by <- 25
b.corr <- .7
data <- data.frame(treat = rbinom(n, 1, prob.tt) - 0.5)
n.blocks <- p%/%b.corr.by
covMat <- diag(n.blocks) %x% 
  matrix(b.corr^abs(matrix(1:b.corr.by, b.corr.by, b.corr.by, byrow = TRUE) - 
                    matrix(1:b.corr.by, b.corr.by, b.corr.by)), b.corr.by, b.corr.by)
diag(covMat) <- 1
data <- cbind(data, mvrnorm(n, rep(0, p), Sigma = covMat))
range(data)
# Mac: -4.963827  4.133723
# Linux/Windows: -4.327635  4.408097
packageVersion("MASS")
# Mac: [1] ‘7.3.49’
# Linux: [1] ‘7.3.49’
# Windows: [1] ‘7.3.47’

R.version$version.string
# Mac: [1] "R version 3.4.3 (2017-11-30)"
# Linux: [1] "R version 3.4.4 (2018-03-15)"
# Windows: [1] "R version 3.4.3 (2017-11-30)"

## Method 2: mvtnorm::rmvnorm()
library(mvtnorm)
set.seed(1234)
sigma <- matrix(c(4,2,2,3), ncol=2)
x <- rmvnorm(n=n, rep(0, p), sigma=covMat)
range(x)
# Mac: [1] -4.482566  4.459236
# Linux: [1] -4.482566  4.459236

## Method 3: mvnfast::rmvn()
set.seed(1234)
x <- mvnfast::rmvn(n, rep(0, p), covMat)
range(x)
# Mac: [1] -4.323585  4.355666
# Linux: [1] -4.323585  4.355666

library(microbenchmark)
library(MASS)
library(mvtnorm)
library(mvnfast)
microbenchmark(v1 <- rmvnorm(n=n, rep(0, p), sigma=covMat, "eigen"),
               v2 <- rmvnorm(n=n, rep(0, p), sigma=covMat, "svd"),
               v3 <- rmvnorm(n=n, rep(0, p), sigma=covMat, "chol"),
               v4 <- rmvn(n, rep(0, p), covMat),
               v5 <- mvrnorm(n, rep(0, p), Sigma = covMat))
Unit: milliseconds
expr       min        lq
v1 <- rmvnorm(n = n, rep(0, p), sigma = covMat, "eigen") 296.55374 300.81089
v2 <- rmvnorm(n = n, rep(0, p), sigma = covMat, "svd") 461.81867 466.98806
v3 <- rmvnorm(n = n, rep(0, p), sigma = covMat, "chol") 118.33759 120.01829
v4 <- rmvn(n, rep(0, p), covMat)  66.64675  69.89383
v5 <- mvrnorm(n, rep(0, p), Sigma = covMat) 291.19826 294.88038
mean    median        uq      max neval   cld
306.72485 301.99339 304.46662 335.6137   100    d 
478.58536 470.44085 493.89041 571.7990   100     e
125.85427 121.26185 122.21361 151.1658   100  b   
71.67996  70.52985  70.92923 100.2622   100 a    
301.88144 296.76028 299.50839 346.7049   100   c

A little more investigation shows the eigen values differ a little bit on macOS and Linux.

set.seed(1234); x <- mvrnorm(n, rep(0, p), Sigma = covMat)
debug(mvrnorm) 
# eS --- macOS
# eS2 -- Linux
Browse[2]> range(abs(eS$values - eS2$values))
# [1] 0.000000e+00 1.776357e-15
Browse[2]> var(as.vector(eS$vectors))
[1] 0.002000006
Browse[2]> var(as.vector(eS2$vectors))
[1] 0.001999987
Browse[2]> all.equal(eS$values, eS2$values)
[1] TRUE
Browse[2]> which(eS$values != eS2$values)
  [1]   6   7   8   9  10  11  12  13  14  20  22  23  24  25  26  27  28  29
  ...
[451] 494 495 496 497 499 500
Browse[2]> range(abs(eS$vectors - eS2$vectors))
[1] 0.0000000 0.5636919

rle() running length encoding

citation()

citation()
citation("MASS")
toBibtex(citation())

Monitor memory usage

  • Windows: memory.size(max=TRUE)
  • Linux
    • RStudio: htop -p PID where PID is the process ID of /usr/lib/rstudio/bin/rsession, not /usr/lib/rstudio/bin/rstudio. This is obtained by running x <- rnorm(2*1e8). The object size can be obtained through print(object.size(x), units = "auto"). Note that 1e8*8/2^20 = 762.9395.
    • R: htop -p PID where PID is the process ID of /usr/lib/R/bin/exec/R. Alternatively, use htop -p `pgrep -f /usr/lib/R/bin/exec/R`
    • To find the peak memory usage grep VmPeak /proc/$PID/status

References:

Resource

Books

# R 3.4.1
.libPaths(c("~/R/x86_64-pc-linux-gnu-library/3.4", .libPaths()))
setwd("/tmp/efficientR/")
bookdown::render_book("index.Rmd", output_format = "bookdown::pdf_book")
# generated pdf file is located _book/_main.pdf

bookdown::render_book("index.Rmd", output_format = "bookdown::epub_book")
# generated epub file is located _book/_main.epub.
# This cannot be done in RStudio ("parse_dt" not resolved from current namespace (lubridate))
# but it is OK to run in an R terminal

Webinar

useR!

Blogs, Tips, Socials, Communities

Bug Tracking System

https://bugs.r-project.org/bugzilla3/ and Search existing bug reports. Remember to select 'All' in the Status drop-down list.

Use sessionInfo().