Rcpp: Difference between revisions
(38 intermediate revisions by the same user not shown) | |||
Line 5: | Line 5: | ||
** Slides http://dirk.eddelbuettel.com/presentations/ | ** Slides http://dirk.eddelbuettel.com/presentations/ | ||
** http://dirk.eddelbuettel.com/papers/ | ** http://dirk.eddelbuettel.com/papers/ | ||
** [https://www.youtube.com/channel/UCgn_wsLrzD386E1TlQkE1VA youtube] | |||
* [http://adv-r.had.co.nz/Rcpp.html High performance functions with Rcpp] from Advanced R | * [http://adv-r.had.co.nz/Rcpp.html High performance functions with Rcpp] from Advanced R | ||
* [https://teuder.github.io/rcpp4everyone_en/ Rcpp for everyone] by Masaki E. Tsuda | * [https://teuder.github.io/rcpp4everyone_en/ Rcpp for everyone] by Masaki E. Tsuda | ||
Line 82: | Line 83: | ||
= Use Rcpp in RStudio = | = Use Rcpp in RStudio = | ||
RStudio makes it easy to use Rcpp package. | RStudio makes it easy to use Rcpp package. ''RStudio can immediately identify any error in a cpp file before we compile it.'' | ||
Open RStudio, click New File -> C++ File. It will create a C++ template on the RStudio editor | Open RStudio, click New File -> C++ File. It will create a C++ template on the RStudio editor | ||
Line 196: | Line 197: | ||
return(g(n-1) + g(n-2)); }') | return(g(n-1) + g(n-2)); }') | ||
// sapply(0:10, g) | // sapply(0:10, g) | ||
// depends argument | |||
cppFunction("arma::mat opg(arma::colvec x) { | |||
return x * x.t(); }", | |||
depends = "RcppArmadillo") | |||
</syntaxhighlight> | </syntaxhighlight> | ||
Line 219: | Line 225: | ||
// sapply(0:10, g) | // sapply(0:10, g) | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Another way to avoid creating the file is the following (from [https://gallery.rcpp.org/articles/sparse-matrix-class/ Constructing a Sparse Matrix Class in Rcpp]) | |||
<pre> | |||
testCode = ' | |||
#include <Rcpp.h> | |||
namespace Rcpp { | |||
class dgCMatrix { | |||
public: | |||
IntegerVector i, p, Dim; | |||
NumericVector x; | |||
List Dimnames; | |||
[SKIP] | |||
} | |||
// [[Rcpp::export]] | |||
Rcpp::dgCMatrix R_to_Cpp_to_R(Rcpp::dgCMatrix& mat){ | |||
return mat; | |||
}' | |||
Rcpp::sourceCpp(code = testCode) | |||
spmat <- abs(rsparsematrix(100, 100, 0.1)) | |||
spmat2 <- R_to_Cpp_to_R(spmat) | |||
all.equal(spmat, spmat2) | |||
</pre> | |||
== g++ command line options == | == g++ command line options == | ||
Line 297: | Line 327: | ||
** RcppArmadillo | ** RcppArmadillo | ||
** [https://cran.r-project.org/web/packages/princurve/index.html princurve] | ** [https://cran.r-project.org/web/packages/princurve/index.html princurve] | ||
* [https://www.youtube.com/playlist?list=PLwc48KSH3D1OkObQ22NHbFwEzof2CguJJ Make an R package with C++ code] (playlist in youtube) by Toby Hocking. | |||
== Armadillo == | == Armadillo == | ||
Line 534: | Line 565: | ||
https://gallery.rcpp.org/articles/passing-cpp-function-pointers/ | https://gallery.rcpp.org/articles/passing-cpp-function-pointers/ | ||
== RcppExamples package == | == Example 8: Matrix cross-distances using Rcpp == | ||
[https://www.mpjdem.xyz/2019/09/28/rcpp-crossdist/ Matrix cross-distances using Rcpp] | |||
== Example 9: RcppExamples package == | |||
There is no need to install it. Download the source and use Rcpp to run the examples. | There is no need to install it. Download the source and use Rcpp to run the examples. | ||
Line 591: | Line 625: | ||
== Some packages that import Rcpp == | == Some packages that import Rcpp == | ||
* [https://dirk.eddelbuettel.com/code/rcpp.cranusers.html CRAN usage] | |||
* [https://cran.r-project.org/web/packages/Rtsne/index.html Rtsne]; see [[Statistics#t-SNE|Statistics -> t-SNE]]. | * [https://cran.r-project.org/web/packages/Rtsne/index.html Rtsne]; see [[Statistics#t-SNE|Statistics -> t-SNE]]. | ||
* [https://github.com/LTLA/batchelor batchelor] and [https://bioconductor.org/books/release/OSCA/integrating-datasets.html#performing-mnn-correction fastMNN] for batch effect correction. | * [https://github.com/LTLA/batchelor batchelor] and [https://bioconductor.org/books/release/OSCA/integrating-datasets.html#performing-mnn-correction fastMNN] for batch effect correction. | ||
Line 596: | Line 631: | ||
* [https://github.com/immunogenomics/harmony Harmony] - Fast, sensitive and accurate integration of single-cell data | * [https://github.com/immunogenomics/harmony Harmony] - Fast, sensitive and accurate integration of single-cell data | ||
* [https://github.com/MarioniLab/DropletUtils/blob/master/src/downsample_run.cpp DropletUtils] and [https://github.com/LTLA/scuttle/tree/master/src scuttle] | * [https://github.com/MarioniLab/DropletUtils/blob/master/src/downsample_run.cpp DropletUtils] and [https://github.com/LTLA/scuttle/tree/master/src scuttle] | ||
* [https://cran.r-project.org/web/packages/CDSeq/index.html CDSeq]: A Complete Deconvolution Method using Sequencing Data | |||
== rcpp NumericMatrix Examples == | |||
[https://cpp.hotexamples.com/examples/rcpp/NumericMatrix/-/cpp-numericmatrix-class-examples.html C++ (Cpp) NumericMatrix Examples] | |||
= Rcpp sugar = | = Rcpp sugar = | ||
* Chapter 8 of Rcpp book | * Chapter 8 of Rcpp book | ||
* [http://dirk.eddelbuettel.com/code/rcpp/Rcpp-sugar.pdf Rcpp syntactic sugar] | * [http://dirk.eddelbuettel.com/code/rcpp/Rcpp-sugar.pdf Rcpp syntactic sugar] | ||
* [https://teuder.github.io/rcpp4everyone_en/210_rcpp_functions.html Chapter 21 R-like functions] from '''Rcpp for everyone'''. | |||
The motivation of ''Rcpp sugar'' is to bring a subset of the high-level R syntax in C++. For example, consider a simple R function <syntaxhighlight lang='rsplus'> | The motivation of ''Rcpp sugar'' is to bring a subset of the high-level R syntax in C++. For example, consider a simple R function <syntaxhighlight lang='rsplus'> | ||
Line 726: | Line 766: | ||
* The VectorBase class | * The VectorBase class | ||
* Example : sapply | * Example : sapply | ||
= Tips = | |||
== Use R functions == | |||
<ul> | |||
<li>[https://gallery.rcpp.org/articles/r-function-from-c++/ Calling R Functions from C++] | |||
<pre> | |||
testCode <- '#include <Rcpp.h> | |||
using namespace Rcpp; | |||
// [[Rcpp::export]] | |||
NumericVector callFunction(NumericVector x, Function f) { | |||
NumericVector res = f(x); | |||
return res; | |||
}' | |||
Rcpp::sourceCpp(code = testCode) | |||
set.seed(42) | |||
x <- rnorm(1e5) | |||
fivenum(x) | |||
# [1] -4.043276349 -0.682384496 -0.002066374 0.673324712 4.328091274 | |||
callFunction(x, fivenum) | |||
# [1] -4.043276349 -0.682384496 -0.002066374 0.673324712 4.328091274 | |||
</pre> | |||
</li> | |||
<li>[https://teuder.github.io/rcpp4everyone_en/230_R_function.html Chapter 23 Using R functions] from '''Rcpp for everyone''' | |||
</li> | |||
</ul> | |||
== Sparse matrix == | |||
* [https://gallery.rcpp.org/articles/sparse-matrix-class/ Constructing a Sparse Matrix Class in Rcpp]. See my note [[#sourceCpp.28.29:_the_actual_workhorse_behind_evalCpp.28.29_and_cppFunction.28.29|here]]. | |||
== Using Fortran in C++ == | |||
[https://gallery.rcpp.org/articles/Combining-R-with-Cpp-and-Fortran/ Extending R with C++ and Fortran] | |||
= [http://cran.r-project.org/web/packages/RcppParallel/index.html RcppParallel] = | = [http://cran.r-project.org/web/packages/RcppParallel/index.html RcppParallel] = | ||
Line 736: | Line 809: | ||
Annoy library is a small and lightweight C++ template header library for very fast approximate nearest neighbors - originally developed to drive the famous Spotify music discovery algorithm. | Annoy library is a small and lightweight C++ template header library for very fast approximate nearest neighbors - originally developed to drive the famous Spotify music discovery algorithm. | ||
= RcppEigen = | |||
<ul> | |||
<li>[https://cran.r-project.org/web/packages/RcppEigen/ RcppEigen], [https://cran.r-project.org/web/packages/RcppEigen/readme/README.html README], [http://eigen.tuxfamily.org/dox-3.2/classEigen_1_1SelfAdjointEigenSolver.html SelfAdjointEigenSolver() documentation] | |||
</li> | |||
<li>[https://eigen.tuxfamily.org/index.php?title=Main_Page Eigen documentation], [https://eigen.tuxfamily.org/dox/GettingStarted.html Getting started] | |||
<pre> | |||
$ nano test.cpp | |||
$ tar -xzvf eigen-3.4.0.tar.gz | |||
$ g++ -I eigen-3.4.0/ test.cpp -o testeigen | |||
$ ./testeigen | |||
3 -1 | |||
2.5 1.5 | |||
</pre> | |||
</li> | |||
<li>[https://qiita.com/yohm/items/a03006790dc1e54a87be C++行列計算ライブラリEigen入門] | |||
</li> | |||
<li>Printing [https://stackoverflow.com/a/29557750 Rcpp: Combining Vectors into Matrix and printing rows using Rcout] '''Rf_PrintValue(mat) ;''' OR [https://stackoverflow.com/a/33979461 '''<<'''] operator. | |||
</li> | |||
</ul> | |||
== SVD == | |||
* [https://kohei-kawaguchi.github.io/EmpiricalIO/rcpp.html Chapter 20 Integrating with C++ using Rcpp and RcppEigen] | |||
* https://cran.r-project.org/web/packages/RcppArmadillo/news.html | |||
* [https://eigen.tuxfamily.org/dox/classEigen_1_1JacobiSVD.html Eigen::JacobiSVD] | |||
* [https://stackoverflow.com/a/16365713 RcppEigen svd is very slow] | |||
** On my mac, (svd, '''svdArma''', svdEigen) = (6.3, 3.0, 37.0) seconds for a 1000x1000 matrix | |||
** On my mac, (svd, svdArma, '''svdEigen''') = (.002, .056, .002) seconds for a 1000x20 matrix. Note the u or v matrices returned from svdArma is a squared matrix? When I increase the number of obs to 10,000, svd is faster than svdEigen (and svdArma). | |||
** On my mac, (svd, svdArma, '''svdEigen''') = (.006, .048, .002) seconds for a 20x1000 matrix. | |||
<pre> | |||
library(inline) | |||
codeArma=' | |||
arma::mat m = Rcpp::as<arma::mat>(m_); | |||
arma::mat u; | |||
arma::vec s; | |||
arma::mat v; | |||
arma::svd(u,s,v,m); | |||
return List::create( Rcpp::Named("u")=u, | |||
Rcpp::Named("d")=s, | |||
Rcpp::Named("v")=v ); | |||
' | |||
svdArma <- cxxfunction(signature(m_="matrix"),codeArma, plugin="RcppArmadillo") | |||
#----------------------------------------------------------------------- | |||
codeEigen=' | |||
const Eigen::Map<Eigen::MatrixXd> m (as<Eigen::Map<Eigen::MatrixXd> >(m_ )); | |||
Eigen::JacobiSVD <Eigen::MatrixXd>svd(m, | |||
Eigen::ComputeThinU|Eigen::ComputeThinV); | |||
return List::create( Rcpp::Named("u")=svd.matrixU(), | |||
Rcpp::Named("d")=svd.singularValues(), | |||
Rcpp::Named("v")=svd.matrixV() ); | |||
' | |||
svdEigen <- cxxfunction(signature(m_="matrix"), codeEigen, plugin="RcppEigen") | |||
#------------------------------------------------------------------------ | |||
n<-1000 | |||
m<-matrix(rnorm(n*n),n,n) | |||
system.time(s1<-svd(m)) # R | |||
m1<-s1$u %*% diag(s1$d) %*% t(s1$v) | |||
all.equal(m,m1) | |||
system.time(s2<-svdArma(m)) # Armadillo | |||
m2<-s2$u %*% diag(array(s2$d)) %*% t(s2$v) | |||
all.equal(m,m2) | |||
system.time(s3<-svdEigen(m)) # Eigen | |||
m3<-s3$u %*% diag(s3$d) %*% t(s3$v) | |||
all.equal(m,m3) | |||
</pre> | |||
= RcppArmadillo = | |||
* https://cran.r-project.org/web/packages/RcppArmadillo/index.html | |||
* https://dirk.eddelbuettel.com/code/rcpp.armadillo.html | |||
* [http://arma.sourceforge.net/docs.html Armadillo documentation] | |||
* [https://www.thecoatlessprofessor.com/programming/cpp/common-operations-with-rcpparmadillo/ COMMON OPERATIONS WITH RCPPARMADILLO] | |||
* It seems arma::vec is a column (not a row). | |||
* Example code: | |||
** [https://github.com/humengying0907/InstaPrism/blob/master/src/cppFuncs.cpp InstaPrism] | |||
== SVD == | |||
<ul> | |||
<li>[http://arma.sourceforge.net/docs.html#svds Armadillo doc] </li> | |||
<li>[https://gallery.rcpp.org/articles/divide-and-concquer-svd/ Performance of the divide-and-conquer SVD algorithm] | |||
<pre> | |||
testCode <- '#include <RcppArmadillo.h> | |||
// [[Rcpp::depends(RcppArmadillo)]] | |||
// [[Rcpp::export]] | |||
arma::mat baseSVD(const arma::mat & X) { | |||
arma::mat U, V; | |||
arma::vec S; | |||
arma::svd(U, S, V, X, "standard"); | |||
return U; | |||
} | |||
// [[Rcpp::export]] | |||
arma::mat dcSVD(const arma::mat & X) { | |||
arma::mat U, V; | |||
arma::vec S; | |||
arma::svd(U, S, V, X, "dc"); | |||
return U; | |||
}' | |||
Rcpp::sourceCpp(code = testCode) | |||
x <- matrix(rnorm(1000*20), 1000) # 1000 x 20 | |||
library(microbenchmark) | |||
microbenchmark(baseSVD(x), dcSVD(x), svd(x) | |||
# Unit: milliseconds | |||
# expr min lq mean median uq max neval cld | |||
# baseSVD(x) 35.074119 42.998798 52.654122 45.433862 51.203433 105.358475 100 b | |||
# dcSVD(x) 36.533591 42.863185 51.891296 44.923845 50.798086 175.189363 100 b | |||
# svd(x) 1.253838 1.536588 1.748942 1.610097 1.791908 4.349285 100 a | |||
system.time(baseSVD(x)); system.time(svd(x)) # similar result for a 20 x 1000 matrix | |||
# user system elapsed | |||
# 0.093 0.007 0.101 | |||
# user system elapsed | |||
# 0.004 0.000 0.004 | |||
</pre> | |||
</li> | |||
<li>[https://rcpp-devel.r-forge.r-project.narkive.com/bfMq3D3O/differences-between-rcppeigen-and-rcpparmadillo Differences between RcppEigen and RcppArmadillo] | |||
</li> | |||
</ul> | |||
== Moore-penrose inverse == | |||
https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_inverse | |||
<pre> | |||
set.seed(1) | |||
y <- matrix(rnorm(20),4) | |||
rankMatrix(var(y)) | |||
solve(var(y)) | |||
# Error in solve.default(var(y)) : | |||
# system is computationally singular: reciprocal condition number = 6.53869e-18 | |||
MASS::ginv(var(y)) | |||
</pre> | |||
pinv() from [http://dirk.eddelbuettel.com/papers/RcppArmadillo-intro.pdf RcppArmadillo: Easily Extending R with High-Performance C++ Code] pdf | |||
<pre> | |||
testCode <- '#include <RcppArmadillo.h> | |||
// [[Rcpp::depends(RcppArmadillo)]] | |||
// [[Rcpp::export]] | |||
arma::mat arma_pinv(const arma::mat & X) { | |||
return arma::pinv(X);; | |||
} | |||
' | |||
Rcpp::sourceCpp(code = testCode) | |||
set.seed(1) | |||
y <- matrix(rnorm(20),4) | |||
arma_pinv(var(y)) | |||
all.equal(MASS::ginv(var(y)), arma_pinv(var(y))) # TRUE | |||
</pre> | |||
== Mahalanobis distance == | |||
* [[Statistics#Mahalanobis_distance_and_outliers_detection|Statistics -> Mahalanobis distance]] | |||
* [https://learn.stat.ubc.ca/~andy.leung/files/seminars-talks/2012/03/RcppDemo.pdf My first attempt of using Rcpp and RcppArmadillo]. Fibonacci sequence, Matrix multiplication (element-wise), Matrix determinant and inversion (Access R functions from C++), Rcpp syntactic sugar, Matrix multiplication. | |||
= cpp11 = | |||
* https://cpp11.r-lib.org/ | |||
* [https://www.tidyverse.org/blog/2021/09/updating-to-cpp11/ Pathway to success - updating your package to cpp11] | |||
* [https://pacha.dev/blog/2023/05/21/a-step-by-step-guide-to-write-an-r-package-that-uses-c-code-ubuntu/ A step by step guide to write an R package that uses C++ code (Ubuntu)] | |||
= R6 = | = R6 = | ||
[https://gallery.rcpp.org//articles/handling-R6-objects-in-rcpp/ Handling R6 objects in C++] | [https://gallery.rcpp.org//articles/handling-R6-objects-in-rcpp/ Handling R6 objects in C++] | ||
= RcppClock benchmark = | |||
[https://gallery.rcpp.org//articles/RcppClock-benchmarking-Rcpp-code/ Benchmarking Rcpp code with RcppClock] |
Latest revision as of 08:14, 5 August 2024
Misc
- http://cran.r-project.org/web/packages/Rcpp/index.html
- Rcpp Gallery https://gallery.rcpp.org/
- Web http://dirk.eddelbuettel.com/
- High performance functions with Rcpp from Advanced R
- Rcpp for everyone by Masaki E. Tsuda
- R Package Integration with Modern Reusable C++ Code Using Rcpp by Daniel Hanson
- Cheat sheet
- The Rcpp discussion archive
- (Video) Extending R with C++: A Brief Introduction to Rcpp
- C++14, R and Travis -- A useful hack
- For RcppEigen, it is necessary to install dependency packages.
sudo apt-get install libblas-dev liblapack-dev sudo apt-get install gfortran
Speed Comparison
- A comparison of high-performance computing techniques in R. It compares Rcpp to an R looping operator (like mapply), a parallelized version of a looping operator (like mcmapply), explicit parallelization, via the parallel package or the ParallelR suite.
- In the following example, C++ avoids the overhead of creating an intermediate object (eg vector of the same length as the original vector). The c++ uses an intermediate scalar. So C++ wins R over memory management in this case.
# http://blog.mckuhn.de/2016/03/avoiding-unnecessary-memory-allocations.html library(Rcpp) `%count<%` <- cppFunction(' size_t count_less(NumericVector x, NumericVector y) { const size_t nx = x.size(); const size_t ny = y.size(); if (nx > 1 & ny > 1) stop("Only one parameter can be a vector!"); size_t count = 0; if (nx == 1) { double c = x[0]; for (int i = 0; i < ny; i++) count += c < y[i]; } else { double c = y[0]; for (int i = 0; i < nx; i++) count += x[i] < c; } return count; } ') set.seed(42) N <- 10^7 v <- runif(N, 0, 10000) # Testing on my ODroid xu4 running ubuntu 15.10 system.time(sum(v < 5000)) # user system elapsed # 1.135 0.305 1.453 system.time(v %count<% 5000) # user system elapsed # 0.535 0.000 0.540
- Why R for data science – and not Python?
library(Rcpp) bmi_R <- function(weight, height) { weight / (height * height) } bmi_R(80, 1.85) # body mass index of person with 80 kg and 185 cm ## [1] 23.37473 cppFunction(" float bmi_cpp(float weight, float height) { return weight / (height * height); } ") bmi_cpp(80, 1.85) # same with cpp function ## [1] 23.37473
- Boost the speed of R calls from Rcpp
- Is the pain worth it?: Can Rcpp speed up Passing Bablok Regression? Several headers are included: vector, cmath, limits, algorithm, numeric.
Use Rcpp in RStudio
RStudio makes it easy to use Rcpp package. RStudio can immediately identify any error in a cpp file before we compile it.
Open RStudio, click New File -> C++ File. It will create a C++ template on the RStudio editor
#include <Rcpp.h> using namespace Rcpp; // Below is a simple example of exporting a C++ function to R. You can // source this function into an R session using the Rcpp::sourceCpp // function (or via the Source button on the editor toolbar) // For more on using Rcpp click the Help button on the editor toolbar // [[Rcpp::export]] int timesTwo(int x) { return x * 2; }
Now in R console, type
library(Rcpp) sourceCpp("~/Downloads/timesTwo.cpp") timesTwo(9) # [1] 18
See more examples on http://adv-r.had.co.nz/Rcpp.html and Calculating a fuzzy kmeans membership matrix
If we wan to test Boost library, we can try it in RStudio. Consider the following example in stackoverflow.com.
// [[Rcpp::depends(BH)]] #include <Rcpp.h> #include <boost/foreach.hpp> #include <boost/math/special_functions/gamma.hpp> #define foreach BOOST_FOREACH using namespace boost::math; //[[Rcpp::export]] Rcpp::NumericVector boost_gamma( Rcpp::NumericVector x ) { foreach( double& elem, x ) { elem = boost::math::tgamma(elem); }; return x; }
Then the R console
boost_gamma(0:10 + 1) # [1] 1 1 2 6 24 120 720 5040 40320 # [10] 362880 3628800 identical( boost_gamma(0:10 + 1), factorial(0:10) ) # [1] TRUE
From the useR 2019 tutorial
Number of packages depends on Rcpp
db <- tools::CRAN_package_db() # added in R 3.4.0 ## rows: number of pkgs, cols: different attributes nTot <- nrow(db) ## all direct Rcpp reverse depends, ie packages using Rcpp nRcpp <- length(tools::dependsOnPkgs("Rcpp", recursive=FALSE, installed=db)) nCompiled <- table(db[, "NeedsCompilation"])[["yes"]] propRcpp <- nRcpp / nCompiled * 100 data.frame(tot=nTot, totRcpp = nRcpp, totCompiled = nCompiled, RcppPctOfCompiled = propRcpp) # tot totRcpp totCompiled RcppPctOfCompiled # 1 14594 1710 3646 46.90071
evalCpp()
evalCpp("2 + 2") set.seed(42) evalCpp("Rcpp::rnorm(2)") library(Rcpp) evalCpp("std::numeric_limits<double>::max()")
cppFunction()
https://www.rdocumentation.org/packages/Rcpp/versions/1.0.1/topics/cppFunction
Note that the c/cpp function can be used as a regular R function without the need of .Call() function.
cppFunction(" int exampleCpp11() { auto x = 10; // guesses type return x; }", plugins=c("cpp11")) // exampleCpp11() Rcpp::cppFunction("Rcpp::NumericVector f(int n) { srand(1234); Rcpp::NumericVector res(n); for(int i=0; i<n; i++) res[i] = rand(); return(res); }") // f(); different result on different platform cppFunction("int f(int a, int b) { return(a + b); }") // f(21, 21) // fibonacci cppFunction('int g(int n) { if (n < 2) return(n); return(g(n-1) + g(n-2)); }') // sapply(0:10, g) // depends argument cppFunction("arma::mat opg(arma::colvec x) { return x * x.t(); }", depends = "RcppArmadillo")
sourceCpp(): the actual workhorse behind evalCpp() and cppFunction()
https://www.rdocumentation.org/packages/Rcpp/versions/1.0.1/topics/sourceCpp
Create the following in a file called testrcpp.cpp and in R run Rcpp::sourceCpp("testrcpp.cpp"). Any exported functions in .cpp will be available in an R session.
#include <Rcpp.h> using namespace Rcpp; // This is a simple example of exporting a C++ function to R. You can // source this function into an R session using the Rcpp::sourceCpp() // [[Rcpp::export]] NumericVector timesTwo(NumericVector x) { return x * 2; } // [[Rcpp::export]] int g(int n) { if (n < 2) return(n); return(g(n-1) + g(n-2)); } // timesTwo(c(2, 5.3)) // sapply(0:10, g)
Another way to avoid creating the file is the following (from Constructing a Sparse Matrix Class in Rcpp)
testCode = ' #include <Rcpp.h> namespace Rcpp { class dgCMatrix { public: IntegerVector i, p, Dim; NumericVector x; List Dimnames; [SKIP] } // [[Rcpp::export]] Rcpp::dgCMatrix R_to_Cpp_to_R(Rcpp::dgCMatrix& mat){ return mat; }' Rcpp::sourceCpp(code = testCode) spmat <- abs(rsparsematrix(100, 100, 0.1)) spmat2 <- R_to_Cpp_to_R(spmat) all.equal(spmat, spmat2)
g++ command line options
- header location via -I,
- library location via -L,
- library via -llibraryname
g++ -I/usr/include -c qnorm_rmath.cpp g++ -o qnorm_rmath qnorm_rmath.o -L/usr/lib -lRmath
C++ and R types
http://dirk.eddelbuettel.com/papers/useR2019_rcpp_tutorial.pdf#page=56
5 Types of Vectors
In R everything is a vector
// Vector // [[Rcpp::export]] double getMax(NumericVector v) { int n = v.size(); // vectors are describing double m = v[0]; // initialize for (int i=0; i<n; i++) { if v[i] > m { Rcpp::Rcout << ”Now ” << m << std::endl; m = v[i]; } } return(m); } // Matrix // [[Rcpp::export]] Rcpp::NumericVector colSums(Rcpp::NumericMatrix mat) { size_t cols = mat.cols(); Rcpp::NumericVector res(cols); for (size_t i=0; i<cols; i++) { res[i] = sum(mat.column(i)); } return(res); }
STL vectors (it copies data)
cppFunction("double getMax2(std::vector<double> v) { int n = v.size(); // vectors are describing double m = v[0]; // initialize for (int i=0; i<n; i++) { if (v[i] > m) { m = v[i]; } } return(m); }") getMax2(c(4,5,2))
Sugar functions
max()
Packages
- Overview
- Rcpp.package.skeleton("mypackage"). This will create a cpp source file src/rcpp_hello_world.cpp.
- Run Rcpp::compileAttributes() which generates src/RcppExports.cpp and R/RcppExports.R (do not edit by hand)
- Edit DESCRIPTION
- optional: Makevars and Makevars.win
- NAMESPACE
- rcpp_hello_world.Rd
- mypackage-package.Rd
- Writing a package that uses Rcpp vignette
- Thirteen Simple Steps for Creating An R Package with an External C++ Library
- Read mypackage/Read-and-delete-me
- The new package can be installed by devtools::load_all() or install.package("mypackage", repos = NULL, type = "source")
- From Advanced R. Before building the package, you’ll need to run Rcpp::compileAttributes(). This function scans the C++ files for Rcpp::export attributes and generates the code required to make the functions available in R. Re-run compileAttributes() whenever functions are added, removed, or have their signatures changed. This is done automatically by the devtools package and by Rstudio.
- Examples
- RcppExamples
- RcppArmadillo
- princurve
- Make an R package with C++ code (playlist in youtube) by Toby Hocking.
Armadillo
Machine Learning
Examples
Example 1. convolution example
First, Rcpp package should be installed (I am working on Linux system). Next we try one example shipped in Rcpp package.
PS. If R was not available in global environment (such as built by ourselves), we need to modify 'Makefile' file by replacing 'R' command with its complete path (4 places).
cd ~/R/x86_64-pc-linux-gnu-library/3.0/Rcpp/examples/ConvolveBenchmarks/ make R
Then type the following in an R session to see how it works. Note that we don't need to issue library(Rcpp) in R.
dyn.load("convolve3_cpp.so") x <- .Call("convolve3cpp", 1:3, 4:6) x # 4 13 28 27 18
If we have our own cpp file, we need to use the following way to create dynamic loaded library file. Note that the character (grave accent) ` is not (single quote)'. If you mistakenly use ', it won't work.
export PKG_CXXFLAGS=`Rscript -e "Rcpp:::CxxFlags()"` export PKG_LIBS=`Rscript -e "Rcpp:::LdFlags()"` R CMD SHLIB xxxx.cpp
Example 2. Use together with inline package
- Straight, curly, or compiled? linked from Where can I learn how to write C code to speed up slow R functions?
- Fortran 77/95 (cfunction)
library(inline) x <- as.numeric(1:10) n <- as.integer(10) code2 <- " integer i do 1 i=1, n 1 x(i) = x(i)**3 " cubefn2 <- cfunction(signature(n="integer", x="numeric"), implicit = "none", dim = c("", "(*)"), code2, convention=".Fortran") cubefn2(n, x)$x # [1] 1 8 27 64 125 216 343 512 729 1000
library(inline) src <-' Rcpp::NumericVector xa(a); Rcpp::NumericVector xb(b); int n_xa = xa.size(), n_xb = xb.size(); Rcpp::NumericVector xab(n_xa + n_xb - 1); for (int i = 0; i < n_xa; i++) for (int j = 0; j < n_xb; j++) xab[i + j] += xa[i] * xb[j]; return xab; ' fun <- cxxfunction(signature(a = "numeric", b = "numeric"), src, plugin = "Rcpp") fun(1:3, 1:4) # [1] 1 4 10 16 17 12
- Simulate VAR(1) data. See Section 1.3.3 C++ Solution of the book 'Seamless R and C++ Integration with Rcpp'.
a <- matrix(c(0.5, 0.1, 0.1, 0.5), nrow = 2) u <- matrix(rnorm(10000), ncol=2) code <- ' arma::mat coeff = Rcpp::as<arma::mat> (a); arma::mat errors = Rcpp::as<arma::mat>(u); int m = errors.n_rows; int n = errors.n_cols; arma::mat simdata(m, n); simdata.row(0) = arma::zeros<arma::mat>(1, n); for (int row=1; row<m; row++) { simdata.row(row) = simdata.row(row-1)*trans(coeff) + errors.row(row); } return Rcpp::wrap(simdata); ' rcppSim <- inline::cxxfunction(signature(a = "numeric", u = "numeric"), code, plugin = "RcppArmadillo") rcppData <- rcppSim(a, u) rSim <- function(coeff, errors) { simdata <- matrix(0, nrow(errors), ncol(errors)) for(row in 2:nrow(errors)) { simdata[row, ] <- coeff %*% simdata[(row-1), ] + errors[row, ] } return(simdata) } rData <- rSim(a, u) > benchmark(rcppSim(a, u), rSim(a, u), columns=c("test", "replications", "elapsed", "relative", "user.self", "sys.self"), order= "relative") test replications elapsed relative user.self sys.self 1 rcppSim(a, u) 100 0.026 1.000 0.022 0.004 2 rSim(a, u) 100 1.557 59.885 1.557 0.000
Example 3. Calling Rmath functions, random number generation
- Please DO NOT USE rand(). Doing so will kick your package off CRAN too should you submit it.
- Timing normal RNGs
- Using Functions from Rmath.h
- http://dirk.eddelbuettel.com/papers/rcpp_sydney-rug_jul2013.pdf#page=33
#include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] NumericVector mypnorm(NumericVector x) { int n = x.size(); NumericVector y(n); for (int i=0; i<n; i++) y[i] = R::pnorm(x[i], 0.0, 1.0, 1, 0); return y; }
sourceCpp("code/using-rmath-rng.cpp") x <- seq(-2, 2) mypnorm(x) pnorm(x)
#include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] NumericMatrix rngCpp(const int N) { NumericMatrix X(N, 4); X(_, 0) = runif(N); X(_, 1) = rnorm(N); X(_, 2) = rt(N, 5); X(_, 3) = rbeta(N, 1, 1); return X; }
set.seed(42) # setting seed M1 <- rngCpp(5) M1 set.seed(42) # resetting seed M2 <- cbind( runif(5), rnorm(5), rt(5, 5), rbeta(5, 1, 1)) M2 all.equal(M1, M2)
Example 4: Bootstrap
See Rcpp introduction vignette.
R version
# Function declaration bootstrap_r <- function(ds, B = 1000) { # Preallocate storage for statistics boot_stat <- matrix(NA, nrow = B, ncol = 2) # Number of observations n <- length(ds) # Perform bootstrap for(i in seq_len(B)) { # Sample initial data gen_data <- ds[ sample(n, n, replace=TRUE) ] # Calculate sample data mean and SD boot_stat[i,] <- c(mean(gen_data), sd(gen_data)) } # Return bootstrap result return(boot_stat) }
Rcpp version
#include <Rcpp.h> // Function declaration with export tag // [[Rcpp::export]] Rcpp::NumericMatrix bootstrap_cpp(Rcpp::NumericVector ds, int B = 1000) { // Preallocate storage for statistics Rcpp::NumericMatrix boot_stat(B, 2); // Number of observations int n = ds.size(); // Perform bootstrap for(int i = 0; i < B; i++) { // Sample initial data Rcpp::NumericVector gen_data = ds[ floor(Rcpp::runif(n, 0, n)) ]; // Calculate sample mean and std dev boot_stat(i, 0) = mean(gen_data); boot_stat(i, 1) = sd(gen_data); } // Return bootstrap results return boot_stat; }
Comparison
# Set seed to generate data set.seed(512) # Generate data initdata <- rnorm(1000, mean = 21, sd = 10) # Use the same seed in R and C++ set.seed(883) # Perform bootstrap with C++ function result_cpp <- bootstrap_cpp(initdata) # Set a new _different_ seed for bootstrapping set.seed(883) # Perform bootstrap result_r <- bootstrap_r(initdata) # Compare output all.equal(result_r, result_cpp) library(rbenchmark) benchmark(r = bootstrap_r(initdata), cpp = bootstrap_cpp(initdata))[, 1:4]
Example 5: Calling R functions from C++
https://gallery.rcpp.org/articles/r-function-from-c++/
Example 6: Using Rcout for output synchronised with R
https://gallery.rcpp.org/articles/using-rcout/
Example 7: Passing user-supplied C++ functions
https://gallery.rcpp.org/articles/passing-cpp-function-pointers/
Example 8: Matrix cross-distances using Rcpp
Matrix cross-distances using Rcpp
Example 9: RcppExamples package
There is no need to install it. Download the source and use Rcpp to run the examples.
- RNGs.cpp
> library(Rcpp) > sourceCpp("~/Downloads/RcppExamples/src/RNGs.cpp") > set.seed(42) > x <- RcppRNGs(5L) rnorm rt rpois 1 1.3709584 -0.05492941 2 2 -0.5646982 -0.03887139 1 3 0.3631284 1.99724535 1 4 0.6328626 0.97864627 1 5 0.4042683 -216.32581066 0 > set.seed(42) > y <- data.frame(rnorm=rnorm(5), rt =rt(5,1), rpois=rpois(5,1)) > all.equal(x, y)
- MatrixExample.cpp. This uses STL transform() algorithm. The cpp needs to 'include' <cmath>.
> sourceCpp("~/Downloads/RcppExamples/src/MatrixExample.cpp") > M <- matrix((1:16)^2, 4) > MatrixExample(M)
- NumericVectorExample.cpp. This uses STL transform() algorithm. The cpp needs to 'include' <cmath>.
> sourceCpp("~/Downloads/RcppExamples/src/NumericVectorExample.cpp") > NumericVectorExample(seq(1,9)^2)
- StringVectorExample.cpp. This uses STL transform() algorithm. No extra needs to be 'included'. It transforms characters to lower case.
> sourceCpp("~/Downloads/RcppExamples/src/StringVectorExample.cpp") > StringVectorExample(c("Tick", "Tack", "Tock"))
- DataFrameExample.cpp
> sourceCpp("~/Downloads/RcppExamples/src/DataFrameExample.cpp") > D <- data.frame(a=1:3, b=LETTERS[1:3], c=as.Date("2011-01-01")+0:2, stringsAsFactors=FALSE) > val <- DataFrameExample(D)
- RcppListExample.cpp
> sourceCpp("~/Downloads/RcppExamples/src/ListExample.cpp") > params <- list(method='BFGS', tolerance=1.0e-5, maxIter=100, startDate=as.Date('2006-7-15')) # call the underlying C++ function > result <- ListExamples(params)
Some packages that import Rcpp
- CRAN usage
- Rtsne; see Statistics -> t-SNE.
- batchelor and fastMNN for batch effect correction.
- monocle3 - an analysis toolkit for single-cell RNA-Seq experiments.
- Harmony - Fast, sensitive and accurate integration of single-cell data
- DropletUtils and scuttle
- CDSeq: A Complete Deconvolution Method using Sequencing Data
rcpp NumericMatrix Examples
C++ (Cpp) NumericMatrix Examples
Rcpp sugar
- Chapter 8 of Rcpp book
- Rcpp syntactic sugar
- Chapter 21 R-like functions from Rcpp for everyone.
The motivation of Rcpp sugar is to bring a subset of the high-level R syntax in C++. For example, consider a simple R function
foo <- function(x, y) { ifelse (x < y, x*x, -(y*y) ) }
With Rcpp instead of writing
RcppExport SEXP foo( SEXP x, SEXP y){ Rcpp::NumericVector xx(x), yy(y) ; int n = xx.size() ; Rcpp::NumericVector res( n ) ; double x_ = 0.0, y_ = 0.0 ; for( int i=0; i<n; i++){ x_ = xx[i] ; y_ = yy[i] ; if( x_ < y_ ){ res[i] = x_ * x_ ; } else { res[i] = -( y_ * y_) ; } } return res ; }
, we can use the ifelse function from R
RcppExport SEXP foo( SEXP xs, SEXP ys){ Rcpp::NumericVector x(xs) ; Rcpp::NumericVector y(ys) ; return Rcpp::wrap( ifelse( x < y, x*x, -(y*y) )) ; }
Operators
- Binary arithmetic operators,
NumericVector x; NumericVector y; NumericVector res = x + y; NumericVector res = x * y; // element-wise multiplication NumericVector res = x + 2.0; NumericVector res = x * y + 7 / 2.0; // two expressions
- Binary logical operators: <, <=, >=, ==, !=
- Unary operators
NumericVector res = -x; LogicalVector res = ! x;
Functions
- Functions producing a single logical result
IntegerVector x = seq_len( 1000 ); all( x*x < 3 ); // Respect the concept of missing values (NA) in R, // expression generated by 'any' or 'all' cannot be converted // directly to 'bool'. Instead one must use 'is.true', // 'if_false' or 'is_na' book res = is_true( any (x < y) );
- Functions producing sugar expressions
is_na seq_along seq_len pmin and pmax ifelse sapply lapply mapply sign diff setdiff union intersect clamp unique sort_unique table duplicated
- Mathematical functions
NumericVector x, y; int k; double z; abs( x ) exp( x ) floor( x ) ceil( x ) pow(x, z) log(x); log10(x); sqrt(x); sin(x); cos(x); tan(x); sinh(x); cosh(x); tanh(x); asin(x); acos(x); atan(x); gamma(x); lgamma(x); digamma(x); trigamma(x); factorial(x); choose(n, k); beta(n, k); trunc(x); round(x, k); signif(x,k) mean(x); var(x); sd(x); sum(x); cumsum(x); min(x), max(x); range(x); which_min(x); which_max(x); setequal(x, y);
- The d/q/p/r statistical functions
NumericVector y1, y2, y3; x1 = dnorm(y1, 0, 1); x2 = pnorm(y2, 0, 1); x3 = qnorm(y3, 0, 1); x4 = rnorm(n, 0, 1); // Initialize the state of the random number generator RcppExport SEXP getRGamma() { RNGScope scope; NumericVector x = rgamma( 10, 1, 1 ); return x; }
Implementation
- The curiously recurring template pattern
- The VectorBase class
- Example : sapply
Tips
Use R functions
- Calling R Functions from C++
testCode <- '#include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] NumericVector callFunction(NumericVector x, Function f) { NumericVector res = f(x); return res; }' Rcpp::sourceCpp(code = testCode) set.seed(42) x <- rnorm(1e5) fivenum(x) # [1] -4.043276349 -0.682384496 -0.002066374 0.673324712 4.328091274 callFunction(x, fivenum) # [1] -4.043276349 -0.682384496 -0.002066374 0.673324712 4.328091274
- Chapter 23 Using R functions from Rcpp for everyone
Sparse matrix
- Constructing a Sparse Matrix Class in Rcpp. See my note here.
Using Fortran in C++
Extending R with C++ and Fortran
RcppParallel
rTRNG: parallel RNG in R
rTRNG: ADVANCED PARALLEL RNG IN R
RcppAnnoy
https://github.com/eddelbuettel/rcppannoy
Annoy library is a small and lightweight C++ template header library for very fast approximate nearest neighbors - originally developed to drive the famous Spotify music discovery algorithm.
RcppEigen
- RcppEigen, README, SelfAdjointEigenSolver() documentation
- Eigen documentation, Getting started
$ nano test.cpp $ tar -xzvf eigen-3.4.0.tar.gz $ g++ -I eigen-3.4.0/ test.cpp -o testeigen $ ./testeigen 3 -1 2.5 1.5
- C++行列計算ライブラリEigen入門
- Printing Rcpp: Combining Vectors into Matrix and printing rows using Rcout Rf_PrintValue(mat) ; OR << operator.
SVD
- Chapter 20 Integrating with C++ using Rcpp and RcppEigen
- https://cran.r-project.org/web/packages/RcppArmadillo/news.html
- Eigen::JacobiSVD
- RcppEigen svd is very slow
- On my mac, (svd, svdArma, svdEigen) = (6.3, 3.0, 37.0) seconds for a 1000x1000 matrix
- On my mac, (svd, svdArma, svdEigen) = (.002, .056, .002) seconds for a 1000x20 matrix. Note the u or v matrices returned from svdArma is a squared matrix? When I increase the number of obs to 10,000, svd is faster than svdEigen (and svdArma).
- On my mac, (svd, svdArma, svdEigen) = (.006, .048, .002) seconds for a 20x1000 matrix.
library(inline) codeArma=' arma::mat m = Rcpp::as<arma::mat>(m_); arma::mat u; arma::vec s; arma::mat v; arma::svd(u,s,v,m); return List::create( Rcpp::Named("u")=u, Rcpp::Named("d")=s, Rcpp::Named("v")=v ); ' svdArma <- cxxfunction(signature(m_="matrix"),codeArma, plugin="RcppArmadillo") #----------------------------------------------------------------------- codeEigen=' const Eigen::Map<Eigen::MatrixXd> m (as<Eigen::Map<Eigen::MatrixXd> >(m_ )); Eigen::JacobiSVD <Eigen::MatrixXd>svd(m, Eigen::ComputeThinU|Eigen::ComputeThinV); return List::create( Rcpp::Named("u")=svd.matrixU(), Rcpp::Named("d")=svd.singularValues(), Rcpp::Named("v")=svd.matrixV() ); ' svdEigen <- cxxfunction(signature(m_="matrix"), codeEigen, plugin="RcppEigen") #------------------------------------------------------------------------ n<-1000 m<-matrix(rnorm(n*n),n,n) system.time(s1<-svd(m)) # R m1<-s1$u %*% diag(s1$d) %*% t(s1$v) all.equal(m,m1) system.time(s2<-svdArma(m)) # Armadillo m2<-s2$u %*% diag(array(s2$d)) %*% t(s2$v) all.equal(m,m2) system.time(s3<-svdEigen(m)) # Eigen m3<-s3$u %*% diag(s3$d) %*% t(s3$v) all.equal(m,m3)
RcppArmadillo
- https://cran.r-project.org/web/packages/RcppArmadillo/index.html
- https://dirk.eddelbuettel.com/code/rcpp.armadillo.html
- Armadillo documentation
- COMMON OPERATIONS WITH RCPPARMADILLO
- It seems arma::vec is a column (not a row).
- Example code:
SVD
- Armadillo doc
- Performance of the divide-and-conquer SVD algorithm
testCode <- '#include <RcppArmadillo.h> // [[Rcpp::depends(RcppArmadillo)]] // [[Rcpp::export]] arma::mat baseSVD(const arma::mat & X) { arma::mat U, V; arma::vec S; arma::svd(U, S, V, X, "standard"); return U; } // [[Rcpp::export]] arma::mat dcSVD(const arma::mat & X) { arma::mat U, V; arma::vec S; arma::svd(U, S, V, X, "dc"); return U; }' Rcpp::sourceCpp(code = testCode) x <- matrix(rnorm(1000*20), 1000) # 1000 x 20 library(microbenchmark) microbenchmark(baseSVD(x), dcSVD(x), svd(x) # Unit: milliseconds # expr min lq mean median uq max neval cld # baseSVD(x) 35.074119 42.998798 52.654122 45.433862 51.203433 105.358475 100 b # dcSVD(x) 36.533591 42.863185 51.891296 44.923845 50.798086 175.189363 100 b # svd(x) 1.253838 1.536588 1.748942 1.610097 1.791908 4.349285 100 a system.time(baseSVD(x)); system.time(svd(x)) # similar result for a 20 x 1000 matrix # user system elapsed # 0.093 0.007 0.101 # user system elapsed # 0.004 0.000 0.004
- Differences between RcppEigen and RcppArmadillo
Moore-penrose inverse
https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_inverse
set.seed(1) y <- matrix(rnorm(20),4) rankMatrix(var(y)) solve(var(y)) # Error in solve.default(var(y)) : # system is computationally singular: reciprocal condition number = 6.53869e-18 MASS::ginv(var(y))
pinv() from RcppArmadillo: Easily Extending R with High-Performance C++ Code pdf
testCode <- '#include <RcppArmadillo.h> // [[Rcpp::depends(RcppArmadillo)]] // [[Rcpp::export]] arma::mat arma_pinv(const arma::mat & X) { return arma::pinv(X);; } ' Rcpp::sourceCpp(code = testCode) set.seed(1) y <- matrix(rnorm(20),4) arma_pinv(var(y)) all.equal(MASS::ginv(var(y)), arma_pinv(var(y))) # TRUE
Mahalanobis distance
- Statistics -> Mahalanobis distance
- My first attempt of using Rcpp and RcppArmadillo. Fibonacci sequence, Matrix multiplication (element-wise), Matrix determinant and inversion (Access R functions from C++), Rcpp syntactic sugar, Matrix multiplication.
cpp11
- https://cpp11.r-lib.org/
- Pathway to success - updating your package to cpp11
- A step by step guide to write an R package that uses C++ code (Ubuntu)