Reproducible: Difference between revisions

From 太極
Jump to navigation Jump to search
Line 19: Line 19:


= Misc =
= Misc =
* [https://jozef.io/r920-christmas-praise-2019/ 4 great free tools that can make your R work more efficient, reproducible and robust]
* digest: Create Compact Hash Digests of R Objects
* digest: Create Compact Hash Digests of R Objects
* [https://cran.r-project.org/web/packages/memoise/index.html memoise]: [https://www.rdocumentation.org/packages/memoise/versions/1.1.0/topics/memoise Memoisation of Functions]. Great for shiny applications. Need to understand how it works in order to take advantage. I modify the example from [https://csgillespie.github.io/efficientR/caching-variables.html Efficient R] by moving the data out of the function. The cache works in the 2nd call. I don't use benchmark() function since it performs the same operation each time (so favor memoise and mask some detail). <syntaxhighlight lang='rsplus'>
* [https://cran.r-project.org/web/packages/memoise/index.html memoise]: [https://www.rdocumentation.org/packages/memoise/versions/1.1.0/topics/memoise Memoisation of Functions]. Great for shiny applications. Need to understand how it works in order to take advantage. I modify the example from [https://csgillespie.github.io/efficientR/caching-variables.html Efficient R] by moving the data out of the function. The cache works in the 2nd call. I don't use benchmark() function since it performs the same operation each time (so favor memoise and mask some detail). <syntaxhighlight lang='rsplus'>

Revision as of 22:57, 21 December 2019

Common Workflow Language (CWL)

R

Rcwl package

Rmarkdown

Rmarkdown package

packrat

R packages → packrat

dockr package

'dockr': easy containerization for R

Docker & Singularity

Docker

Misc

  • 4 great free tools that can make your R work more efficient, reproducible and robust
  • digest: Create Compact Hash Digests of R Objects
  • memoise: Memoisation of Functions. Great for shiny applications. Need to understand how it works in order to take advantage. I modify the example from Efficient R by moving the data out of the function. The cache works in the 2nd call. I don't use benchmark() function since it performs the same operation each time (so favor memoise and mask some detail).
    library(ggplot2) # mpg 
    library(memoise) 
    plot_mpg2 <- function(mpgdf, row_to_remove) {
      mpgdf = mpgdf[-row_to_remove,]
      plot(mpgdf$cty, mpgdf$hwy)
      lines(lowess(mpgdf$cty, mpgdf$hwy), col=2)
    }
    m_plot_mpg2 = memoise(plot_mpg2)
    system.time(m_plot_mpg2(mpg, 12))
    #   user  system elapsed
    #  0.019   0.003   0.025
    system.time(plot_mpg2(mpg, 12))
    #   user  system elapsed
    #  0.018   0.003   0.024
    system.time(m_plot_mpg2(mpg, 12))
    #   user  system elapsed
    #  0.000   0.000   0.001
    system.time(plot_mpg2(mpg, 12))
    #   user  system elapsed
    #  0.032   0.008   0.047
And be careful when it is used in simulation.
f <- function(n=1e5) { 
  a <- rnorm(n)
  a
} 
system.time(f1 <- f())
mf <- memoise::memoise(f)
system.time(f2 <- mf())
system.time(f3 <- mf())
all.equal(f2, f3) # TRUE
  • reproducible: A Set of Tools that Enhance Reproducibility Beyond Package Management