Reproducible: Difference between revisions

From 太極
Jump to navigation Jump to search
Line 4: Line 4:


== R ==
== R ==
[https://bioconductor.org/packages/release/bioc/html/Rcwl.html Rcwl] package
* [https://bioconductor.org/packages/release/bioc/html/Rcwl.html Rcwl] package
* [https://appsilon.com/reproducible-research-when-your-results-cant-be-reproduced/ Reproducible Research: What to do When Your Results Can’t Be Reproduced]. 3 danger zones.
** R session context
*** R version
*** Packages versions
*** Using set.seed() for a reproducible randomization
*** Floating point accuracy
** Operating System (OS) context
*** System packages versions
*** System locale
*** Environment variables
** Data versioning


= Rmarkdown =
= Rmarkdown =

Revision as of 19:47, 23 July 2020

Common Workflow Language (CWL)

R

Rmarkdown

Rmarkdown package

packrat

R packages → packrat

checkpoint

R → Reproducible Research

dockr package

'dockr': easy containerization for R

Docker & Singularity

Docker

Misc

  • 4 great free tools that can make your R work more efficient, reproducible and robust
  • digest: Create Compact Hash Digests of R Objects
  • memoise: Memoisation of Functions. Great for shiny applications. Need to understand how it works in order to take advantage. I modify the example from Efficient R by moving the data out of the function. The cache works in the 2nd call. I don't use benchmark() function since it performs the same operation each time (so favor memoise and mask some detail).
    library(ggplot2) # mpg 
    library(memoise) 
    plot_mpg2 <- function(mpgdf, row_to_remove) {
      mpgdf = mpgdf[-row_to_remove,]
      plot(mpgdf$cty, mpgdf$hwy)
      lines(lowess(mpgdf$cty, mpgdf$hwy), col=2)
    }
    m_plot_mpg2 = memoise(plot_mpg2)
    system.time(m_plot_mpg2(mpg, 12))
    #   user  system elapsed
    #  0.019   0.003   0.025
    system.time(plot_mpg2(mpg, 12))
    #   user  system elapsed
    #  0.018   0.003   0.024
    system.time(m_plot_mpg2(mpg, 12))
    #   user  system elapsed
    #  0.000   0.000   0.001
    system.time(plot_mpg2(mpg, 12))
    #   user  system elapsed
    #  0.032   0.008   0.047
And be careful when it is used in simulation.
f <- function(n=1e5) { 
  a <- rnorm(n)
  a
} 
system.time(f1 <- f())
mf <- memoise::memoise(f)
system.time(f2 <- mf())
system.time(f3 <- mf())
all.equal(f2, f3) # TRUE