Revision as of 19:47, 23 July 2020

Common Workflow Language (CWL)

https://www.commonwl.org/
Workflow systems turn raw data into scientific knowledge. Pipeline, Snakemake, Docker, Galaxy, Python, Conda, Workflow Definition Language (WDL), Nextflow. The best is to embed the workflow in a container; see Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics by Baichoo 2018.

R

Rcwl package
Reproducible Research: What to do When Your Results Can’t Be Reproduced. 3 danger zones.
- R session context
  - R version
  - Packages versions
  - Using set.seed() for a reproducible randomization
  - Floating point accuracy
- Operating System (OS) context
  - System packages versions
  - System locale
  - Environment variables
- Data versioning

Rmarkdown

Rmarkdown package

packrat

R packages → packrat

checkpoint

R → Reproducible Research

dockr package

'dockr': easy containerization for R

Docker & Singularity

Docker

Misc

4 great free tools that can make your R work more efficient, reproducible and robust
digest: Create Compact Hash Digests of R Objects

memoise: Memoisation of Functions. Great for shiny applications. Need to understand how it works in order to take advantage. I modify the example from Efficient R by moving the data out of the function. The cache works in the 2nd call. I don't use benchmark() function since it performs the same operation each time (so favor memoise and mask some detail).

library(ggplot2) # mpg 
library(memoise) 
plot_mpg2 <- function(mpgdf, row_to_remove) {
  mpgdf = mpgdf[-row_to_remove,]
  plot(mpgdf$cty, mpgdf$hwy)
  lines(lowess(mpgdf$cty, mpgdf$hwy), col=2)
}
m_plot_mpg2 = memoise(plot_mpg2)
system.time(m_plot_mpg2(mpg, 12))
#   user  system elapsed
#  0.019   0.003   0.025
system.time(plot_mpg2(mpg, 12))
#   user  system elapsed
#  0.018   0.003   0.024
system.time(m_plot_mpg2(mpg, 12))
#   user  system elapsed
#  0.000   0.000   0.001
system.time(plot_mpg2(mpg, 12))
#   user  system elapsed
#  0.032   0.008   0.047

And be careful when it is used in simulation.

f <- function(n=1e5) { 
  a <- rnorm(n)
  a
} 
system.time(f1 <- f())
mf <- memoise::memoise(f)
system.time(f2 <- mf())
system.time(f3 <- mf())
all.equal(f2, f3) # TRUE

reproducible: A Set of Tools that Enhance Reproducibility Beyond Package Management
Improving reproducibility in computational biology research 2020

@@ Line 4: / Line 4: @@
 == R ==
-[https://bioconductor.org/packages/release/bioc/html/Rcwl.html Rcwl] package
+* [https://bioconductor.org/packages/release/bioc/html/Rcwl.html Rcwl] package
+* [https://appsilon.com/reproducible-research-when-your-results-cant-be-reproduced/ Reproducible Research: What to do When Your Results Can’t Be Reproduced]. 3 danger zones.
+** R session context
+*** R version
+*** Packages versions
+*** Using set.seed() for a reproducible randomization
+*** Floating point accuracy
+** Operating System (OS) context
+*** System packages versions
+*** System locale
+*** Environment variables
+** Data versioning
 = Rmarkdown =

Reproducible: Difference between revisions

Revision as of 19:47, 23 July 2020

Contents

Common Workflow Language (CWL)

R

Rmarkdown

packrat

checkpoint

dockr package

Docker & Singularity

Misc

Navigation menu

Reproducible: Difference between revisions

Revision as of 19:47, 23 July 2020

Common Workflow Language (CWL)

R

Rmarkdown

packrat

checkpoint

dockr package

Docker & Singularity

Misc

Navigation menu

Search