Revision as of 18:24, 19 March 2021

Common Workflow Language (CWL)

https://www.commonwl.org/
Workflow systems turn raw data into scientific knowledge. Pipeline, Snakemake, Docker, Galaxy, Python, Conda, Workflow Definition Language (WDL), Nextflow. The best is to embed the workflow in a container; see Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics by Baichoo 2018.

R

Rcwl package
- Connecting Bioconductor to other bioinformatics tools using Rcwl from Bioc2020
Reproducible Research: What to do When Your Results Can’t Be Reproduced. 3 danger zones.
- R session context
  - R version
  - Packages versions
  - Using set.seed() for a reproducible randomization
  - Floating point accuracy
- Operating System (OS) context
  - System packages versions
  - System locale
  - Environment variables
- Data versioning

Rmarkdown

Rmarkdown package

packrat and renv

R packages → packrat/renv

checkpoint

R → Reproducible Research

dockr package

'dockr': easy containerization for R

Docker & Singularity

Docker

targets package

targets: Democratizing Reproducible Analysis Pipelines Will Landau

Snakemake

Papers

High-throughput analysis suggests differences in journal false discovery rate by subject area and impact factor but not open access status

Share your code and data

zenodo.org which has been used by Demystifying "drop-outs" in single-cell UMI data

Misc

4 great free tools that can make your R work more efficient, reproducible and robust
digest: Create Compact Hash Digests of R Objects

memoise: Memoisation of Functions. Great for shiny applications. Need to understand how it works in order to take advantage. I modify the example from Efficient R by moving the data out of the function. The cache works in the 2nd call. I don't use benchmark() function since it performs the same operation each time (so favor memoise and mask some detail).

library(ggplot2) # mpg 
library(memoise) 
plot_mpg2 <- function(mpgdf, row_to_remove) {
  mpgdf = mpgdf[-row_to_remove,]
  plot(mpgdf$cty, mpgdf$hwy)
  lines(lowess(mpgdf$cty, mpgdf$hwy), col=2)
}
m_plot_mpg2 = memoise(plot_mpg2)
system.time(m_plot_mpg2(mpg, 12))
#   user  system elapsed
#  0.019   0.003   0.025
system.time(plot_mpg2(mpg, 12))
#   user  system elapsed
#  0.018   0.003   0.024
system.time(m_plot_mpg2(mpg, 12))
#   user  system elapsed
#  0.000   0.000   0.001
system.time(plot_mpg2(mpg, 12))
#   user  system elapsed
#  0.032   0.008   0.047

And be careful when it is used in simulation.

f <- function(n=1e5) { 
  a <- rnorm(n)
  a
} 
system.time(f1 <- f())
mf <- memoise::memoise(f)
system.time(f2 <- mf())
system.time(f3 <- mf())
all.equal(f2, f3) # TRUE

reproducible: A Set of Tools that Enhance Reproducibility Beyond Package Management
Improving reproducibility in computational biology research 2020

@@ Line 43: / Line 43: @@
 = Papers =
 [https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03817-7 High-throughput analysis suggests differences in journal false discovery rate by subject area and impact factor but not open access status]
+= Share your code and data =
+[https://zenodo.org/ zenodo.org] which has been used by [https://zenodo.org/record/3926915 Demystifying "drop-outs" in single-cell UMI data]
 = Misc =

Reproducible: Difference between revisions

Revision as of 18:24, 19 March 2021

Contents

Common Workflow Language (CWL)

R

Rmarkdown

packrat and renv

checkpoint

dockr package

Docker & Singularity

targets package

Snakemake

Papers

Share your code and data

Misc

Navigation menu

Reproducible: Difference between revisions

Revision as of 18:24, 19 March 2021

Common Workflow Language (CWL)

R

Rmarkdown

packrat and renv

checkpoint

dockr package

Docker & Singularity

targets package

Snakemake

Papers

Share your code and data

Misc

Navigation menu

Search