Reproducible: Difference between revisions

From 太極
Jump to navigation Jump to search
Line 43: Line 43:
= Papers =
= Papers =
[https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03817-7 High-throughput analysis suggests differences in journal false discovery rate by subject area and impact factor but not open access status]
[https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03817-7 High-throughput analysis suggests differences in journal false discovery rate by subject area and impact factor but not open access status]
= Share your code and data =
[https://zenodo.org/ zenodo.org] which has been used by [https://zenodo.org/record/3926915 Demystifying "drop-outs" in single-cell UMI data]


= Misc =
= Misc =

Revision as of 18:24, 19 March 2021

Common Workflow Language (CWL)

R

Rmarkdown

Rmarkdown package

packrat and renv

R packages → packrat/renv

checkpoint

R → Reproducible Research

dockr package

'dockr': easy containerization for R

Docker & Singularity

Docker

targets package

targets: Democratizing Reproducible Analysis Pipelines Will Landau

Snakemake

Papers

High-throughput analysis suggests differences in journal false discovery rate by subject area and impact factor but not open access status

Share your code and data

zenodo.org which has been used by Demystifying "drop-outs" in single-cell UMI data

Misc

  • 4 great free tools that can make your R work more efficient, reproducible and robust
  • digest: Create Compact Hash Digests of R Objects
  • memoise: Memoisation of Functions. Great for shiny applications. Need to understand how it works in order to take advantage. I modify the example from Efficient R by moving the data out of the function. The cache works in the 2nd call. I don't use benchmark() function since it performs the same operation each time (so favor memoise and mask some detail).
    library(ggplot2) # mpg 
    library(memoise) 
    plot_mpg2 <- function(mpgdf, row_to_remove) {
      mpgdf = mpgdf[-row_to_remove,]
      plot(mpgdf$cty, mpgdf$hwy)
      lines(lowess(mpgdf$cty, mpgdf$hwy), col=2)
    }
    m_plot_mpg2 = memoise(plot_mpg2)
    system.time(m_plot_mpg2(mpg, 12))
    #   user  system elapsed
    #  0.019   0.003   0.025
    system.time(plot_mpg2(mpg, 12))
    #   user  system elapsed
    #  0.018   0.003   0.024
    system.time(m_plot_mpg2(mpg, 12))
    #   user  system elapsed
    #  0.000   0.000   0.001
    system.time(plot_mpg2(mpg, 12))
    #   user  system elapsed
    #  0.032   0.008   0.047
And be careful when it is used in simulation.
f <- function(n=1e5) { 
  a <- rnorm(n)
  a
} 
system.time(f1 <- f())
mf <- memoise::memoise(f)
system.time(f2 <- mf())
system.time(f3 <- mf())
all.equal(f2, f3) # TRUE