Reproducible: Difference between revisions
(97 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
= Common Workflow Language (CWL) = | = Common Workflow Language (CWL) = | ||
* https://www.commonwl.org/ | * https://www.commonwl.org/ | ||
* [https://www.nature.com/articles/d41586-019-02619-z Workflow systems turn raw data into scientific knowledge]. Pipeline, Snakemake, Docker, Galaxy, Python, Conda, Workflow Definition Language (WDL), Nextflow. The best is to embed the workflow in a container; see [https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2446-1 Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics] by Baichoo 2018. | * [https://www.nature.com/articles/d41586-019-02619-z Workflow systems turn raw data into scientific knowledge]. Pipeline, Snakemake, Docker, Galaxy, Python, Conda, Workflow Definition Language (WDL), [https://www.nextflow.io/ Nextflow]. The best is to embed the workflow in a container; see [https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2446-1 Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics] by Baichoo 2018. | ||
* [https://www.biorxiv.org/content/10.1101/2021.04.30.442204v1?rss=1&s=09 Simplifying the development of portable, scalable, and reproducible workflows] Piccolo 2021. | |||
== R == | == R == | ||
[https://bioconductor.org/packages/release/bioc/html/Rcwl.html Rcwl] package | * [https://cran.r-project.org/web/views/ReproducibleResearch.html CRAN Task View: Reproducible Research] | ||
* [https://bioconductor.org/packages/release/bioc/html/Rcwl.html Rcwl] package | |||
** [https://liubuntu.github.io/Bioc2020RCWL/ Connecting Bioconductor to other bioinformatics tools using Rcwl] from [https://bioc2020.bioconductor.org/workshops.html Bioc2020] | |||
* [https://appsilon.com/reproducible-research-when-your-results-cant-be-reproduced/ Reproducible Research: What to do When Your Results Can’t Be Reproduced]. 3 danger zones. | |||
** R session context | |||
*** R version | |||
*** Packages versions | |||
*** Using set.seed() for a reproducible randomization | |||
*** Floating point accuracy | |||
** Operating System (OS) context | |||
*** System packages versions | |||
*** System locale | |||
*** Environment variables | |||
** Data versioning | |||
<ul> | |||
<li>[https://pure.mpg.de/rest/items/item_3178013_4/component/file_3178471/content A Reproducible Data Analysis Workflow with R Markdown, Git, Make, and Docker], [https://brandmaier.github.io/reproducible-data-analysis-materials/BioPsy2020.html#1 Slides], [https://github.com/aaronpeikert/repro-talk Talks & Video]. The whole idea is written in an R package [https://github.com/aaronpeikert/repro repro] package. The package create an R project Template where we can use it by RStudio -> New Project -> '''Create Example Repro Template'''. Note that the Makefile and Dockerfile can be inferred from the markdown.Rmd file. Note this approach does not make use the '''renv''' package. Also it cannot handle Bioconductor packages. Four elements | |||
<ul> | |||
<li>Git folder of source code for version control (R project) </li> | |||
<li>Makefile. Make is a “recipe” language that describes how files depend on each other and how to resolve these dependencies.</li> | |||
<li>Docker software environment (Containerization)</li> | |||
<li>RMarkdown (dynamic document generation)</li> | |||
</ul> | |||
<pre> | |||
automake() # Create '.repro/Dockerfile_packages', | |||
# '.repro/Makefile_Rmds' & 'Dockerfile' | |||
# and open <Makefile> | |||
# Modify <Makefile> by following the console output | |||
rerun() # will inspects the files of a project and suggest a way to | |||
# reproduce the project. So just follow the console output | |||
# by opening a terminal and typing | |||
make docker && make -B DOCKER=TRUE | |||
# The above will generate the output html file in your browser | |||
</pre> | |||
In the end, it calls the following command according to the console output where 'reproproject' in this example is the Docker image name (same as my project name except it automatically converts the name to lower cases). | |||
{{Pre}} | |||
docker run --rm --user 368262265 \ | |||
-v "/Full_Path_To_Project":"/home/rstudio/" \ | |||
reproproject Rscript \ | |||
-e 'rmarkdown::render("/home/rstudio//markdown.Rmd", "all")' | |||
</pre> | |||
</li> | |||
<li>[https://jhudatascience.org/Adv_Reproducibility_in_Cancer_Informatics/ Advanced Reproducibility in Cancer Informatics] </li> | |||
<li>[https://github.com/ttimbers/jsm2023-teaching-reproducibility-and-responsible-workflow Teaching reproducibility and responsible workflow] (2023 JSM) </li> | |||
<li>[https://www.r-bloggers.com/2023/10/an-overview-of-whats-out-there-for-reproducibility-with-r/ An overview of what’s out there for reproducibility with R] 2023/10/5 | |||
<li>[https://www.youtube.com/watch?app=desktop&v=1jVJSPsC4Yo Building Reproducible Analytical Pipelines with R by Dr. Bruno André Rodrigues Coelho | Tunis R User] 2023/12/9. | |||
</ul> | |||
= Rmarkdown = | = Rmarkdown = | ||
Line 10: | Line 61: | ||
= packrat = | = packrat = | ||
[[R_packages|R packages & | |||
* [https://cran.r-project.org/web/packages/packrat/ CRAN] & [https://rstudio.github.io/packrat/ Github] | |||
** [https://github.com/rstudio/packrat/issues?q=bioconductor Bioconductor] related issues | |||
* Videos: | |||
** https://www.rstudio.com/resources/webinars/managing-package-dependencies-in-r-with-packrat/ | |||
** https://www.rstudio.com/resources/webinars/rstudio-essentials-webinar-series-managing-part-3/ | |||
* Packrat will not only store all packages, but also all project files. | |||
* Packrat is integrated in RStudio’s user interface. It allows you to share projects along co-workers easily. See [https://rstudio.github.io/packrat/rstudio.html Using Packrat with RStudio]. | |||
* [https://rstudio.github.io/packrat/limitations.html limitations]. | |||
** [https://cran.r-project.org/web/packages/XML/index.html XML] package needs to install some OS library ''libxml2''. So it is not just R package issue. | |||
** [[Install_R#Ubuntu.2FDebian_goodies|Ubuntu goodies]] | |||
* [https://stackoverflow.com/questions/36187543/using-r-with-git-and-packrat Git and packrat]. The ''packrat/src'' directory can be very large. ''If you don't want them available in your git-repo, you simply add packrat/src/ to the .gitignore. But, this will mean that anyone accessing the git-repo will not have access to the package source code, and the files will be downloaded from CRAN, or from wherever the source line dictates within the packrat.lock file.'' | |||
** [https://www.joelnitta.com/post/packrat/ Using packrat with git for (better) version control] Jun 2018 | |||
* A scenario that we need packrat: suppose we are developing a package in the current R-3.5.X. Our package requires the 'doRNG' package. That package depends the 'rngtools' package. A few months later a new R (3.6.0) was released and a new release (1.3.1.1) of 'rngtools' also requires R-3.6.0. So if we want to install 'doRNG' in R-3.5.x, it will fail with an error: ''dependency 'rngtools' is not available for package 'doRNG' ''. | |||
== Create a snapshot == | |||
* Do we really need to call packrat::snapshot()? The [https://rstudio.github.io/packrat/walkthrough.html walk through] page says it is not needed but the lock file is not updated from my testing. | |||
* I got an error when it is trying to fetch the source code from bioconductor and local repositories: packrat is trying to fetch the source from CRAN in these two packages. | |||
** On normal case, the packrat/packrat.lock file contains two entries in 'Repos' field (line 4). | |||
** The cause of the error is I ran snapshot() after I quitted R and entered again. So the solution is to add bioc and local repositories to options(repos). | |||
** So what is important of running snapshot()? | |||
** Check out the [https://groups.google.com/forum/#!forum/packrat-discuss forum]. | |||
<syntaxhighlight lang='rsplus'> | |||
> dir.create("~/projects/babynames", recu=T) | |||
> packrat::init("~/projects/babynames") | |||
Initializing packrat project in directory: | |||
- "~/projects/babynames" | |||
Adding these packages to packrat: | |||
_ | |||
packrat 0.4.9-3 | |||
Fetching sources for packrat (0.4.9-3) ... OK (CRAN current) | |||
Snapshot written to '/home/brb/projects/babynames/packrat/packrat.lock' | |||
Installing packrat (0.4.9-3) ... | |||
OK (built source) | |||
Initialization complete! | |||
Unloading packages in user library: | |||
- packrat | |||
Packrat mode on. Using library in directory: | |||
- "~/projects/babynames/packrat/lib" | |||
> install.packages("reshape2") | |||
> packrat::snapshot() | |||
> system("tree -L 2 ~/projects/babynames/packrat/") | |||
/home/brb/projects/babynames/packrat/ | |||
├── init.R | |||
├── lib | |||
│ └── x86_64-pc-linux-gnu | |||
├── lib-ext | |||
│ └── x86_64-pc-linux-gnu | |||
├── lib-R # base packages | |||
│ └── x86_64-pc-linux-gnu | |||
├── packrat.lock | |||
├── packrat.opts | |||
└── src | |||
├── bitops | |||
├── glue | |||
├── magrittr | |||
├── packrat | |||
├── plyr | |||
├── Rcpp | |||
├── reshape2 | |||
├── stringi | |||
└── stringr | |||
</syntaxhighlight> | |||
== Restoring snapshots == | |||
Suppose a packrat project was created on Ubuntu 16.04 and we now want to repeat the analysis on Ubuntu 18.04. We first copy the whole project directory ('babynames') to Ubuntu 18.04. Then we should delete the library subdirectory ('packrat/lib') which contains binary files (*.so) that do not work on the new OS. After we delete the library subdirectory, start R from the project directory. Now if we run '''packrat::restore()''' command, it will re-install all missing libraries. Bingo! NOTE: Maybe I should use '''packrat::bundle()''' instead of manually copy the whole project folder. | |||
Note: some OS level libraries (e.g. libXXX-dev) need to be installed manually beforehand in order for the magic to work. | |||
<syntaxhighlight lang='rsplus'> | |||
$ rm -rf ~/projects/babynames/packrat/lib | |||
$ cd ~/projects/babynames/ | |||
$ R | |||
> | |||
> packrat::status() | |||
> remove.packages("plyr") | |||
> packrat::status() | |||
> packrat::restore() | |||
</syntaxhighlight> | |||
== Workflow == | |||
<pre> | |||
setwd("ProjectDir") | |||
packrat::init() | |||
packrat::on() # packrat::search_path() | |||
install.packages() | |||
# For personal packages stored locally | |||
packrat::set_opts(local.repos = "~/git/R") | |||
packrat::install_local("digest") # dir name of the package | |||
library(YourPackageName) | |||
# double check all dependent ones have been installed | |||
packrat::snapshot() | |||
packrat::bundle() | |||
</pre> | |||
A bundle file (*.tar.gz) will be created under ProjectDir/packrat/src directory. '''Note this tar.gz file includes the whole project folder. ''' | |||
To unbundle the project in a new R environment/directory: | |||
<pre> | |||
setwd("NewDirectory") # optional | |||
packrat::unbundle(FullPathofBundleTarBall, ".") | |||
# this will create 'ProjectDir' | |||
# CPU is more important than disk speed | |||
# At the end, it will show the project has been unbundled and restored at ... | |||
setwd("ProjectDir") | |||
packrat::packrat_mode() # on | |||
.libPaths() # verify | |||
library() # Expect to see packages in our bundle | |||
# packrat::on() | |||
</pre> | |||
Example 1: The above method works for packages from Bioconductor; e.g. S4Vectors which depends on BiocGenerics & BiocVersion only. However, Bioconductor project des not have a snapshot repository like MRAN. So it is difficult to reproduce the environment for an earlier release of Bioconductor. | |||
Example 2: bundle our in-house R package for future reproducibility. | |||
== Set Up a Custom CRAN-like Repository == | |||
See https://rstudio.github.io/packrat/custom-repos.html. Note the personal repository name ('sushi' in this example) used in "Repository" field of the personal package will be used in <packrat/packrat.lock> file. So as long as we work on the same computer, it is easy to restore a packrat project containing packages coming from personal repository. | |||
'''[https://rstudio.github.io/packrat/commands.html Common functions]''': | |||
* packrat::init() | |||
* packrat::snapshot(), packrat::restore() | |||
* packrat::clean() | |||
* packrat::status() | |||
* packrat::install_local() # http://rstudio.github.io/packrat/limitations.html | |||
* packrat::bundle() # see @28:44 of the [https://www.rstudio.com/resources/webinars/managing-package-dependencies-in-r-with-packrat/ video], packrat::unbundle() # see @29:17 of the same video. This will rebuild all packages | |||
* packrat::on(), packrat::off() | |||
* packrat::get_opts() | |||
* packrat::set_opts() # http://rstudio.github.io/packrat/limitations.html | |||
* packrat::opts$local.repos("~/local-cran") | |||
* packrat::opts$external.packages(c("devtools")) # break the isolation | |||
* packrat::extlib() | |||
* packrat::with_extlib() | |||
* packrat::project_dir(), .libPaths() | |||
== Warning == | |||
* If we download and modify some function definition from a package in CRAN without changing DESCRIPTION file or the package name, the snapshot created using packrat::snapshot() will contain the package source from CRAN instead of local repository. This is because (I guess) the DESCRIPTION file contains a field 'Repository' with the value 'CRAN'. | |||
== Docker == | |||
[https://www.joelnitta.com/post/docker-and-packrat/ Docker and Packrat]. | |||
* This is a minimal example that installs a single package each from CRAN, bioconductor, and github to a Docker image using packrat. | |||
* All operations are done in the container. So the host OS does not need to have R installed. | |||
* The R script will install packrat in the container. It will also initialize packrat in the working directory and install R packages there. But in the [https://www.rdocumentation.org/packages/packrat/versions/0.5.0/topics/snapshot packrat::snapshot()] it chooses '''snapshot.sources = FALSE'''. The goal is to generate packrat.lock file. | |||
* The first part of generating packrat.lock is not quite right since the file was generated in the container only. We should use '''-v''' in the ''docker run'' command. The github repository at https://github.com/joelnitta/docker-packrat-example has fixed the problem. | |||
{{Pre}} | |||
$ git clone https://github.com/joelnitta/docker-packrat-example.git | |||
$ cd docker-packrat-example | |||
# Step 1: create the 'packrat.lock' file | |||
$ nano install_packages.R # note: nano is not available in the rstudio container | |||
# need to install additional OS level packages like libcurl | |||
# in rocker/rstudio. Probably rocker/tidyverse is better than rstudio | |||
# | |||
$ docker run -it -e DISABLE_AUTH=true -v $(pwd):/home/rstudio/project rocker/tidyverse:3.6.0 bash | |||
# Inside the container now | |||
$ cd home/rstudio/project | |||
$ time Rscript install_packages.R # generate 'packrat/packrat.lock' | |||
$ exit # It took 43 minutes. | |||
# Question: is there an easier way to generate packrat.lock without | |||
# wasting time to install lots of packages? | |||
# Step 2: build the image | |||
# Open another terminal/tab | |||
$ nano Dockerfile # change rocker image and R version. Make sure these two are the same as | |||
# we have used when we created the 'packrat.lock' file | |||
$ time docker build . -t mycontainer # It took 45 minutes. | |||
$ docker run -it mycontainer R | |||
# Step 3: check the packages defined in 'install_packages.R' are installed | |||
packageVersion("minimal") | |||
packageVersion("biospear") | |||
</pre> | |||
Questions: | |||
* After running the statement packrat::init(), it will leave a footprint of a hidden file '''.Rprofile''' in the current directory. PS: The [https://rstudio.github.io/packrat/walkthrough.html purpose of .Rprofile file] is to direct R to use the private package library (when it is started from the project directory). <syntaxhighlight lang='rsplus'> | |||
#### -- Packrat Autoloader (version 0.5.0) -- #### | |||
source("packrat/init.R") | |||
#### -- End Packrat Autoloader -- #### | |||
</syntaxhighlight> | |||
: If the 'packrat' directory was accidentally deleted, next time when you launch R it will show an error message because it cannot find the file. | |||
* The ownership of the 'packrat' directory will be root now. See this [https://rviews.rstudio.com/2018/01/18/package-management-for-reproducible-r-code/ Package Management for Reproducible R Code]. | |||
* This sophisticated approach does not save the package source code. If a package has been updated and the version we used has been moved to archive in CRAN, what will happen when we try to restore it? So it is probably better to use '''snapshot.sources = TRUE''' and run packrat::bundle(). | |||
= renv: successor to the packrat package = | |||
* https://rstudio.github.io/renv/index.html | |||
* [https://blog.rstudio.com/2019/11/06/renv-project-environments-for-r/ release] 2019-11-6 | |||
* [https://rstudio.github.io/renv/articles/renv.html Introduction to renv] 2021-01-09 | |||
* [https://www.r-bloggers.com/2023/03/r-renv-how-to-manage-dependencies-in-r-projects-easily/ R renv: How to Manage Dependencies in R Projects Easily] 2023-03-22 | |||
* The [https://rstudio.github.io/renv/reference/migrate.html renv::migrate()] function makes it possible to migrate projects from '''Packrat''' to '''renv'''. | |||
* [https://blog.rstudio.com/2020/08/20/why-package-environment-management-is-critical-for-serious-data-science/ Why Package & Environment Management is Critical for Serious Data Science] and a [https://garciamikep.github.io/useR-webinar/#41 workflow]. | |||
* [https://medium.com/analytics-vidhya/deploying-an-r-shiny-app-on-heroku-free-tier-b31003858b68 Deploying an R Shiny app on Heroku free tier] | |||
* [https://github.com/rstudio/renv/issues?q=bioconductor Bioconductor] related questions | |||
* [https://daryavanichkina.com/posts/2021-07-28-renvhpc Installing packages on a PBS-Pro HPC cluster using renv] | |||
* [https://www.r-bloggers.com/2023/05/dependency-management/ Dependency Management] | |||
Compare to packrat: | |||
* Many packages are difficult to build from sources. Your system will need to have a compatible compiler toolchain available. In some cases, R packages may depend on C/C++ features that aren't available in an older system toolchain, especially in some older Linux enterprise environments. | |||
* '''renv no longer attempts to explicitly download and track R package source tarballs within your project.''' For packages from local sources, refer [https://rstudio.github.io/renv/articles/local-sources.html this article]. | |||
* renv has its discovery machinery to analyze your R code to determine which R packages will be included in the lock file. We can however instead prefer to capture ''all'' packages installed into your project library by using '''renv::settings$snapshot.type("all") ''' | |||
renv package does not have bundle() nor unbundle() function. | |||
<syntaxhighlight lang='r'> | |||
# mkdir renvdeseq2 | |||
setwd("renvdeseq2") | |||
renv::init(bioconductor = TRUE) | |||
# attempts to copy and reuse packages | |||
# already installed in your R libraries | |||
# We'll be asked to restart the R session if we | |||
# are not doing this in RStudio. | |||
renv::install("BiocManager") | |||
# method 1: this will only install packages under the curDir/renv/... folder | |||
BiocManager::install("DESeq2") | |||
# method 2: this will install packages in ~/.cache/R/renv/renv/... folder | |||
# therefore, the library can be reused by other needs. | |||
options(repos = BiocManager::repositories()) | |||
renv::install("DESeq2") | |||
renv::snapshot() # create renv.lock | |||
# it seems the lock file "renvdeseq2/renv.lock" does not | |||
# save any package info I just installed from Bioconductor | |||
# except the renv package. | |||
# Read https://rstudio.github.io/renv/articles/faq.html | |||
</syntaxhighlight> | |||
Find R package dependencies in a project | |||
<pre> | |||
renv::dependencies() | |||
</pre> | |||
The following line will make snapshot() to write all packages in renv .cache directory (e.g., ~/.cache/R/renv/cache/v5/R-4.2/x86_64-pc-linux-gnu/) to renv.lock file. Note that the setting is persistent even we restart R! | |||
<pre> | |||
renv::settings$snapshot.type("all") # default is "implicit" | |||
renv::snapshot() | |||
</pre> | |||
Pass renv.lock to other people and/or clone the project repository | |||
<pre> | |||
# Make sure the 'renv' package has been installed on the remote computer | |||
install.packages("renv") | |||
renv::init() # install the packages declared in renv.lock | |||
</pre> | |||
Use '''[https://rstudio.github.io/renv/reference/migrate.html renv::migrate()]''' to port a Packrat project to renv. | |||
== renv::install() == | |||
[https://youtu.be/yc7ZB4F_dc0?t=346 Using renv to track the version of your packages in R (CC229)]. After renv::init(), it will identify some packages we don't have or have older versions... In the end we are informed some packages are not installed. '''Consider reinstalling these packages before snapshotting the lockfile'''. So go ahead and run '''renv::snapshot()'''. | |||
== Other sources == | |||
* [https://rstudio.github.io/renv/articles/renv.html#package-sources Package sources] | |||
* [https://stackoverflow.com/a/63373333 Use renv for private GitLab package] | |||
For example, for DeMixT from github, | |||
<pre> | |||
renv::init() | |||
renv::install("wwylab/DeMixT") | |||
# Error: package 'SummarizedExperiment' is not available | |||
renv::install("bioc::SummarizedExperiment") | |||
renv::install("wwylab/DeMixT") | |||
renv::snapshot() | |||
</pre> | |||
== install.packages() == | |||
It seems install.packages() also install the packages in the project directory. So it's not clear what's the difference of install.packages() and renv::install() for simple case. But renv::install() is more flexible than install.packages(). | |||
Note that the installed packages won't go into the lock file unless the project is using it. For example, we can create a simple R file that calls "library(PACKAGENAME)" and in the R console we can run "source(MySimple.R)". Now when we run '''renv::snapshot()''', the PACKAGENAME will be recorded. | |||
If I open a project that loaded an renv environment, then calling "install.packages()" will install new packages into the renv's cache folder (e.g., ''~/.cache/R/renv/cache/v5/R-4.2/x86_64-pc-linux-gnu/'' in Linux). Note that the version number will be recorded too (e.g., ''~/.cache/R/renv/cache/v5/R-4.2/x86_64-pc-linux-gnu/pkgndep/1.2.1'' ). | |||
== Reference == | |||
See [https://rstudio.github.io/renv/reference/index.html Reference]. | |||
== Bioconductor == | |||
[https://rstudio.github.io/renv/articles/bioconductor.html Using renv with Bioconductor] | |||
Create an Rmd file and include an R chunk "library(DESeq2)". Then run the following line | |||
<pre> | |||
renv::init(bioconductor = TRUE) | |||
</pre> | |||
and it will generate "renv.lock", ".Rprofile" files and "renv" directory. | |||
PS. | |||
* When we install a fresh [https://cran.r-project.org/bin/linux/ubuntu/ R in Ubuntu], we should run '''"sudo apt install r-base-dev curl libcurl4-openssl-dev libssl-dev libxml2-dev " '''system packages before we can successfully run "BiocManager::install('DESeq2')". | |||
* It is perfectly fine to run '''renv::init(bioconductor = TRUE)''' even if you have previously run renv::init() without the bioconductor argument. The bioconductor argument simply ensures that Bioconductor repositories are activated within your renv project. | |||
== renv::dependencies() == | |||
[https://rstudio.github.io/renv/reference/dependencies.html ?dependencies]. Find R packages used within a project. dependencies() will crawl files within your project, looking for R files and the packages used within those R files. | |||
<pre> | |||
df <- renv::dependencies("Some_Dir") | |||
</pre> | |||
It also search Rmd files from my testing. | |||
== renv::record() == | |||
You can use the [https://rstudio.github.io/renv/reference/record.html record()] function from the renv package to record a new entry within an existing renv.lock file. | |||
<pre> | |||
renv::record("[email protected]") | |||
</pre> | |||
However, the package is still not installed in the local directory. In other words, '''renv::record()''' seems to be an opposite function to '''renv::install()''' where renv::install() will install a package in the local directory even the package was not used anywhere. | |||
== renv::load() == | |||
[https://rstudio.github.io/renv/reference/load.html ?renv::load]. It is especially useful in Windows OS. | |||
Note that it does not change the working directory though. | |||
== renv::restore() == | |||
* For renv-based project, we just need to share a text file '''renv.lock''' to our colleague. But for packrat-based project, we need to run bundle() command and pass a tar.gz file to our colleague. | |||
* See the output message on [https://gist.github.com/arraytools/6a8741c5cac70dfc36a0ff5321d2ee0d here]. This is based on renv 0.16.0 (2022-09-29). | |||
* renv::restore() can be slow since it needs to compile packages from source. The "make" utility is required for it to work! | |||
* My tips | |||
** renv::restore() will use source to restore. This can take a long time. | |||
** Use P3M from Posit. Click "Setup" in [https://packagemanager.posit.co/client/#/ P3M] and follow the instruction there for your OS. For example, on Windows, I can run <syntaxhighlight lang='r' inline>options(repos = c(CRAN = "https://packagemanager.posit.co/cran/latest")) </syntaxhighlight> | |||
** Even we try to use P3M for package installation, a few packages still need to be install from source. On Windows OS, we need to install [https://cran.rstudio.com/bin/windows/Rtools/ Rtools]. After installing Rtools by accepting the defaults, no further setup is needed. R will be able to recognize all new binaries. See [[R_packages#Windows:_Rtools|Windows -> Rtools]]. | |||
** After successfully calling renv::restore(), we need to restart R. If we use R console instead of RStudio, we can use '''renv::load()'''. This is useful for the case of Windows OS + R console. | |||
== renv::update() == | |||
https://rstudio.github.io/renv/reference/update.html | |||
<pre> | |||
renv::update() # including Bioconductor, Github, Gitlab, Git, Bitbucket, ... | |||
renv::update(packages = c("dplyr", "ggplot2", "tidyr")) # update specific CRAN | |||
renv::install("bioc::Biobase") # install/update specific Bioconductor package | |||
renv::update(packages = "mygithubpackage") | |||
</pre> | |||
== A case with issues using renv::snapshot() & renv::restore() == | |||
<ul> | |||
<li>[https://bioc.cran.dev/packages/3.17/bioc/html/BiocGenerics.html BiocGenerics] in Bioconductor 3.17 is now 0.46.0 but I have 0.45.3. Also the current Bioconductor 3.18 shows BiocGenerics version 0.48.1. | |||
<syntaxhighlight lang='r'> | |||
... | |||
* Project '~/Project' loaded. [renv 0.17.3] | |||
* The project is currently out-of-sync. | |||
* Use `renv::status()` for more details. | |||
> renv::snapshot() | |||
The following Bioconductor packages appear to be from a separate Bioconductor release: | |||
BiocGenerics [installed 0.45.3 != latest 0.46.0] | |||
renv may be unable to restore these packages. | |||
Bioconductor version: 3.17 | |||
The following package(s) have unsatisfied dependencies: | |||
MatrixModels requires Matrix (>= 1.6-0), but version 1.5-4 is installed | |||
Consider updating the required dependencies as appropriate. | |||
Do you want to proceed? [y/N]: N | |||
> packageVersion("BiocGenerics") | |||
[1] ‘0.45.3’ | |||
> packageVersion("Matrix") | |||
[1] ‘1.5.4’ | |||
> packageVersion("MatrixModels") | |||
[1] ‘0.5.3’ | |||
> packageVersion("renv") | |||
[1] ‘0.17.3’ | |||
</syntaxhighlight> | |||
Q: MatrixModel was not recorded in renv.lock. Why renv::snapshot() shows unsatisfied dependencies for the 'MatrixModels' package. Open a terminal and list the files in directory "./renv/library/R-4.3/aarch64-apple-darwin20" by dates. Decide to delete the package. In the end, I run remove.packages("MatrixModels") and BiocManager::install("BiocGenerics") to update the package to the latest version in Bioconductor 3.17 (old) release.</BR> | |||
<li>(Cont.) When I run renv::restore() on another machine, I got an error related to BiocGenerics. | |||
<syntaxhighlight lang='r'> | |||
> renv::restore() | |||
It looks like you've called renv::restore() in a project that hasn't been activated yet. | |||
How would you like to proceed? | |||
1: Activate the project and use the project library. | |||
2: Do not activate the project and use the current library paths. | |||
3: Cancel and resolve the situation another way. | |||
Selection: 1 | |||
- renv activated -- please restart the R session. | |||
The following package(s) will be updated: | |||
# Bioconductor --------------------------------------------------------------- | |||
- BiocGenerics [0.46.0 -> 0.45.3] | |||
- IRanges [2.34.1 -> 2.34.0] | |||
- S4Vectors [0.38.2 -> 0.38.1] | |||
# CRAN ----------------------------------------------------------------------- | |||
- BiocManager [1.30.22 -> 1.30.20] | |||
... | |||
- Downloading S4Vectors from Bioconductor ... OK [819.2 Kb in 0.63s] | |||
- Downloading BiocGenerics from Bioconductor ... ERROR [error code 22] | |||
- Downloading BiocGenerics from Bioconductor ... ERROR [error code 22] | |||
- Downloading S4Vectors from Bioconductor ... ERROR [error code 22] | |||
Warning: failed to find source for 'S4Vectors 0.38.1' in package repositories | |||
Warning: failed to find source for 'BiocGenerics 0.45.3' in package repositories | |||
Warning: error downloading 'https://bioconductor.org/packages/3.17/bioc/src/contrib/Archive/BiocGenerics/BiocGenerics_0.45.3.tar.gz' [error code 22] | |||
Warning: error downloading 'https://cran.rstudio.com/src/contrib/Archive/BiocGenerics/BiocGenerics_0.45.3.tar.gz' [error code 22] | |||
Warning: error downloading 'https://cran.rstudio.com/src/contrib/Archive/S4Vectors/S4Vectors_0.38.1.tar.gz' [error code 22] | |||
Error: failed to retrieve package '[email protected]' | |||
Traceback (most recent calls last): | |||
9: renv::restore() | |||
8: renv_restore_run_actions(project, diff, current, lockfile, rebuild) | |||
7: retrieve(packages) | |||
6: handler(package, renv_retrieve_impl(package)) | |||
5: renv_retrieve_impl(package) | |||
4: renv_retrieve_bioconductor(record) | |||
3: renv_retrieve_repos(record) | |||
2: stopf("failed to retrieve package '%s'", renv_record_format_remote(record)) | |||
1: stop(sprintf(fmt, ...), call. = call.) | |||
</syntaxhighlight> | |||
<li>I go back to the original project. Run 'BiocManager::install("BiocGenerics")' and remove.packages("MatrixModels") | |||
<syntaxhighlight lang='r'> | |||
> renv::snapshot() | |||
The following package(s) will be updated in the lockfile: | |||
# Bioconductor ======================= | |||
- BiocGenerics [0.45.3 -> 0.46.0] | |||
# CRAN =============================== | |||
- Matrix [1.5-4 -> 1.6-5] | |||
... | |||
The version of R recorded in the lockfile will be updated: | |||
- R [4.3.1 -> 4.3.2] | |||
Do you want to proceed? [y/N]: y | |||
</syntaxhighlight> | |||
Now I copy renv.lock to another machine/place. Call renv::restore() to test again. | |||
<li>(Cont.) renv::restore() did show errors in the processing, but failed to give a warning at the end. | |||
<syntaxhighlight lang='r'> | |||
> renv::restore() | |||
It looks like you've called renv::restore() in a project that hasn't been activated yet. | |||
How would you like to proceed? | |||
1: Activate the project and use the project library. | |||
2: Do not activate the project and use the current library paths. | |||
3: Cancel and resolve the situation another way. | |||
Selection: 1 | |||
- renv activated -- please restart the R session. | |||
The following package(s) will be updated: | |||
... | |||
Do you want to proceed? [Y/n]: | |||
# Downloading packages ------------------------------------------------------- | |||
- Downloading vctrs from CRAN ... OK [file is up to date] | |||
- Downloading tinytex from CRAN ... OK [file is up to date] | |||
... | |||
- Downloading S4Vectors from Bioconductor ... OK [file is up to date] | |||
- Downloading mgcv from CRAN ... ERROR [error code 22] | |||
- Downloading mgcv from CRAN ... OK [file is up to date] | |||
- Downloading nlme from CRAN ... ERROR [error code 22] | |||
- Downloading nlme from CRAN ... OK [file is up to date] | |||
... | |||
Successfully downloaded 60 packages in 460 seconds. | |||
# Installing packages -------------------------------------------------------- | |||
- Installing clue ... OK [copied from cache] | |||
- Installing lattice ... OK [copied from cache] | |||
... | |||
The following loaded package(s) have been updated: | |||
- BiocManager | |||
- renv <------------ Something is wrong. Just 2 packages got installed. | |||
Restart your R session to use the new versions. | |||
> q() | |||
Save workspace image? [y/n/c]: n | |||
$ R | |||
- Project '~/Project' loaded. [renv 1.0.4] | |||
- One or more packages recorded in the lockfile are not installed. | |||
- Use `renv::status()` for more details. | |||
Warning message: | |||
renv 1.0.4 was loaded from project library, but this project is configured to use renv ${VERSION}. | |||
Use `renv::record("[email protected]")` to record renv 1.0.4 in the lockfile. | |||
Use `renv::restore(packages = "renv")` to install renv ${VERSION} into the project library. | |||
> packageVersion("renv") | |||
[1] ‘1.0.4’ | |||
> library() <-------------- Just show 2 packages in the renv directory. | |||
</syntaxhighlight> | |||
<li>(Cont.) I repeat the step of calling renv::restore() again. Now library() shows a complete list. | |||
<syntaxhighlight lang='r'> | |||
installed.packages(lib="./renv/library/R-4.3/x86_64-pc-linux-gnu") |> dim() | |||
[1] 227 16 | |||
</syntaxhighlight> | |||
Testing loading packages on the new machine and everything looks well. | |||
<li>It seems to be OK the renv versions are different on the old (0.17.3) and new systems (1.0.3). But a problem with using the old renv is [https://github.com/rstudio/renv/issues/1356 BiocVersion recorded in lockfile but not used in this project]. So I decided to upgrade the renv package. After upgrading the version, the warning is gone. | |||
</ul> | |||
== A case with only one CRAN package and the first time use == | |||
<ul> | |||
<li>I put glmnet in an R file. | |||
<li>renv::init() returned a warning message. | |||
<pre> | |||
> install.packages("renv") # 1.0.4 in R 4.3.2 | |||
> renv::init() | |||
renv: Project Environments for R | |||
Welcome to renv! It looks like this is your first time using renv. | |||
This is a one-time message, briefly describing some of renv's functionality. | |||
renv will write to files within the active project folder, including: | |||
- A folder 'renv' in the project directory, and | |||
- A lockfile called 'renv.lock' in the project directory. | |||
In particular, projects using renv will normally use a private, per-project | |||
R library, in which new packages will be installed. This project library is | |||
isolated from other R libraries on your system. | |||
In addition, renv will update files within your project directory, including: | |||
- .gitignore | |||
- .Rbuildignore | |||
- .Rprofile | |||
Finally, renv maintains a local cache of data on the filesystem, located at: | |||
- "~/.cache/R/renv" | |||
This path can be customized: please see the documentation in `?renv::paths`. | |||
Please read the introduction vignette with `vignette("renv")` for more information. | |||
You can browse the package documentation online at https://rstudio.github.io/renv/. | |||
Do you want to proceed? [y/N]: y | |||
- "~/.cache/R/renv" has been created. | |||
- Resolving missing dependencies ... | |||
# Downloading packages ------------------------------------------------------- | |||
- Downloading glmnet from CRAN ... OK [2.3 Mb in 0.17s] | |||
- Downloading foreach from CRAN ... OK [87.7 Kb] | |||
- Downloading iterators from CRAN ... OK [293.2 Kb in 0.11s] | |||
- Downloading shape from CRAN ... OK [631.3 Kb in 0.15s] | |||
- Downloading Rcpp from CRAN ... OK [3.3 Mb in 0.21s] | |||
- Downloading RcppEigen from CRAN ... OK [1.4 Mb in 0.15s] | |||
Successfully downloaded 6 packages in 2.6 seconds. | |||
# Installing packages -------------------------------------------------------- | |||
- Installing iterators ... OK [built from source and cached in 1.3s] | |||
- Installing foreach ... OK [built from source and cached in 1.4s] | |||
- Installing shape ... OK [built from source and cached in 1.5s] | |||
- Installing Rcpp ... OK [built from source and cached in 30s] | |||
- Installing RcppEigen ... OK [built from source and cached in 41s] | |||
- Installing glmnet ... OK [built from source and cached in 1.2m] | |||
The following required packages are not installed: | |||
- codetools [required by foreach] | |||
- Matrix [required by glmnet] | |||
- survival [required by glmnet] | |||
Consider reinstalling these packages before snapshotting the lockfile. | |||
The following package(s) will be updated in the lockfile: | |||
# CRAN ----------------------------------------------------------------------- | |||
- foreach [* -> 1.5.2] | |||
- glmnet [* -> 4.1-8] | |||
- iterators [* -> 1.0.14] | |||
- Rcpp [* -> 1.0.12] | |||
- RcppEigen [* -> 0.3.3.9.4] | |||
- renv [* -> 1.0.4] | |||
- shape [* -> 1.4.6.1] | |||
The version of R recorded in the lockfile will be updated: | |||
- R [* -> 4.3.2] | |||
- Lockfile written to "/tmp/test/renv.lock". | |||
- renv activated -- please restart the R session. | |||
> q() | |||
</pre> | |||
I copy renv.lock to renv-old.lock for comparison purpose later. Note that 'R' repository is "https://cloud.r-project.org". | |||
<li>Quit R. Get a warning message about '''inconsistent state'''. The document [https://rstudio.github.io/renv/reference/status.html#lockfile-vs-dependencies- Report inconsistencies between lockfile, library, and dependencies -> Lockfile vs dependencies()] instructs to run '''renv::snapshot()''' to fix the problem. In this case, glmnet depends on Matrix, survival,... which are part of built-in/recommended R packages. | |||
<pre> | |||
- Project '/tmp/test' loaded. [renv 1.0.4] | |||
- The project is out-of-sync -- use `renv::status()` for details. | |||
> renv::status() | |||
The following package(s) are in an inconsistent state: | |||
package installed recorded used | |||
codetools y n y | |||
lattice y n y | |||
Matrix y n y | |||
survival y n y | |||
See ?renv::status() for advice on resolving these issues. | |||
> packageVersion("renv") | |||
[1] ‘1.0.4’ | |||
> renv::snapshot() | |||
The following package(s) will be updated in the lockfile: | |||
# CRAN ----------------------------------------------------------------------- | |||
- codetools [* -> 0.2-19] | |||
- lattice [* -> 0.22-5] | |||
- Matrix [* -> 1.6-1.1] | |||
- survival [* -> 3.5-7] | |||
Do you want to proceed? [Y/n]: | |||
- Lockfile written to "/tmp/test/renv.lock". | |||
> q() | |||
</pre> | |||
Close and open R again. No complain. | |||
<pre> | |||
- Project '/tmp/test' loaded. [renv 1.0.4] | |||
> renv::status() | |||
No issues found -- the project is in a consistent state. | |||
</pre> | |||
<li>Compare the renv-old.lock and current renv.lock files. ''Matrix, codetools, lattice'' and ''survival'' packages are added. | |||
</ul> | |||
== A case from 'Survive with Omics' == | |||
https://ocbe-uio.github.io/survomics/survomics.html | |||
<ul> | |||
<li>Create a file ~/renv/survomics/test.R containing all lines of library() statement | |||
<li>R - | |||
<syntaxhighlight lang='r'> | |||
install.packages("renv") | |||
renv::init(bioconductor = TRUE) | |||
q() | |||
</syntaxhighlight> | |||
<li>R - | |||
<syntaxhighlight lang='r'> | |||
renv::status() | |||
renv::install("psbcGroup") | |||
# fatal error: gsl/gsl_matrix.h: No such file or directory | |||
# Search 'gsl' in https://packagemanager.posit.co/client/#/repos/cran/setup | |||
system("sudo apt-get install -y libgsl0-dev") | |||
renv::install("psbcGroup") | |||
renv::install("nyiuab/BhGLM") | |||
q() | |||
</syntaxhighlight> | |||
<li>R - | |||
<syntaxhighlight lang='r'> | |||
renv::status() | |||
renv::snapshot() | |||
q() | |||
</syntaxhighlight> | |||
<li>R - NO MORE MESSAGES | |||
<li>I added "httpgd" package in "test.R". R - | |||
<syntaxhighlight lang='r'> | |||
renv::status() | |||
renv::install("httpgd") | |||
renv::snapshot() | |||
q() | |||
</syntaxhighlight> | |||
</ul> | |||
== Github examples == | |||
* https://github.com/amyfrancis97/DrivR-Base | |||
== rig system make-orthogonal == | |||
The command rig system make-orthogonal is used to make installed versions of R [https://en.wikipedia.org/wiki/Orthogonality#Computer_science orthogonal]. This means that it ensures that different versions of R installed on the same system do not interfere with each other | |||
{{Pre}} | |||
$ cd ~/Project1 # This does not matter as RStudio does not care about this | |||
$ rig rstudio 4.3-arm64 # Good | |||
[INFO] Running open -n -a RStudio --env RSTUDIO_WHICH_R=/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/R | |||
$ rig rstudio 4.2-arm64 # Error | |||
[ERROR] R 4.2-arm64 is not orthogonal, it cannot run as a non-default. Run `rig system make-orthogonal`. | |||
$ rig system make-orthogonal # Fix the error | |||
[INFO] Running `sudo` for updating the R installations. This might need your password. | |||
Password: | |||
[INFO] Making all R versions orthogonal | |||
$ rig rstudio 4.2-arm64 # No more error even RStudio still opens the last project | |||
# no based on the current working directory | |||
[INFO] Running open -n -a RStudio --env RSTUDIO_WHICH_R=/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/R | |||
</pre> | |||
== Summary so far == | |||
I have two ways for associate an R project with an R version (at least on my mac). Install '''rig''' and use '''rig add''' to install multiple versions of R. | |||
# Use "renv" to create an renv environment project. At the end, I run '''mv .Rprofile Rprofile'''. This will prevent loading renv environment automatically (so the current default R version does not matter) and I have a backup of the current renv environment. If I need, I can still rename Rprofile back to .Rprofile and launch R/RStudio. | |||
# Use "renv" to create an renv environment project. Use '''rig rstudio 4.2-arm64''' to launch RStudio and manually change the project to the desired project (from the last open project). | |||
To use with RStudio IDE, see | |||
* '''How to launch a specific version of R from a specific directory''' from the [[Install_R#rig|rig]] page. It works well when the project directory is an renv directory. | |||
* My current solution; see [[Install_R#Mac|Install R]] (not specifically related to renv). | |||
{{Pre}} | |||
open -n -a RStudio ~/proj/proj.Rproj \ | |||
--env RSTUDIO_WHICH_R=/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/R | |||
</pre> | |||
== Videos == | |||
* [https://youtu.be/yjlEbIDevOs Kevin Ushey | renv: Project Environments for R | RStudio (2020)] | |||
* [https://youtu.be/GwVx_pf2uz4 E. David Aja | You should be using renv | RStudio (2022)] | |||
* [https://www.youtube.com/watch?v=N2STULZ1dYo {renv} For Reproducible Analyses] | |||
== Tips == | |||
<ul> | |||
<li>renv::init() will check any syntax errors | |||
{{Pre}} | |||
> renv::init() | |||
WARNING: One or more problems were discovered while enumerating dependencies. | |||
/tmp/Project1/RRR.R | |||
------------------- | |||
ERROR 1: /tmp/Project1/RRR.R:2:1: unexpected '>' | |||
1: # This is a test for renv | |||
2: > | |||
^ | |||
Please see `?renv::dependencies` for more information. | |||
Do you want to proceed? [y/N]: y | |||
</pre> | |||
At the end, it will not record any packages from the '''R file''' in the renv.lock file. When we start R next time, we will see error/warning messages again | |||
<pre> | |||
* Project '/tmp/Project1' loaded. [renv 0.17.0] | |||
WARNING: One or more problems were discovered while enumerating dependencies. | |||
/tmp/Project1/RRR.R | |||
------------------- | |||
ERROR 1: /tmp/Project1/RRR.R:2:1: unexpected '>' | |||
1: # This is a test for renv | |||
2: > | |||
^ | |||
Please see `?renv::dependencies` for more information. | |||
Error: snapshot aborted | |||
Traceback (most recent calls last): | |||
43: source("renv/activate.R") | |||
42: withVisible(eval(ei, envir)) | |||
... | |||
1: stop(condition) | |||
[Previously saved workspace restored] | |||
</pre> | |||
<li>Even for a very simple R file/case, I find "rm -rf renv" will fail if I decide to "clean" the directory. | |||
{{Pre}} | |||
rm: cannot remove 'renv/sandbox/R-4.3/x86_64-pc-linux-gnu/9a444a72/compiler': Permission denied | |||
... | |||
</pre> | |||
<li> For code chunks that you’d explicitly like renv to ignore, you can include '''renv.ignore=TRUE''' in the chunk header | |||
<li>[https://rstudio.github.io/renv/reference/dependencies.html#ignoring-files Ignoring Files]: '''.gitignore''' and '''.renvignore''' | |||
<li>[https://rstudio.github.io/renv/reference/dependencies.html#errors Errors]: Use something like ''renv::settings$snapshot.type("explicit")'' Check out the [https://github.com/rstudio/renv/issues?page=2&q=snapshot.type github issues page] | |||
</ul> | |||
== Hash == | |||
[https://stackoverflow.com/a/63961221 renv - manually overwrite package version in lock file]. The hash is used for caching; it allows renv::restore() to restore a package from the global renv cache if available, thereby avoiding a retrieve + build + install of the package. | |||
If it is not set, then renv will not use the cache and instead always try to retrieve the package from the declared source. | |||
== Cache and path customization == | |||
* [https://rstudio.github.io/renv/reference/paths.html ?renv::paths]. The path can be customized. | |||
* [https://arca-dpss.github.io/manual-open-science/rocker-chapter.html Chapter 13 Rocker] in The Open Science Manual. Make Your Scientific Research Accessible and Reproducible. | |||
* [https://github.com/robertdj/renv-docker A guide to getting {renv} projects into Docker images] | |||
On Linux all R packages under "renv/library/R-4.3/x86_64-pc-linux-gnu/" folder are just soft links to folders in the renv '''cache directory'''. So the project specific renv directory does not take much space. | |||
On my macOS, the cache directory is | |||
<pre> | |||
> renv::paths$cache() | |||
[1] "/Users/USERNAME/Library/Caches/org.R-project.R/R/renv/cache/v5/R-4.3/aarch64-apple-darwin20" | |||
</pre> | |||
On my Linux system, the cache directory is | |||
<syntaxhighlight lang='r'> | |||
> renv::paths$cache() | |||
[1] "/home/USERNAME/.cache/R/renv/cache/v5/R-4.3/x86_64-pc-linux-gnu" | |||
</syntaxhighlight> | |||
On Windows, the cache directory is (replace $USER with the username) | |||
<pre> | |||
C:/Users/$USER/AppData/Local/R/cache/R/renv | |||
</pre> | |||
=== isolate() === | |||
* [https://stackoverflow.com/a/68170298 How can I copy and entire renv based project to a new PC (which does not have internet access)?] | |||
* [https://rstudio.github.io/renv/reference/isolate.html ?isolate] - Copy packages from the renv cache directly into the project library, so that the project can continue to function independently of the renv cache. Remember: normally the R packages under renv/ directory are soft link to renv cache directory. If we use isolate(), the R packages will be "copied" instead of "linked" to the project/renv folder. | |||
* If you want to undo the isolation and revert back to using the renv cache, you can delete the packages in your project library and then call renv::restore(). This will reinstall the packages from the renv cache and create symlinks in your project library. | |||
== Set the default repository: PPM == | |||
<ul> | |||
<li>According to the [https://cran.r-project.org/web/packages/renv/news/news.html NEWS], renv 1.0.0 now uses Posit Public Package Manager by default, for new projects where the repositories have not already been configured externally. | |||
<li>The following works on Ubuntu 24.04 & R 4.4.0.</br> | |||
Method 1: See [[R_packages#Posit_Package_Manager/RStudio_Package_Manager/PPM|R packages -> Posit Package Manager/RStudio Package Manager/PPM]]. </br> | |||
Method 2: | |||
{{Pre}} | |||
renv::install("Rcpp") # install.packages() also works | |||
renv::install("glmnet", repos = "https://packagemanager.posit.co/cran/latest") | |||
# require gfortran, so install.packages() failed | |||
renv::install("RcppArmadillo", repos = "https://packagemanager.posit.co/cran/latest") | |||
# install.packages() compilation failed | |||
renv::install("RcppEigen", repos = "https://packagemanager.posit.co/cran/latest") | |||
# install.packages() compilation failed | |||
</pre> | |||
Question: why renv::restore() will download source code from CRAN instead of binary? | |||
<li>https://rstudio.github.io/renv/reference/config.html asks to use | |||
<pre> | |||
# default is TRUE | |||
options(renv.config.ppm.enabled = TRUE) | |||
</pre> | |||
<li>https://rstudio.github.io/renv/reference/settings.html asks to use | |||
<pre> | |||
# default is TRUE | |||
options(renv.settings.ppm.enabled = TRUE) | |||
</pre> | |||
<li>[https://docs.posit.co/ide/user/ide/guide/environments/r/packages.html R Package Repositories] from posit. | |||
<li>[https://blog.djnavarro.net/posts/2022-01-10_setting-cran-repositories/ Setting CRAN repository options] 2022 Jan. Search for '''PPM''' (Posit Package Manager). | |||
</ul> | |||
=== Experiment === | |||
This assume the project folder has not installed any packages yet. | |||
<ul> | |||
<li>Create an R file with just one line: library(glmnet) | |||
<li>Launch a docker container | |||
<pre> | |||
docker run --rm -it -v $(pwd):/home/rstudio rocker/r-ver:4.3.3 | |||
</pre> | |||
<li>Inside the container. Install packages. Create renv.lock. '''A crucial point to remember is that packages must be installed via the Public Package Manager (PPM). If they are not, enabling PPM will not allow for their restoration, even though PPM is active. This is because only packages initially installed through PPM can be restored using the same.''' | |||
<pre> | |||
setwd("/home/rstudio") | |||
install.packages('renv', ask = F) | |||
options(renv.settings.ppm.enabled = TRUE) | |||
renv::init() # interactively. Enter 'y' to allow to create a local cache directory | |||
q() | |||
</pre> | |||
<li>Clean up before bootstraping | |||
<pre> | |||
sudo rm -rf renv | |||
sudo rm .Rprofile | |||
</pre> | |||
<li>Final testing | |||
<pre> | |||
docker run --rm -it -v $(pwd):/home/rstudio -w /home/rstudio rocker/r-ver:4.3.3 | |||
</pre> | |||
<pre> | |||
The following package(s) are missing entries in the cache: | |||
- foreach | |||
- glmnet | |||
- iterators | |||
- Rcpp | |||
- RcppEigen | |||
- shape | |||
These packages will need to be reinstalled. | |||
- Project '/home/rstudio' loaded. [renv 1.0.7] | |||
The following package(s) have broken symlinks into the cache: | |||
- foreach | |||
- glmnet | |||
- iterators | |||
- Rcpp | |||
- RcppEigen | |||
- shape | |||
Use `renv::repair()` to try and reinstall these packages. | |||
- None of the packages recorded in the lockfile are currently installed. | |||
- Would you like to restore the project library? [y/N]: y | |||
... | |||
# Installing packages ------------------------- | |||
... | |||
- Installing glmnet ... OK [installed binary and cached in 1.5s] | |||
packageVersion("glmnet") | |||
# [1] ‘4.1.8’ | |||
</pre> | |||
</ul> | |||
== Private R packages == | |||
[https://rstudio.github.io/renv/articles/cellar.html The Package Cellar] | |||
== Local R packages == | |||
Deprecated? | |||
* https://rstudio.github.io/renv/articles/local-sources.html | |||
* Since local R packages (no matter it is source or binary) are not part of '''renv.lock''', the original location of these packages are not important when we first install these packages. | |||
* When we try to restore local R packages, we can put these packages' source files into '''renv/local''' directory. | |||
<pre> | |||
# mkdir renvbiotrip | |||
setwd("renvbiotrip") | |||
renv::init() # we shall restart R according to the instruction | |||
# * Initializing project ... | |||
# * Discovering package dependencies ... Done! | |||
# * Copying packages into the cache ... Done! | |||
# The following package(s) will be updated in the lockfile: | |||
# CRAN =============================== | |||
# - renv [* -> 0.10.0] | |||
# * Lockfile written to '/tmp/renvbiotrip/renv.lock'. | |||
# * Project '/tmp/renvbiotrip' loaded. [renv 0.10.0] | |||
# * renv activated -- please restart the R session. | |||
renv::install("~/Downloads/MyPackage_0.1.1.tar.gz") | |||
# 1. The above command will take care of the dependence. Cool ! | |||
# That is, we don't need to use the remotes package. | |||
# 2. The output will show if packages are installed from | |||
# 'linked cache' or from source | |||
renv::settings$snapshot.type("all") | |||
renv::snapshot() | |||
# It will give a message some package(s) were installed from an unknown source | |||
# renv may be unable to restore these packages in the future. | |||
</pre> | |||
Since the dependence package versions change from time to time, if we compare the renv.lock file created yesterday it will likely be different from what we created today (package version and hash tag). | |||
Now we are ready to test the restoration. | |||
<ul> | |||
<li> | |||
Pass renv.lock and MyPackage_0.1.1.tar.gz to other people (different instruction if we pass the project repository?). Suppose we have copied renv.lock to renvbiotrip/ directory on a new computer. | |||
<pre> | |||
# mkdir renvbiotrip | |||
## Copy renv.lock to renvbiotrip/ | |||
# mkdir renvbiotrip/renv/local | |||
## Copy MyPackage_0.1.1.tar.gz (private packages) to renvbiotrip/renv/local | |||
install.packages("renv") | |||
renv::restore() # install the packages declared in renv.lock | |||
# The output will show if packages are installed from | |||
# 'linked cache' or from source | |||
library(MyPackage) # verify | |||
MyPackage::foo() # test | |||
</pre> | |||
</li> | |||
<li>We can test renv.lock in a Docker container from another directory to mimic the way of passing the file to other people. For example, | |||
<pre> | |||
docker run --rm -it -v $(pwd):/home/docker -w /home/docker r-base:4.0.0 | |||
</pre> | |||
</li> | |||
<li>We can create a docker image based on the renv.lock and MyPackage.tar.gz files. See the '''renvbiotrip''' repository.</li> | |||
</ul> | |||
Note that | |||
* If we issue renv::restore() instead of renv::init() on the destination machine, the packages will be installed into the global environment. | |||
* It seems '''renv::init()''' is equivalent to '''renv::activate()''' AND '''renv::restore()''' on the destination machine. | |||
== The project library is out of sync with the lockfile == | |||
We'll get this message if we start R with a version different from what is in the "renv.lock" file. See [[#install_a_package_on_an_old_version_of_R|install a package on an old version of R]]. | |||
== graph == | |||
* Search for "graph" on https://rstudio.github.io/renv/index.html | |||
* We install [https://cran.r-project.org/web/packages/igraph/index.html igraph] package first before we can use renv::graph(). It seems no extra software was needed to install igraph package. Still I got an error, | |||
<pre> | |||
> graph(root = "devtools", leaf = "rlang") | |||
Error in inherits(edges, "formula") : | |||
argument "edges" is missing, with no default | |||
</pre> | |||
== renv issues == | |||
* [https://www.r-bloggers.com/2024/05/things-that-can-go-wrong-when-using-renv/ Things that can go wrong when using renv] | |||
== Docker == | |||
* https://environments.rstudio.com/docker.html | |||
* [https://biolitika.si/pin-r-package-versions-using-docker-and-renv.html Pin package versions in your production Docker image] | |||
<ul> | |||
<li>[https://rstudio.github.io/renv/articles/docker.html Using renv with Docker]. Note that there are two ways for the Docker approach. One way is to include package installation in the Docker file which embeds the packages into the image. A second approach is to add appropriate R packages when the container is run. | |||
<ol> | |||
<li>Creating Docker Images with renv (see [https://environments.rstudio.com/docker.html here] for 3 example Registries: Rocker Project/R-Hub/RStudio). Note: '''r-base:X.X.X''' image does not include several important libraries like "curl". If we use r-base.X.X.X as the base image, we will run into errors when we call '''renv::restore()'''. Docker images from [https://bioconductor.org/help/docker/ Bioconductor] (which is based on rocker/rstudio) has included utilities. | |||
<pre> | |||
RUN R -e "install.packages('renv', repos = c(CRAN = 'https://cloud.r-project.org'))" | |||
WORKDIR /home/docker | |||
COPY renv.lock renv.lock | |||
ENV RENV_PATHS_LIBRARY renv/library | |||
RUN R -e 'renv::restore()' | |||
CMD ["R"] | |||
</pre> | |||
</li> | |||
<li>Running Docker Containers with renv. Note that repository name must be lowercase. | |||
<pre> | |||
docker build -t projectname . | |||
docker run --rm -it projectname # OR | |||
docker run --rm -it -v $(pwd):/home/docker projectname | |||
</pre> | |||
Question: how to update a package within a container? 1. start the container with root and update packages in the container 2. system("su docker") to switch to the user 'docker'. 3. when we run system("su docker"), it will exit R and go to the shell. Run "whoami" to double check the current user and type "R" to enter R again. | |||
Another simple but inferior way to test the docker method is the following: assuming <renv.lock> is saved in the ProjectDir directory and the ProjectDir directory does not have ''renv'' nor ''.Rprofile''. The big drawback of this approach is the created renv directory and <.Rprofile> belongs to the user ''root''. | |||
<pre> | |||
docker run --rm -it -v ProjectDir:/home r-base:4.0.0 | |||
install.packages("renv") | |||
setwd("/home") | |||
renv::init() | |||
</pre> | |||
</li> | |||
<li>Back up images. [[Docker#How_to_copy_Docker_images_from_one_host_to_another_without_using_a_repository|How to copy Docker images from one host to another without using a repository]] by using the '''docker save''' command. | |||
</li> | |||
</ol> | |||
</li> | |||
</ul> | |||
* [https://www.r-bloggers.com/2021/08/setting-up-a-transparent-reproducible-r-environment-with-docker-renv/ Setting up a transparent reproducible R environment with Docker + renv] | |||
* [https://www.r-bloggers.com/2023/06/a-gentle-introduction-to-docker/ A Gentle Introduction to Docker]. docker build & renv. | |||
== pracpac package == | |||
[https://cran.r-project.org/web/packages/pracpac/index.html pracpac] - Practical 'R' Packaging in 'Docker' | |||
== Github actions == | |||
[https://orchid00.github.io/actions_sandbox/testing-with-renev.html Chapter 5 Testing with a reproducible environment] | |||
= checkpoint = | = checkpoint = | ||
Line 20: | Line 1,067: | ||
= Docker & Singularity = | = Docker & Singularity = | ||
[[Docker|Docker]] | [[Docker|Docker]] | ||
= targets package = | |||
<ul> | |||
<li>[https://ropensci.org/blog/2021/02/03/targets/ targets: Democratizing Reproducible Analysis "Pipelines"] Will Landau. | |||
* It is similar to the Linux [https://linuxopsys.com/topics/make-command-in-linux make] command. | |||
* It’s designed to help with computationally demanding analysis projects. The package skips costly runtime for tasks that are already up to date. | |||
* An example. This pipeline reads in a CSV file, performs a transformation, and then generates a summary and a plot. | |||
:<syntaxhighlight lang='r'> | |||
# Load the necessary library | |||
library(targets) | |||
# Define the pipeline | |||
tar_plan( | |||
tar_target( | |||
raw_data, | |||
read.csv("data.csv") # Assume you have a CSV file named "data.csv" | |||
), | |||
tar_target( | |||
transformed_data, | |||
raw_data %>% transform() # Perform your transformation here | |||
), | |||
tar_target( | |||
summary, | |||
transformed_data %>% summary() | |||
), | |||
tar_target( | |||
plot, | |||
ggplot(transformed_data, aes(x = x, y = y)) + | |||
geom_point() + | |||
theme_minimal() | |||
) | |||
) | |||
# Run the pipeline | |||
tar_make() | |||
</syntaxhighlight> | |||
: If the data.csv file doesn’t change, and the transformation function remains the same, the targets package won’t re-run those steps. It will directly use the results from the previous run, saving computational resources. This is the power of the targets package: it intelligently determines which parts of your analysis need to be updated and which parts can be skipped. | |||
: Please replace "data.csv" and transform() with your actual data file and transformation function. Also, replace aes(x = x, y = y) with the actual variables you want to plot. | |||
<li>[https://www.brodrigues.co/blog/2023-05-08-dock_dev_env/ Why you should consider working on a dockerized development environment] | |||
<li>[https://youtu.be/zs6LtT0PavM Building reproducible analytical pipelines with R at ReproTea (2023-07-19)]. renv, targets, docker, Dockerfile (packages from posit) and alternatives (Podman, Nix). Nice talk. | |||
</ul> | |||
= rix package = | |||
* [https://github.com/b-rodrigues/rix Rix: Reproducible Environments with Nix] | |||
* [https://www.brodrigues.co/tags/nix/ Reproducible data science with Nix] by Bruno Rodrigues. | |||
* Videos: | |||
** [https://youtu.be/c1LhgeTTxaI Reproducible R development environments with Nix] 8/6/2023 | |||
** [https://youtu.be/R3t83-2aNwY Nix for R users with {rix} - running an old project with an old R and old packages] 8/25/2023 | |||
** [https://youtu.be/VXB4e11lHtw Reproducible R development on Github Actions with Nix] 11/12/2023 | |||
** [https://m.youtube.com/watch?v=eWt1oXatxw8 rix: An R package for reproducible dev environments with Nix (FOSDEM 2024)] 2/6/2024 and [https://raw.githack.com/b-rodrigues/fosdem2024_pres/targets-runs/rendered_slides/fosdem_pres.html#/title-slide Slide] | |||
== Building reproducible analytical pipelines with R == | |||
* [https://www.brodrigues.co/ Bruno Rodrigues], [https://www.brodrigues.co/about/books/ book]. | |||
* https://raps-with-r.dev/. | |||
* https://rap4mads.eu/index.html | |||
== Nix == | |||
<ul> | |||
<li>https://nixos.org/download.html | |||
<li>Current (2024/2/27) version 2.20.3. | |||
<syntaxhighlight lang='sh'> | |||
$ sh <(curl -L https://nixos.org/nix/install) --daemon | |||
$ nix-shell -p R rPackages.ggplot2 | |||
# to install a package | |||
$ nix-env -iA nixos.librewolf | |||
$ sudo nix-env -iA nixos.librewolf | |||
# to remove an installed package, | |||
$ nix-env -e [package_name] | |||
</syntaxhighlight> | |||
<li>Plots can be shown when we call a plot function in a nix interactive shell. | |||
<li>For some reason, the Bioconductor packages will need to compile when I run '''nix-build'''. | |||
<li>What is the difference of using '''nix-env''' and '''nix-shell'''? | |||
* '''nix-env''' is a global installation. It is similar to '''traditional package managers''' like '''apt, yum''', or '''brew'''. It is not ideal for reproducibility. | |||
* '''nix-shell''' is a local installation. These packages are not installed globally. The environment is temporary and isolated. A nix-shell will temporarily modify your $PATH environment variable. This can be used to try a piece of software before deciding to permanently install it. By specifying packages in a shell.nix or default.nix file, you can ensure consistent development environments across different machines or projects. | |||
<pre> | |||
$ nix-env -iA nixpkgs.rPackages.dplyr | |||
$ nix-shell -p rPackages.dplyr | |||
</pre> | |||
<li>[https://ostechnix.com/getting-started-nix-package-manager/ Getting Started With Nix Package Manager: A Beginner’s Guide 2024] | |||
<li>[https://ostechnix.com/install-openssh-nixos/ How To Install openSSH on NixOS] | |||
<li>NixOS | |||
* [https://tech.aufomm.com/my-nixos-journey-intro-and-installation/ My NixOS Journey - Intro and Installation] | |||
</li> | |||
</ul> | |||
= Dev Containers = | |||
[https://blog.revolutionanalytics.com/2022/08/dev-containers-for-r.html Easy R Tutorials with Dev Containers] | |||
= conda, mamba = | |||
[https://youtu.be/QI2Qg_1aySc How to create a conda or mamba environment for R programming to enhance reproducibility (CC230)] by Riffomonas Project | |||
= Snakemake = | |||
* [https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03774-1 Hypercluster: a flexible tool for parallelized unsupervised clustering optimization] | |||
* https://snakemake.readthedocs.io/en/stable/tutorial/setup.html#run-tutorial-for-free-in-the-cloud-via-gitpod | |||
* https://hpc.nih.gov/apps/snakemake.html | |||
* [https://academic.oup.com/bioinformatics/article/28/19/2520/290322?login=false Snakemake—a scalable bioinformatics workflow engine] (paper, 2012) | |||
* [https://youtu.be/r9PWnEmz_tc An introduction to Snakemake tutorial for beginners (CC248)] by Riffomonas Project | |||
= Papers = | |||
[https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03817-7 High-throughput analysis suggests differences in journal false discovery rate by subject area and impact factor but not open access status] | |||
= Share your code and data = | |||
* [https://www.bmj.com/content/384/bmj.q324.full Mandatory data and code sharing for research published by The BMJ] 2024 | |||
* [https://www.r-bloggers.com/2023/11/deposits-r-package-delivers-a-common-workflow-for-r-users/ deposits R Package Delivers a Common Workflow for R Users] 2023/11/30 | |||
* [https://zenodo.org/ zenodo.org] which has been used by | |||
** [https://zenodo.org/record/3926915 Demystifying "drop-outs" in single-cell UMI data] | |||
** https://zenodo.org/record/1225670 [https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1438-9 UMI-count modeling and differential expression analysis for single-cell RNA sequencing] | |||
** [https://twitter.com/seandavis12/status/1715351568860524819 Zenodo empowers sharing research output of arbitrary size and format and receives @NIH and @NIHDataScience support for data sharing as a Generalist Repository]. | |||
* [https://osf.io/ OSF] which has been used by | |||
** [https://osf.io/g4w28/ Methods for correcting inference based on outcomes predicted by machine learning] | |||
** [https://osf.io/gcjn6/ Predictive performance of logistic regression after penalization and variance decomposition - A simulation study], [https://onlinelibrary.wiley.com/doi/10.1002/bimj.202200108 the paper] 2023 | |||
* codeocean. | |||
** [https://codeocean.com/capsule/9934440/tree/v1 A multicenter study benchmarking single-cell RNA sequencing technologies using reference samples]. The R code can be downloaded by git (Capsule -> Export -> Clone via Git). The data (3.4G zip file) has to be downloaded manually. The environment panel shows what packages have to be installed (apt-get, Bioconductor, R-CRAN, R-Github). It seems "Export" is more complete than "Clone via Git". It even include a Dockerfile. | |||
** [https://github.com/dylkot/cNMF/ Consensus Non-negative Matrix factorization (cNMF) v1.2] | |||
= Misc = | = Misc = | ||
Line 59: | Line 1,230: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
* [https://cran.rstudio.com/web/packages/reproducible/index.html reproducible]: A Set of Tools that Enhance Reproducibility Beyond Package Management | * [https://cran.rstudio.com/web/packages/reproducible/index.html reproducible]: A Set of Tools that Enhance Reproducibility Beyond Package Management | ||
* [https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007881 Improving reproducibility in computational biology research] 2020 |
Revision as of 16:31, 15 May 2024
Common Workflow Language (CWL)
- https://www.commonwl.org/
- Workflow systems turn raw data into scientific knowledge. Pipeline, Snakemake, Docker, Galaxy, Python, Conda, Workflow Definition Language (WDL), Nextflow. The best is to embed the workflow in a container; see Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics by Baichoo 2018.
- Simplifying the development of portable, scalable, and reproducible workflows Piccolo 2021.
R
- CRAN Task View: Reproducible Research
- Rcwl package
- Reproducible Research: What to do When Your Results Can’t Be Reproduced. 3 danger zones.
- R session context
- R version
- Packages versions
- Using set.seed() for a reproducible randomization
- Floating point accuracy
- Operating System (OS) context
- System packages versions
- System locale
- Environment variables
- Data versioning
- R session context
- A Reproducible Data Analysis Workflow with R Markdown, Git, Make, and Docker, Slides, Talks & Video. The whole idea is written in an R package repro package. The package create an R project Template where we can use it by RStudio -> New Project -> Create Example Repro Template. Note that the Makefile and Dockerfile can be inferred from the markdown.Rmd file. Note this approach does not make use the renv package. Also it cannot handle Bioconductor packages. Four elements
- Git folder of source code for version control (R project)
- Makefile. Make is a “recipe” language that describes how files depend on each other and how to resolve these dependencies.
- Docker software environment (Containerization)
- RMarkdown (dynamic document generation)
automake() # Create '.repro/Dockerfile_packages', # '.repro/Makefile_Rmds' & 'Dockerfile' # and open <Makefile> # Modify <Makefile> by following the console output rerun() # will inspects the files of a project and suggest a way to # reproduce the project. So just follow the console output # by opening a terminal and typing make docker && make -B DOCKER=TRUE # The above will generate the output html file in your browser
In the end, it calls the following command according to the console output where 'reproproject' in this example is the Docker image name (same as my project name except it automatically converts the name to lower cases).
docker run --rm --user 368262265 \ -v "/Full_Path_To_Project":"/home/rstudio/" \ reproproject Rscript \ -e 'rmarkdown::render("/home/rstudio//markdown.Rmd", "all")'
- Advanced Reproducibility in Cancer Informatics
- Teaching reproducibility and responsible workflow (2023 JSM)
- An overview of what’s out there for reproducibility with R 2023/10/5
- Building Reproducible Analytical Pipelines with R by Dr. Bruno André Rodrigues Coelho | Tunis R User 2023/12/9.
Rmarkdown
Rmarkdown package
packrat
- CRAN & Github
- Bioconductor related issues
- Videos:
- Packrat will not only store all packages, but also all project files.
- Packrat is integrated in RStudio’s user interface. It allows you to share projects along co-workers easily. See Using Packrat with RStudio.
- limitations.
- XML package needs to install some OS library libxml2. So it is not just R package issue.
- Ubuntu goodies
- Git and packrat. The packrat/src directory can be very large. If you don't want them available in your git-repo, you simply add packrat/src/ to the .gitignore. But, this will mean that anyone accessing the git-repo will not have access to the package source code, and the files will be downloaded from CRAN, or from wherever the source line dictates within the packrat.lock file.
- A scenario that we need packrat: suppose we are developing a package in the current R-3.5.X. Our package requires the 'doRNG' package. That package depends the 'rngtools' package. A few months later a new R (3.6.0) was released and a new release (1.3.1.1) of 'rngtools' also requires R-3.6.0. So if we want to install 'doRNG' in R-3.5.x, it will fail with an error: dependency 'rngtools' is not available for package 'doRNG' .
Create a snapshot
- Do we really need to call packrat::snapshot()? The walk through page says it is not needed but the lock file is not updated from my testing.
- I got an error when it is trying to fetch the source code from bioconductor and local repositories: packrat is trying to fetch the source from CRAN in these two packages.
- On normal case, the packrat/packrat.lock file contains two entries in 'Repos' field (line 4).
- The cause of the error is I ran snapshot() after I quitted R and entered again. So the solution is to add bioc and local repositories to options(repos).
- So what is important of running snapshot()?
- Check out the forum.
> dir.create("~/projects/babynames", recu=T) > packrat::init("~/projects/babynames") Initializing packrat project in directory: - "~/projects/babynames" Adding these packages to packrat: _ packrat 0.4.9-3 Fetching sources for packrat (0.4.9-3) ... OK (CRAN current) Snapshot written to '/home/brb/projects/babynames/packrat/packrat.lock' Installing packrat (0.4.9-3) ... OK (built source) Initialization complete! Unloading packages in user library: - packrat Packrat mode on. Using library in directory: - "~/projects/babynames/packrat/lib" > install.packages("reshape2") > packrat::snapshot() > system("tree -L 2 ~/projects/babynames/packrat/") /home/brb/projects/babynames/packrat/ ├── init.R ├── lib │ └── x86_64-pc-linux-gnu ├── lib-ext │ └── x86_64-pc-linux-gnu ├── lib-R # base packages │ └── x86_64-pc-linux-gnu ├── packrat.lock ├── packrat.opts └── src ├── bitops ├── glue ├── magrittr ├── packrat ├── plyr ├── Rcpp ├── reshape2 ├── stringi └── stringr
Restoring snapshots
Suppose a packrat project was created on Ubuntu 16.04 and we now want to repeat the analysis on Ubuntu 18.04. We first copy the whole project directory ('babynames') to Ubuntu 18.04. Then we should delete the library subdirectory ('packrat/lib') which contains binary files (*.so) that do not work on the new OS. After we delete the library subdirectory, start R from the project directory. Now if we run packrat::restore() command, it will re-install all missing libraries. Bingo! NOTE: Maybe I should use packrat::bundle() instead of manually copy the whole project folder.
Note: some OS level libraries (e.g. libXXX-dev) need to be installed manually beforehand in order for the magic to work.
$ rm -rf ~/projects/babynames/packrat/lib $ cd ~/projects/babynames/ $ R > > packrat::status() > remove.packages("plyr") > packrat::status() > packrat::restore()
Workflow
setwd("ProjectDir") packrat::init() packrat::on() # packrat::search_path() install.packages() # For personal packages stored locally packrat::set_opts(local.repos = "~/git/R") packrat::install_local("digest") # dir name of the package library(YourPackageName) # double check all dependent ones have been installed packrat::snapshot() packrat::bundle()
A bundle file (*.tar.gz) will be created under ProjectDir/packrat/src directory. Note this tar.gz file includes the whole project folder.
To unbundle the project in a new R environment/directory:
setwd("NewDirectory") # optional packrat::unbundle(FullPathofBundleTarBall, ".") # this will create 'ProjectDir' # CPU is more important than disk speed # At the end, it will show the project has been unbundled and restored at ... setwd("ProjectDir") packrat::packrat_mode() # on .libPaths() # verify library() # Expect to see packages in our bundle # packrat::on()
Example 1: The above method works for packages from Bioconductor; e.g. S4Vectors which depends on BiocGenerics & BiocVersion only. However, Bioconductor project des not have a snapshot repository like MRAN. So it is difficult to reproduce the environment for an earlier release of Bioconductor.
Example 2: bundle our in-house R package for future reproducibility.
Set Up a Custom CRAN-like Repository
See https://rstudio.github.io/packrat/custom-repos.html. Note the personal repository name ('sushi' in this example) used in "Repository" field of the personal package will be used in <packrat/packrat.lock> file. So as long as we work on the same computer, it is easy to restore a packrat project containing packages coming from personal repository.
- packrat::init()
- packrat::snapshot(), packrat::restore()
- packrat::clean()
- packrat::status()
- packrat::install_local() # http://rstudio.github.io/packrat/limitations.html
- packrat::bundle() # see @28:44 of the video, packrat::unbundle() # see @29:17 of the same video. This will rebuild all packages
- packrat::on(), packrat::off()
- packrat::get_opts()
- packrat::set_opts() # http://rstudio.github.io/packrat/limitations.html
- packrat::opts$local.repos("~/local-cran")
- packrat::opts$external.packages(c("devtools")) # break the isolation
- packrat::extlib()
- packrat::with_extlib()
- packrat::project_dir(), .libPaths()
Warning
- If we download and modify some function definition from a package in CRAN without changing DESCRIPTION file or the package name, the snapshot created using packrat::snapshot() will contain the package source from CRAN instead of local repository. This is because (I guess) the DESCRIPTION file contains a field 'Repository' with the value 'CRAN'.
Docker
- This is a minimal example that installs a single package each from CRAN, bioconductor, and github to a Docker image using packrat.
- All operations are done in the container. So the host OS does not need to have R installed.
- The R script will install packrat in the container. It will also initialize packrat in the working directory and install R packages there. But in the packrat::snapshot() it chooses snapshot.sources = FALSE. The goal is to generate packrat.lock file.
- The first part of generating packrat.lock is not quite right since the file was generated in the container only. We should use -v in the docker run command. The github repository at https://github.com/joelnitta/docker-packrat-example has fixed the problem.
$ git clone https://github.com/joelnitta/docker-packrat-example.git $ cd docker-packrat-example # Step 1: create the 'packrat.lock' file $ nano install_packages.R # note: nano is not available in the rstudio container # need to install additional OS level packages like libcurl # in rocker/rstudio. Probably rocker/tidyverse is better than rstudio # $ docker run -it -e DISABLE_AUTH=true -v $(pwd):/home/rstudio/project rocker/tidyverse:3.6.0 bash # Inside the container now $ cd home/rstudio/project $ time Rscript install_packages.R # generate 'packrat/packrat.lock' $ exit # It took 43 minutes. # Question: is there an easier way to generate packrat.lock without # wasting time to install lots of packages? # Step 2: build the image # Open another terminal/tab $ nano Dockerfile # change rocker image and R version. Make sure these two are the same as # we have used when we created the 'packrat.lock' file $ time docker build . -t mycontainer # It took 45 minutes. $ docker run -it mycontainer R # Step 3: check the packages defined in 'install_packages.R' are installed packageVersion("minimal") packageVersion("biospear")
Questions:
- After running the statement packrat::init(), it will leave a footprint of a hidden file .Rprofile in the current directory. PS: The purpose of .Rprofile file is to direct R to use the private package library (when it is started from the project directory).
#### -- Packrat Autoloader (version 0.5.0) -- #### source("packrat/init.R") #### -- End Packrat Autoloader -- ####
- If the 'packrat' directory was accidentally deleted, next time when you launch R it will show an error message because it cannot find the file.
- The ownership of the 'packrat' directory will be root now. See this Package Management for Reproducible R Code.
- This sophisticated approach does not save the package source code. If a package has been updated and the version we used has been moved to archive in CRAN, what will happen when we try to restore it? So it is probably better to use snapshot.sources = TRUE and run packrat::bundle().
renv: successor to the packrat package
- https://rstudio.github.io/renv/index.html
- release 2019-11-6
- Introduction to renv 2021-01-09
- R renv: How to Manage Dependencies in R Projects Easily 2023-03-22
- The renv::migrate() function makes it possible to migrate projects from Packrat to renv.
- Why Package & Environment Management is Critical for Serious Data Science and a workflow.
- Deploying an R Shiny app on Heroku free tier
- Bioconductor related questions
- Installing packages on a PBS-Pro HPC cluster using renv
- Dependency Management
Compare to packrat:
- Many packages are difficult to build from sources. Your system will need to have a compatible compiler toolchain available. In some cases, R packages may depend on C/C++ features that aren't available in an older system toolchain, especially in some older Linux enterprise environments.
- renv no longer attempts to explicitly download and track R package source tarballs within your project. For packages from local sources, refer this article.
- renv has its discovery machinery to analyze your R code to determine which R packages will be included in the lock file. We can however instead prefer to capture all packages installed into your project library by using renv::settings$snapshot.type("all")
renv package does not have bundle() nor unbundle() function.
# mkdir renvdeseq2 setwd("renvdeseq2") renv::init(bioconductor = TRUE) # attempts to copy and reuse packages # already installed in your R libraries # We'll be asked to restart the R session if we # are not doing this in RStudio. renv::install("BiocManager") # method 1: this will only install packages under the curDir/renv/... folder BiocManager::install("DESeq2") # method 2: this will install packages in ~/.cache/R/renv/renv/... folder # therefore, the library can be reused by other needs. options(repos = BiocManager::repositories()) renv::install("DESeq2") renv::snapshot() # create renv.lock # it seems the lock file "renvdeseq2/renv.lock" does not # save any package info I just installed from Bioconductor # except the renv package. # Read https://rstudio.github.io/renv/articles/faq.html
Find R package dependencies in a project
renv::dependencies()
The following line will make snapshot() to write all packages in renv .cache directory (e.g., ~/.cache/R/renv/cache/v5/R-4.2/x86_64-pc-linux-gnu/) to renv.lock file. Note that the setting is persistent even we restart R!
renv::settings$snapshot.type("all") # default is "implicit" renv::snapshot()
Pass renv.lock to other people and/or clone the project repository
# Make sure the 'renv' package has been installed on the remote computer install.packages("renv") renv::init() # install the packages declared in renv.lock
Use renv::migrate() to port a Packrat project to renv.
renv::install()
Using renv to track the version of your packages in R (CC229). After renv::init(), it will identify some packages we don't have or have older versions... In the end we are informed some packages are not installed. Consider reinstalling these packages before snapshotting the lockfile. So go ahead and run renv::snapshot().
Other sources
For example, for DeMixT from github,
renv::init() renv::install("wwylab/DeMixT") # Error: package 'SummarizedExperiment' is not available renv::install("bioc::SummarizedExperiment") renv::install("wwylab/DeMixT") renv::snapshot()
install.packages()
It seems install.packages() also install the packages in the project directory. So it's not clear what's the difference of install.packages() and renv::install() for simple case. But renv::install() is more flexible than install.packages().
Note that the installed packages won't go into the lock file unless the project is using it. For example, we can create a simple R file that calls "library(PACKAGENAME)" and in the R console we can run "source(MySimple.R)". Now when we run renv::snapshot(), the PACKAGENAME will be recorded.
If I open a project that loaded an renv environment, then calling "install.packages()" will install new packages into the renv's cache folder (e.g., ~/.cache/R/renv/cache/v5/R-4.2/x86_64-pc-linux-gnu/ in Linux). Note that the version number will be recorded too (e.g., ~/.cache/R/renv/cache/v5/R-4.2/x86_64-pc-linux-gnu/pkgndep/1.2.1 ).
Reference
See Reference.
Bioconductor
Create an Rmd file and include an R chunk "library(DESeq2)". Then run the following line
renv::init(bioconductor = TRUE)
and it will generate "renv.lock", ".Rprofile" files and "renv" directory.
PS.
- When we install a fresh R in Ubuntu, we should run "sudo apt install r-base-dev curl libcurl4-openssl-dev libssl-dev libxml2-dev " system packages before we can successfully run "BiocManager::install('DESeq2')".
- It is perfectly fine to run renv::init(bioconductor = TRUE) even if you have previously run renv::init() without the bioconductor argument. The bioconductor argument simply ensures that Bioconductor repositories are activated within your renv project.
renv::dependencies()
?dependencies. Find R packages used within a project. dependencies() will crawl files within your project, looking for R files and the packages used within those R files.
df <- renv::dependencies("Some_Dir")
It also search Rmd files from my testing.
renv::record()
You can use the record() function from the renv package to record a new entry within an existing renv.lock file.
renv::record("[email protected]")
However, the package is still not installed in the local directory. In other words, renv::record() seems to be an opposite function to renv::install() where renv::install() will install a package in the local directory even the package was not used anywhere.
renv::load()
?renv::load. It is especially useful in Windows OS.
Note that it does not change the working directory though.
renv::restore()
- For renv-based project, we just need to share a text file renv.lock to our colleague. But for packrat-based project, we need to run bundle() command and pass a tar.gz file to our colleague.
- See the output message on here. This is based on renv 0.16.0 (2022-09-29).
- renv::restore() can be slow since it needs to compile packages from source. The "make" utility is required for it to work!
- My tips
- renv::restore() will use source to restore. This can take a long time.
- Use P3M from Posit. Click "Setup" in P3M and follow the instruction there for your OS. For example, on Windows, I can run
options(repos = c(CRAN = "https://packagemanager.posit.co/cran/latest"))
- Even we try to use P3M for package installation, a few packages still need to be install from source. On Windows OS, we need to install Rtools. After installing Rtools by accepting the defaults, no further setup is needed. R will be able to recognize all new binaries. See Windows -> Rtools.
- After successfully calling renv::restore(), we need to restart R. If we use R console instead of RStudio, we can use renv::load(). This is useful for the case of Windows OS + R console.
renv::update()
https://rstudio.github.io/renv/reference/update.html
renv::update() # including Bioconductor, Github, Gitlab, Git, Bitbucket, ... renv::update(packages = c("dplyr", "ggplot2", "tidyr")) # update specific CRAN renv::install("bioc::Biobase") # install/update specific Bioconductor package renv::update(packages = "mygithubpackage")
A case with issues using renv::snapshot() & renv::restore()
- BiocGenerics in Bioconductor 3.17 is now 0.46.0 but I have 0.45.3. Also the current Bioconductor 3.18 shows BiocGenerics version 0.48.1.
... * Project '~/Project' loaded. [renv 0.17.3] * The project is currently out-of-sync. * Use `renv::status()` for more details. > renv::snapshot() The following Bioconductor packages appear to be from a separate Bioconductor release: BiocGenerics [installed 0.45.3 != latest 0.46.0] renv may be unable to restore these packages. Bioconductor version: 3.17 The following package(s) have unsatisfied dependencies: MatrixModels requires Matrix (>= 1.6-0), but version 1.5-4 is installed Consider updating the required dependencies as appropriate. Do you want to proceed? [y/N]: N > packageVersion("BiocGenerics") [1] ‘0.45.3’ > packageVersion("Matrix") [1] ‘1.5.4’ > packageVersion("MatrixModels") [1] ‘0.5.3’ > packageVersion("renv") [1] ‘0.17.3’
Q: MatrixModel was not recorded in renv.lock. Why renv::snapshot() shows unsatisfied dependencies for the 'MatrixModels' package. Open a terminal and list the files in directory "./renv/library/R-4.3/aarch64-apple-darwin20" by dates. Decide to delete the package. In the end, I run remove.packages("MatrixModels") and BiocManager::install("BiocGenerics") to update the package to the latest version in Bioconductor 3.17 (old) release.
- (Cont.) When I run renv::restore() on another machine, I got an error related to BiocGenerics.
> renv::restore() It looks like you've called renv::restore() in a project that hasn't been activated yet. How would you like to proceed? 1: Activate the project and use the project library. 2: Do not activate the project and use the current library paths. 3: Cancel and resolve the situation another way. Selection: 1 - renv activated -- please restart the R session. The following package(s) will be updated: # Bioconductor --------------------------------------------------------------- - BiocGenerics [0.46.0 -> 0.45.3] - IRanges [2.34.1 -> 2.34.0] - S4Vectors [0.38.2 -> 0.38.1] # CRAN ----------------------------------------------------------------------- - BiocManager [1.30.22 -> 1.30.20] ... - Downloading S4Vectors from Bioconductor ... OK [819.2 Kb in 0.63s] - Downloading BiocGenerics from Bioconductor ... ERROR [error code 22] - Downloading BiocGenerics from Bioconductor ... ERROR [error code 22] - Downloading S4Vectors from Bioconductor ... ERROR [error code 22] Warning: failed to find source for 'S4Vectors 0.38.1' in package repositories Warning: failed to find source for 'BiocGenerics 0.45.3' in package repositories Warning: error downloading 'https://bioconductor.org/packages/3.17/bioc/src/contrib/Archive/BiocGenerics/BiocGenerics_0.45.3.tar.gz' [error code 22] Warning: error downloading 'https://cran.rstudio.com/src/contrib/Archive/BiocGenerics/BiocGenerics_0.45.3.tar.gz' [error code 22] Warning: error downloading 'https://cran.rstudio.com/src/contrib/Archive/S4Vectors/S4Vectors_0.38.1.tar.gz' [error code 22] Error: failed to retrieve package '[email protected]' Traceback (most recent calls last): 9: renv::restore() 8: renv_restore_run_actions(project, diff, current, lockfile, rebuild) 7: retrieve(packages) 6: handler(package, renv_retrieve_impl(package)) 5: renv_retrieve_impl(package) 4: renv_retrieve_bioconductor(record) 3: renv_retrieve_repos(record) 2: stopf("failed to retrieve package '%s'", renv_record_format_remote(record)) 1: stop(sprintf(fmt, ...), call. = call.)
- I go back to the original project. Run 'BiocManager::install("BiocGenerics")' and remove.packages("MatrixModels")
> renv::snapshot() The following package(s) will be updated in the lockfile: # Bioconductor ======================= - BiocGenerics [0.45.3 -> 0.46.0] # CRAN =============================== - Matrix [1.5-4 -> 1.6-5] ... The version of R recorded in the lockfile will be updated: - R [4.3.1 -> 4.3.2] Do you want to proceed? [y/N]: y
Now I copy renv.lock to another machine/place. Call renv::restore() to test again.
- (Cont.) renv::restore() did show errors in the processing, but failed to give a warning at the end.
> renv::restore() It looks like you've called renv::restore() in a project that hasn't been activated yet. How would you like to proceed? 1: Activate the project and use the project library. 2: Do not activate the project and use the current library paths. 3: Cancel and resolve the situation another way. Selection: 1 - renv activated -- please restart the R session. The following package(s) will be updated: ... Do you want to proceed? [Y/n]: # Downloading packages ------------------------------------------------------- - Downloading vctrs from CRAN ... OK [file is up to date] - Downloading tinytex from CRAN ... OK [file is up to date] ... - Downloading S4Vectors from Bioconductor ... OK [file is up to date] - Downloading mgcv from CRAN ... ERROR [error code 22] - Downloading mgcv from CRAN ... OK [file is up to date] - Downloading nlme from CRAN ... ERROR [error code 22] - Downloading nlme from CRAN ... OK [file is up to date] ... Successfully downloaded 60 packages in 460 seconds. # Installing packages -------------------------------------------------------- - Installing clue ... OK [copied from cache] - Installing lattice ... OK [copied from cache] ... The following loaded package(s) have been updated: - BiocManager - renv <------------ Something is wrong. Just 2 packages got installed. Restart your R session to use the new versions. > q() Save workspace image? [y/n/c]: n $ R - Project '~/Project' loaded. [renv 1.0.4] - One or more packages recorded in the lockfile are not installed. - Use `renv::status()` for more details. Warning message: renv 1.0.4 was loaded from project library, but this project is configured to use renv ${VERSION}. Use `renv::record("[email protected]")` to record renv 1.0.4 in the lockfile. Use `renv::restore(packages = "renv")` to install renv ${VERSION} into the project library. > packageVersion("renv") [1] ‘1.0.4’ > library() <-------------- Just show 2 packages in the renv directory.
- (Cont.) I repeat the step of calling renv::restore() again. Now library() shows a complete list.
installed.packages(lib="./renv/library/R-4.3/x86_64-pc-linux-gnu") |> dim() [1] 227 16
Testing loading packages on the new machine and everything looks well.
- It seems to be OK the renv versions are different on the old (0.17.3) and new systems (1.0.3). But a problem with using the old renv is BiocVersion recorded in lockfile but not used in this project. So I decided to upgrade the renv package. After upgrading the version, the warning is gone.
A case with only one CRAN package and the first time use
- I put glmnet in an R file.
- renv::init() returned a warning message.
> install.packages("renv") # 1.0.4 in R 4.3.2 > renv::init() renv: Project Environments for R Welcome to renv! It looks like this is your first time using renv. This is a one-time message, briefly describing some of renv's functionality. renv will write to files within the active project folder, including: - A folder 'renv' in the project directory, and - A lockfile called 'renv.lock' in the project directory. In particular, projects using renv will normally use a private, per-project R library, in which new packages will be installed. This project library is isolated from other R libraries on your system. In addition, renv will update files within your project directory, including: - .gitignore - .Rbuildignore - .Rprofile Finally, renv maintains a local cache of data on the filesystem, located at: - "~/.cache/R/renv" This path can be customized: please see the documentation in `?renv::paths`. Please read the introduction vignette with `vignette("renv")` for more information. You can browse the package documentation online at https://rstudio.github.io/renv/. Do you want to proceed? [y/N]: y - "~/.cache/R/renv" has been created. - Resolving missing dependencies ... # Downloading packages ------------------------------------------------------- - Downloading glmnet from CRAN ... OK [2.3 Mb in 0.17s] - Downloading foreach from CRAN ... OK [87.7 Kb] - Downloading iterators from CRAN ... OK [293.2 Kb in 0.11s] - Downloading shape from CRAN ... OK [631.3 Kb in 0.15s] - Downloading Rcpp from CRAN ... OK [3.3 Mb in 0.21s] - Downloading RcppEigen from CRAN ... OK [1.4 Mb in 0.15s] Successfully downloaded 6 packages in 2.6 seconds. # Installing packages -------------------------------------------------------- - Installing iterators ... OK [built from source and cached in 1.3s] - Installing foreach ... OK [built from source and cached in 1.4s] - Installing shape ... OK [built from source and cached in 1.5s] - Installing Rcpp ... OK [built from source and cached in 30s] - Installing RcppEigen ... OK [built from source and cached in 41s] - Installing glmnet ... OK [built from source and cached in 1.2m] The following required packages are not installed: - codetools [required by foreach] - Matrix [required by glmnet] - survival [required by glmnet] Consider reinstalling these packages before snapshotting the lockfile. The following package(s) will be updated in the lockfile: # CRAN ----------------------------------------------------------------------- - foreach [* -> 1.5.2] - glmnet [* -> 4.1-8] - iterators [* -> 1.0.14] - Rcpp [* -> 1.0.12] - RcppEigen [* -> 0.3.3.9.4] - renv [* -> 1.0.4] - shape [* -> 1.4.6.1] The version of R recorded in the lockfile will be updated: - R [* -> 4.3.2] - Lockfile written to "/tmp/test/renv.lock". - renv activated -- please restart the R session. > q()
I copy renv.lock to renv-old.lock for comparison purpose later. Note that 'R' repository is "https://cloud.r-project.org".
- Quit R. Get a warning message about inconsistent state. The document Report inconsistencies between lockfile, library, and dependencies -> Lockfile vs dependencies() instructs to run renv::snapshot() to fix the problem. In this case, glmnet depends on Matrix, survival,... which are part of built-in/recommended R packages.
- Project '/tmp/test' loaded. [renv 1.0.4] - The project is out-of-sync -- use `renv::status()` for details. > renv::status() The following package(s) are in an inconsistent state: package installed recorded used codetools y n y lattice y n y Matrix y n y survival y n y See ?renv::status() for advice on resolving these issues. > packageVersion("renv") [1] ‘1.0.4’ > renv::snapshot() The following package(s) will be updated in the lockfile: # CRAN ----------------------------------------------------------------------- - codetools [* -> 0.2-19] - lattice [* -> 0.22-5] - Matrix [* -> 1.6-1.1] - survival [* -> 3.5-7] Do you want to proceed? [Y/n]: - Lockfile written to "/tmp/test/renv.lock". > q()
Close and open R again. No complain.
- Project '/tmp/test' loaded. [renv 1.0.4] > renv::status() No issues found -- the project is in a consistent state.
- Compare the renv-old.lock and current renv.lock files. Matrix, codetools, lattice and survival packages are added.
A case from 'Survive with Omics'
https://ocbe-uio.github.io/survomics/survomics.html
- Create a file ~/renv/survomics/test.R containing all lines of library() statement
- R -
install.packages("renv") renv::init(bioconductor = TRUE) q()
- R -
renv::status() renv::install("psbcGroup") # fatal error: gsl/gsl_matrix.h: No such file or directory # Search 'gsl' in https://packagemanager.posit.co/client/#/repos/cran/setup system("sudo apt-get install -y libgsl0-dev") renv::install("psbcGroup") renv::install("nyiuab/BhGLM") q()
- R -
renv::status() renv::snapshot() q()
- R - NO MORE MESSAGES
- I added "httpgd" package in "test.R". R -
renv::status() renv::install("httpgd") renv::snapshot() q()
Github examples
rig system make-orthogonal
The command rig system make-orthogonal is used to make installed versions of R orthogonal. This means that it ensures that different versions of R installed on the same system do not interfere with each other
$ cd ~/Project1 # This does not matter as RStudio does not care about this $ rig rstudio 4.3-arm64 # Good [INFO] Running open -n -a RStudio --env RSTUDIO_WHICH_R=/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/R $ rig rstudio 4.2-arm64 # Error [ERROR] R 4.2-arm64 is not orthogonal, it cannot run as a non-default. Run `rig system make-orthogonal`. $ rig system make-orthogonal # Fix the error [INFO] Running `sudo` for updating the R installations. This might need your password. Password: [INFO] Making all R versions orthogonal $ rig rstudio 4.2-arm64 # No more error even RStudio still opens the last project # no based on the current working directory [INFO] Running open -n -a RStudio --env RSTUDIO_WHICH_R=/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/R
Summary so far
I have two ways for associate an R project with an R version (at least on my mac). Install rig and use rig add to install multiple versions of R.
- Use "renv" to create an renv environment project. At the end, I run mv .Rprofile Rprofile. This will prevent loading renv environment automatically (so the current default R version does not matter) and I have a backup of the current renv environment. If I need, I can still rename Rprofile back to .Rprofile and launch R/RStudio.
- Use "renv" to create an renv environment project. Use rig rstudio 4.2-arm64 to launch RStudio and manually change the project to the desired project (from the last open project).
To use with RStudio IDE, see
- How to launch a specific version of R from a specific directory from the rig page. It works well when the project directory is an renv directory.
- My current solution; see Install R (not specifically related to renv).
open -n -a RStudio ~/proj/proj.Rproj \ --env RSTUDIO_WHICH_R=/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/R
Videos
- Kevin Ushey | renv: Project Environments for R | RStudio (2020)
- E. David Aja | You should be using renv | RStudio (2022)
- {renv} For Reproducible Analyses
Tips
- renv::init() will check any syntax errors
> renv::init() WARNING: One or more problems were discovered while enumerating dependencies. /tmp/Project1/RRR.R
ERROR 1: /tmp/Project1/RRR.R:2:1: unexpected '>' 1: # This is a test for renv 2: > ^ Please see `?renv::dependencies` for more information. Do you want to proceed? [y/N]: yAt the end, it will not record any packages from the R file in the renv.lock file. When we start R next time, we will see error/warning messages again
* Project '/tmp/Project1' loaded. [renv 0.17.0] WARNING: One or more problems were discovered while enumerating dependencies. /tmp/Project1/RRR.R ------------------- ERROR 1: /tmp/Project1/RRR.R:2:1: unexpected '>' 1: # This is a test for renv 2: > ^ Please see `?renv::dependencies` for more information. Error: snapshot aborted Traceback (most recent calls last): 43: source("renv/activate.R") 42: withVisible(eval(ei, envir)) ... 1: stop(condition) [Previously saved workspace restored]
- Even for a very simple R file/case, I find "rm -rf renv" will fail if I decide to "clean" the directory.
rm: cannot remove 'renv/sandbox/R-4.3/x86_64-pc-linux-gnu/9a444a72/compiler': Permission denied ...
- For code chunks that you’d explicitly like renv to ignore, you can include renv.ignore=TRUE in the chunk header
- Ignoring Files: .gitignore and .renvignore
- Errors: Use something like renv::settings$snapshot.type("explicit") Check out the github issues page
Hash
renv - manually overwrite package version in lock file. The hash is used for caching; it allows renv::restore() to restore a package from the global renv cache if available, thereby avoiding a retrieve + build + install of the package.
If it is not set, then renv will not use the cache and instead always try to retrieve the package from the declared source.
Cache and path customization
- ?renv::paths. The path can be customized.
- Chapter 13 Rocker in The Open Science Manual. Make Your Scientific Research Accessible and Reproducible.
- A guide to getting {renv} projects into Docker images
On Linux all R packages under "renv/library/R-4.3/x86_64-pc-linux-gnu/" folder are just soft links to folders in the renv cache directory. So the project specific renv directory does not take much space.
On my macOS, the cache directory is
> renv::paths$cache() [1] "/Users/USERNAME/Library/Caches/org.R-project.R/R/renv/cache/v5/R-4.3/aarch64-apple-darwin20"
On my Linux system, the cache directory is
> renv::paths$cache() [1] "/home/USERNAME/.cache/R/renv/cache/v5/R-4.3/x86_64-pc-linux-gnu"
On Windows, the cache directory is (replace $USER with the username)
C:/Users/$USER/AppData/Local/R/cache/R/renv
isolate()
- How can I copy and entire renv based project to a new PC (which does not have internet access)?
- ?isolate - Copy packages from the renv cache directly into the project library, so that the project can continue to function independently of the renv cache. Remember: normally the R packages under renv/ directory are soft link to renv cache directory. If we use isolate(), the R packages will be "copied" instead of "linked" to the project/renv folder.
- If you want to undo the isolation and revert back to using the renv cache, you can delete the packages in your project library and then call renv::restore(). This will reinstall the packages from the renv cache and create symlinks in your project library.
Set the default repository: PPM
- According to the NEWS, renv 1.0.0 now uses Posit Public Package Manager by default, for new projects where the repositories have not already been configured externally.
- The following works on Ubuntu 24.04 & R 4.4.0.
Method 1: See R packages -> Posit Package Manager/RStudio Package Manager/PPM.
Method 2:renv::install("Rcpp") # install.packages() also works renv::install("glmnet", repos = "https://packagemanager.posit.co/cran/latest") # require gfortran, so install.packages() failed renv::install("RcppArmadillo", repos = "https://packagemanager.posit.co/cran/latest") # install.packages() compilation failed renv::install("RcppEigen", repos = "https://packagemanager.posit.co/cran/latest") # install.packages() compilation failed
Question: why renv::restore() will download source code from CRAN instead of binary?
- https://rstudio.github.io/renv/reference/config.html asks to use
# default is TRUE options(renv.config.ppm.enabled = TRUE)
- https://rstudio.github.io/renv/reference/settings.html asks to use
# default is TRUE options(renv.settings.ppm.enabled = TRUE)
- R Package Repositories from posit.
- Setting CRAN repository options 2022 Jan. Search for PPM (Posit Package Manager).
Experiment
This assume the project folder has not installed any packages yet.
- Create an R file with just one line: library(glmnet)
- Launch a docker container
docker run --rm -it -v $(pwd):/home/rstudio rocker/r-ver:4.3.3
- Inside the container. Install packages. Create renv.lock. A crucial point to remember is that packages must be installed via the Public Package Manager (PPM). If they are not, enabling PPM will not allow for their restoration, even though PPM is active. This is because only packages initially installed through PPM can be restored using the same.
setwd("/home/rstudio") install.packages('renv', ask = F) options(renv.settings.ppm.enabled = TRUE) renv::init() # interactively. Enter 'y' to allow to create a local cache directory q()
- Clean up before bootstraping
sudo rm -rf renv sudo rm .Rprofile
- Final testing
docker run --rm -it -v $(pwd):/home/rstudio -w /home/rstudio rocker/r-ver:4.3.3
The following package(s) are missing entries in the cache: - foreach - glmnet - iterators - Rcpp - RcppEigen - shape These packages will need to be reinstalled. - Project '/home/rstudio' loaded. [renv 1.0.7] The following package(s) have broken symlinks into the cache: - foreach - glmnet - iterators - Rcpp - RcppEigen - shape Use `renv::repair()` to try and reinstall these packages. - None of the packages recorded in the lockfile are currently installed. - Would you like to restore the project library? [y/N]: y ... # Installing packages ------------------------- ... - Installing glmnet ... OK [installed binary and cached in 1.5s] packageVersion("glmnet") # [1] ‘4.1.8’
Private R packages
Local R packages
Deprecated?
- https://rstudio.github.io/renv/articles/local-sources.html
- Since local R packages (no matter it is source or binary) are not part of renv.lock, the original location of these packages are not important when we first install these packages.
- When we try to restore local R packages, we can put these packages' source files into renv/local directory.
# mkdir renvbiotrip setwd("renvbiotrip") renv::init() # we shall restart R according to the instruction # * Initializing project ... # * Discovering package dependencies ... Done! # * Copying packages into the cache ... Done! # The following package(s) will be updated in the lockfile: # CRAN =============================== # - renv [* -> 0.10.0] # * Lockfile written to '/tmp/renvbiotrip/renv.lock'. # * Project '/tmp/renvbiotrip' loaded. [renv 0.10.0] # * renv activated -- please restart the R session. renv::install("~/Downloads/MyPackage_0.1.1.tar.gz") # 1. The above command will take care of the dependence. Cool ! # That is, we don't need to use the remotes package. # 2. The output will show if packages are installed from # 'linked cache' or from source renv::settings$snapshot.type("all") renv::snapshot() # It will give a message some package(s) were installed from an unknown source # renv may be unable to restore these packages in the future.
Since the dependence package versions change from time to time, if we compare the renv.lock file created yesterday it will likely be different from what we created today (package version and hash tag).
Now we are ready to test the restoration.
-
Pass renv.lock and MyPackage_0.1.1.tar.gz to other people (different instruction if we pass the project repository?). Suppose we have copied renv.lock to renvbiotrip/ directory on a new computer.
# mkdir renvbiotrip ## Copy renv.lock to renvbiotrip/ # mkdir renvbiotrip/renv/local ## Copy MyPackage_0.1.1.tar.gz (private packages) to renvbiotrip/renv/local install.packages("renv") renv::restore() # install the packages declared in renv.lock # The output will show if packages are installed from # 'linked cache' or from source library(MyPackage) # verify MyPackage::foo() # test
- We can test renv.lock in a Docker container from another directory to mimic the way of passing the file to other people. For example,
docker run --rm -it -v $(pwd):/home/docker -w /home/docker r-base:4.0.0
- We can create a docker image based on the renv.lock and MyPackage.tar.gz files. See the renvbiotrip repository.
Note that
- If we issue renv::restore() instead of renv::init() on the destination machine, the packages will be installed into the global environment.
- It seems renv::init() is equivalent to renv::activate() AND renv::restore() on the destination machine.
The project library is out of sync with the lockfile
We'll get this message if we start R with a version different from what is in the "renv.lock" file. See install a package on an old version of R.
graph
- Search for "graph" on https://rstudio.github.io/renv/index.html
- We install igraph package first before we can use renv::graph(). It seems no extra software was needed to install igraph package. Still I got an error,
> graph(root = "devtools", leaf = "rlang") Error in inherits(edges, "formula") : argument "edges" is missing, with no default
renv issues
Docker
- Using renv with Docker. Note that there are two ways for the Docker approach. One way is to include package installation in the Docker file which embeds the packages into the image. A second approach is to add appropriate R packages when the container is run.
- Creating Docker Images with renv (see here for 3 example Registries: Rocker Project/R-Hub/RStudio). Note: r-base:X.X.X image does not include several important libraries like "curl". If we use r-base.X.X.X as the base image, we will run into errors when we call renv::restore(). Docker images from Bioconductor (which is based on rocker/rstudio) has included utilities.
RUN R -e "install.packages('renv', repos = c(CRAN = 'https://cloud.r-project.org'))" WORKDIR /home/docker COPY renv.lock renv.lock ENV RENV_PATHS_LIBRARY renv/library RUN R -e 'renv::restore()' CMD ["R"]
- Running Docker Containers with renv. Note that repository name must be lowercase.
docker build -t projectname . docker run --rm -it projectname # OR docker run --rm -it -v $(pwd):/home/docker projectname
Question: how to update a package within a container? 1. start the container with root and update packages in the container 2. system("su docker") to switch to the user 'docker'. 3. when we run system("su docker"), it will exit R and go to the shell. Run "whoami" to double check the current user and type "R" to enter R again.
Another simple but inferior way to test the docker method is the following: assuming <renv.lock> is saved in the ProjectDir directory and the ProjectDir directory does not have renv nor .Rprofile. The big drawback of this approach is the created renv directory and <.Rprofile> belongs to the user root.
docker run --rm -it -v ProjectDir:/home r-base:4.0.0 install.packages("renv") setwd("/home") renv::init()
- Back up images. How to copy Docker images from one host to another without using a repository by using the docker save command.
- Creating Docker Images with renv (see here for 3 example Registries: Rocker Project/R-Hub/RStudio). Note: r-base:X.X.X image does not include several important libraries like "curl". If we use r-base.X.X.X as the base image, we will run into errors when we call renv::restore(). Docker images from Bioconductor (which is based on rocker/rstudio) has included utilities.
- Setting up a transparent reproducible R environment with Docker + renv
- A Gentle Introduction to Docker. docker build & renv.
pracpac package
pracpac - Practical 'R' Packaging in 'Docker'
Github actions
Chapter 5 Testing with a reproducible environment
checkpoint
dockr package
'dockr': easy containerization for R
Docker & Singularity
targets package
- targets: Democratizing Reproducible Analysis "Pipelines" Will Landau.
- It is similar to the Linux make command.
- It’s designed to help with computationally demanding analysis projects. The package skips costly runtime for tasks that are already up to date.
- An example. This pipeline reads in a CSV file, performs a transformation, and then generates a summary and a plot.
# Load the necessary library library(targets) # Define the pipeline tar_plan( tar_target( raw_data, read.csv("data.csv") # Assume you have a CSV file named "data.csv" ), tar_target( transformed_data, raw_data %>% transform() # Perform your transformation here ), tar_target( summary, transformed_data %>% summary() ), tar_target( plot, ggplot(transformed_data, aes(x = x, y = y)) + geom_point() + theme_minimal() ) ) # Run the pipeline tar_make()
- If the data.csv file doesn’t change, and the transformation function remains the same, the targets package won’t re-run those steps. It will directly use the results from the previous run, saving computational resources. This is the power of the targets package: it intelligently determines which parts of your analysis need to be updated and which parts can be skipped.
- Please replace "data.csv" and transform() with your actual data file and transformation function. Also, replace aes(x = x, y = y) with the actual variables you want to plot.
- Why you should consider working on a dockerized development environment
- Building reproducible analytical pipelines with R at ReproTea (2023-07-19). renv, targets, docker, Dockerfile (packages from posit) and alternatives (Podman, Nix). Nice talk.
rix package
- Rix: Reproducible Environments with Nix
- Reproducible data science with Nix by Bruno Rodrigues.
- Videos:
- Reproducible R development environments with Nix 8/6/2023
- Nix for R users with {rix} - running an old project with an old R and old packages 8/25/2023
- Reproducible R development on Github Actions with Nix 11/12/2023
- rix: An R package for reproducible dev environments with Nix (FOSDEM 2024) 2/6/2024 and Slide
Building reproducible analytical pipelines with R
Nix
- https://nixos.org/download.html
- Current (2024/2/27) version 2.20.3.
$ sh <(curl -L https://nixos.org/nix/install) --daemon $ nix-shell -p R rPackages.ggplot2 # to install a package $ nix-env -iA nixos.librewolf $ sudo nix-env -iA nixos.librewolf # to remove an installed package, $ nix-env -e [package_name]
- Plots can be shown when we call a plot function in a nix interactive shell.
- For some reason, the Bioconductor packages will need to compile when I run nix-build.
- What is the difference of using nix-env and nix-shell?
- nix-env is a global installation. It is similar to traditional package managers like apt, yum, or brew. It is not ideal for reproducibility.
- nix-shell is a local installation. These packages are not installed globally. The environment is temporary and isolated. A nix-shell will temporarily modify your $PATH environment variable. This can be used to try a piece of software before deciding to permanently install it. By specifying packages in a shell.nix or default.nix file, you can ensure consistent development environments across different machines or projects.
$ nix-env -iA nixpkgs.rPackages.dplyr $ nix-shell -p rPackages.dplyr
- Getting Started With Nix Package Manager: A Beginner’s Guide 2024
- How To Install openSSH on NixOS
- NixOS
Dev Containers
Easy R Tutorials with Dev Containers
conda, mamba
How to create a conda or mamba environment for R programming to enhance reproducibility (CC230) by Riffomonas Project
Snakemake
- Hypercluster: a flexible tool for parallelized unsupervised clustering optimization
- https://snakemake.readthedocs.io/en/stable/tutorial/setup.html#run-tutorial-for-free-in-the-cloud-via-gitpod
- https://hpc.nih.gov/apps/snakemake.html
- Snakemake—a scalable bioinformatics workflow engine (paper, 2012)
- An introduction to Snakemake tutorial for beginners (CC248) by Riffomonas Project
Papers
- zenodo.org which has been used by
- Demystifying "drop-outs" in single-cell UMI data
- https://zenodo.org/record/1225670 UMI-count modeling and differential expression analysis for single-cell RNA sequencing
- Zenodo empowers sharing research output of arbitrary size and format and receives @NIH and @NIHDataScience support for data sharing as a Generalist Repository.
- OSF which has been used by
- codeocean.
- A multicenter study benchmarking single-cell RNA sequencing technologies using reference samples. The R code can be downloaded by git (Capsule -> Export -> Clone via Git). The data (3.4G zip file) has to be downloaded manually. The environment panel shows what packages have to be installed (apt-get, Bioconductor, R-CRAN, R-Github). It seems "Export" is more complete than "Clone via Git". It even include a Dockerfile.
- Consensus Non-negative Matrix factorization (cNMF) v1.2
Misc
- 4 great free tools that can make your R work more efficient, reproducible and robust
- digest: Create Compact Hash Digests of R Objects
- memoise: Memoisation of Functions. Great for shiny applications. Need to understand how it works in order to take advantage. I modify the example from Efficient R by moving the data out of the function. The cache works in the 2nd call. I don't use benchmark() function since it performs the same operation each time (so favor memoise and mask some detail).
library(ggplot2) # mpg library(memoise) plot_mpg2 <- function(mpgdf, row_to_remove) { mpgdf = mpgdf[-row_to_remove,] plot(mpgdf$cty, mpgdf$hwy) lines(lowess(mpgdf$cty, mpgdf$hwy), col=2) } m_plot_mpg2 = memoise(plot_mpg2) system.time(m_plot_mpg2(mpg, 12)) # user system elapsed # 0.019 0.003 0.025 system.time(plot_mpg2(mpg, 12)) # user system elapsed # 0.018 0.003 0.024 system.time(m_plot_mpg2(mpg, 12)) # user system elapsed # 0.000 0.000 0.001 system.time(plot_mpg2(mpg, 12)) # user system elapsed # 0.032 0.008 0.047
- And be careful when it is used in simulation.
f <- function(n=1e5) { a <- rnorm(n) a } system.time(f1 <- f()) mf <- memoise::memoise(f) system.time(f2 <- mf()) system.time(f3 <- mf()) all.equal(f2, f3) # TRUE
- reproducible: A Set of Tools that Enhance Reproducibility Beyond Package Management
- Improving reproducibility in computational biology research 2020