R Docker: Difference between revisions

From 太極
Jump to navigation Jump to search
 
(3 intermediate revisions by the same user not shown)
Line 96: Line 96:
** [https://sbamin.com/blog/2016/02/running_rstudio_in_docker_environment/ Running RStudio in a docker container]
** [https://sbamin.com/blog/2016/02/running_rstudio_in_docker_environment/ Running RStudio in a docker container]
* [http://ropenscilabs.github.io/r-docker-tutorial/ R Docker tutorial] from ropenscilabs. It covers sharing your analysis.
* [http://ropenscilabs.github.io/r-docker-tutorial/ R Docker tutorial] from ropenscilabs. It covers sharing your analysis.
* [https://datawookie.dev/blog/2024/05/desert-island-docker-r-edition/ Desert Island Docker: R Edition]
* [https://datawookie.dev/blog/2024/05/desert-island-docker-r-edition/ Desert Island Docker: R Edition] & [https://www.youtube.com/watch?v=N1ew3jTDcvk video].


== Dockerfile ==
== Dockerfile ==
Line 526: Line 526:
</pre>
</pre>


= [http://www.bioconductor.org/help/docker/ Bioconductor] =
= Bioconductor =
[http://www.bioconductor.org/help/docker/ Bioconductor]
 
(2020-1-30)
(2020-1-30)
<ul>
<ul>
Line 538: Line 540:
     bioconductor/bioconductor_docker:devel
     bioconductor/bioconductor_docker:devel


docker run -it --user rstudio bioconductor/bioconductor_docker:RELEASE_3_10 R
docker run -it --user rstudio \
      --name bioc3.10 bioconductor/bioconductor_docker:RELEASE_3_10 R
      ## I did not add '--rm' so the container can be restarted if
      ## we use renv::restore()


docker run --rm -it -e DISABLE_AUTH=true -p 8787:8787 \
docker run --rm -it --name bioc3.19 -e DISABLE_AUTH=true -p 8787:8787 \
           bioconductor/bioconductor_docker:RELEASE_3_19 # R 4.4
           bioconductor/bioconductor_docker:RELEASE_3_19 # R 4.4


Line 547: Line 552:
One case (Bioc 3.14 works with R 4.1.3)
One case (Bioc 3.14 works with R 4.1.3)
<pre>
<pre>
docker run -it --name hungry_wiles \
docker run -it --name bioc3.14 \
       -v $(pwd):/home/rstudio -w /home/rstudio \
       -v $(pwd):/home/rstudio -w /home/rstudio \
       --user rstudio \
       --user rstudio \
       bioconductor/bioconductor_docker:RELEASE_3_14 R
       bioconductor/bioconductor_docker:RELEASE_3_14 R
# q()
# q() # this will stop the container
# docker restart hungry_wiles
# docker attach hungry_wiles
</pre>
</pre>
<syntaxhighlight lang='sh'>
docker start bioc3.14
docker attach bioc3.14 # attach your terminal to a running Docker container
</syntaxhighlight>
<li>https://github.com/Bioconductor/bioconductor_docker
<li>https://github.com/Bioconductor/bioconductor_docker
</ul>
</ul>

Latest revision as of 16:06, 15 November 2024

Use with R (r-base) & RStudio IDE: Rocker

  • Docker 101 for Data Scientists by RStudio
  • r-base (Official image, R version is tagged), RStudio
    • The oldest version of R is 3.1.2 (2014-10-31). docke run -it --rm r-base:3.1.2
    • Managing Users
    • An Introduction to Docker for R Users: how to write your own <Dockerfile>, install packages, run a script and get results.
    • Extensions from r-base. For example, r-spatial-base. It also mentions ropensci container is built upon rocker/rstudio.
    • The r-base image does not have pdflatex, git. Need to manually install them.
    • Not sure if the Docker Official Image is the same as the one provided by Rocker Project.
    • NOTE: Plotting works by forwarding X11. The instruction depends on the host OS. See rocker Wiki or the command below. Creating graphics files inside a container is still OK 👌; see the example How to compile R Markdown documents using Docker.
      docker pull r-base:3.5.3
      docker run -it --rm rocker/drd RD              # a little smaller, 3.6GB for R 4.0
      docker run -it --rm rocker/drd R               # good to test the pipe operator (due in R 4.1.0)
      docker run -it --rm rocker/r-devel RD          # initial one, larger, 5.7GB for R 4.0
      docker run -it --rm rocker/r-devel R           # r-release
      docker run -it --rm r-base:3.5.3               # default is root "/"
      docker run -it --rm rocker/r-rspm:22.04        # seems the 'latest' tag is missing
      docker run -it --rm rocker/r-bspm:22.04        # better than r-rspm in the case of 'tidyverse'
                                                     # since bspm can take care of missing system OS libraries
                                                     # 'Many' Bioconductor packages like DESeq2/limma/sva are avail
      docker run -v ~/Downloads:/src -it --rm r-base # /src does not exist
      docker run -v ~/Downloads:/home/docker -it --rm r-base # /home/docker exists and is empty by default
                                                    # setwd("/home/docker")
      docker run -it --rm -u1000:1000 -e DISPLAY=$DISPLAY \
                 -v /tmp/.X11-unix:/tmp/.X11-unix \
                 -v $(pwd):/work -w /work r-base
      
      docker run -it --rm -p 8787:8787 \
             -v $(pwd):/home/rstudio/project \
             -e PASSWORD=mypassword \
             -w /home/rstudio/project rocker/tidyverse
      
      # Disable authorization
      docker run -it --rm -p 8787:8787 \
             -v $(pwd):/home/rstudio/project \
             -e DISABLE_AUTH=true \
             -w /home/rstudio/project rocker/tidyverse:4.2
      

      The "-u" option causes an error "s6-supervise (child): fatal: unable to exec run: Permission denied" (2/19/2023). According to the rocker/tidyverse documentation, the non-root default user rstudio is set up as RStudio Server user. So it is not needed to use a different username.

      docker run -ti --rm -v "$PWD":/home/docker -w /home/docker \
             -u docker r-base bash   # Non-root user
      
      docker run -ti --rm -v "$PWD":/home/rstudio -w /home/rstudio \
             -u rstudio rocker/rstudio bash  # Non-root user
      

Dockerfile

Create a new directory and a new file 'Dockerfile' with the content.

FROM debian:testing
MAINTAINER Dirk Eddelbuettel [email protected]
## Remain current
RUN apt-get update -qq
RUN apt-get dist-upgrade -y
RUN apt-get install -y --no-install-recommends r-base r-base-dev r-recommended littler
RUN ln -s /usr/share/doc/littler/examples/install.r /usr/local/bin/install.r

NOTE

  1. From r-base in DockerHub click the "latest" in the "Supported tags and respective Dockerfile links" section.
  2. I ran into errors when I use the above (short) Dockerfile. But the Dockerfile from rocker (leave out the last line of launching R) works well. The R packages built in the image include 'docopt', 'magrittr', 'stringi', and 'stringr'.
  3. Install R package is possible when we launch a container. But we are not able to save the packages?? The rocker wiki also mentions something about installing packages.
  4. See also How to save data in wiki.

install2.r

The install2.r command can be used to concisely describe the installation of the R package in the Dockerfile. See Rocker project - Extending images

A quick run of an R script

docker run --rm \
  -v $(pwd):/tmp/working_dir \
  -w /tmp/working_dir \
  rocker/tidyverse:latest \
  Rscript my_script.R

docker run

Note that if we are using the Dockerfile above to create an image, we will be dropped to the Linux shell. If we are pulling the rocker/r-base image from Docker Hub, we will be in R console directly. See the last line of Dockerfile on github website for rocker.

Then run the following to do some exercise (We could possibly replace 21b6a9e8b9e8 with your image ID or use rocker/r-base). For simplicity, we can try the colortools package first which does not depend on other packages and there is no need to compile the package.

sudo docker build -t debian:testing-add-r . # create an image based on the above Dockerfile
wget http://cran.r-project.org/src/contrib/sanitizers_0.1.0.tar.gz
sudo docker run -v `pwd`:/mytmp -t 21b6a9e8b9e8 \
     R CMD check --no-manual --no-build-vignettes /mytmp/sanitizers_0.1.0.tar.gz
sudo docker run -v `pwd`:/mytmp -t 21b6a9e8b9e8 \
     Rdevel CMD check --no-manual --no-build-vignettes /mytmp/sanitizers_0.1.0.tar.gz

sudo docker search eddelbuettel
sudo docker pull eddelbuettel/docker-ubuntu-r   # default tag is 'latest'; actually older than the other tags
sudo docker images eddelbuettel/docker-ubuntu-r # see the tag column
sudo docker pull eddelbuettel/docker-ubuntu-r:add-r # the tag name can only be obtained from hub.docker.com
sudo docker images eddelbuettel/docker-ubuntu-r # see the tag column
sudo docker pull eddelbuettel/docker-ubuntu-r:add-r-devel
sudo docker images eddelbuettel/docker-ubuntu-r # see the tag column
sudo docker run -v `pwd`:/mytmp -t 54d865dbd2c9 R CMD check --no-manual --no-build-vignettes /mytmp/sanitizers_0.1.0.tar.gz

sudo docker run -t -i eddelbuettel/docker-ubuntu-r /bin/bash
$ sudo docker images
REPOSITORY                     TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
debian                         testing-add-r       21b6a9e8b9e8        28 minutes ago      572.2 MB
ubuntu                         14.04               ed5a78b7b42b        4 days ago          188.3 MB
ubuntu                         latest              ed5a78b7b42b        4 days ago          188.3 MB
debian                         testing             88ba2870bfbe        7 weeks ago         154.7 MB
eddelbuettel/docker-ubuntu-r   add-r-devel         c998a74a1fb4        11 weeks ago        460.4 MB
eddelbuettel/docker-ubuntu-r   add-r               54d865dbd2c9        11 weeks ago        460.4 MB
eddelbuettel/docker-ubuntu-r   latest              a7cd5ddeb98e        5 months ago        515.4 MB

sudo docker logs xxxxx                # view the log
sudo docker restart xxxxx
sudo docker exec -it xxxx /bin/bash   # view any changes in R library
sudo docker stop xxxxx
sudo docker rm xxxxx

This is another example of using 'docker run' accompanying MotifBreakR package.

Testing a new R release

R 4.1.0

docker pull rocker/r-base:4.1.0 

alias dkrr='docker run --rm -it -u1000:1000 -v$(pwd):/work -w /work'
dkrr rocker/r-ubuntu:20.04 bash
dkrr r-base:latest R --version | head -1
dkrr r-base:3.6.3 R --version | head -1

# Assume we are in a directory called 'curse'
# (Yes you may need to add Depends and LaTeX support ...)
# Even the 'survival' package requires pdflatex in 'R CMD build' step.
# A toy package like https://cran.r-project.org/web/packages/QuadRoot/ works
dkrr rocker/r-base:4.1.0 R CMD build .  # this will create curse_1.0.0.tar.gz
dkrr rocker/r-base:4.1.0 R CMD check --no-vignettes --no-manual curse_1.0.0.tar.gz

A closer solution is to use rocker/verse:4.0.4 image. But it gives different errors

  • survival: LaTeX Error: File `fancyvrb.sty' not found.
  • glmnet: dependencies ‘foreach’, ‘shape’ are not available. A workaround solution (need to figure out the dependencies first):
    $ curl -s https://cran.r-project.org/src/contrib/glmnet_4.1-1.tar.gz | tar xzv 
    $ cd glmnet
    $ docker run --rm -it -v$(pwd):/work -w /work rocker/verse:4.0.4 bash
    # Rscript -e "install.packages(c('foreach', 'shape', 'knitr', 'lars', 'testthat', 'xfun', 'rmarkdown'))"
    # su rstudio
    $ R CMD build .
    $ exit
    # exit
    

    If we don't install the 'Suggests' packages, building will fail when it was trying to build the vignette. A more relaxed solution is adding the option --no-build-vignettes

rocker/verse images

The rocker/verse images contain the curl utility & libcurl4-openssl-dev & libxml2-dev which are needed when I use the groundhog package to install packages. The tidyverse images don't include these libraries/utility. Tested on 3.6.3.

Note that these versioned images may be using MRAN. For example for rocker/verse:3.6.3,

> options()$repo
                                            CRAN
"https://mran.microsoft.com/snapshot/2020-04-24"

For rocker/verse:4.2.3

> options()$repo
                                                             CRAN
"https://packagemanager.posit.co/cran/__linux__/jammy/2023-04-20"

For rocker/verse:4.3.0

> options()$repo
                                                         CRAN
"https://packagemanager.posit.co/cran/__linux__/jammy/latest"

Testing R packages

Multiple containers, cookies

Docker for R Package Development

http://www.jimhester.com/2017/10/13/docker/

A DevOps Perspective

Reproducible

Research papers

Debugging R memory problem

Docker image for debugging R memory problems using the r-debug container (valgrind)

Debugging with gcc problem

https://twitter.com/eddelbuettel/status/1232341601483182081

More examples

Building a Repository of Alpine-based Docker Images for R

Building Rstudio Image using Dockerfile

See Version-stable/rocker-versioned2 Rocker images repo and https://github.com/rocker-org/rocker-versioned2/tree/master/dockerfiles.

In the 'Build images' -> 'Container definition files' section, it describes These (dockerfile) files are generated by the build scripts under the build folder.

To build rstudio image,

git clone https://github.com/rocker-org/rocker-versioned2.git
cd rocker-versioned2
docker build -f dockerfiles/rstudio_4.4.0.Dockerfile -t rstudio_4.4.0 .

docker run -d -p 8787:8787 -e PASSWORD=rstudio \
           rstudio_4.4.0

docker run --rm rstudio_4.4.0 R -e 'R.version'

It took 16 minutes on 8th gen CPU, 8GB RAM and 4GB disk space.

Disable password in rocker/rstudio

https://rocker-project.org/images/versioned/rstudio.html. Note that the option -p 8787:8787 is not optional.

docker run --rm -ti -e DISABLE_AUTH=true \
    -p 8787:8787 rocker/rstudio

RStudio + Podman

RStudio in Docker – now share your R code effortlessly!. Markdown

https://harini.blog/2019/05/25/rstudio-and-rshiny-in-docker/

It is interesting the Dockerfile uses install2.r (R script with a shebang line) from the littler package to install R packages. See http://dirk.eddelbuettel.com/code/littler.examples.html or rocker/verse Dockerfile. But it is not clear how to install private R packages (mount host folder and use install.packages()).

Note that the tag name should be lower case; awesomer instead awesomeR.

Also got an error when trying to build the image: Failed to fetch http://deb.debian.org/debian/dists/stretch/InRelease Temporary failure resolving 'deb.debian.org' .

Try it again at home. The apt update part is OK but I still got a new error: Error: installation of package ‘gifski’ had non-zero exit status

(Updated 9-18-2020) Try both the long and short commands using the tag '4.0.2' instead of '3.5.1'. Both work. The report file <example_report.pdf> is generated. In this example, the Rmd file is called through an R file. See the source code.

Dockerized RStudio Server, packages (renv)

HomeLab 6: Dockerized RStudio Server, packages, persistent storage and SSL certs

How to manage R package dependencies for shiny app deployment (docker)

How to manage R package dependencies for shiny app deployment (docker)

METACRAN web

https://github.com/metacran/metacranweb It works (2019-11-3).

Modified Dockerfile

Checking your Package for Compatibility with R 4.0.0

Best Practices for R with Docker

Best Practices for R with Docker

Warning: unable to load shared object 'R_X11.so'

Using rocker/rstudio:4.2 image, I got the following message when I use arrange_ggsurvplots().

Warning message:
In grSoftVersion() :
  unable to load shared object '/usr/local/lib/R/modules//R_X11.so':
  libXt.so.6: cannot open shared object file: No such file or directory

See https://github.com/rocker-org/rocker-versioned/issues/234, https://issuekiller.com/issues/rocker-org/rocker/81299432.

$ ls -l /usr/local/lib/R/modules//R_X11.so
-rwxr-xr-x 1 root root 665128 May  7 00:48 /usr/local/lib/R/modules//R_X11.so

$ ldd /usr/local/lib/R/modules//R_X11.so | grep libX
        libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f74b07bb000)
        libXt.so.6 => not found
        libXrender.so.1 => /usr/lib/x86_64-linux-gnu/libXrender.so.1 (0x00007f74af598000)
        libXext.so.6 => /usr/lib/x86_64-linux-gnu/libXext.so.6 (0x00007f74af583000)
        libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007f74ad479000)
        libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f74ad46f000)

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"
...

My solution is

$ docker exec -it CONTAINER bash
# apt update
# apt install libxt-dev

Another case when I use ggsurvplot()

Error in dyn.load(file, DLLpath = DLLpath, ...) : 
  unable to load shared object '/usr/local/lib/R/site-library/xml2/libs/xml2.so':
  libxml2.so.2: cannot open shared object file: No such file or directory
$ ls -l /usr/local/lib/R/site-library/xml2/libs/
total 1404
-rwxrwxr-x 1 rstudio staff 1437536 Jun  3 12:04 xml2.so

$ ldd /usr/local/lib/R/site-library/xml2/libs/xml2.so
        linux-vdso.so.1 (0x00007ffd369b0000)
        libxml2.so.2 => not found
        libR.so => /usr/local/lib/R/lib/libR.so (0x00007faf7406a000)
 ...

Applying the same method seems to fix the problem.

The above problems happened on my mac computer. When I check the problematic files on my Ubuntu host using ldd, it does not have the same problem (but the path is a little different???)

# ls -l /usr/local/lib/R/site-library/xml2/libs/
total 1404
-rwxrwxr-x 1 rstudio staff 1437536 May 22 19:29 xml2.so

# ldd /usr/local/lib/R/site-library/xml2/libs/xml2.so
	linux-vdso.so.1 (0x00007fff6f51d000)
	libxml2.so.2 => /lib/x86_64-linux-gnu/libxml2.so.2 (0x00007facc1dd8000)
	libR.so => not found
	libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007facc1bf6000)
 ...

Similar problem when I try to install tidyverse package on top of a rocker/r-rspm:22.04 container. See also piggy-back on RSPM's system dependency data base #404. To fix it, run the following in the R session

system("apt update")
system("apt-get install libxml2-dev")

Bioconductor

Bioconductor

(2020-1-30)

  • https://bioconductor.org/help/docker/
  • Bioconductor releases
  • https://hub.docker.com/r/bioconductor/bioconductor_docker (support as early as RELEASE_3_10 with R 3.6.3)
    docker run \
         -e PASSWORD=bioc \
         -p 8787:8787 \
         bioconductor/bioconductor_docker:devel
    
    docker run -it --user rstudio \
          --name bioc3.10 bioconductor/bioconductor_docker:RELEASE_3_10 R
          ## I did not add '--rm' so the container can be restarted if 
          ## we use renv::restore()
    
    docker run --rm -it --name bioc3.19 -e DISABLE_AUTH=true -p 8787:8787 \
              bioconductor/bioconductor_docker:RELEASE_3_19 # R 4.4
    
    docker run -it --user rstudio bioconductor/bioconductor_docker:devel bash
    

    One case (Bioc 3.14 works with R 4.1.3)

    docker run -it --name bioc3.14 \
           -v $(pwd):/home/rstudio -w /home/rstudio \
           --user rstudio \
           bioconductor/bioconductor_docker:RELEASE_3_14 R
    # q()  # this will stop the container
    
    docker start bioc3.14
    docker attach bioc3.14 # attach your terminal to a running Docker container
  • https://github.com/Bioconductor/bioconductor_docker

(2019-10-15)

How I use Bioconductor with Docker, Part 2: More memory, faster Bioconductor with Docker

BiocImageBuilder

Reproducible Bioconductor workflows using browser-based interactive notebooks and containers

ARM64

Testing Packages on Linux ARM64 with GitHub Actions

Bioc Conference

  • Orchestra
  • Bioc2019 conference. Workshopt material. Before the meeting (6/20/2019) there are 13 downloads and it bumps to 32 downloads after the meeting for the docker image.
    • Download a tarball containing R packages (binary). 605 packages are included. It bundles 605 packages based on R 3.6.0 and Bioconductor 3.10 (BiocManager 1.30.4).
    • Run RStudio container. All R packages downloaded in the last step are mounted. (user=rstudio, ps=bioc). That is, built-in is at /usr/local/lib/R/library, custom is at /usr/local/lib/R/site-library. These two locations are what .libPaths() gives.
  • BioC 2020
    • Workshop packages were created using the BuildABiocWorkshop2020 template. From the 'Dockerfile', we see each workshop material is organized as an R package. So each workshop's package is built in the Docker image. There is no need to build the vignette again. NOTE: it takes a while to build the Docker image locally since it requires to compile each R package separately.
    • BioC 2020: Where Software and Biology Connect Opening Remarks
    • Take the recount2 workshop for example, we don't need to knit the Rmd file. To view HTML vignette, we type browseVignettes(package="recountWorkshop2020") and click the link "HTML". In case of the requested page was not found error, add help/ to the URL right after the hostname, e.g., http://localhost:8787/help/library/recountWorkshop2020/doc/recount-workshop.html. Another way to open the HTML without any tweak is type "help(package = 'recountWorkshop2020')" -> User Guide -> HTML.
    • Bioc Asia 2020
    • The vignette may not include the R code. So the Rmd file is still needed to understand the content or do a practice.

single-cell RNA-Seq

  • Docker image with rstudio for single cell analysis (github), https://hub.docker.com/r/vbarrerab/singlecell-base. Other images: https://github.com/rnakato/docker_singlecell, https://hub.docker.com/r/leanderd/single-cell-analysis
    docker run -d -p 8787:8787 \
      --name scrna \
      -e USER='rstudio' \
      -e PASSWORD='rstudioSC' \
      -e ROOT=TRUE \
      -v /home/$USER/Documents/scrna:/home/rstudio/projects \
      vbarrerab/singlecell-base:R.4.0.3-BioC.3.11-ubuntu_20.04
    

    If I accidentally reboot the computer, installed packages were not lost. But it is safer to use docker stop XXX and then run docker start XXX.

  • Image containing rstudio + conda + a set of helpful packages for single cell analysis analysis,
  • docker hub,
  • Dockerfile
  • To use with Portainer, it is better to use composerize to convert the docker run command into a stack. Note Portainer cannot take version 3.x so I change version to 2. A stack/docker-compose.yml file generated by composerize with a modification on version number.
    version: '2'
    services:
        singlecell-base:
            ports:
                - '8787:8787'
            container_name: scrna
            environment:
                - USER=rstudio
                - PASSWORD=rstudioSC
                - ROOT=TRUE
            volumes:
                - '/tmp/scrna:/home/rstudio/projects'
            image: 'vbarrerab/singlecell-base:R.4.0.3-BioC.3.11-ubuntu_20.04'
    

    Note: the right way to delete a stack is to stop the container, and then delete the container. The final step is to select the stack and remove it.

  • Note that I have 2 pythons installed. One is from the OS (/opt/conda/bin/python) whose version is 3.8.3. The other one is on (/home/rstudio/.conda/envs/sc_env/bin/python) whose version is 3.7.8. The $PATH variable will show differences.
  • Another scRNA-Seq course including a docker image (not tried yet): Analysis of single cell RNA-seq data (ebook, University of Cambridge Bioinformatics training unit) and the paper Tutorial: guidelines for the computational analysis of single-cell RNA sequencing data Andrews 2020.

Cellar: interactive tool for analyzing single-cell omics data

Cellar is an interactive tool for analyzing single-cell omics data. Cellar is built in Python using the Dash framework and relies on several open-source packages.

Nanopore sequencing

DUESSELPORE Webserver and the paper