2. 2. Build your own Docker container image and run it (locally or in the academic cluster)

First, make sure you got familiar with how Docker containers work.

This task can be done on your own computer or on a cloud-based Play with Docker environment. Specifically, for this tutorial we will use the Play with Docker Labs Environment, however you are free to follow along on your own computer, if you have Docker, OrbStack, or other Docker-compatible software installed. Using the Play with Docker Labs Environment is free and does not require any installation, only a web browser and a Docker Hub account.

The Goal

Build your own Docker container image for your hypothetical project. Imagine, that for this project you need 1 or 2 R packages that were archived from CRAN and are not available in the Rocker container image.

To achieve this goal, find a few archived R packages on CRAN. A few notable examples are rgdal, MortalitySmooth and many others. Remember, “40% of all packages ever in CRAN got at one point archived”[^1]. CRAN does not have it’s own section with archived packages, so you might want to look at CRANhaven Dashboard where you can find recently archived packages.

Choose the Rocker Image version

Depending on which archived package you selected, you must first find out which Rocker image version to use. For example, the package MortalitySmooth was archived on 2020-12-10. If you would use Rocker RStudio container image with R v4.1.0 released on 18th May 2021, the R package installer in the container will think it is 18th May 2021 and will try to install the package from the CRAN snapshot from that date. Since MortalitySmooth was archived after that date, you will not be able to install it using the standard install.packages() function. You will have to use remotes::install_version() function from the remotes package. Use internet search to find the release dates of R versions released just before the date the R package was archived. Rocker images are configured to use CRAN snapshot on the date of the R version release.

So your options are:

  • Use more recent R version (and consequently Rocker image) and try to installing MortalitySmooth using remotes::install_version() function.

  • Use strictly the R version that was released just before the package was archived and try to install the package using install.packages() function.

Start the Play with Docker Environment

Go to https://labs.play-with-docker.com/ and log in with your Docker Hub account.

If it is the first time you are logging in, Play With Docker will request access to your Docker account. Click on the “Accept” button to proceed. Than click the large green “Start” button to start the environment.

Click on the “Add New Instance” button to start a new Docker container instance.

You should get a new terminal window with a prompt that looks like this:

Clone the Repository

Go to the minimal example repository and copy the URL as shown below:

Now clone the repository by pasting the following command into the terminal:

git clone https://github.com/Population-Dynamics-Lab/grid-sample-containerized.git

Check which folders you have in the current directory:

ls

You should see the grid-sample-containerized folder. Change the directory to the repository (you can type cd g and press Tab to autocomplete the folder name):

cd grid-sample-containerized

You can check the contents of the repository by listing the files in the directory using in terminal again:

ls -al

Edit the Dockerfile

Now that you have selected the Rocker image version, you can edit the Dockerfile in the repository. The Dockerfile is a text file that contains instructions for building a Docker container image. Unlike in tutorial 1, In this tutorial, ignore the install.R file, we will be installing packages right in the Dockerfile.

Find the editor button in the middle of the screen and click on it.

A very simple file browser and editor (displayed when you click a file) will appear. You can edit the files in the repository directly in the browser. Remember to save changes.

To install the MortalitySmooth package in the Dockerfile add this in the second line:

RUN install2.r --error --skipinstalled MortalitySmooth

So your final Dockerfile should look like this:

FROM rocker/rstudio:4.0.0
RUN install2.r --error --skipinstalled MortalitySmooth

If you were installing two packages, you would add them like this:

FROM rocker/rstudio:4.0.0
RUN install2.r --error --skipinstalled MortalitySmooth ggplot2

Remember to save the changes.

Build the Docker Container Image

Unlike in tutorial 1, you will not be using Binder web service to build the Docker container image automatically for you. Instead, you will use the docker command. The docker command is a command-line tool that allows you to interact with Docker containers and images. You can use the docker command to build a Docker image from the Dockerfile in the repository.

First make sure you are in a folder that has the Dockerfile in it. You can check the contents of the current directory by running:

ls -al

And you can quickly check what is in the Dockerfile by running:

cat Dockerfile

To build the Docker image, run the following command in the terminal:

docker build -t r-mort-smooth:4.0.0 .

Let us break down this command:

part of command what it does
docker build This is the base command used to build a Docker image from a Dockerfile.
-t r-mort-smooth:4.0.0 The -t flag stands for “tag”. r-mort-smooth:4.0.0 is the name and tag given to the image. r-mort-smooth is the name of the image. 4.0.0 is the tag, which often represents the version of the image. You can choose any name and version here can also be anything, but we use the same version we used in the Dockerfile just so that we know which R version is going to be inside the container.
. this very important . (dot) specifies the build context, which is the current directory. Docker will look for a Dockerfile in this directory to create the image.

The container image will take about 3-6 minutes to build.

When the build is finished, you can check that it was added to the local container image storage:

docker images

Run the Docker Container from your Image

Now you have a Docker container image with the MortalitySmooth package installed. To run it, you can use the docker run command in the following way:

docker run --rm -p 8787:8787 -v $(pwd):/home/rstudio/my-project -e PASSWORD=somepass r-mort-smooth:4.0.0

Let us break down this command:

part of command what it does
docker run This is the base command used to run a Docker container from local or remote container image storage.
--rm This makes the container temporary. It will be destroyed after you stop it. You can explore other options (e.g. how to name containers, make them persistent and re-run the same ones after stopping) in the Docker documentation. But for now we want a disposable container that is destroyed after stopping.
-p 8787:8787 This flag specifies that the port inside the container is mapped to your computer, so that you can access RStudio in a web browser. Briefly, RStudio in a container is actually a server software that works over a network and it is not exactly the same as RStudio on your laptop, even though it feels that way. This is why ports are necessary, but do not worry about it too much at the moment.
-v $(pwd):/home/rstudio/my-project This maps the current directory (designated by $(pwd)) from which you are running the command to a folder inside the container (/home/rstudio/my-project). Thanks to this, when you use the containerized RStudio, you will have access to your local folder and will be able to run scripts and edit them. The /home/rstudio/ is default for Rocker containers, and the my-project part can be replaced with anything. Instead of the current directory (designated by $(pwd)) you can provide /path/to/any/folder/on/your/computer.
-e PASSWORD=somepass Sets the password. Better use a good password, even though you are running locally.
r-mort-smooth:4.0.0 The final part is the name and tag that you assigned earlier when you were creating the container image.

The container starts almost instantly. In case with Play with Docker service, you will see a button with a port pop-up:

Click on the port number to open RStudio in a new tab. Use the default login rstudio and the password you set in the docker run command.

If you are following along on your own computer, you can open a web browser and go to http://localhost:8787 to access RStudio. Use the default login rstudio and the password you set in the docker run command.

You can now use RStudio in the browser to check if the MortalitySmooth package is installed by running:

library(MortalitySmooth)

Stop the container

To stop the container, click the “power” button in the top right corner of the RStudio window. Close the web browser tab with RStudio. In the Play with Docker browser tab, click in the terminal and press Ctrl+C or Ctrl+\ to stop the container.

Video reference

For reference, here is the whole process in a sped up sequence:

And her is a video in a more leisurely pace, where only the building of the Dockerfile is sped up, but you can watch and rewind to any steps:

Discussion

Now that you have created your own reproducible repository, think for a moment, how future proof is it really? What does the reproducibility of your repository depend on? How can you further future-proof it?