Containers (Singularity & Docker)

Containers are a popular way of creating a reproducible software environment. Container solutions are Docker and Singularity, we support singularity.

The Singularity user guides are a great resource for learning what you can do with singularity




Running a container


On green or gray nodes

here is a native installation from CentOS EPEL of singularity 3.8.7, no modules to load.

pull the docker image you want, here ubuntu:18.04

singularity pull docker://ubuntu:18.04

write an sbatch file (here called ubuntu.slurm):

#!/bin/bash
#SBATCH -t 0-00:30
#SBATCH -N 1
#SBATCH -c 1
#SBATCH --cpus-per-task=2   #singularity can use multiple cores
#SBATCH --mem-per-cpu=4000
singularity exec docker://ubuntu:18.04 cat /etc/issue

submit to the queueing system with

sbatch ubuntu.slurm

and when the resources become available, your job will be executed.

On amp nodes (not using GPU)

You need to load the module which comes from AI-Lab:

module load amp
module load Singularity

pull the docker image you want, here ubuntu:20.04:

singularity pull docker://ubuntu:20.04

write an sbatch file (here called ubuntu.slurm):

#!/bin/bash
#SBATCH -t 0-00:30
#SBATCH -N 1
#SBATCH -c 1
#SBATCH -p gpu
#SBATCH --mem-per-cpu=4000
module load amp
module load Singularity
singularity exec docker://ubuntu:20.04 cat /etc/issue
# or singularity exec ubuntu_20.04.sif cat /etc/issue

submit to the queueing system with

sbatch ubuntu.slurm

and when the resources become available, your job will be executed.

On amp nodes (using GPU)

When running singularity through SLURM (srun, sbatch) only GPUs reverved through SLURM are visible to singularity.

Use with

module load amp
module load cuda
module load Singularity

pull the docker image you want, here ubuntu:20.04:

singularity pull docker://ubuntu:20.04

write an sbatch file (here called ubuntu.slurm):

#!/bin/bash
#SBATCH -t 0-00:30
#SBATCH -N 1
#SBATCH -c 1
#SBATCH -p gpu
#SBATCH --gres=gpu:A100:1     #only use this if your job actually uses GPU
#SBATCH --mem-per-cpu=4000
module load amp
module load cuda
module load Singularity
singularity exec --nv docker://ubuntu:20.04 nvidia-smi
# or singularity exec --nv ubuntu_20.04.sif nvidia-smi
# the --nv option to singularity passes the GPU to it

submit to the queueing system with

sbatch ubuntu.slurm

and when the resources become available, your job will be executed.

More on singularity and GPUs, see https://sylabs.io/guides/3.9/user-guide/gpu.html.

Hints

By default there is no network isolation in Singularity, so there is no need to map any port (-p in docker). If the process inside the container binds to an IP:port, it will be immediately reachable on the host. Singularity also mounts $HOME and $TMP by default so the directory you run the container from will be the working directory within the container (unless the directory is not on the same filesystem as $HOME).

Singularity will use all cores reserved using --cpus-per-task, if less should be used, the singularity parameter --cpus can be used, similarly, if a container should use less memory, this can be restricted by the singularity parameter --memory. These parameters can be useful, if a single batch job starts several containers concurrently.



Example: Interactive PyTorch job (with GPU)


Start an interactive session on amp, make the modules available and run the docker image in singularity:

module load amp
module load Singularity
srun -t 1:00:00 -p gpu --gres=gpu:1 --pty bash
singularity exec --nv docker://pytorch/pytorch python

inside the container python session run

import torch
torch.cuda.is_available()
torch.cuda.get_device_name()

You can also shorten it to a single command

srun -t 1:00:00 -p gpu --mem 32G --gres=gpu:1 singularity exec docker://pytorch/pytorch python -c "import torch;print(torch.cuda.is_available())"

which should give the same result (without the GPU name). If you remove the --nv flag the result changes as singularity no longer exposes the gpu.




Example: Interactive TensorFlow job (without GPU)


Start an interactive session on amp, make the modules available and run the docker image in singularity:

srun -t 1:00:00 -p gpu --pty bash
source /usr/share/lmod/lmod/init/bash
module load amp
module load Singularity/3.7.3
singularity run docker://tensorflow/tensorflow

inside the container run

python
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

The following is the “TensorFlow 2 quickstart for beginners” from https://www.tensorflow.org/tutorials/quickstart/beginner, continue inside the python:

import tensorflow as tf
print("TensorFlow version:", tf.__version__)
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10)
])
predictions = model(x_train[:1]).numpy()
predictions
tf.nn.softmax(predictions).numpy()
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
loss_fn(y_train[:1], predictions).numpy()
model.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test,  y_test, verbose=2)
probability_model = tf.keras.Sequential([
  model,
  tf.keras.layers.Softmax()
])
probability_model(x_test[:5])



Example job for OpenDroneMap (ODM)


OpenDroneMap needs a writable directory for the data. This directory needs to contain a subdirectory named images.

Assume you keep your ODM projects in the directory opendronemap:

opendronemap
|
|-Laagna-2021
| |
| |-images
|
|-Paldiski-2015
| |
| |-images
|
|-Paldiski-2018
| |
| |-images
|
|-TalTech-2015
| |
| |-images

If you want to create a 3D model for Laagna-2021, you would run the following Singularity command:

singularity run --bind $(pwd)/opendronemap/Laagna-2021:/datasets/code docker://opendronemap/odm --project-path /datasets

For creating a DEM, you would need to add --dsm and potentially -v "$(pwd)/odm_dem:/code/odm_dem"

GPU use for singularity is enabled with the --nv switch, be aware that ODM uses the GPU only for the matching, which is only a small percentage of the time of the whole computation.

The SLURM job-script looks like this:

#!/bin/bash
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task=10
#SBATCH --time 01:30:00
#SBATCH --partition gpu
#SBATCH --gres=gpu:A100:1

module load amp
module load Singularity

singularity run --nv --bind $(pwd)/opendronemap/Laagna-2021:/datasets/code docker://opendronemap/odm --project-path /datasets --dsm



Obtaining and Building Singularity Containers


When you want to use a container with the cluster you’ll need to get the image from somewhere and you cannot build containers on the cluster for security reasons (even with --fakeroot) so there are two ways to get your containers into the cluster.

From Container Registries

Singularity can pull and convert docker images from docker container registeries (most significantly dockerhub) directly into singularity images. This is the method used in the previous examples. You can read more here: https://docs.sylabs.io/guides/3.9/user-guide/singularity_and_docker.html

You can also use GitHub’s Container Registry or TalTech’s Software Science Gitlab (You’ll need to sign in with an access token to pull containers from the registry, more on that here https://docs.sylabs.io/guides/3.9/user-guide/endpoint.html)

Building images locally then moving to cluster

Since Singularity images are single files you can transfer them quite easily with any tool used to sync data with the cluster, scp, rsync etc. You can build locally with either just the singularity tool or singularity and docker

docker build -t pytorch .
docker save pytorch | gzip > pytorch.tar.gz

creates a file pytorch.tar.gz which you can either convert to a singularity image locally with singularity build docker-archive//pytorch.tar.gz or you can move the archive to the cluster and build frrom there. Building from a docker archive is the only form of image building allowed in the cluster.