moved to https://docs.hpc.taltech.ee not changed to rocky yet
Containers (Singularity & Docker)
Containers are a popular way of creating a reproducible software environment. Container solutions are Docker and Singularity, we support singularity.
The Singularity user guides are a great resource for learning what you can do with singularity
Running a container
On green or gray nodes
here is a native installation from CentOS EPEL of singularity 3.8.7, no modules to load.
pull the docker image you want, here ubuntu:18.04
singularity pull docker://ubuntu:18.04
write an sbatch file (here called ubuntu.slurm
):
#!/bin/bash
#SBATCH -t 0-00:30
#SBATCH -N 1
#SBATCH -c 1
#SBATCH --cpus-per-task=2 #singularity can use multiple cores
#SBATCH --mem-per-cpu=4000
singularity exec docker://ubuntu:18.04 cat /etc/issue
submit to the queueing system with
sbatch ubuntu.slurm
and when the resources become available, your job will be executed.
On amp nodes (not using GPU)
You need to load the module which comes from AI-Lab:
module load amp
module load Singularity
pull the docker image you want, here ubuntu:20.04:
singularity pull docker://ubuntu:20.04
write an sbatch file (here called ubuntu.slurm
):
#!/bin/bash
#SBATCH -t 0-00:30
#SBATCH -N 1
#SBATCH -c 1
#SBATCH -p gpu
#SBATCH --mem-per-cpu=4000
module load amp
module load Singularity
singularity exec docker://ubuntu:20.04 cat /etc/issue
# or singularity exec ubuntu_20.04.sif cat /etc/issue
submit to the queueing system with
sbatch ubuntu.slurm
and when the resources become available, your job will be executed.
On amp nodes (using GPU)
When running singularity through SLURM (srun, sbatch) only GPUs reverved through SLURM are visible to singularity.
Use with
module load amp
module load cuda
module load Singularity
pull the docker image you want, here ubuntu:20.04:
singularity pull docker://ubuntu:20.04
write an sbatch file (here called ubuntu.slurm
):
#!/bin/bash
#SBATCH -t 0-00:30
#SBATCH -N 1
#SBATCH -c 1
#SBATCH -p gpu
#SBATCH --gres=gpu:A100:1 #only use this if your job actually uses GPU
#SBATCH --mem-per-cpu=4000
module load amp
module load cuda
module load Singularity
singularity exec --nv docker://ubuntu:20.04 nvidia-smi
# or singularity exec --nv ubuntu_20.04.sif nvidia-smi
# the --nv option to singularity passes the GPU to it
submit to the queueing system with
sbatch ubuntu.slurm
and when the resources become available, your job will be executed.
More on singularity and GPUs, see https://sylabs.io/guides/3.9/user-guide/gpu.html.
Hints
By default there is no network isolation in Singularity, so there is no need to map any port (-p in docker). If the process inside the container binds to an IP:port, it will be immediately reachable on the host. Singularity also mounts $HOME and $TMP by default so the directory you run the container from will be the working directory within the container (unless the directory is not on the same filesystem as $HOME).
Singularity will use all cores reserved using --cpus-per-task
, if less should be used, the singularity parameter --cpus
can be used, similarly, if a container should use less memory, this can be restricted by the singularity parameter --memory
. These parameters can be useful, if a single batch job starts several containers concurrently.
Example: Interactive PyTorch job (with GPU)
Start an interactive session on amp, make the modules available and run the docker image in singularity:
module load amp
module load Singularity
srun -t 1:00:00 -p gpu --gres=gpu:1 --pty bash
singularity exec --nv docker://pytorch/pytorch python
inside the container python session run
import torch
torch.cuda.is_available()
torch.cuda.get_device_name()
You can also shorten it to a single command
srun -t 1:00:00 -p gpu --mem 32G --gres=gpu:1 singularity exec docker://pytorch/pytorch python -c "import torch;print(torch.cuda.is_available())"
which should give the same result (without the GPU name). If you remove the --nv
flag the result changes as singularity no longer exposes the gpu.
Example: Interactive TensorFlow job (without GPU)
Start an interactive session on amp, make the modules available and run the docker image in singularity:
srun -t 1:00:00 -p gpu --pty bash
source /usr/share/lmod/lmod/init/bash
module load amp
module load Singularity/3.7.3
singularity run docker://tensorflow/tensorflow
inside the container run
python
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
The following is the “TensorFlow 2 quickstart for beginners” from https://www.tensorflow.org/tutorials/quickstart/beginner, continue inside the python:
import tensorflow as tf
print("TensorFlow version:", tf.__version__)
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10)
])
predictions = model(x_train[:1]).numpy()
predictions
tf.nn.softmax(predictions).numpy()
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
loss_fn(y_train[:1], predictions).numpy()
model.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test, verbose=2)
probability_model = tf.keras.Sequential([
model,
tf.keras.layers.Softmax()
])
probability_model(x_test[:5])
Example job for OpenDroneMap (ODM)
OpenDroneMap needs a writable directory for the data. This directory needs to contain a subdirectory named images
.
Assume you keep your ODM projects in the directory opendronemap
:
opendronemap
|
|-Laagna-2021
| |
| |-images
|
|-Paldiski-2015
| |
| |-images
|
|-Paldiski-2018
| |
| |-images
|
|-TalTech-2015
| |
| |-images
If you want to create a 3D model for Laagna-2021, you would run the following Singularity command:
singularity run --bind $(pwd)/opendronemap/Laagna-2021:/datasets/code docker://opendronemap/odm --project-path /datasets
For creating a DEM, you would need to add --dsm
and potentially -v "$(pwd)/odm_dem:/code/odm_dem"
GPU use for singularity is enabled with the --nv
switch, be aware that ODM uses the GPU only for the matching, which is only a small percentage of the time of the whole computation.
The SLURM job-script looks like this:
#!/bin/bash
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task=10
#SBATCH --time 01:30:00
#SBATCH --partition gpu
#SBATCH --gres=gpu:A100:1
module load amp
module load Singularity
singularity run --nv --bind $(pwd)/opendronemap/Laagna-2021:/datasets/code docker://opendronemap/odm --project-path /datasets --dsm
Obtaining and Building Singularity Containers
When you want to use a container with the cluster you’ll need to get the image from somewhere and you cannot build containers on the cluster for security reasons (even with --fakeroot
) so there are two ways to get your containers into the cluster.
From Container Registries
Singularity can pull and convert docker images from docker container registeries (most significantly dockerhub) directly into singularity images. This is the method used in the previous examples. You can read more here: https://docs.sylabs.io/guides/3.9/user-guide/singularity_and_docker.html
You can also use GitHub’s Container Registry or TalTech’s Software Science Gitlab (You’ll need to sign in with an access token to pull containers from the registry, more on that here https://docs.sylabs.io/guides/3.9/user-guide/endpoint.html)
Building images locally then moving to cluster
Since Singularity images are single files you can transfer them quite easily with any tool used to sync data with the cluster, scp
, rsync
etc. You can build locally with either just the singularity
tool or singularity
and docker
Building images from singularity definition file then transferring to the cluster.
Building images with docker from dockerfiles then saving the image docker save to an archive e.g
docker build -t pytorch .
docker save pytorch | gzip > pytorch.tar.gz
creates a file pytorch.tar.gz
which you can either convert to a singularity image locally with singularity build docker-archive//pytorch.tar.gz
or you can move the archive to the cluster and build from there. Building from a docker archive is the only form of image building allowed in the cluster.