Quickstart: Cluster

Accessing the cluster

NB! To access the cluster, user must have an active Uni-ID account, for this palease contact to us by email (hpcsupport@taltech.ee) or webpage. In the case of using licensed programs, the user must also be added to the appropriate group. More about available programs and licenses can be found here.

The login-node of the cluster can be reached by SSH. SSH (the Secure SHell) is available using the command ssh in Linux/Unix, Mac and Windows-10. A guide for Windows users using PuTTY (an alternative SSH using a graphical user interface (GUI)) is here.

For accessing the cluster base.hpc.taltech.ee:

ssh uni-ID@base.hpc.taltech.ee

The cluster is accessible form inside the university and from major Estonian network providers. If you are traveling (or not on one of the major networks), the access requires either EduVPN/OpenVPN/FortiVPN (a config for manual configuration of OpenVPN can be generated at https://eduvpn.taltech.ee)or the use of a two-step login using a jump-host:

ssh -l uni-ID@intra.ttu.ee proksi.intra.ttu.ee
ssh uni-ID@base.hpc.taltech.ee

For using graphical applications add the -X switch to the SSH command, and for GLX (X Window System) forwarding additionally the -Y switch, so to be able to start a GUI program that uses GLX the connection command would be:

ssh -X -Y uni-ID@base.hpc.taltech.ee

NB! The login-node is for some light interactive analysis, do not do heavy computations here. For heavy computations, request a (interactive) session on a compute node from the resource manager/scheduler SLURM!

SSH fingerprints of host-keys

SSH key fingerprint is a security feature for easy identification/verification of the host user is connecting to. This option allows to connect to the server without a password. On first connect, user is shown a fingerprint of a host-key, and asked if it should be added to the list of known hosts.

Please compare the fingerprint to the ones below, if one matches, the host can be added, if the fingerprint does not match, then there is a problem (e.g. man-in-the-middle-attack).

SSH host keys of our servers

base.hpc.taltech.ee

ECDSA SHA256:OEfQiOB/eIG8hYoQ25sQk9T5tx9EtQbhi6sNM4C8mME
ED25519 SHA256:t0CSTU0AnSsJThzuM68tucrcfnn2wLKabjSnuRKX8Yc
RSA SHA256:qYrmOw/YN7wf640yBHADX3wnAOPu0OOXlcu4LKBxzG8

amp.hpc.taltech.ee

ECDSA SHA256:yl6+VaKow6qDZAXL3rQY8+3d3pcH0kYg7MjGgNVTWZs
ED25519 SHA256:YOjtpcEL2+AWm6vDFjVl0znYuQPMSVCkyFGvdO5fm8o
RSA SHA256:4aaOxumH1ATNfiIA4mZSNMefvxfdFm5zZoUj6VR7TYo

viz.hpc.taltech.ee

ECDSA SHA256:z2/bxleZ3T3vErkg4C7kvDPKKEU0qaoR8bL29EgMfGA
ED25519 SHA256:9zRBmS3dxD7BNISZKwg6l/2+6p4HeqlOhA4OMBjD9mk
RSA SHA256:Q6NDm88foRVTKtEAEexcRqPqMQNGUzf3rQdetBympPg

How to get SSH keys.

Structure and file tree

By accessing the cluster, the user gets into his home directory or $HOME (/gpfs/mariana/home/$USER/).

In the home directory, the user can create, delete, and overwrite files and perform calculations (if slurm script does not force program to use $SCRATCH directory). The home directory is limited in size of 2 TB.

NB! HPC is not intended for data storage and does not make regular backups of user’s data.

The home directory can be accessed from console or by GUI programs, but it cannot be mounted. For mounting was created special smbhome and smbgroup folders (/gpfs/mariana/smbhome/$USER/ and /gpfs/mariana/smbgroup//, respectively). More about smb folders can be found here.

Some programs and scripts suppose that files will be transfer to $SCRATCH directory. In this case user needs to know at which node this job was running, to connect to exactly this node (in example it is green11). $SCRATCH directory will be in /state/partition1/.

srun -w green11 --pty bash
cd /state/partition1/

Please note that the scratch is not shared between nodes, so parallel MPI jobs that span multiple nodes cannot access each other’s scratch files.

Running jobs with the SLURM

SLURM is a management and job scheduling system at Linux clusters. SLURM quick reference can be found here.

Examples of slurm scripts are usually given on the program’s page with some recommendations for optimal use of resources for this particular program.

The most often used SLURM commands are:

srun - to start a session or an application (in real time)
sbatch - to start a computation using a batch file (submit for later execution)
squeue - to check the load of the cluster and status of own jobs
sinfo - to check the state of the cluster and partitions
scancel - to delete a submitted job (or stop a running job)

For more parameters see the man-pages (manual) of the commands srun, sbatch, sinfo and squeue. For this use the command man followed by the program-name whose manual you want to see, e.g.:

man srun

Requesting resources with SLURM can be done either with parameters to srun or in a batch script invoked by sbatch.

The following defaults are used if not otherwise specified:

default memory - is 1 GB/thread (for larger jobs request more memory)
short partition default time limit is 10 min and max time limit is 4 hours (longer jobs need to be submitted to partitions common or one of the infiniBand partitions)
common partition default time is 10 min and max time limit is 15 days (for longer jobs implement a restart feature and submit dependent jobs)s

Requesting an interactive session (longer than 10 min, here 1 hour):

srun -t 01:00:00 --pty bash 

This logs you into one of the compute nodes, there you can load modules and run interactive applications, compile your code, etc.

With srun is reccomended to use CLI (command-line interface) instead of GUI (Graphical user interface) programs if it is possible. For example, use octave-cli or octave instead of octave-gui.

Running a simple non-interactive single process job that lasts longer than 4 hours:

srun --partition=common -t 05:00:00 -n 1 ./a.out

NB! Environment variables for OpenMP are not set automatically, e.g.

srun  -N 1 --cpus-per-task=28 ./a.out

would not set OMP_NUM_THREADS to 28, this has to be done manually. So usually, for parallel jobs it is recommended to use scripts for sbatch.

Below is given an example of batch slurm script (filename: myjob.slurm) with explanation of the commands.

#!/bin/bash
#SBATCH --partition=common    ### Partition
#SBATCH --job-name=HelloOMP   ### Job Name           -J
#SBATCH --time=00:10:00       ### WallTime           -t
#SBATCH --nodes=4             ### Number of Nodes    -N 
#SBATCH --ntasks-per-node=7   ### Number of tasks (MPI processes)
#SBATCH --cpus-per-task=4     ### Number of threads per task (OMP threads)
#SBATCH --account=hpcrcf      ### In case of several accounts, specifies account used for job submission
#SBATCH --mem-per-cpu=100     ### Min RAM required in MB
#SBATCH --array=13-18         ### Array tasks for parameter sweep

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK		### setup environment
module load gcc					### setup environment
./hello_omp $SLURM_ARRAY_TASK_ID			### only for arrays, setup output files with system information
mpirun -n 28 ./hello_mpi 				### run program

In this example are listed some of the more common submission parameters. There are many more possible job-submission options, moreover, some of the options listed above are not useful to apply together. An explanation of the variables used inside SLURM/SBATCH can be found here. In contrast to e.g. GridEngine, SLURM allows fine-grained resource requests, using parameters like --ntasks-per-core or --ntasks-per-node.

An example script for submitting:

a single process job
an OpenMP parallel job
an MPI parallel job (OpenFOAM)
an array (parameter sweep) job
a GPU job
a job using the scratch partition (sequential or OpenMP parallel)

The job is then submitted to SLURM by

sbatch myjob.slurm

and will be executed when the requested resources become available.

Output of applications that would normally be written to STDOUT is written to a file, also error messages are written to a file (default output file is slurm-$job_id.log). More about SLURM finished job statistics can be found here.

Some useful online resources:

SLURM scheduler workload manager
Victor Eijkhout: Introduction to High-Performance Scientific Computing
Charles Severance, Kevin Dowd: High Performance Computing
OpenMP standard
MPI standard
SLURM Quick Reference (Cheat Sheet)

Monitoring jobs & resources

Monitoring a job on the node

Status of jobs, whether they are running or not, and on which node can be seen with the command:
```
squeue -u <username>
```
You can check the load of the node your job runs on, its status and configuration by using
```
scontrol show node <nodename>
```
the load should not exceed the number of hyperthreads (CPUs in SLURM notation) of the node.

In case of MPI parallel runs statistics of several nodes can be monitored by specifying nodes names. For example:
```
scontrol show node=green[25-26]
```
It is possible to submit a second (this time) interactive job to the node where the main job is running, check with squeue where your job is running, then submit
```
	srun -w <nodename> --pty htop
```
Note that there must be free slots on the machine, so if you cannot use -n 80 or --exclusive for your main job (use -n 79).

To monitor GPUs, you can use
```
	srun -w <nodename> --pty watch nvidia-smi
```

An alternative method on Linux computers, if you have X11. Logging to base/amp with --X key:

ssh --X UniID@base.hpc.taltech.ee

then submit your main interactive job

srun --x11 -n <numtasks> --cpus-per-task=<numthreads> --pty bash

and start an xterm -e htop & in the session.

In sbatch the option --x11=batch can be used, note that the ssh session to vase needs to stay open!

Monitoring resource usage

Default disc quotas for both home and smbhome are 2 TB per user. For smbgroup there is no limits. You can monitor you resource usage by mmlsquota and sreport commands.

Current disk usage:

/usr/lpp/mmfs/bin/mmlsquota --block-size=auto

CPU usage during last day:

sreport -t Hours cluster UserUtilizationByAccount Users=$USER

CPU usage in specific period:

sreport -t Hours cluster UserUtilizationByAccount Users=$USER start=2021-01-01T00:00:00 end=2022-02-01T00:00:00

Where start= and end= can be changed depending on the desired period of time.

Copying data to/from the clusters

You can copy your data to the cluster by using scp, sftp, sshfs or rsync.

scp is available on all Linux systems, Mac and Windows10 PowerShell. There are also GUI versions available for different OS (like PuTTY).

Copying to the cluster with scp:
```
scp local_path_from_where_to_copy_file/file_name uni-id@base.hpc.taltech.ee:path_to_where_to_save
```
Copying from the cluster with scp:
```
scp uni-id@base.hpc.taltech.ee:path_from_where_to_copy/file_name local_path_to_where_to_save 
```
Path to the file at HPC can be checked by pwd command.
sftp is the secure version of the ftp protocol. This command starts a session, in which files can be transmitted in both directions using the get and put commands. File transfer can be done in “binary” or “ascii” mode, conversion of line-endings (see below) is automatic in “ascii” mode. There are also GUI versions available for different OS (FileZilla, gFTP and WinSCP (Windows))
```
sftp uni-id@base.hpc.taltech.ee
```
sshfs can be used to temporarily mount remote filesystems for data transfer or analysis. The data is tunneled through an ssh-connection. Be sware that this is usually not performant and can creates high load on the login node due to ssh-encryption.
```
sshfs uni-id@base.hpc.taltech.ee:remote_dir/ /path_to_local_mount_point/
```
rsync can update files if previous versions exist without having to transfer the whole file. However, its use is recommended for the advanced user only, since one has to be careful with the syntax.

SMB/CIFS exported filesystems

One of the simple and convenient ways to control and process data based on HPC is mounting. Mounting means that user attaches his directory placed at HPC to a directory on his computer and can process files as if they were on this computer. These can be accessed from within university or from EduVPN.

Each user automatically has a directory within smbhome. It does not match with $HOME directory, so calculations should be initially done at smbhome directory to prevent copying or files needed should be copied from home directory to the smbhome directory by commands:

pwd	### look path to the file 
cp path_to_your_file/your_file /gpfs/mariana/smbhome/$USER/	### copying

To get a directory for group access, please contact us (a group and a directory need to be created).

The HPC center exports two filesystems as Windows network shares:

local path on cluster	Linux network URL	Windows network URL
/gpfs/mariana/smbhome/$USER	smb://smb.hpc.taltech.ee/smbhome	\\smb.hpc.taltech.ee\smbhome
/gpfs/mariana/smbgroup	smb://smb.hpc.taltech.ee/smbgroup	\\smb.hpc.taltech.ee\smbgroup
/gpfs/mariana/home/$USER	not exported	not exported

This is the quick-access guide, for more details, see here

Windows access

The shares can be found using the Explorer “Map Network Drive”.

server >>> \\smb.hpc.taltech.ee\smbhome
username >>> INTRA\<uni-id>

From Powershell:

 net use \\smb.hpc.taltech.ee\smbhome /user:INTRA\uni-id
 get-smbconnection

Linux access

On Linux with GUI Desktop, the shares can be accessed with the nautilus browser.

From commandline, the shares can be mounted as follows:

dbus-run-session bash
gio mount smb://smb.hpc.taltech.ee/smbhome/

you will be asked for “User” (which is your UniID), “Domain” (which is “INTRA”), and your password.

To disconnect from the share, unmount with

gio mount -u smb://smb.hpc.taltech.ee/smbhome/

Special considerations for copying Windows - Linux

Microsoft Windows is using a different line ending in text files (ASCII/UTF8 files) than Linux/Unix/Mac: CRLF vs. LF When copying files between Windows-Linux, this needs to be taken into account. The FTP (File Transfer Protocol) has ASCII and BINARY modes, in ASCII-mode the line-end conversion is automatic.

There are tools for conversion of the line-ending, in case the file was copied without line conversion: dos2unix, unix2dos, todos, fromdos, the stream-editor sed can also be used.