HPC Center user guides

_images/HPC.jpg

The use of the resources of the TalTech HPC Centre requires an active Uni-ID account (please ask at hpcsupport@taltech.ee to activate access), a procedure for non-employees/non-students can be found here (in Estonian), further the user needs to be added to the HPC-USERS group, please ask hpcsupport@taltech.ee to activate HPC access (from your UniID e-mail account). In the case of using licensed programs, the user must also be added to the appropriate group. More about available programs and licenses.

The cluster has a Linux operating system (based on CentOS; Debian or Ubuntu on special purpose nodes) and uses SLURM as a batch scheduler and resource manager. Linux is the dominating operating system used for scientific computing and of now is the only operating system present in the Top500 list (a list of the 500 most powerful computers in the world).

Linux command-line knowledge is essential for using the cluster. By learning Linux and using the TalTech clusters also necessary skills for accessing one of the international supercomputing centers (e.g. LUMI or any of the PRACE centers) are acquired.





Hardware Specification


TalTech ETAIS Cloud: 4 node OpenStack cloud
  • 5 compute (nova) nodes with 768GB of RAM and 80 threads each

  • 65 TB CephFS storage (net capacity)

  • accessible through the ETAIS website: https://etais.ee/using/

base.hpc.taltech.ee is the new cluster environment all nodes from HPC1 and HPC2 will be migrated here
  • SLURM v20 scheduler, a live load diagram

  • home directory file system has 1.5 PB storage, with a 2 TB/user quota

  • 32 green nodes (former hpc2.ttu.ee nodes), 2 x Intel Xeon Gold 6148 20C 2.40 GHz, 96 GB DDR4-2666 R ECC RAM (green[1-32]), 25 Gbit Ethernet, 18 of these FDR InfiniBand (green-ib partition)

  • 48 gray nodes (former hpc.ttu.ee nodes, migration in progress), 2 x Intel Xeon E5-2630L 6C with 64 GB RAM and 1 TB local drive, 1 Gbit Ethernet, QDR InfiniBand (gray-ib partition)

  • 1 mem1tb large memory node, 1TB RAM, 4x Intel Xeon CPU E5-4640 (together 32 cores, 64 threads)

  • amp GPU nodes, specific guide for amp, amp1: 8xNvidia A100/40GB, 2x 64core AMD EPYC 7742 (together 128 cores, 256 threads), 1 TB RAM; amp2: 8xNvidia A100/80GB, 2x 64core AMD EPYC 7742 (together 128 cores, 256 threads), 2 TB RAM

  • viz.hpc.taltech.ee Visualization node (accessible within University network and FortiVPN), 2x nVidia Tesla K20Xm grapic cards (on displays :0.0 and :0.1)




SLURM partitions


partition

default time

time limit

default memory

nodes

short

10 min

2 hours

1 GB/thread

green

common

10 min

8 days

1 GB/thread

green

green-ib

10 min

8 days

1 GB/thread

green

long

10 min

15 days

1 GB/thread

green

gray-ib

10 min

8 days

1 GB/thread

gray

gpu

10 min

5 days

1 GB/thread

amp

mem1tb

mem1tb




Contents: