HPC Center user guides

_images/HPC.jpg

The use of the resources of the TalTech HPC Centre requires an active Uni-ID account (a procedure for non-employees/non-students can be found here (in Estonian)), further the user needs to be added to the HPC-USERS group, please ask hpcsupport@taltech.ee to activate HPC access (from your UniID e-mail account).

The cluster has a Linux operating system (based on CentOS; Debian or Ubuntu on special purpose nodes) and uses SLURM as a batch scheduler and resource manager. Linux is the dominating operating system used for scientific computing and of now is the only operating system present in the Top500 list (a list of the 500 most powerful computers in the world).

Linux command-line knowledge is essential for using the cluster. By learning Linux and using the TalTech clusters also necessary skills for accessing one of the international supercomputing centers (e.g. LUMI or any of the PRACE centers) are acquired.

Hardware Specification

  • base.hpc.taltech.ee is the new cluster environment all nodes from HPC1 and HPC2 will be migrated here
    • SLURM v20 scheduler, a live load diagram

    • home directory file system has 1.5 PB storage, with a 2 TB/user quota

    • 32 green nodes (former hpc2.ttu.ee nodes), 2 x Intel Xeon Gold 6148 20C 2.40 GHz, 96 GB DDR4-2666 R ECC RAM (green[1-32]), 25 Gbit Ethernet, 18 of these FDR InfiniBand (green-ib partition)

    • 48 gray nodes (former hpc.ttu.ee nodes, migration in progress), 2 x Intel Xeon E5-2630L 6C with 64 GB RAM and 1 TB local drive, 1 Gbit Ethernet, QDR InfiniBand (gray-ib partition)

    • 1 mem1tb large memory node, 1TB RAM, 4x Intel Xeon CPU E5-4640 (together 32 cores, 64 threads)

    • amp GPU nodes, specific guide for amp (amp.hpc.taltech.ee), amp1: 8xNvidia A100/40GB, 2x 64core AMD EPYC 7742 (together 128 cores, 256 threads), 1 TB RAM; amp2: 8xNvidia A100/80GB, 2x 64core AMD EPYC 7742 (together 128 cores, 256 threads), 2 TB RAM

    • viz.hpc.taltech.ee Visualization node (accessible within University network and FortiVPN), 2x nVidia Tesla K20Xm grapic cards (on displays :0.0 and :0.1)

    • SLURM partitions:

    • short: (default) time limit 2 hours, default time 10 min, default mem 1 GB/thread, green nodes

    • common: time limit 15 days, default time 10 min, default mem 1 GB/thread, green nodes

    • green-ib: time limit 15 days, default time 10 min, default mem 1 GB/thread, green InfiniBand nodes

    • gray-ib: time limit 8 days, default time 10 min, default mem 1 GB/thread, gray InfiniBand nodes

    • gpu: amp GPU node, time limit 5 days, default time 10 min, default mem 1 GB/thread

    • mem1tb: mem1tb node

  • TalTech ETAIS Cloud: 4 node OpenStack cloud
    • 5 compute (nova) nodes with 768GB of RAM and 80 threads each

    • 65 TB CephFS storage (net capacity)

    • accessible through the ETAIS website: https://etais.ee/using/

Contents: