Florian Kurpicz
Tutorial

Working with the LiDO3 Cluster

LiDO3 is the high performance cluster located at TU Dortmund. In this short tutorial, we give an introduction on how to run shared memory experiments on the cluster. A part on distributed memory computation (and even hybrid approaches) will be added later. Since LiDO3 uses the workload manager Slurm, some information in this tutorial is applicable to other clusters (if configured similarly).

Preliminaries

First thing we need is an user account. To this end, we need an SSH Key Pair. We assume that the private key is called lido3_key and the corresponding user account is named smmamust.

Log Into the Cluster

There are two gateway nodes gw01 and gw02 that can be used as entry point. In this tutorial, we are going to use gw2, but it doesn't matter, as the only reason for two gateways is redundancy; sometimes one of the gateways is down for service. We can log into the cluster with the following command.

ssh -i .ssh/lido3_key smmamust@gw02.lido.tu-dortmund.de

Note that you have to be in the in the university network (use a VPN or SSH if you are not physically at the TU Dortmund).

Since the command is quite long, we should configure an SSH Alias by extending the ~/.ssh/config file with

Host lido3gw02
     HostName gw02.lido.tu-dortmund.de
     User smmamust
     IdentityFile ~/.ssh/lido3_key

Obviously, the same can be done for gateway node gw01. Now, we can log into the gateway node gw02 with ssh lido3gw02, which prompts us for the passphrase for the lido3_key key.

After the Login

We are greeted with a lot of information about the cluster: the last login date, the logo, the current version of the operating system, some notes, a change log, which shows installed and removed software, and finally and most importantly our disk quotas:

Disk quotas for user smmamust (uid XXXXX):
                 --    disk space     --
Filesystem       limit  used avail  used
/home/smmamust     32G  3.7G   29G   12%
/work/smmamust      1T  500G  500G   50%

/home and /work

There are two directories that we can use: home and work. The first thing we notice is that home is significantly smaller than work. This is because home is backuped and can be restored in case of a system failure---there are no backups for work. Hence, important data should always we written to home. However, when we run a job in the cluster, home is mounted read-only, thus we cannot write to it directly. More on this topic later, when we start our first jobs.

Adding Software

A lot of commonly used software, e.g., compilers, version control systems, build tools, and many more, are available as modules in many different versions. To see a list of all software that is available, we can use the command module avail. Note that the module names consist of a name and a version number. When we want to load module, we can do so with module add [module name]. Removing a module can be done with module rm [module name]. If we want to exchange a module for another version, we do not have to first remove the old version and then add the new version, we simply can use module swap [module name old] [module name new].

Since added modules are only valid for the current shell environment, we would have to load them each time we log into the cluster and start jobs. An easy alternative is to load them via the .bashrc. Let's assume we want to use git to manage our source code, compile it using the GCC, and use cmake as build tool, then we would add the following lines to .bashrc:

module add git/2.22.0
module add gcc/7.3.0
module add cmake/3.8.2

Starting our First Jobs

Before we take a run our first job at the cluster, we have a look at the different nodes and queues the cluster has to offer.

Nodes and Queues

There are five different types of nodes of which the cluster contains:

  1. 244 cstd01 are the standard nodes with 20 cores and 64 GB RAM,
  2. 72 cstd02 have the same configuration as cstd01, but they are connected differently (1:1 instead of 1:3, which is only relevant for distributed experiments),
  3. 28 cquad01 are bigger nodes with 48 cores and 256 GB RAM,
  4. 2 cquad02 are even bigger with 48 cores and 1024 GB RAM, and finally
  5. 20 tesla_k40 are the same as cstd01, but have two NVidia K40 graphics cards included.

In addition to the different nodes, there are different queues that are also called Slurm Partitions, which allow to request node for different times. There are short, med, long, and ultralong queues that allow for a maximum wall time of 2 hours, 8 hours, 2 days, and 28 days, respectively. It should be noted that the cstd02 and tesla_k40 nodes cannot be used in the ultralong queue. In general, the more nodes of a type are available and the shorter the running time of the job is, the faster it gets executed. We can get an overview over all nodes and queues with sinfo.

Slurm Script for Shared Memory Programs

Now, we are ready to start our first job at the cluster. Since this is only a small example, we will run the following small C++ program shared.cpp, which we store in /home/smmamust/example/.

#include <omp.h>
#include <cstdint>
#include <iostream>

int32_t main() {

  #pragma omp parallel
  {
    int32_t const rank { omp_get_thread_num() };
    #pragma omp critical
    std::cout << "Hello world from rank " << rank << std::endl;
  }
  return int32_t { 0 };
}

We compile the program using g++ version 7.3.0 with g++ -fopenmp shared.cpp -o shared. Next, we want to run the program. To this end, we write a Slum script that the Slurm workload manager will schedule. So, let us write example.sh and start with the following lines.

#!/bin/bash -l
#SBATCH --time=02:00:00      # The maximum wall time of this job is two hours.
#SBATCH --partition=short    # Hence, we can use the short queue.
#SBATCH --nodes=1            # We want exactly one node ...
#SBATCH --constraint=cquad01 # ... with 48 cores and 256 GB RAM ...
#SBATCH --mem=250000         # ... of which we use 250 GB.
#SBATCH --exclusive          # In addition, we want to be the only user on the node.
#SBATCH --ntasks-per-node=48 # We do not want to have more than 48 tasks running on the node,
#SBATCH --cpus-per-task=1    # and we want each task running on its own (exclusive) core.
#SBATCH --output=/work/smmamust/lido_example_%j.dat # Location, where output is collected.
                                                    # %j will be replaced with the job id.

This is all the configuration that we use for the job in our example. More parameters are described in a LiDO3 example and the Slurm documentation. To run our program, we have to extend call it in our Slurm script, by extending it with the following lines.

cd /home/smmamust/example # Go to the directory, where the executable is stored.
./shared                  # Run it.

That's all we need for this small example. To schedule it, we simply run sbatch example.sh.

Viewing Scheduled Jobs

We can get a list of all our scheduled and currently running jobs using squeue -u smmamust. If we want to know when the scheduled jobs are going to start (latest possible date) we use squeue -u smmamust --start. To cancel a job, we use scancel [job id], or scancel -u smmamust to cancel all our scheduled and running jobs.

When we see that our job has terminated (it isn't listed in the list of jobs anymore, or if we get a mail notification, which we need to configure), we can see the result in our output file. To configure the mail notification, we extend the example.sh with the following.

#SBATCH --mail-user=[mail@example.org]
#SBATCH --mail-type=[ALL, BEGIN, END, FAIL, NONE]