WDF Cluster Guide

From RCSWiki
Jump to navigation Jump to search

Security Icon.png

Cybersecurity awareness at the U of C

Please note that there are typically about 950 phishing attempts targeting University of Calgary accounts each month. This is just a reminder to be careful about computer security issues, both at home and at the University. Please visit https://it.ucalgary.ca/it-security for more information, tips on secure computing, and how to report suspected security problems.

Support Icon.png

Need Help or have other WDF Cluster Resource Related Questions?

For all general RCS related issues, questions, or comments, please contact us at support@hpc.ucalgary.ca.

This guide gives an overview of the Western Diversification Fund (WDF) compute clusters at the University of Calgary and is intended to be read by new account holders getting started on WDF systems. This guide covers topics such as the hardware and performance characteristics, available software, usage policies and how to log in and run jobs.

Introduction

WDF compute systems which are provisioned from nodes in the ARC compute cluster can be used for running large numbers (hundreds) of concurrent serial (one core) jobs, OpenMP or other thread-based jobs, shared-memory parallel code using up to 48 threads per job (depending on the partition), distributed-memory (MPI-based) parallel code using up to hundreds of cores, or jobs that take advantage of Graphics Processing Units (GPUs). WDF systems also get their own login node and partition. Almost all work on WDF systems is done through a command line interface. This computational resource is available for approved projects based at the University of Calgary.

How to Get Started

WDF systems and Accounts will be created by RCS after a contract is signed.

Hardware

WDF Systems are generally comprised of nodes that are in the cpu2021 partition when they are not needed for WDF workloads. These nodes have 48 cores and 187GB of memory. They have access to /home, /scratch, and /work/projectname filesystems but not /global/software.

Partition Hardware Specs

When submitting jobs on WDF systems, they will only have access to partitions named wdf-projectname.

Cluster Storage

You will have access to only 3 filesystems:

  1. /home - where job scripts and software only used by one person go
  2. /scratch - 5 day after job completes, this filesystem could be cleaned up. Only for temporary files.
  3. /work/projectname - a "projectname" directory will be created that reflects the name of the WDF project/partition/login node.

{{Message Box | title=No Backup Policy! | message=You are responsible for your own backups. Please contact support@hpc.ucalgary.ca for guidance

Use the arc.quota command on ARC to determine the available space on your various volumes and home directory.

Partition Description Capacity
/home User home directories 500 GB (per user)
/work Research project storage Up to 100's of TB
/scratch Scratch space for temporary files Up to 30 TB
/tmp Temporary space local to the compute cluster Dependent on nodes, use df -h.
/dev/shm Small temporary in-memory disk space local to the compute cluster Dependent on nodes, use df -h.

/home: Home file system

Each user has a directory under /home and is the default working directory when logging in to ARC. Each home directory has a per-user quota of 500 GB. This limit is fixed and cannot be increased. Researchers requiring additional storage exceeding what is available on their home directory may use /work and /scratch.

Note on file sharing: Due to security concerns, permissions set using chmod on your home directory to allow other users to read/write to your home directory be automatically reverted by an automated system process unless an explicit exception is made. If you need to share files with other researchers on the ARC cluster, please write to support@hpc.ucalgary.ca to ask for such an exception.

/scratch: Scratch file system for large job-oriented storage

Associated with each job, under the /scratch directory, a subdirectory is created that can be referenced in job scripts as /scratch/${SLURM_JOB_ID}. You can use that directory for temporary files needed during the course of a job. Up to 15 TB of storage may be used (up to 1M files), per user (total for all your jobs) in the /scratch file system.

Data in /scratch associated with a given job will be deleted automatically, without exception, five days after the job finishes.

/work: Work file system for larger projects

If you need more space than provided in /home and the /scratch job-oriented space is not appropriate for you case, please write to support@hpc.ucalgary.ca with an explanation, including an indication of how much storage you expect to need and for how long. If approved, you will then be assigned a directory under /work with an appropriately large quota.

/tmp, /dev/shm: Temporary files

You may use /tmp for temporary files generated by your job. The /tmp is stored on a disk local to the compute node and is not shared across the cluster. The files stored here may be removed immediately after your job terminates.

/dev/shm is similar to /tmp but the storage is backed by virtual memory for higher IOPS. This is ideal for workloads that require many small read/writes to share data between processes or as a fast cache. The files stored here may be removed immediately after your job terminates.

Using The System

Logging in

To log in to your WDF system, connect using SSH to projectname.rcs.ucalgary.ca on port 22. Connections are accepted only from the University of Calgary network (on campus) or through the University of Calgary General VPN (off campus).

See Connecting to RCS HPC Systems for more information.

How to interact with the system

On your WDF system, computations get submitted as jobs. Once submitted, the jobs are then assigned to compute nodes by the job scheduler as resources become available. Cluster.png

You can access the system with your UCalgary IT user credentials. Once connected, you will get placed in the "projectname" login node, for basic tasks such as job submission, monitor job status, manage files, edit text, etc. It is a shared resource where multiple users get connected at the same time. Thus, any intensive tasks is not allowed on the login node as it may block other potential users to connect/submit their computations.

The job scheduling system is called SLURM. There are two SLURM commands that can allocate resources to a job under appropriate conditions: ‘salloc’ and ‘sbatch’. They both accept the same set of command line options with respect to resource allocation. 

‘salloc’ is to launch an interactive session, typically for tasks under 5 hours. Once an interactive job session is created, you can do things like explore research datasets, start R or python sessions to test your code, compile software applications etc.

a. Example 1: The following command requests for 1 cpu on 1 node for 1 task along with 1 GB of RAM for an hour. 

         [tannistha.nandi@wdf ~]$ salloc --mem=1G -c 1 -N 1 -n 1  -t 01:00:00
         salloc: Granted job allocation 6758015
         salloc: Waiting for resource configuration
         salloc: Nodes fc4 are ready for job
         [tannistha.nandi@fc4 ~]$ 


Once you finish the work, type 'exit' at the command prompt to end the interactive session,

        [tannistha.nandi@fg3 ~]$ exit
        [tannistha.nandi@fg3 ~]$ salloc: Relinquishing job allocation 6760460

It is to ensure that the allocated resources are released from your job and now available to other users.

‘sbatch’ is to submit computations as jobs to run on the cluster. You can submit a job-script.slurm via 'sbatch' for execution.

        [tannistha.nandi@wdf ~]$ sbatch job-script.slurm

When resources become available, they get allocated to this task. Batch jobs are suited for tasks that run for long periods of time without any user supervision. When the job-script terminates, the allocation is released. Please review the section on how to prepare job scripts for more information.

Prepare job scripts

Job scripts are text files saved with an extension '.slurm', for example, 'job-script.slurm'. A job script looks something like this:

   #!/bin/bash
   ####### Reserve computing resources #############
   #SBATCH --nodes=1
   #SBATCH --ntasks=1
   #SBATCH --cpus-per-task=1
   #SBATCH --time=01:00:00
   #SBATCH --mem=1G
   #SBATCH --partition=cpu2019
####### Set environment variables ############### module load python/anaconda3-2018.12
####### Run your script ######################### python myscript.py

The first line contains the text "#!/bin/bash" to interpret it as a bash script.

It is followed by lines that start with a '#SBATCH' to communicate with 'SLURM'. You may add as many #SBATCH directives as needed to reserve computing resources for your task. The above example requests for one cpu on a single node for 1 task along with 1GB RAM for an hour on cpu2019 partition.

Next, you have to set up environment variables either by loading the modules centrally installed or export path to the software in your home directory. The above example loads an available python module.

Finally, include the Linux command to execute the local script.

Note that failing to specify part of a resource allocation request (most notably time and memory) will result in bad resource requests as the defaults are not appropriate to most cases. Please refer to the section 'Running non-interactive jobs' for more examples.

Software

All nodes run the same version of Linux with the same set of base software packages. To maintain the stability and consistency of all nodes, any additional dependencies that your software requires must be installed under your account or /work/projectname.

Please contact us at support@hpc.ucalgary.ca if you need help installing software.

Storage

Please review the Storage section above for important policies and advice regarding file storage and file sharing.

Interactive Jobs

The login node may be used for such tasks as editing files, compiling programs and running short tests while developing programs. We suggest CPU intensive workloads on the login node be restricted to under 15 minutes as per our cluster guidelines. For interactive workloads exceeding 15 minutes, use the salloc command to allocate an interactive session on a compute node.

The default salloc allocation is 1 CPU and 1 GB of memory. Adjust this by specifying -n CPU# and --mem Megabytes. You may request up to 5 hours of CPU time for interactive jobs.

salloc --time 5:00:00 --partition cpu2019

Always use salloc or srun to start an interactive job. Do not SSH directly to a compute node as SSH sessions will be refused without an active job running.

Running non-interactive jobs (batch processing)

Production runs and longer test runs should be submitted as (non-interactive) batch jobs, in which commands to be executed are listed in a script (text file). Batch jobs scripts are submitted using the sbatch command, part of the Slurm job management and scheduling software. #SBATCH directive lines at the beginning of the script are used to specify the resources needed for the job (cores, memory, run time limit and any specialized hardware needed).

Most of the information on the Running Jobs (external link) page on the Compute Canada web site is also relevant for submitting and managing batch jobs and reserving processors for interactive work on ARC. One major difference between running jobs on the ARC and Compute Canada clusters is in selecting the type of hardware that should be used for a job. On ARC, you choose the hardware to use primarily by specifying a partition, as described below.

Selecting a Partition

There are some aspects to consider when selecting a partition including:

  • Resource requirements in terms of memory and CPU cores
  • Hardware specific requirements, such as GPU or CPU Instruction Set Extensions
  • Partition resource limits and potential wait time
  • Software support parallel processing using Message Passing Interface (MPI), OpenMP, etc.
    • Eg. MPI for parallel processing can distribute memory across multiple nodes, per-node memory requirements could be lower. Whereas, OpenMP or single process code that is restricted to one node would require a higher memory node.
    • Note: MPI code running on hardware with Omni-Path networking should be compiled with Omni-Path networking support. This is provided by loading the openmpi/2.1.3-opa or openmpi/3.1.2-opa modules prior to compiling.

Since resources that are requested are reserved for your job, please request only as much CPU and memory as your job requires to avoid reducing the cluster efficiency. If you are unsure which partition to use or the specific resource requests that are appropriate for your jobs, please contact us at support@hpc.ucalgary.ca and we would be happy to work with you.

Support

Support Icon.png

Need Help or have other ARC Related Questions?

For all general RCS related issues, questions, or comments, please contact us at support@hpc.ucalgary.ca.

Please don't hesitate to contact us directly by email if you need help or require guidance on migrating and running your workflows.