ARC Cluster Guide: Difference between revisions

From RCSWiki
Jump to navigation Jump to search
mNo edit summary
 
(68 intermediate revisions by 7 users not shown)
Line 1: Line 1:
This guide gives an overview of the ARC (Advanced Research Computing) cluster at the University of Calgary.
{{ARC Cluster Status}}


It is intended to be read by new account holders getting started on ARC, covering such topics as the hardware and performance characteristics, available software, usage policies and how to log in and run jobs.  
{{Message Box
|title=[[Support|Need Help or have other ARC Related Questions?]]
|message=For all general RCS related issues, questions, or comments, please contact us at support@hpc.ucalgary.ca.
|icon=Support Icon.png}}


For ARC-related questions not answered here, please write to support@hpc.ucalgary.ca.
This guide gives an overview of the Advanced Research Computing (ARC) cluster at the University of Calgary and is intended to be read by new account holders getting started on ARC. This guide covers topics such as the hardware and performance characteristics, available software, usage policies and how to log in and run jobs. ARC can be used with data that a Researcher has classified as Lv1 and Lv2 as described in the UCalgary [https://www.ucalgary.ca/legal-services/sites/default/files/teams/1/Standards-Legal-Information-Security-Classification-Standard.pdf Information Security Classification Standard]


'''Cybersecurity awareness at the U of C'''
== Introduction ==
The ARC is a high performance compute (HPC) cluster that is available for research projects based at the University of Calgary. This compute cluster is comprised of hundreds of severs interconnected with a high bandwidth interconnect. Special resources within the cluster include nodes with large memory installed and GPUs are also available. You may learn more about ARC's hardware in the [[ARC Cluster Guide#Hardware|hardware section below]]. ARC can be accessed through a [[Linux Introduction|command line interface]] or via a web interface called Open OnDemand.


Please note that there are typically about 950 phishing attempts targeting University of Calgary accounts each month. This is just a reminder to be careful about computer security issues, both at home and at the University. See [https://it.ucalgary.ca/it-security] for more information, such as tips for secure computing and how to report suspected security problems.
This cluster can be used for running large numbers (hundreds) of concurrent serial (one core) jobs, OpenMP or other thread-based jobs, shared-memory parallel code using up to 40 or 80 threads per job (depending on the partition), distributed-memory (MPI-based) parallel code using up to hundreds of cores, or jobs that take advantage of Graphics Processing Units (GPUs).
 
== Introduction ==
The ARC compute cluster can be used for running large numbers (hundreds) of concurrent serial (one core) jobs, OpenMP or other thread-based jobs, shared-memory parallel code using up to 40 or 80 threads per job (depending on the partition), distributed-memory (MPI-based) parallel code using up to hundreds of cores, or jobs that take advantage of Graphics Processing Units (GPUs). This computational resource is available for research projects based at the University of Calgary and is meant to supplement the resources available to researchers through Compute Canada.


Historically, ARC is primarily comprised of older, disparate Linux-based clusters that were formerly offered to researchers from across Canada such as Breezy, Lattice, and Parallel.  In addition, a large-memory compute node (Bigbyte) was salvaged from the now-retired local Storm cluster. In January 2019, a major addition to ARC with modern hardware was purchased. In 2020, compute clusters from CHGI have been migrated into ARC.
Historically, ARC is primarily comprised of older, disparate Linux-based clusters that were formerly offered to researchers from across Canada such as Breezy, Lattice, and Parallel.  In addition, a large-memory compute node (Bigbyte) was salvaged from the now-retired local Storm cluster. In January 2019, a major addition to ARC with modern hardware was purchased. In 2020, compute clusters from CHGI have been migrated into ARC.


=== How to Get Started ===
=== How to Get Started ===
If you have a project you think would be appropriate for ARC, please write to support@hpc.ucalgary.ca and mention the intended research and software you plan to use.
If you have a project you think would be appropriate for ARC, please email support@hpc.ucalgary.ca and mention the intended research and software you plan to use. You must have a University of Calgary IT account in order to use ARC.
 
Access to ARC will be granted to your University of Calgary IT account.
* For users that do not have a University of IT account or email address, please register for one at https://itregport.ucalgary.ca/.
* For users that do not have a University of IT account or email address, please register for one at https://itregport.ucalgary.ca/.
* For users external to the University, such as for users collaborating on a research project at the University of Calgary, please contact us and mention the project leader you are collaborating with.
* For users external to the University, such as for users collaborating on a research project at the University of Calgary, please contact us and mention the project leader you are collaborating with.


Once your access to ARC has been granted, you will be able to immediately make use of the cluster by following the [[ARC_Cluster_Guide#Using_ARC|usage guide outlined below]].
Once your access to ARC has been granted, you will be able to immediately make use of the cluster using your University of Calgary IT account by following the [[ARC_Cluster_Guide#Using_ARC|usage guide outlined below]].
 
== Using ARC ==
 
{{Message Box
|icon=Security Icon.png
|title=Cybersecurity awareness at the U of C
|message=Please note that there are typically about 950 phishing attempts targeting University of Calgary accounts each month. This is just a reminder to be careful about computer security issues, both at home and at the University. Please visit https://it.ucalgary.ca/it-security for more information, tips on secure computing, and how to report suspected security problems.}}
 
=== Logging in ===
To log in to ARC, connect using SSH to <code>arc.ucalgary.ca</code> on port <code>22</code>. Connections to ARC are accepted only from the University of Calgary network (on campus) or through the University of Calgary General VPN (off campus).
 
See [[Connecting to RCS HPC Systems]] for more information.
=== How to interact with ARC ===
 
ARC cluster is a collection of several compute nodes connected by a high-speed network. On ARC, computations get submitted as jobs. Once submitted, the jobs are then assigned to compute nodes by the job scheduler as resources become available.
 
[[File:Cluster.png]]
 
You can access ARC with your UCalgary IT user credentials. Once connected, you will get placed in the ARC login node, for basic tasks such as job submission, monitor job status, manage files, edit text, etc. It is a shared resource where multiple users get connected at the same time. Thus, any intensive tasks is not allowed on the login node as it may block other potential users to connect/submit their computations.
        [tannistha.nandi@arc ~]$ 
The job scheduling system on ARC is called SLURM.  On ARC, there are two SLURM commands that can allocate resources to a job under appropriate conditions: ‘salloc’ and ‘sbatch’. They both accept the same set of command line options with respect to resource allocation. 
 
'''‘salloc’''' is to launch an interactive session, typically for tasks under 5 hours.
Once an interactive job session is created, you can do things like explore research datasets, start R or python sessions to test your code, compile software applications etc.
 
a. Example 1: The following command requests for 1 cpu on 1 node for 1 task along with 1 GB of RAM for an hour. 
          [tannistha.nandi@arc ~]$ salloc --mem=1G -c 1 -N 1 -n 1  -t 01:00:00
          salloc: Granted job allocation 6758015
          salloc: Waiting for resource configuration
          salloc: Nodes fc4 are ready for job
          [tannistha.nandi@fc4 ~]$
 
 
b. Example 2:  The following command requests for 1 GPU to be used from 1 node belonging to the gpu-v100 partition along with 1 GB of RAM for 1 hour.  Generic resource scheduling (--gres) is used to request for GPU resources.
        [tannistha.nandi@arc ~]$ salloc --mem=1G -t 01:00:00 -p gpu-v100 --gres=gpu:1
        salloc: Granted job allocation 6760460
        salloc: Waiting for resource configuration
        salloc: Nodes fg3 are ready for job
        [tannistha.nandi@fg3 ~]$
 
Once you finish the work, type 'exit' at the command prompt to end the interactive session,
        [tannistha.nandi@fg3 ~]$ exit
        [tannistha.nandi@fg3 ~]$ salloc: Relinquishing job allocation 6760460
It is to ensure that the allocated resources are released from your job and now available to other users.
 
'''‘sbatch’''' is to submit computations as jobs to run on the cluster. You can submit a job-script.slurm via 'sbatch' for execution. 
        [tannistha.nandi@arc ~]$ sbatch job-script.slurm
When resources become available, they get allocated to this task. Batch jobs are suited for tasks that run for long periods of time without any user supervision. When the job-script terminates, the allocation is released.
Please review the section on how to prepare job scripts for more information.
 
=== Prepare job scripts  ===
Job scripts are text files saved with an extension '.slurm', for example, 'job-script.slurm'.
A job script looks something like this:
    ''#!/bin/bash''
    ####### Reserve computing resources #############
    #SBATCH --nodes=1
    #SBATCH --ntasks=1
    #SBATCH --cpus-per-task=1
    #SBATCH --time=01:00:00
    #SBATCH --mem=1G
    #SBATCH --partition=cpu2019<br>
    ####### Set environment variables ###############
    module load python/anaconda3-2018.12<br>
    ####### Run your script #########################
    python myscript.py
 
The first line contains the text "#!/bin/bash" to interpret it as a bash script.
 
It is followed by lines that start with a '#SBATCH' to communicate with  'SLURM'. You may add as many #SBATCH directives as needed to reserve computing resources for your task. The above example requests for one cpu on a single node for 1 task along with 1GB RAM for an hour on cpu2019 partition.
 
Next, you have to set up environment variables either by loading the modules centrally installed on ARC or export path to the software in your home directory. The above example loads an available python module.
 
Finally, include the Linux command to execute the local script.
 
Note that failing to specify part of a resource allocation request (most notably '''time''' and '''memory''') will result in bad resource requests as the defaults are not appropriate to most cases. Please refer to the section 'Running non-interactive jobs' for more examples.


== Hardware ==
== Hardware ==
=== Processors ===
Since the ARC cluster is a conglomeration of many different compute clusters, the hardware within ARC can vary widely in terms of performance and capabilities. To mitigate any compatibility issues with different hardware, we combine similar hardware into their own Slurm partition to ensure your workload runs as consistently as possible within one partition. Please carefully review the hardware specs for each of the partitions below to avoid any surprises.
Besides login and administrative servers, the ARC hardware consists of compute nodes of several types. When submitting jobs to run on the cluster, you can specify a partition parameter to select the particular type of hardware that is most appropriate for your work. See the [[ARC Hardware|ARC Hardware page]] for further technical details about the hardware if you are interested, but, most users will only need to be aware of the processor count and memory characteristics given in the section on [[#Selecting a partition|selecting a partition]] below.  


=== Network (interconnect) ===
=== Partition Hardware Specs ===
Most of the partitions have a high-speed interconnect between the compute nodes, making them suitable for multi-node parallel processingIt is sometimes important to make a distinction in the type of network technology used, as some software may be built to work with libraries supporting one type of network and not another.  
When submitting jobs to ARC, you may specify a partition that your job will run onPlease choose a partition that is most appropriate for your work.


The compute nodes in some of the partitions (cpu2019, apophis, apophis-bf, pawson, pawson-bf, razi and razi-bf) communicate via a 100 Gbit/s Omni-Path network, whereas the older compute nodes within the lattice and parallel cluster partitions use 40 Gbit/s InfiniBand.  A bit more detail is given on the [ARC Hardware page].
* See also [[How to find available partitions on ARC]].


=== Storage ===
A few things to keep in mind when choosing a partition:
The ARC cluster has slightly over 1 petabyte of shared disk storage available across the entire cluster as well as temporary storage local to each of the compute nodes. Please refer to the individual sections below on the capacity limitations and usage policies.
* Specific workloads requiring special Intel Instruction Set Extensions may only work on newer Intel CPUs.
* If working with multi-node parallel processing, ensure your software and libraries support the partition's interconnect networking.
* While older partitions may be slower, they may be less busy and have little to no wait times.


'''Backup policy:''' you are responsible for your own backups.  Many researchers will have accounts with Compute Canada and may choose to back up their data there (the Project file system accessible through the Cedar cluster would often be used).  We can explain more about this option if you write to support@hpc.ucalgary.ca .
If you are unsure which partition to use or need assistance on selecting an appropriate partition, please see [[#Selecting_a_Partition|the Selecting a Partition Section]] below.  


{| class="wikitable"
{| class="wikitable"
! Partition
! Partition
! Description
! Description
! Capacity
! Nodes
! CPU Cores, Model, and Year
! Memory
! GPU
! Network
|-
| -
| ARC Login Node
| 1
| 16 cores, 2x Intel(R) Xeon(R) CPU E5620  @ 2.40GHz (Westmere, 2010)
| 48 GB
| N/A
| 40 Gbit/s InfiniBand
|-
| gpu-v100
| GPU Parition
| 13
| 80 cores, 2x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (Skylake, 2019)
| 754 GB
| 2x Tesla V100-PCIE-16GB
| 100 Gbit/s Omni-Path
|-
|-
| /home
|gpu-a100
| User home directories
|GPU Partition
| 500 GB (per user)
|5
|40 cores, 1x Intel(R) Xeon(R) Silver 4316 CPU @ 2.30GHz (Ice Lake, 2021)
|512 GB
|2x GA100 A100 PCIe 80GB
|100 Gbit/s Mellanox Infiniband
|-
|-
| /work
|cpu2023
| Research project storage
|General Purpose Compute
| Up to 100's of TB
|48
|64 cores, 2x Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake, 2021)
|512 GB
|N/A
|40 Gbit/s Mellanox Infiniband (temporarily)
|-
|-
| /scratch
|cpu2022
| Scratch space for temporary files
|General Purpose Compute
| Up to 30 TB
|52
|52 cores, 2x Intel(R) Xeon(R) Gold 5320 CPU @ 2.20GHz (Ice Lake)
|256 GB
|N/A
|40 Gbit/s InfiniBand
|-
|-
| /tmp
| cpu2021
| Temporary space local to the compute cluster
| General Purpose Compute
| Dependent on nodes
| 48
| 48 cores, 2x Intel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz (Cascade Lake, 2021)
| 185 GB
| N/A
| 100 Gbit/s Mellanox Infiniband
|-
| cpu2019
| General Purpose Compute
| 14
| 40 cores, 2x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (Skylake, 2019)
| 190 GB
| N/A
| 100 Gbit/s Omni-Path
|-
| apophis
| General Purpose Compute
| 21
| 40 cores, 2x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (Skylake, 2019)
| 190 GB
| N/A
| 100 Gbit/s Omni-Path
|-
| razi
| General Purpose Compute
| 41
| 40 cores, 2x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (Skylake, 2019)
| 190 GB
| N/A
| 100 Gbit/s Omni-Path
|-
| bigmem
| Big Memory Nodes
| 2
| 80 cores, 4x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (Skylake, 2019)
| 3022 GB
| N/A
| 100 Gbit/s Omni-Path
|-
| pawson
| General Purpose Compute
| 13
| 40 cores, 2x Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz (Skylake, 2019)
| 190 GB
| N/A
| 100 Gbit/s Omni-Path
|-
|cpu2017
|General Purpose Compute
|14
|56 cores, 2x Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz (Sandy Bridge, 2012)
|256 GB
|N/A
|40 Gbit/s InfiniBand
|-
| theia
| Former Theia cluster
| 20
| 56 cores, 2x Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz (Sandy Bridge, 2012)
| 188 GB
| N/A
| 40 Gbit/s InfiniBand
|-
| cpu2013
| Former hyperion cluster
| 12
| 32 cores, 2x Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz (Sandy Bridge, 2012)
| 126 GB
| N/A
| 40 Gbit/s InfiniBand
|-
| lattice
| Former Lattice cluster
| 307
| 8 cores, 2x Intel(R) Xeon(R) CPU L5520  @ 2.27GHz (Nehalem, 2009)
| 12 GB
| N/A
| 40 Gbit/s InfiniBand
|-
| single
| Former Lattice cluster
| 168
| 8 cores, 2x Intel(R) Xeon(R) CPU L5520  @ 2.27GHz (Nehalem, 2009)
| 12 GB
| N/A
| 40 Gbit/s InfiniBand
|-
| parallel
| Former Parallel Cluster
| 576
| 12 cores, 2x Intel(R) Xeon(R) CPU E5649  @ 2.53GHz (Westmere, 2011)
| 24 GB
| N/A
| 40 Gbit/s InfiniBand
|}
|}


==== /home: Home file system ====
===ARC Cluster Storage===
Each user has a directory under /home and is the default working directory when logging in to ARC. Each home directory has a per-user quota of 500 GB. This limit is fixed and cannot be increased. Researchers requiring additional storage exceeding what is available on their home directory may use /work and /scratch.
Usage of ARC cluster storage is outlined by our [[ARC Storage Terms of Use]] page.
 
{{Warning Box
| title=Data Storage
| message=ARC storage is not suitable for long-term or archival storage. It is not backed-up and does not have sufficient redundancy to be used as a primary storage system. It is not guaranteed to be available for the time periods that are typical of archiving.
 
Please ensure that the only data you keep on ARC is used for active computations.
 
For information on available campus storage options, please see [[Storage Options]].
}}
 
{{Message Box
| title=No Backup Policy!
| message=You are responsible for your own backups.  Many researchers will have accounts with Compute Canada and may choose to back up their data there (the Project file system accessible through the Cedar cluster would often be used).
 
Please contact us at support@hpc.ucalgary.ca if you want more information about this option.


Note on file sharing: Due to security concerns, permissions set using <code>chmod</code> on your home directory to allow other users to read/write to your home directory be automatically reverted by an automated system process unless an explicit exception is made.  If you need to share files with other researchers on the ARC cluster, please write to support@hpc.ucalgary.ca to ask for such an exception.
You can also back up data to your UofC OneDrive for business allocation see: https://rcs.ucalgary.ca/How_to_transfer_data#rclone:_rsync_for_cloud_storage This allocation starts at 5TB. Contact the support center for questions regarding OneDrive for Business.
}}


==== /scratch: Scratch file system for large job-oriented storage ====
The ARC cluster has around 2 petabyte of shared disk storage available across the entire cluster as well as temporary storage local to each of the compute nodes. Please refer to the individual sections below on the capacity limitations and usage policies.  
Associated with each job, under the /scratch directory, a subdirectory is created that can be referenced in job scripts as /scratch/${SLURM_JOB_ID}. You can use that directory for temporary files needed during the course of a job. Up to 30 TB of storage may be used, per user (total for all your jobs) in the /scratch file system.  Deletion policy: data in /scratch associated with a given job will be deleted automatically, without exception, five days after the job finishes.


==== /work: Work file system for larger projects ====
Use the <code>arc.quota</code> command on ARC to determine the available space on your various volumes and home directory.
If you need more space than provided in /home and the /scratch job-oriented space is not appropriate for you case, please write to support@hpc.ucalgary.ca with an explanation, including an indication of how much storage you expect to need and for how long.  If approved, you will then be assigned a directory under /work with an appropriately large quota.


== Software ==
{| class="wikitable"
Look for installed software under /global/software and through the module avail command (described below). Links to documentation for some of the installed software is highlighted on a [[ARC_Software_pages|separate Wiki page]].
!Partition
!Description
!Capacity
|-
|<code>/home</code>
|User home directories
|500 GB (per user)
|-
|<code>/work</code>
|Research project storage
|Up to 100's of TB
|-
|<code>/scratch</code>
|Scratch space for temporary files
|Up to 15 TB
|-
|<code>/tmp</code>
|Temporary space local to the compute cluster
|Dependent on available storage on nodes. Verify with <code>df -h</code>.
|-
|<code>/dev/shm</code>
|Small temporary in-memory disk space local to the compute cluster
|Dependent on memory size set in your Slurm job.
|}
====<code>/home</code>: Home file system====
Each user has a directory under /home and is the default working directory when logging in to ARC. Each home directory has a per-user quota of 500 GB. This limit is fixed and cannot be increased. Researchers requiring additional storage exceeding what is available on their home directory may use <code>/work</code> and <code>/scratch</code>.
 
Note on file sharing: Due to security concerns, permissions set using <code>chmod</code> on your home directory to allow other users to read/write to your home directory be automatically reverted by an automated system process unless an explicit exception is made.  If you need to share files with other researchers on the ARC cluster, please write to support@hpc.ucalgary.ca to ask for such an exception.
 
====<code>/scratch</code>: Scratch file system for large job-oriented storage====
Associated with each job, under the <code>/scratch</code> directory, a subdirectory is created that can be referenced in job scripts as <code>/scratch/${SLURM_JOB_ID}</code>. You can use that directory for temporary files needed during the course of a job. Up to 15 TB of storage may be used, per user (total for all your jobs) in the <code>/scratch</code> file system.  


The setup of the environment for using some of the installed software is through the module command. An overview of [https://www.westgrid.ca//support/modules modules on WestGrid (external link)] is largely applicable to ARC.
Data in <code>/scratch</code> associated with a given job will be deleted automatically, without exception, five days after the job finishes.


To list available modules, type:
====<code>/work</code>: Work file system for larger projects====
If you need more space than provided in <code>/home</code> and the <code>/scratch</code> job-oriented space is not appropriate for you case, please write to support@hpc.ucalgary.ca with an explanation, including an indication of how much storage you expect to need and for how long.  If approved, you will then be assigned a directory under <code>/work</code> with an appropriately large quota.


module avail
====<code>/tmp</code>,<code>/var/tmp</code>: Temporary files====
You may use <code>/tmp</code> or <code>/var/tmp</code> for storing temporary files generated by your job. The <code>/tmp</code> is stored on a disk local to the compute node and is not shared across the cluster. The files stored here will be removed immediately after your job terminates.


So, for example, to load a module for Python use:
==== <code>/dev/shm</code>, <code>/run/user/$uid</code>: In-memory temporary files ====
<code>/dev/shm</code> and <code>/run/user/$UID</code> is writable location for temporary files backed by virtual memory. This can be used if faster I/O is required. This is ideal for workloads that require many small read/writes to share data between processes or as a fast cache. The amount of data you can write here is dependent on the amount of free memory available to your job. The files stored at these locations will be removed immediately after your job terminates.


  module load python/anaconda-3.6-5.1.0
== Software ==
All ARC nodes run the latest version of Rocky Linux 8 with the same set of base software packages. To maintain the stability and consistency of all nodes, any additional dependencies that your software requires must be installed under your account. For your convenience, we have packaged commonly used software packages and dependencies as modules available under <code>/global/software</code>. If your software package is not available as a module, you may also try Anaconda which allows users to manage and install custom packages in an isolated environment.


and to remove it use:
For a list of available packages that have been made available, please see [[ARC Software pages]].


module remove python/anaconda-3.6-5.1.0
Please contact us at support@hpc.ucalgary.ca if you need additional software installed.


To see currently loaded modules, type:
==== Modules ====
The setup of the environment for using some of the installed software is through the <code>module</code> command. An overview of [https://www.westgrid.ca//support/modules modules on WestGrid (external link)] is largely applicable to ARC.


module list
Software packages bundled as a module will be available under <code>/global/software</code> and can be listed with the <code>module avail</code> command.
<syntaxhighlight lang="bash">
$ module avail
</syntaxhighlight>


Unlike some clusters, there are no modules loaded by default. So, for example, to use Intel compilers, or to use Open MPI parallel programming, you must load an appropriate module.
To enable Python, load the Python module by running:
<syntaxhighlight lang="bash">
$ module load python/anaconda-3.6-5.1.0
</syntaxhighlight>


Write to support@hpc.ucalgary.ca if you need additional software installed.
To unload the Python module, run:
<syntaxhighlight lang="bash">
$ module remove python/anaconda-3.6-5.1.0
</syntaxhighlight>


== Using ARC ==
To see currently loaded modules, run:
=== Logging in ===
<syntaxhighlight lang="bash">
To log in to ARC, connect using SSH to arc.ucalgary.ca. Connections to ARC are accepted only from the University of Calgary network (on campus) or through the University of Calgary General VPN (off campus).
$ module list
</syntaxhighlight>


See [[Connecting to RCS HPC Systems]] for more information.
By default, no modules are loaded on ARC. If you wish to use a specific module, such as the Intel compilers or the Open MPI parallel programming packages, you must load the appropriate module.


=== Storage ===
== Job submission ==
Please review the [[#Storage|Storage]] section above for important policies and advice regarding file storage and file sharing.


=== Interactive Jobs ===
=== Interactive Jobs ===
Line 108: Line 365:


The default salloc allocation is 1 CPU and 1 GB of memory. Adjust this by specifying <code>-n CPU#</code> and <code>--mem Megabytes</code>. You may request up to 5 hours of CPU time for interactive jobs.
The default salloc allocation is 1 CPU and 1 GB of memory. Adjust this by specifying <code>-n CPU#</code> and <code>--mem Megabytes</code>. You may request up to 5 hours of CPU time for interactive jobs.
  salloc --time 5:00:00 --partition cpu2019  
  salloc --time=5:00:00 --partition=cpu2019


Always use salloc or srun to start an interactive job. Do not SSH directly to a compute node as SSH sessions will be refused without an active job running.
<!-- This information doesn't seem that useful or relevant to running interactive jobs. Move to getting started section?
ARC uses the Linux operating system. The program that responds to your typed commands and allows you to run other programs is called the Linux shell. There are several different shells available, but, by default you will use one called bash. It is useful to have some knowledge of the shell and a variety of other command-line programs that you can use to manipulate files. If you are new to Linux systems, we recommend that you work through one of the many online tutorials that are available, such as the [http://www.ee.surrey.ac.uk/Teaching/Unix/index.html UNIX Tutorial for Beginners (external link)] provided by the University of Surrey. The tutorial covers such fundamental topics, among others, as creating, renaming and deleting files and directories, how to produce a listing of your files and how to tell how much disk space you are using.  For a more comprehensive introduction to Linux, see [http://linuxcommand.sourceforge.net/tlcl.php The Linux Command Line (external link)].
ARC uses the Linux operating system. The program that responds to your typed commands and allows you to run other programs is called the Linux shell. There are several different shells available, but, by default you will use one called bash. It is useful to have some knowledge of the shell and a variety of other command-line programs that you can use to manipulate files. If you are new to Linux systems, we recommend that you work through one of the many online tutorials that are available, such as the [http://www.ee.surrey.ac.uk/Teaching/Unix/index.html UNIX Tutorial for Beginners (external link)] provided by the University of Surrey. The tutorial covers such fundamental topics, among others, as creating, renaming and deleting files and directories, how to produce a listing of your files and how to tell how much disk space you are using.  For a more comprehensive introduction to Linux, see [http://linuxcommand.sourceforge.net/tlcl.php The Linux Command Line (external link)].
-->


=== Running non-interactive jobs (batch processing) ===
=== Running non-interactive jobs (batch processing) ===
Line 117: Line 378:
Most of the information on the [https://docs.computecanada.ca/wiki/Running_jobs Running Jobs (external link)] page on the Compute Canada web site is also relevant for submitting and managing batch jobs and reserving processors for interactive work on ARC.  One major difference between running jobs on the ARC and Compute Canada clusters is in selecting the type of hardware that should be used for a job. On ARC, you choose the hardware to use primarily by specifying a partition, as described below.
Most of the information on the [https://docs.computecanada.ca/wiki/Running_jobs Running Jobs (external link)] page on the Compute Canada web site is also relevant for submitting and managing batch jobs and reserving processors for interactive work on ARC.  One major difference between running jobs on the ARC and Compute Canada clusters is in selecting the type of hardware that should be used for a job. On ARC, you choose the hardware to use primarily by specifying a partition, as described below.


==== Selecting a partition ====
=== Selecting a Partition ===
The type of computer on which a job can or should be run is determined by characteristics of your software, such as whether it supports parallel processing and by simulation or data-dependent factors such as the amount of memory required.  If the program you are running uses MPI (Message Passing Interface) for parallel processing, which allows the memory usage to be distributed across multiple compute nodes, then, the memory required per MPI process is an important factor.  If you are running a serial code (that is, it is not able to use multiple CPU cores) or one that is parallelized with OpenMP or other thread-based techniques that restrict it to running on just a single compute node, then, the total memory required is the main factor to consider. If your program can make use of graphics processing units, then, that will be the determining factor. If you have questions about which ARC hardware to use, please write to [[mailto:support@hpc.ucalgary.ca]] and we would be happy to discuss this with you.
There are some aspects to consider when selecting a partition including:
* Resource requirements in terms of memory and CPU cores
* Hardware specific requirements, such as GPU or CPU Instruction Set Extensions
* Partition resource limits and potential wait time
* Software support parallel processing using Message Passing Interface (MPI), OpenMP, etc.
** Eg. MPI for parallel processing can distribute memory across multiple nodes, per-node memory requirements could be lower. Whereas, OpenMP or single process code that is restricted to one node would require a higher memory node.
** Note: MPI code running on hardware with Omni-Path networking should be compiled with Omni-Path networking support. This is provided by loading the <code>openmpi/2.1.3-opa</code> or <code>openmpi/3.1.2-opa</code> modules prior to compiling.


One you have decided what type of hardware best suits your calculations, you can select it on a job-by-job basis by including the '''partition''' keyword for an '''SBATCH''' directive in your batch job. The tables below summarize the characteristics of the various partitions
Since resources that are requested are reserved for your job, please request only as much CPU and memory as your job requires to avoid reducing the cluster efficiency.  If you are unsure which partition to use or the specific resource requests that are appropriate for your jobs, please contact us at [mailto:support@hpc.ucalgary.ca support@hpc.ucalgary.ca] and we would be happy to work with you.


If you omit the partition specification, the system will try to assign your job to appropriate hardware based on other aspects of your request, but, for more control you can specify one or more partitions yourself.  You are allowed to specify a comma-separate list of partitions.
{| class="wikitable" style="width: 100%;"
 
In some cases, you really should specify the partition explicitly.  For example, if you are running single-node jobs with thread-based parallel processing requesting 8 cores you could use:
 
#SBATCH --mem=0
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --partition=single,lattice
 
Since the single and lattice partitions both have the same type of hardware, it is appropriate to list them both.  Specifying --mem=0 allows you to use all the available memory (12000 MB) on the compute node assigned to the job.  Since the compute nodes in those partitions have 8 cores each and you will be using them all, you need not be concerned about other users' jobs sharing the memory with your job.  However, if you didn't explicitly specify the partition in such a case, the system would try to assign your job to the cpu2019 or similar partition.  Those nodes have 40 cores and much more memory than the single and lattice partitions.  If you specified --mem=0 in such a case, you would be wasting 32 cores of processing.  So, if you don't specify a partition yourself, you have to give greater thought to the memory specification to make sure that the scheduler will not assign your job more resources than are needed.
 
As time limits may be changed by administrators to adjust to maintenance schedules or system load, the values given in the tables are not definitive.  See the Time limits section below for commands you can use on ARC itself to determine current limits.
 
Parameters such as '''--ntasks-per-cpu''', '''--cpus-per-task''', '''--mem''' and '''--mem-per-cpu>''' have to be adjusted according to the capabilities of the hardware also. The product of --ntasks-per-cpu and --cpus-per-task should be less than or equal to the number given in the "Cores/node" column.  The '''--mem>''' parameter (or the product of '''--mem-per-cpu''' and '''--cpus-per-task''') should be less than the "Memory limit" shown. If using whole nodes, you can specify '''--mem=0''' to request the maximum amount of memory per node.
 
==== Partitions for modern hardware ====
Note, MPI codes using this hardware should be compiled with Omni-Path networking support. This is provided by loading the openmpi/2.1.3-opa or openmpi/3.1.2-opa modules prior to compiling.
{| class="wikitable
!Partition
!Partition
!Description
!Cores/node
!Cores/node
!Memory limit (MB)
!Memory Request Limit
!Time limit (h)
!Time Limit
!GPUs/node
!GPU
!Networking
|-
|cpu2021
|General Purpose Compute
|48
|185,000 MB
|7 days ‡
|
|100 Gbit/s Omni-Path
|-
|-
|cpu2019
|cpu2019
|General Purpose Compute
|40
|40
|185000
|185,000 MB
|168
|7 days ‡
|
|
|100 Gbit/s Omni-Path
|-
|bigmem
|Big Memory Compute
|80
|3,000,000 MB
|24 hours ‡
|
|100 Gbit/s Omni-Path
|-
|gpu-v100
|GPU Compute
|80
|753,000 MB
|24 hours ‡
|2
|100 Gbit/s Omni-Path
|-
|-
|apophis&dagger;
|apophis&dagger;
|Private Research Partition
|40
|40
|185000
|185,000 MB
|168
|7 days ‡
|
|
|100 Gbit/s Omni-Path
|-
|-
|apophis-bf&dagger;
|razi&dagger;
|Private Research Partition
|40
|40
|185000
|185,000 MB
|5
|7 days ‡
|
|
|100 Gbit/s Omni-Path
|-
|-
|razi&dagger;
|pawson&dagger;
|Private Research Partition
|40
|40
|185000
|185,000 MB
|168
|7 days ‡
|
|
|100 Gbit/s Omni-Path
|-
|-
|razi-bf&dagger;
|sherlock&dagger;
|40
|Private Research Partition
|185000
|7
|5
|185,000 MB
|
|7 days ‡
|
|100 Gbit/s Omni-Path
|-
|-
|bigmem
|theia&dagger;
|80
|Private Research Partition
|3000000
|28
|24
|188,000 MB
|
|7 days ‡
|
|40 Gbit/s InfiniBand
|-
|-
|gpu-v100
|synergy&dagger;
|40
|Private Research Partition
|753000
|14
|24
|245,000 MB
|2
|7 days ‡
|}
|
&dagger; The apophis and razi partitions contain hardware contributed to ARC by particular researchers. They should be used only by members of those researchers' groups.  However, they have generously allowed their compute nodes to be shared with others outside their research groups for relatively short jobs by specifying the apophis-bf and razi-bf partitions.  (In some cases in which a partition is not explicitly specified, these "back-fill" partitions may be automatically selected by the system).
|40 Gbit/s InfiniBand
 
==== Partitions for legacy hardware ====
{| class="wikitable"
!Partition
!Cores/node
!Memory limit (MB)
!Time limit (h)
!GPUs/node
|-
|-
|cpu2013
|cpu2013
|Legacy General Purpose Compute
|16
|16
|120000
|120000
|168
|7 days ‡
|
|
|-
|40 Gbit/s InfiniBand
|-
|lattice
|lattice
|Legacy General Purpose Compute
|8
|8
|12000
|12000
|168
|7 days ‡
|
|
|-
|40 Gbit/s InfiniBand
|-
|parallel
|parallel
|Legacy General Purpose Compute
|12
|12
|23000
|23000
|168
|7 days ‡
|
|
|40 Gbit/s InfiniBand
|-
|-
|breezy&Dagger;
|24
|255000
|72
|
|-
|bigbyte&Dagger;
|32
|1000000
|24
|
|-
|single
|single
|Legacy Single-Node Job Compute
|8
|8
|12000
|12000
|168
|7 days ‡
|
|40 Gbit/s InfiniBand
|-
|cpu2021-bf24
|Back-fill Compute (2021-era hardware, 24h)
|48
|185,000 MB
|24 hours ‡
|
|
|100 Gbit/s Omni-Path
|-
|-
|gpu
|cpu2019-bf05
|12
|Back-fill Compute (2019-era hardware, 5h)
|23000
|40
|72
|185,000 MB
|3
|5 hours ‡
|
|100 Gbit/s Omni-Path
|-
|cpu2017-bf05
|Back-fill Compute (2017-era hardware, 5h)
|14
|245,000 MB
|5 hours ‡
|
|40 Gbit/s InfiniBand
|-
|+ style="caption-side: bottom; text-align: left; font-weight: normal;" | &dagger; These partitions contain hardware contributed to ARC by particular researchers and should only be used by members of their research groups. However, they have generously allowed their compute nodes to be shared with others outside their research groups for short jobs.  A special 'back-fill' or -bf partition is available for use by all ARC users for jobs shorter than 5 hours.<br />‡ As time limits may be changed by administrators to adjust to maintenance schedules or system load, the values given in the tables are not definitive.  See the Time limits section below for commands you can use on ARC itself to determine current limits.
|}
|}
&Dagger; Update 2019-11-27 - the breezy and bigbyte partition nodes are being repurposed as a cluster to support teaching and learning activities and are no longer available as part of ARC.


==== Examples ====
==== Backfill partitions ====
Backfill partitions can be used by all users on ARC for short-term jobs. The hardware backing these partitions are generously contributed by researchers.  We recommend including the backfill partitions for short term jobs as it may help reduce your job's wait time and increase the overall cluster throughput.
 
Previously, each contributing research group had their own backfill partition. Since June 2021, we have merged:
 
* apophis-bf, pawson-bf, and razi-bf into cpu2019-bf05
* theia-bf and synergy-bf into cpu2017-bf05
 
The naming scheme of the backfill partitions is the CPU generation year, followed by -bf and the time limit in hours.  For example, cpu2017-bf05 would represent a backfill partition containing processors from 2017 with a time limit of 5 hours.
 
==== Hardware resource and job policy limits ====
In addition to the hardware limitations, please be aware that there may also be policy limits imposed on your account for each partition. These limits restrict the number of cores, nodes, or GPUs that can be used at any given time. Since the limits are applied on a partition-by-partition basis, using resources in one partition should not affect the available resources you can use in another partition.
 
These limits can be listed by running:
<syntaxhighlight lang="bash">
$ sacctmgr show qos format=Name,MaxWall,MaxTRESPU%20,MaxSubmitJobs
      Name    MaxWall            MaxTRESPU MaxSubmit
---------- ----------- -------------------- ---------
    normal  7-00:00:00                          2000
    breezy  3-00:00:00              cpu=384      2000
      gpu  7-00:00:00                          13000
  cpu2019  7-00:00:00              cpu=240      2000
  gpu-v100  1-00:00:00    cpu=80,gres/gpu=4      2000
    single  7-00:00:00      cpu=408,node=75      2000
      razi  7-00:00:00                          2000
</syntaxhighlight>
 
==== Specifying a partition in a job ====
One you have decided which partitions best suits your computation, you can select one or more partition on a job-by-job basis by including the <code>partition</code> keyword for an <code>SBATCH</code> directive in your batch job. Multiple partitions should be comma separated.  If you omit the partition specification, the system will try to assign your job to appropriate hardware based on other aspects of your request.
 
In some cases, you really should specify the partition explicitly.  For example, if you are running single-node jobs with thread-based parallel processing requesting 8 cores you could use:
<syntaxhighlight lang="bash">
#SBATCH --mem=0              ❶
#SBATCH --nodes=1            ❷
#SBATCH --ntasks=1          ❸
#SBATCH --cpus-per-task=8    ❹
#SBATCH --partition=single,lattice  ❺
</syntaxhighlight>
 
A few things to mention in this example:
# <code>--mem=0</code> allocates all available memory on the compute node for the job. This effectively allocates the entire node for your job.
# <code>--nodes=1</code> allocates 1 node for the job
# <code>--ntasks=1</code> your job has a single task
# <code>--cpus-per-task=8</code> asks for 8 CPUs per task. This job in total will request 8 * 1, or 8 CPUs.
# <code>--partition=single,lattice</code> specifies that this job can run on either single or lattice.
Suppose that your job requires at most 8 CPU cores and 10 GB of memory. The above Slurm request would be valid and optimal since your job fits neatly in a single node on the single and parallel partition.  However, if you failed to specify the partition, Slurm may try to schedule your job to a partition with larger nodes, such as cpu2019 where each node has 40 cores and 190 GB of memory. If your job is scheduled on such a node, your job will be effectively wasting 32 cores and 180 GB of memory because <code>--mem=0</code> not only requests for 190 GB on this node, but also prevents other jobs from being scheduled on the same node.
 
If you don't specify a partition, please give greater thought to the memory specification to make sure that the scheduler will not assign your job more resources than are needed.
 
Parameters such as '''--ntasks-per-cpu''', '''--cpus-per-task''', '''--mem''' and '''--mem-per-cpu>''' have to be adjusted according to the capabilities of the hardware also. The product of --ntasks-per-cpu and --cpus-per-task should be less than or equal to the number given in the "Cores/node" column.  The '''--mem>''' parameter (or the product of '''--mem-per-cpu''' and '''--cpus-per-task''') should be less than the "Memory limit" shown. If using whole nodes, you can specify '''--mem=0''' to request the maximum amount of memory per node.
 
===== Examples =====
Here are some examples of specifying the various partitions.
Here are some examples of specifying the various partitions.


Line 258: Line 603:
  #SBATCH --partition=bigmem
  #SBATCH --partition=bigmem


If the more modern computers are too busy or you have a job well-suited to run on the compute nodes described in the legacy hardware section above, choose the cpu2013, Lattice or Parallel compute nodes (without graphics processing units) by specifying the corresponding partition keyword:
If the more modern computers are too busy or you have a job well-suited to run on the compute nodes described in the legacy hardware section above, choose the cpu2013, Lattice or Parallel compute nodes by specifying the corresponding partition keyword:


  #SBATCH --partition=cpu2013
  #SBATCH --partition=cpu2013
  #SBATCH --partition=lattice
  #SBATCH --partition=lattice
or
  #SBATCH --partition=parallel
  #SBATCH --partition=parallel


Line 272: Line 613:
  #SBATCH --partition=single
  #SBATCH --partition=single


For single-node jobs requiring more memory or processors than available through the breezy or single partitions, use the bigbyte partition:
=== Time limits ===
 
Use the <code>--time</code> directive to tell the job scheduler the maximum time that your job might run.  For example:
#SBATCH --partition=bigbyte
 
To select the nodes that have GPUs, specify the '''gpu''' partition. Use an SBATCH directive in the form '''--gres=gpu:n''' to specify the number of GPUs, n, that you need.  For example, if the software you are running can make use of all three GPUs on a compute node, use:
 
#SBATCH --partition=gpu --gres=gpu:3
 
==== Time limits ====
Use a directive of the form
 
  #SBATCH --time=hh:mm:ss
  #SBATCH --time=hh:mm:ss


to tell the job scheduler the maximum time that your job might run. You can use the command
You can use <code>scontrol show partitions</code> or <code>sinfo</code> to see the current maximum time that a job can run.
<syntaxhighlight lang="bash" highlight="6">
$ scontrol show partitions
PartitionName=single                                                               
  AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL                                   
  AllocNodes=ALL Default=NO QoS=single                                             
  DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO       
  MaxNodes=UNLIMITED MaxTime=7-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED 
  Nodes=cn[001-168]                                                               
  PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO       
  OverTimeLimit=NONE PreemptMode=OFF                                               
  State=UP TotalCPUs=1344 TotalNodes=168 SelectTypeParameters=NONE                 
  DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED                                 
</syntaxhighlight>


  scontrol show partitions
Alternatively, with <code>sinfo</code> under the <code>TIMELIMIT</code> column:
<syntaxhighlight lang="bash">
$ sinfo                                                   
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST             
single        up 7-00:00:00      1 drain* cn097                 
single        up 7-00:00:00      1  maint cn002                 
single        up 7-00:00:00      4 drain* cn[001,061,133,154]   
...
</syntaxhighlight>


to see the current configuration of the partitions including the maximum time limit you can specify for each partition, as given by the MaxTime field.  Alternatively, see the TIMELIMIT column in the output from
== Support ==
 
{{Support
sinfo
|title=[[Support|Need Help or have other ARC Related Questions?]]
 
|message=For all general RCS related issues, questions, or comments, please contact us at support@hpc.ucalgary.ca.
=== Hardware resource and job policy limits ===
}}
There are limits on the number of cores, nodes and/or GPUs that one can use on ARC at any given time. There is also a limit on the number of jobs that a user can have pending or running at a given time (the MaxSubmitJobs parameter in the command below). The limits are generally applied on a partition-by-partition basis, so, using resources in one partition should not affect the amount you can use in a different partition. To see the current limits you can run the command:
 
sacctmgr show qos format=Name,MaxWall,MaxTRESPU%20,MaxSubmitJobs


== Support ==
Please don't hesitate to [[Support|contact us]] directly by email if you need help using ARC or require guidance on migrating and running your workflows to ARC.
Please send ARC-related questions to support@hpc.ucalgary.ca.


https://leo.leung.xyz/wiki/MediaWiki#Extensions
{{Navbox ARC}}
[[Category:ARC]]
[[Category:Guides]]

Latest revision as of 21:31, 19 August 2024

ARC status: Cluster operational


System is operational. No updates are planned.

See the ARC Cluster Status page for system notices.

Support Icon.png

Need Help or have other ARC Related Questions?

For all general RCS related issues, questions, or comments, please contact us at support@hpc.ucalgary.ca.

This guide gives an overview of the Advanced Research Computing (ARC) cluster at the University of Calgary and is intended to be read by new account holders getting started on ARC. This guide covers topics such as the hardware and performance characteristics, available software, usage policies and how to log in and run jobs. ARC can be used with data that a Researcher has classified as Lv1 and Lv2 as described in the UCalgary Information Security Classification Standard

Introduction

The ARC is a high performance compute (HPC) cluster that is available for research projects based at the University of Calgary. This compute cluster is comprised of hundreds of severs interconnected with a high bandwidth interconnect. Special resources within the cluster include nodes with large memory installed and GPUs are also available. You may learn more about ARC's hardware in the hardware section below. ARC can be accessed through a command line interface or via a web interface called Open OnDemand.

This cluster can be used for running large numbers (hundreds) of concurrent serial (one core) jobs, OpenMP or other thread-based jobs, shared-memory parallel code using up to 40 or 80 threads per job (depending on the partition), distributed-memory (MPI-based) parallel code using up to hundreds of cores, or jobs that take advantage of Graphics Processing Units (GPUs).

Historically, ARC is primarily comprised of older, disparate Linux-based clusters that were formerly offered to researchers from across Canada such as Breezy, Lattice, and Parallel. In addition, a large-memory compute node (Bigbyte) was salvaged from the now-retired local Storm cluster. In January 2019, a major addition to ARC with modern hardware was purchased. In 2020, compute clusters from CHGI have been migrated into ARC.

How to Get Started

If you have a project you think would be appropriate for ARC, please email support@hpc.ucalgary.ca and mention the intended research and software you plan to use. You must have a University of Calgary IT account in order to use ARC.

  • For users that do not have a University of IT account or email address, please register for one at https://itregport.ucalgary.ca/.
  • For users external to the University, such as for users collaborating on a research project at the University of Calgary, please contact us and mention the project leader you are collaborating with.

Once your access to ARC has been granted, you will be able to immediately make use of the cluster using your University of Calgary IT account by following the usage guide outlined below.

Using ARC

Security Icon.png

Cybersecurity awareness at the U of C

Please note that there are typically about 950 phishing attempts targeting University of Calgary accounts each month. This is just a reminder to be careful about computer security issues, both at home and at the University. Please visit https://it.ucalgary.ca/it-security for more information, tips on secure computing, and how to report suspected security problems.

Logging in

To log in to ARC, connect using SSH to arc.ucalgary.ca on port 22. Connections to ARC are accepted only from the University of Calgary network (on campus) or through the University of Calgary General VPN (off campus).

See Connecting to RCS HPC Systems for more information.

How to interact with ARC

ARC cluster is a collection of several compute nodes connected by a high-speed network. On ARC, computations get submitted as jobs. Once submitted, the jobs are then assigned to compute nodes by the job scheduler as resources become available.

Cluster.png

You can access ARC with your UCalgary IT user credentials. Once connected, you will get placed in the ARC login node, for basic tasks such as job submission, monitor job status, manage files, edit text, etc. It is a shared resource where multiple users get connected at the same time. Thus, any intensive tasks is not allowed on the login node as it may block other potential users to connect/submit their computations.

        [tannistha.nandi@arc ~]$ 

The job scheduling system on ARC is called SLURM. On ARC, there are two SLURM commands that can allocate resources to a job under appropriate conditions: ‘salloc’ and ‘sbatch’. They both accept the same set of command line options with respect to resource allocation. 

‘salloc’ is to launch an interactive session, typically for tasks under 5 hours. Once an interactive job session is created, you can do things like explore research datasets, start R or python sessions to test your code, compile software applications etc.

a. Example 1: The following command requests for 1 cpu on 1 node for 1 task along with 1 GB of RAM for an hour. 

         [tannistha.nandi@arc ~]$ salloc --mem=1G -c 1 -N 1 -n 1  -t 01:00:00
         salloc: Granted job allocation 6758015
         salloc: Waiting for resource configuration
         salloc: Nodes fc4 are ready for job
         [tannistha.nandi@fc4 ~]$ 


b. Example 2:  The following command requests for 1 GPU to be used from 1 node belonging to the gpu-v100 partition along with 1 GB of RAM for 1 hour. Generic resource scheduling (--gres) is used to request for GPU resources.

        [tannistha.nandi@arc ~]$ salloc --mem=1G -t 01:00:00 -p gpu-v100 --gres=gpu:1
        salloc: Granted job allocation 6760460
        salloc: Waiting for resource configuration
        salloc: Nodes fg3 are ready for job
        [tannistha.nandi@fg3 ~]$

Once you finish the work, type 'exit' at the command prompt to end the interactive session,

        [tannistha.nandi@fg3 ~]$ exit
        [tannistha.nandi@fg3 ~]$ salloc: Relinquishing job allocation 6760460

It is to ensure that the allocated resources are released from your job and now available to other users.

‘sbatch’ is to submit computations as jobs to run on the cluster. You can submit a job-script.slurm via 'sbatch' for execution.

        [tannistha.nandi@arc ~]$ sbatch job-script.slurm

When resources become available, they get allocated to this task. Batch jobs are suited for tasks that run for long periods of time without any user supervision. When the job-script terminates, the allocation is released. Please review the section on how to prepare job scripts for more information.

Prepare job scripts

Job scripts are text files saved with an extension '.slurm', for example, 'job-script.slurm'. A job script looks something like this:

   #!/bin/bash
   ####### Reserve computing resources #############
   #SBATCH --nodes=1
   #SBATCH --ntasks=1
   #SBATCH --cpus-per-task=1
   #SBATCH --time=01:00:00
   #SBATCH --mem=1G
   #SBATCH --partition=cpu2019
####### Set environment variables ############### module load python/anaconda3-2018.12
####### Run your script ######################### python myscript.py

The first line contains the text "#!/bin/bash" to interpret it as a bash script.

It is followed by lines that start with a '#SBATCH' to communicate with 'SLURM'. You may add as many #SBATCH directives as needed to reserve computing resources for your task. The above example requests for one cpu on a single node for 1 task along with 1GB RAM for an hour on cpu2019 partition.

Next, you have to set up environment variables either by loading the modules centrally installed on ARC or export path to the software in your home directory. The above example loads an available python module.

Finally, include the Linux command to execute the local script.

Note that failing to specify part of a resource allocation request (most notably time and memory) will result in bad resource requests as the defaults are not appropriate to most cases. Please refer to the section 'Running non-interactive jobs' for more examples.

Hardware

Since the ARC cluster is a conglomeration of many different compute clusters, the hardware within ARC can vary widely in terms of performance and capabilities. To mitigate any compatibility issues with different hardware, we combine similar hardware into their own Slurm partition to ensure your workload runs as consistently as possible within one partition. Please carefully review the hardware specs for each of the partitions below to avoid any surprises.

Partition Hardware Specs

When submitting jobs to ARC, you may specify a partition that your job will run on. Please choose a partition that is most appropriate for your work.

A few things to keep in mind when choosing a partition:

  • Specific workloads requiring special Intel Instruction Set Extensions may only work on newer Intel CPUs.
  • If working with multi-node parallel processing, ensure your software and libraries support the partition's interconnect networking.
  • While older partitions may be slower, they may be less busy and have little to no wait times.

If you are unsure which partition to use or need assistance on selecting an appropriate partition, please see the Selecting a Partition Section below.

Partition Description Nodes CPU Cores, Model, and Year Memory GPU Network
- ARC Login Node 1 16 cores, 2x Intel(R) Xeon(R) CPU E5620 @ 2.40GHz (Westmere, 2010) 48 GB N/A 40 Gbit/s InfiniBand
gpu-v100 GPU Parition 13 80 cores, 2x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (Skylake, 2019) 754 GB 2x Tesla V100-PCIE-16GB 100 Gbit/s Omni-Path
gpu-a100 GPU Partition 5 40 cores, 1x Intel(R) Xeon(R) Silver 4316 CPU @ 2.30GHz (Ice Lake, 2021) 512 GB 2x GA100 A100 PCIe 80GB 100 Gbit/s Mellanox Infiniband
cpu2023 General Purpose Compute 48 64 cores, 2x Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake, 2021) 512 GB N/A 40 Gbit/s Mellanox Infiniband (temporarily)
cpu2022 General Purpose Compute 52 52 cores, 2x Intel(R) Xeon(R) Gold 5320 CPU @ 2.20GHz (Ice Lake) 256 GB N/A 40 Gbit/s InfiniBand
cpu2021 General Purpose Compute 48 48 cores, 2x Intel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz (Cascade Lake, 2021) 185 GB N/A 100 Gbit/s Mellanox Infiniband
cpu2019 General Purpose Compute 14 40 cores, 2x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (Skylake, 2019) 190 GB N/A 100 Gbit/s Omni-Path
apophis General Purpose Compute 21 40 cores, 2x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (Skylake, 2019) 190 GB N/A 100 Gbit/s Omni-Path
razi General Purpose Compute 41 40 cores, 2x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (Skylake, 2019) 190 GB N/A 100 Gbit/s Omni-Path
bigmem Big Memory Nodes 2 80 cores, 4x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (Skylake, 2019) 3022 GB N/A 100 Gbit/s Omni-Path
pawson General Purpose Compute 13 40 cores, 2x Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz (Skylake, 2019) 190 GB N/A 100 Gbit/s Omni-Path
cpu2017 General Purpose Compute 14 56 cores, 2x Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz (Sandy Bridge, 2012) 256 GB N/A 40 Gbit/s InfiniBand
theia Former Theia cluster 20 56 cores, 2x Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz (Sandy Bridge, 2012) 188 GB N/A 40 Gbit/s InfiniBand
cpu2013 Former hyperion cluster 12 32 cores, 2x Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz (Sandy Bridge, 2012) 126 GB N/A 40 Gbit/s InfiniBand
lattice Former Lattice cluster 307 8 cores, 2x Intel(R) Xeon(R) CPU L5520 @ 2.27GHz (Nehalem, 2009) 12 GB N/A 40 Gbit/s InfiniBand
single Former Lattice cluster 168 8 cores, 2x Intel(R) Xeon(R) CPU L5520 @ 2.27GHz (Nehalem, 2009) 12 GB N/A 40 Gbit/s InfiniBand
parallel Former Parallel Cluster 576 12 cores, 2x Intel(R) Xeon(R) CPU E5649 @ 2.53GHz (Westmere, 2011) 24 GB N/A 40 Gbit/s InfiniBand

ARC Cluster Storage

Usage of ARC cluster storage is outlined by our ARC Storage Terms of Use page.

Attention Icon.png

Data Storage

ARC storage is not suitable for long-term or archival storage. It is not backed-up and does not have sufficient redundancy to be used as a primary storage system. It is not guaranteed to be available for the time periods that are typical of archiving.

Please ensure that the only data you keep on ARC is used for active computations.

For information on available campus storage options, please see Storage Options.

Information Icon.png

No Backup Policy!

You are responsible for your own backups. Many researchers will have accounts with Compute Canada and may choose to back up their data there (the Project file system accessible through the Cedar cluster would often be used).

Please contact us at support@hpc.ucalgary.ca if you want more information about this option.

You can also back up data to your UofC OneDrive for business allocation see: https://rcs.ucalgary.ca/How_to_transfer_data#rclone:_rsync_for_cloud_storage This allocation starts at 5TB. Contact the support center for questions regarding OneDrive for Business.

The ARC cluster has around 2 petabyte of shared disk storage available across the entire cluster as well as temporary storage local to each of the compute nodes. Please refer to the individual sections below on the capacity limitations and usage policies.

Use the arc.quota command on ARC to determine the available space on your various volumes and home directory.

Partition Description Capacity
/home User home directories 500 GB (per user)
/work Research project storage Up to 100's of TB
/scratch Scratch space for temporary files Up to 15 TB
/tmp Temporary space local to the compute cluster Dependent on available storage on nodes. Verify with df -h.
/dev/shm Small temporary in-memory disk space local to the compute cluster Dependent on memory size set in your Slurm job.

/home: Home file system

Each user has a directory under /home and is the default working directory when logging in to ARC. Each home directory has a per-user quota of 500 GB. This limit is fixed and cannot be increased. Researchers requiring additional storage exceeding what is available on their home directory may use /work and /scratch.

Note on file sharing: Due to security concerns, permissions set using chmod on your home directory to allow other users to read/write to your home directory be automatically reverted by an automated system process unless an explicit exception is made. If you need to share files with other researchers on the ARC cluster, please write to support@hpc.ucalgary.ca to ask for such an exception.

/scratch: Scratch file system for large job-oriented storage

Associated with each job, under the /scratch directory, a subdirectory is created that can be referenced in job scripts as /scratch/${SLURM_JOB_ID}. You can use that directory for temporary files needed during the course of a job. Up to 15 TB of storage may be used, per user (total for all your jobs) in the /scratch file system.

Data in /scratch associated with a given job will be deleted automatically, without exception, five days after the job finishes.

/work: Work file system for larger projects

If you need more space than provided in /home and the /scratch job-oriented space is not appropriate for you case, please write to support@hpc.ucalgary.ca with an explanation, including an indication of how much storage you expect to need and for how long. If approved, you will then be assigned a directory under /work with an appropriately large quota.

/tmp,/var/tmp: Temporary files

You may use /tmp or /var/tmp for storing temporary files generated by your job. The /tmp is stored on a disk local to the compute node and is not shared across the cluster. The files stored here will be removed immediately after your job terminates.

/dev/shm, /run/user/$uid: In-memory temporary files

/dev/shm and /run/user/$UID is writable location for temporary files backed by virtual memory. This can be used if faster I/O is required. This is ideal for workloads that require many small read/writes to share data between processes or as a fast cache. The amount of data you can write here is dependent on the amount of free memory available to your job. The files stored at these locations will be removed immediately after your job terminates.

Software

All ARC nodes run the latest version of Rocky Linux 8 with the same set of base software packages. To maintain the stability and consistency of all nodes, any additional dependencies that your software requires must be installed under your account. For your convenience, we have packaged commonly used software packages and dependencies as modules available under /global/software. If your software package is not available as a module, you may also try Anaconda which allows users to manage and install custom packages in an isolated environment.

For a list of available packages that have been made available, please see ARC Software pages.

Please contact us at support@hpc.ucalgary.ca if you need additional software installed.

Modules

The setup of the environment for using some of the installed software is through the module command. An overview of modules on WestGrid (external link) is largely applicable to ARC.

Software packages bundled as a module will be available under /global/software and can be listed with the module avail command.

$ module avail

To enable Python, load the Python module by running:

$ module load python/anaconda-3.6-5.1.0

To unload the Python module, run:

$ module remove python/anaconda-3.6-5.1.0

To see currently loaded modules, run:

$ module list

By default, no modules are loaded on ARC. If you wish to use a specific module, such as the Intel compilers or the Open MPI parallel programming packages, you must load the appropriate module.

Job submission

Interactive Jobs

The ARC login node may be used for such tasks as editing files, compiling programs and running short tests while developing programs. We suggest CPU intensive workloads on the login node be restricted to under 15 minutes as per our cluster guidelines. For interactive workloads exceeding 15 minutes, use the salloc command to allocate an interactive session on a compute node.

The default salloc allocation is 1 CPU and 1 GB of memory. Adjust this by specifying -n CPU# and --mem Megabytes. You may request up to 5 hours of CPU time for interactive jobs.

salloc --time=5:00:00 --partition=cpu2019

Always use salloc or srun to start an interactive job. Do not SSH directly to a compute node as SSH sessions will be refused without an active job running.


Running non-interactive jobs (batch processing)

Production runs and longer test runs should be submitted as (non-interactive) batch jobs, in which commands to be executed are listed in a script (text file). Batch jobs scripts are submitted using the sbatch command, part of the Slurm job management and scheduling software. #SBATCH directive lines at the beginning of the script are used to specify the resources needed for the job (cores, memory, run time limit and any specialized hardware needed).

Most of the information on the Running Jobs (external link) page on the Compute Canada web site is also relevant for submitting and managing batch jobs and reserving processors for interactive work on ARC. One major difference between running jobs on the ARC and Compute Canada clusters is in selecting the type of hardware that should be used for a job. On ARC, you choose the hardware to use primarily by specifying a partition, as described below.

Selecting a Partition

There are some aspects to consider when selecting a partition including:

  • Resource requirements in terms of memory and CPU cores
  • Hardware specific requirements, such as GPU or CPU Instruction Set Extensions
  • Partition resource limits and potential wait time
  • Software support parallel processing using Message Passing Interface (MPI), OpenMP, etc.
    • Eg. MPI for parallel processing can distribute memory across multiple nodes, per-node memory requirements could be lower. Whereas, OpenMP or single process code that is restricted to one node would require a higher memory node.
    • Note: MPI code running on hardware with Omni-Path networking should be compiled with Omni-Path networking support. This is provided by loading the openmpi/2.1.3-opa or openmpi/3.1.2-opa modules prior to compiling.

Since resources that are requested are reserved for your job, please request only as much CPU and memory as your job requires to avoid reducing the cluster efficiency. If you are unsure which partition to use or the specific resource requests that are appropriate for your jobs, please contact us at support@hpc.ucalgary.ca and we would be happy to work with you.

Partition Description Cores/node Memory Request Limit Time Limit GPU Networking
cpu2021 General Purpose Compute 48 185,000 MB 7 days ‡ 100 Gbit/s Omni-Path
cpu2019 General Purpose Compute 40 185,000 MB 7 days ‡ 100 Gbit/s Omni-Path
bigmem Big Memory Compute 80 3,000,000 MB 24 hours ‡ 100 Gbit/s Omni-Path
gpu-v100 GPU Compute 80 753,000 MB 24 hours ‡ 2 100 Gbit/s Omni-Path
apophis† Private Research Partition 40 185,000 MB 7 days ‡ 100 Gbit/s Omni-Path
razi† Private Research Partition 40 185,000 MB 7 days ‡ 100 Gbit/s Omni-Path
pawson† Private Research Partition 40 185,000 MB 7 days ‡ 100 Gbit/s Omni-Path
sherlock† Private Research Partition 7 185,000 MB 7 days ‡ 100 Gbit/s Omni-Path
theia† Private Research Partition 28 188,000 MB 7 days ‡ 40 Gbit/s InfiniBand
synergy† Private Research Partition 14 245,000 MB 7 days ‡ 40 Gbit/s InfiniBand
cpu2013 Legacy General Purpose Compute 16 120000 7 days ‡ 40 Gbit/s InfiniBand
lattice Legacy General Purpose Compute 8 12000 7 days ‡ 40 Gbit/s InfiniBand
parallel Legacy General Purpose Compute 12 23000 7 days ‡ 40 Gbit/s InfiniBand
single Legacy Single-Node Job Compute 8 12000 7 days ‡ 40 Gbit/s InfiniBand
cpu2021-bf24 Back-fill Compute (2021-era hardware, 24h) 48 185,000 MB 24 hours ‡ 100 Gbit/s Omni-Path
cpu2019-bf05 Back-fill Compute (2019-era hardware, 5h) 40 185,000 MB 5 hours ‡ 100 Gbit/s Omni-Path
cpu2017-bf05 Back-fill Compute (2017-era hardware, 5h) 14 245,000 MB 5 hours ‡ 40 Gbit/s InfiniBand
† These partitions contain hardware contributed to ARC by particular researchers and should only be used by members of their research groups. However, they have generously allowed their compute nodes to be shared with others outside their research groups for short jobs. A special 'back-fill' or -bf partition is available for use by all ARC users for jobs shorter than 5 hours.
‡ As time limits may be changed by administrators to adjust to maintenance schedules or system load, the values given in the tables are not definitive. See the Time limits section below for commands you can use on ARC itself to determine current limits.

Backfill partitions

Backfill partitions can be used by all users on ARC for short-term jobs. The hardware backing these partitions are generously contributed by researchers. We recommend including the backfill partitions for short term jobs as it may help reduce your job's wait time and increase the overall cluster throughput.

Previously, each contributing research group had their own backfill partition. Since June 2021, we have merged:

  • apophis-bf, pawson-bf, and razi-bf into cpu2019-bf05
  • theia-bf and synergy-bf into cpu2017-bf05

The naming scheme of the backfill partitions is the CPU generation year, followed by -bf and the time limit in hours. For example, cpu2017-bf05 would represent a backfill partition containing processors from 2017 with a time limit of 5 hours.

Hardware resource and job policy limits

In addition to the hardware limitations, please be aware that there may also be policy limits imposed on your account for each partition. These limits restrict the number of cores, nodes, or GPUs that can be used at any given time. Since the limits are applied on a partition-by-partition basis, using resources in one partition should not affect the available resources you can use in another partition.

These limits can be listed by running:

$ sacctmgr show qos format=Name,MaxWall,MaxTRESPU%20,MaxSubmitJobs
      Name     MaxWall            MaxTRESPU MaxSubmit
---------- ----------- -------------------- ---------
    normal  7-00:00:00                           2000
    breezy  3-00:00:00              cpu=384      2000
       gpu  7-00:00:00                          13000
   cpu2019  7-00:00:00              cpu=240      2000
  gpu-v100  1-00:00:00    cpu=80,gres/gpu=4      2000
    single  7-00:00:00      cpu=408,node=75      2000
      razi  7-00:00:00                           2000

Specifying a partition in a job

One you have decided which partitions best suits your computation, you can select one or more partition on a job-by-job basis by including the partition keyword for an SBATCH directive in your batch job. Multiple partitions should be comma separated. If you omit the partition specification, the system will try to assign your job to appropriate hardware based on other aspects of your request.

In some cases, you really should specify the partition explicitly. For example, if you are running single-node jobs with thread-based parallel processing requesting 8 cores you could use:

#SBATCH --mem=0              ❶
#SBATCH --nodes=1            ❷
#SBATCH --ntasks=1           ❸
#SBATCH --cpus-per-task=8    ❹
#SBATCH --partition=single,lattice   ❺

A few things to mention in this example:

  1. --mem=0 allocates all available memory on the compute node for the job. This effectively allocates the entire node for your job.
  2. --nodes=1 allocates 1 node for the job
  3. --ntasks=1 your job has a single task
  4. --cpus-per-task=8 asks for 8 CPUs per task. This job in total will request 8 * 1, or 8 CPUs.
  5. --partition=single,lattice specifies that this job can run on either single or lattice.

Suppose that your job requires at most 8 CPU cores and 10 GB of memory. The above Slurm request would be valid and optimal since your job fits neatly in a single node on the single and parallel partition. However, if you failed to specify the partition, Slurm may try to schedule your job to a partition with larger nodes, such as cpu2019 where each node has 40 cores and 190 GB of memory. If your job is scheduled on such a node, your job will be effectively wasting 32 cores and 180 GB of memory because --mem=0 not only requests for 190 GB on this node, but also prevents other jobs from being scheduled on the same node.

If you don't specify a partition, please give greater thought to the memory specification to make sure that the scheduler will not assign your job more resources than are needed.

Parameters such as --ntasks-per-cpu, --cpus-per-task, --mem and --mem-per-cpu> have to be adjusted according to the capabilities of the hardware also. The product of --ntasks-per-cpu and --cpus-per-task should be less than or equal to the number given in the "Cores/node" column. The --mem> parameter (or the product of --mem-per-cpu and --cpus-per-task) should be less than the "Memory limit" shown. If using whole nodes, you can specify --mem=0 to request the maximum amount of memory per node.

Examples

Here are some examples of specifying the various partitions.

As mentioned in the Hardware section above, the ARC cluster was expanded in January 2019. To select the 40-core general purpose nodes specify:

#SBATCH --partition=cpu2019

To run on the Tesla V100 GPU-enabled nodes, use the gpu-v100 partition. You will also need to include an SBATCH directive in the form --gres=gpu:n to specify the number of GPUs, n, that you need. For example, if the software you are running can make use of both GPUs on a gpu-v100 partition compute node, use:

#SBATCH --partition=gpu-v100 --gres=gpu:2

For very large memory jobs (more than 185000 MB), specify the bigmem partition:

#SBATCH --partition=bigmem

If the more modern computers are too busy or you have a job well-suited to run on the compute nodes described in the legacy hardware section above, choose the cpu2013, Lattice or Parallel compute nodes by specifying the corresponding partition keyword:

#SBATCH --partition=cpu2013
#SBATCH --partition=lattice
#SBATCH --partition=parallel

There is an additional partition called single that provides nodes similar to the lattice partition, but, is intended for single-node jobs. Select the single partition with

#SBATCH --partition=single

Time limits

Use the --time directive to tell the job scheduler the maximum time that your job might run. For example:

#SBATCH --time=hh:mm:ss

You can use scontrol show partitions or sinfo to see the current maximum time that a job can run.

$ scontrol show partitions
PartitionName=single                                                                 
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL                                    
   AllocNodes=ALL Default=NO QoS=single                                              
   DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO        
   MaxNodes=UNLIMITED MaxTime=7-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED  
   Nodes=cn[001-168]                                                                 
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO        
   OverTimeLimit=NONE PreemptMode=OFF                                                
   State=UP TotalCPUs=1344 TotalNodes=168 SelectTypeParameters=NONE                  
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED

Alternatively, with sinfo under the TIMELIMIT column:

$ sinfo                                                     
PARTITION  AVAIL  TIMELIMIT  NODES  STATE NODELIST               
single        up 7-00:00:00      1 drain* cn097                  
single        up 7-00:00:00      1  maint cn002                  
single        up 7-00:00:00      4 drain* cn[001,061,133,154]    
...

Support

Support Icon.png

Need Help or have other ARC Related Questions?

For all general RCS related issues, questions, or comments, please contact us at support@hpc.ucalgary.ca.

Please don't hesitate to contact us directly by email if you need help using ARC or require guidance on migrating and running your workflows to ARC.