MARC Cluster Guide
This guide gives an overview of the MARC (Medical Advanced Research Computing) cluster at the University of Calgary.
It is intended to be read by new account holders getting started on MARC, covering such topics as the hardware and performance characteristics, available software, usage policies and how to log in.
For MARC-related questions not answered here, please write to support@hpc.ucalgary.ca .
Cybersecurity awareness at the U of C
Please note that there are typically about 950 phishing attempts targeting University of Calgary accounts each month. This is just a reminder to be careful about computer security issues, both at home and at the University. See [1] for more information, such as tips for secure computing and how to report suspected security problems.
Introduction
MARC is a cluster comprised of Linux-based computers purchased in 2019
The MARC cluster has been designed to enable analyses and computations on data requiring Level 3 and Level 4 safeguards as described in the University of Calgary Information Security Classification Standard here: https://www.ucalgary.ca/policies/files/policies/im010-03-security-standard_0.pdf
Due to security requirements for Level 3/4 data, some necessary restrictions have been placed on MARC to prevent accidental (or otherwise) data exfiltration.
- Compute nodes and login nodes have no access to the internet
- All data must be ingested to MARC by first copying it to SCDS (Secure Compute Data Store) and then fetching it from SCDS to MARC.
- Resulting data (outputs of analyses) must be copyied to SCDS and then fetching it from SCDS to wherever it needs to go using established means.
- All file accesses are recorded for auditing purposes.
- ssh connections to MARC must be through the IT Citrix system (Admin VPN is not sufficient nor necessary)
- All accounts must be IT accounts
- A project ID is required to use MARC. This project ID is the same number that is used on SCDS
Accounts
If you have a project you think would be appropriate for MARC, please make a request in the IT Service Catalogue TODO: LINK REQUIRED To assist you getting started, it would be helpful if you mention what software you plan to use.
Hardware
MARC has compute nodes of two different varieties:
- 8 GPU (Graphics Processing Unit)-enabled nodes containing:
- 40-cores: each node having 2 sockets. Each socket has an Intel Xeon Gold 6148 20-core processor, running at 2.4 GHz.
- The 40 cores on the individual compute nodes share about 750 GB of RAM (memory) but, jobs should request no more than 753000 MB.
- Two Tesla V100-PCIE-16GB GPUs.
- 1 Bigmem Node
- 80-cores: node with 4 sockets. Each socket has an Intel Xeon Gold 6148 20-core processor, running at 2.4 GHz.
- The 80 cores on the node share about 3 TB of RAM (memory), but, jobs should request no more than 3000000 MB.
cpu2019
Allows non-GPU jobs to use:
- Up to 38 cpus per node
- No gpus.
- Up to 500GB memory
- Are the same
gpu2019
Allows jobs requiring nVidia v100 gpu jobs to use:
- 1 or 2 gpus per node
- Up to 40 cpus per node.
- Up to 750GB memory
bigmem
For very large memory jobs:
- Up to 80 cpus
- Up to 3TB memory
- No gpus
Storage
About a petabyte of raw disk storage is available to the MARC cluster, but for error checking and performance reasons, the amount of usable storage for researchers' projects is considerably le\ ss than that. From a user's perspective, the total amount of storage is less important than the individual storage limits. As described below, there are three storage areas: home, scratch an\ d work, with different limits and usage policies.
Home file system: /home
There is a per-user quota of 50 GB under /home. This limit is fixed and cannot be increased. Each user has a directory under /home, which is the default working directory when logging in to MARC. It is expected that most researchers will work from /project and only use home for software and such things.
Scratch file system for large job-oriented storage: /scratch
Associated with each job, under the /scratch directory, a subdirectory is created that can be referenced in job scripts as /scratch/${SLURM_JOB_ID}. You can use that directory for temporary fi\ les needed during the course of a job. Up to 30 TB of storage may be used, per user (total for all your jobs) in the /scratch file system. Deletion policy: data in /scratch associated with a \ given job will be deleted automatically, without exception, five days after the job finishes.
Work file system for larger projects: /work
If you need more space than provided in /home and the /scratch job-oriented space is not appropriate for you case, please write to support@hpc.ucalgary.ca with an explanation, including an ind\ ication of how much storage you expect to need and for how long. If approved, you will then be assigned a directory under /work with an appropriately large quota.