MARC Cluster Guide: Difference between revisions
m (added marc) |
No edit summary |
||
(18 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
{{MARC Cluster Status}} | |||
{{Message Box | {{Message Box | ||
|title=Need Help or have other MARC Related Questions? | |||
|title=Need Help or have other | |||
|message=For all general RCS related issues, questions, or comments, please contact us at support@hpc.ucalgary.ca. | |message=For all general RCS related issues, questions, or comments, please contact us at support@hpc.ucalgary.ca. | ||
|icon=Support Icon.png}} | |icon=Support Icon.png}} | ||
This guide gives an overview of the MARC (Medical Advanced Research Computing) cluster at the University of Calgary and is intended to be read by new account holders getting started on MARC. This guide covers information on MARC's restrictions, hardware, performance characteristics, and storage information. | |||
== Introduction == | |||
MARC is a cluster of Linux-based computers that were purchased in 2019. It has been specifically designed to meet the security requirements for handling Level 3 and Level 4 classified data, as defined by the University of Calgary Information Security Classification Standard, which you can find here: https://www.ucalgary.ca/policies/files/policies/im010-03-security-standard_0.pdf | |||
To ensure the security of Level 3/4 data, MARC has implemented several restrictions: | |||
* '''Account Requirement''': All users must have IT accounts. | |||
* '''Project ID Requirement''': To use MARC, a project ID is required. This project ID is the same number used on SCDS. | |||
* '''SSH Access via Citrix''': Access to MARC via SSH must be done through the IT Citrix system. Admin VPN access is neither sufficient nor necessary. | |||
* '''Internet Access''': Neither compute nodes nor login nodes have internet access. | |||
* '''Data Ingestion via SCDS''': All data must be transferred to MARC by first copying it to SCDS (Secure Compute Data Store) and then fetching it from SCDS to MARC. | |||
* '''Data Retrieval via SCDS''': Resulting data, such as analysis outputs, must be copied to SCDS and then fetched from SCDS to its intended destination using established methods. | |||
* '''Data Auditing''': All file accesses are logged for auditing purposes. | |||
These measures have been put in place to ensure the security and controlled handling of sensitive data on the MARC cluster and to prevent intentional or accidental data exfiltration. | |||
MARC | |||
== Obtaining an account == | |||
Please refer to the [[how to get a MARC account]] page for more information on how to obtain and log in to MARC. | |||
== Hardware == | == Hardware == | ||
MARC has compute nodes of two different varieties: | MARC has compute nodes of two different varieties: | ||
* | * 4 GPU (Graphics Processing Unit)-enabled nodes containing: | ||
** 40-cores: each node having 2 sockets. Each socket has an Intel Xeon Gold 6148 20-core processor, running at 2.4 GHz. | ** 40-cores: each node having 2 sockets. Each socket has an Intel Xeon Gold 6148 20-core processor, running at 2.4 GHz. | ||
** The 40 cores on the individual compute nodes share about 750 GB of RAM (memory) but, jobs should request no more than 753000 MB. | ** The 40 cores on the individual compute nodes share about 750 GB of RAM (memory) but, jobs should request no more than 753000 MB. | ||
Line 39: | Line 34: | ||
** 80-cores: node with 4 sockets. Each socket has an Intel Xeon Gold 6148 20-core processor, running at 2.4 GHz. | ** 80-cores: node with 4 sockets. Each socket has an Intel Xeon Gold 6148 20-core processor, running at 2.4 GHz. | ||
** The 80 cores on the node share about 3 TB of RAM (memory), but, jobs should request no more than 3000000 MB. | ** The 80 cores on the node share about 3 TB of RAM (memory), but, jobs should request no more than 3000000 MB. | ||
The hardware is broken out into three distinct partitions with the following restrictions. | |||
{| class="wikitable" | |||
!Partition | |||
!Purpose and description | |||
!CPUs per Node | |||
!GPUs per Node | |||
!Memory per Node | |||
|- | |||
|cpu2019 | |||
|For non-GPU jobs; homogeneous nodes | |||
|Up to 38 | |||
|No GPUs | |||
|Up to 500GB | |||
|- | |||
|gpu2019 | |||
|For jobs requiring nVidia v100 gpu jobs | |||
|Up to 40 | |||
|1 or 2 GPUs | |||
|Up to 750GB | |||
|- | |||
|bigmem | |||
|For very large memory jobs | |||
|Up to 80 | |||
|No GPUs | |||
|Up to 3TB | |||
|} | |||
== Storage == | |||
About a petabyte of raw disk storage is available to the MARC cluster, but for error checking and performance reasons, the amount of usable storage for researchers' projects is considerably less than that. From a user's perspective, the total amount of storage is less important than the individual storage limits. | |||
MARC storage is not accessible outside of MARC. | |||
=== | === Home file system: /home === | ||
Each account on MARC has a home directory under <code>/home</code> with a 25 GB per-user quota. This limit is fixed and cannot be increased. The intended use for the home directory is for software, scripts, and configuration files and must only contain Level 1/2 data only. Do not store patient identifiable files (Level 3/4) in your home directory. Level 3/4 data is only appropriate under <code>/project</code>. | |||
=== | === Project file system: /project === | ||
All projects will have a directory under <code>/project</code> named after your project ID and will be the same as your SCDS share name. All files related to a project should only be stored within its assigned <code>/project</code> directory. Quotas in <code>/project</code> are somewhat flexible and are assigned based on the project requirements. Please contact support@hpc.ucalgary.ca if you require additional space. | |||
The SCDS share and the project directory on MARC are two separate storage systems. Data transferred to the <code>/project</code> directory is a second copy of the data on the SCDS share. On a SCDS share deleting files will not create free space. The NetApp device that hosts the SCDS drive maintains a copy of the deleted files in ‘snapshot space’. Hence the quota is consumed by the files in the SCDS share plus the files in ‘snapshot space’. | |||
=== | === Temporary file system: /tmp === | ||
Each compute node has a temporary storage location available under <code>/tmp</code> that is only accessible from within that node and for the running job. Data stored in <code>/tmp</code> will be deleted immediately after the job terminates. It is suitable for all levels of data. | |||
Because the login node is a shared system, you must take care that data stored within <code>/tmp</code> on the login node is restricted to only your account using the appropriate file modes (<code>chmod</code> and <code>chown</code>/<code>chgrp</code> commands) and deleted after use. | |||
== Software installations == | == Software installations == | ||
Line 72: | Line 83: | ||
=== Python === | === Python === | ||
There are some complications in using Python on MARC relative to using ARC. | There are some complications in using Python on MARC relative to using ARC. | ||
Normally, we would recommend installing | |||
On MARC, security requirements for working with | Normally, we would recommend installing Conda in user's home directory. | ||
On MARC, security requirements for working with Level 4 data require that we block outgoing and incoming internet connections. | |||
As a result, new packages cannot be downloaded with conda. | As a result, new packages cannot be downloaded with conda. | ||
Line 84: | Line 96: | ||
** Install it in your home directory with $bash Anaconda3-2020.07-Linux-x86_64.sh | ** Install it in your home directory with $bash Anaconda3-2020.07-Linux-x86_64.sh | ||
** you will be asked to agree to a license agreement and to confirm that you wish to create a folder anaconda3 once the installation completes, you will have a new directory under your home directory ~/anaconda3. In order to make it possible to use the local conda instance you will need to change the system path to include your local python directories $ export PATH=~/anaconda3/bin:$PATH | ** you will be asked to agree to a license agreement and to confirm that you wish to create a folder anaconda3 once the installation completes, you will have a new directory under your home directory ~/anaconda3. In order to make it possible to use the local conda instance you will need to change the system path to include your local python directories $ export PATH=~/anaconda3/bin:$PATH | ||
* Download a docker container with the software that you need including python (e.g. tensorflow-gpu) | * Download a docker container with the software that you need including python (e.g. tensorflow-gpu) | ||
** Transfer the docker container to MARC via SCDS | ** Transfer the docker container to MARC via SCDS | ||
** Copy it to your /home directory | ** Copy it to your /home directory | ||
** Run it with singularity | ** Run it with singularity | ||
* Non-open source software which requires a connection to a license server may require admin assistance to set up. contact [mailto:support@hpc.ucalgary.ca support@hpc.ucalgary.ca] for support. | |||
== Further Reading == | == Further Reading == | ||
See [[ | See [[Running jobs]] for information on how to submit a job in the HPC cluster. | ||
[[Category:MARC]] | [[Category:MARC]] | ||
[[Category:Guides]] | [[Category:Guides]] |
Latest revision as of 17:10, 13 August 2024
|
MARC status: Cluster operational No upgrades planned. Please contact us if you experience system issues. See the MARC Cluster Status page for system notices. |
Need Help or have other MARC Related Questions? For all general RCS related issues, questions, or comments, please contact us at support@hpc.ucalgary.ca.
|
This guide gives an overview of the MARC (Medical Advanced Research Computing) cluster at the University of Calgary and is intended to be read by new account holders getting started on MARC. This guide covers information on MARC's restrictions, hardware, performance characteristics, and storage information.
Introduction
MARC is a cluster of Linux-based computers that were purchased in 2019. It has been specifically designed to meet the security requirements for handling Level 3 and Level 4 classified data, as defined by the University of Calgary Information Security Classification Standard, which you can find here: https://www.ucalgary.ca/policies/files/policies/im010-03-security-standard_0.pdf
To ensure the security of Level 3/4 data, MARC has implemented several restrictions:
- Account Requirement: All users must have IT accounts.
- Project ID Requirement: To use MARC, a project ID is required. This project ID is the same number used on SCDS.
- SSH Access via Citrix: Access to MARC via SSH must be done through the IT Citrix system. Admin VPN access is neither sufficient nor necessary.
- Internet Access: Neither compute nodes nor login nodes have internet access.
- Data Ingestion via SCDS: All data must be transferred to MARC by first copying it to SCDS (Secure Compute Data Store) and then fetching it from SCDS to MARC.
- Data Retrieval via SCDS: Resulting data, such as analysis outputs, must be copied to SCDS and then fetched from SCDS to its intended destination using established methods.
- Data Auditing: All file accesses are logged for auditing purposes.
These measures have been put in place to ensure the security and controlled handling of sensitive data on the MARC cluster and to prevent intentional or accidental data exfiltration.
Obtaining an account
Please refer to the how to get a MARC account page for more information on how to obtain and log in to MARC.
Hardware
MARC has compute nodes of two different varieties:
- 4 GPU (Graphics Processing Unit)-enabled nodes containing:
- 40-cores: each node having 2 sockets. Each socket has an Intel Xeon Gold 6148 20-core processor, running at 2.4 GHz.
- The 40 cores on the individual compute nodes share about 750 GB of RAM (memory) but, jobs should request no more than 753000 MB.
- Two Tesla V100-PCIE-16GB GPUs.
- 1 Bigmem Node
- 80-cores: node with 4 sockets. Each socket has an Intel Xeon Gold 6148 20-core processor, running at 2.4 GHz.
- The 80 cores on the node share about 3 TB of RAM (memory), but, jobs should request no more than 3000000 MB.
The hardware is broken out into three distinct partitions with the following restrictions.
Partition | Purpose and description | CPUs per Node | GPUs per Node | Memory per Node |
---|---|---|---|---|
cpu2019 | For non-GPU jobs; homogeneous nodes | Up to 38 | No GPUs | Up to 500GB |
gpu2019 | For jobs requiring nVidia v100 gpu jobs | Up to 40 | 1 or 2 GPUs | Up to 750GB |
bigmem | For very large memory jobs | Up to 80 | No GPUs | Up to 3TB |
Storage
About a petabyte of raw disk storage is available to the MARC cluster, but for error checking and performance reasons, the amount of usable storage for researchers' projects is considerably less than that. From a user's perspective, the total amount of storage is less important than the individual storage limits.
MARC storage is not accessible outside of MARC.
Home file system: /home
Each account on MARC has a home directory under /home
with a 25 GB per-user quota. This limit is fixed and cannot be increased. The intended use for the home directory is for software, scripts, and configuration files and must only contain Level 1/2 data only. Do not store patient identifiable files (Level 3/4) in your home directory. Level 3/4 data is only appropriate under /project
.
Project file system: /project
All projects will have a directory under /project
named after your project ID and will be the same as your SCDS share name. All files related to a project should only be stored within its assigned /project
directory. Quotas in /project
are somewhat flexible and are assigned based on the project requirements. Please contact support@hpc.ucalgary.ca if you require additional space.
The SCDS share and the project directory on MARC are two separate storage systems. Data transferred to the /project
directory is a second copy of the data on the SCDS share. On a SCDS share deleting files will not create free space. The NetApp device that hosts the SCDS drive maintains a copy of the deleted files in ‘snapshot space’. Hence the quota is consumed by the files in the SCDS share plus the files in ‘snapshot space’.
Temporary file system: /tmp
Each compute node has a temporary storage location available under /tmp
that is only accessible from within that node and for the running job. Data stored in /tmp
will be deleted immediately after the job terminates. It is suitable for all levels of data.
Because the login node is a shared system, you must take care that data stored within /tmp
on the login node is restricted to only your account using the appropriate file modes (chmod
and chown
/chgrp
commands) and deleted after use.
Software installations
Python
There are some complications in using Python on MARC relative to using ARC.
Normally, we would recommend installing Conda in user's home directory. On MARC, security requirements for working with Level 4 data require that we block outgoing and incoming internet connections. As a result, new packages cannot be downloaded with conda.
Depending on what you need, the two recommendations we can make are
- Download the standard anaconda distribution from the anaconda website to a personal computer: https://repo.anaconda.com/archive/Anaconda3-2020.07-Linux-x86_64.sh
- Transfer the script to MARC via SCDS
- Copy it to your /home directory
- Install it in your home directory with $bash Anaconda3-2020.07-Linux-x86_64.sh
- you will be asked to agree to a license agreement and to confirm that you wish to create a folder anaconda3 once the installation completes, you will have a new directory under your home directory ~/anaconda3. In order to make it possible to use the local conda instance you will need to change the system path to include your local python directories $ export PATH=~/anaconda3/bin:$PATH
- Download a docker container with the software that you need including python (e.g. tensorflow-gpu)
- Transfer the docker container to MARC via SCDS
- Copy it to your /home directory
- Run it with singularity
- Non-open source software which requires a connection to a license server may require admin assistance to set up. contact support@hpc.ucalgary.ca for support.
Further Reading
See Running jobs for information on how to submit a job in the HPC cluster.