MARC Cluster Guide: Difference between revisions

From RCSWiki
Jump to navigation Jump to search
m (Phillips moved page Marc Cluster Guide to MARC Cluster Guide)
(→‎Hardware: changed 8 nodes to 4 nodes)
(11 intermediate revisions by 4 users not shown)
Line 1: Line 1:
{{Message Box
|icon=Security Icon.png
|title=Cybersecurity awareness at the U of C
|message=Please note that there are typically about 950 phishing attempts targeting University of Calgary accounts each month. This is just a reminder to be careful about computer security issues, both at home and at the University. Please visit https://it.ucalgary.ca/it-security for more information, tips on secure computing, and how to report suspected security problems.}}
{{Message Box
|title=Need Help or have other MARC Related Questions?
|message=For all general RCS related issues, questions, or comments, please contact us at support@hpc.ucalgary.ca.
|icon=Support Icon.png}}
This guide gives an overview of the MARC (Medical Advanced Research Computing) cluster at the University of Calgary.
This guide gives an overview of the MARC (Medical Advanced Research Computing) cluster at the University of Calgary.


Line 4: Line 15:


If you are looking for how to login to MARC or how to get an account, please see [[MARC_accounts]]
If you are looking for how to login to MARC or how to get an account, please see [[MARC_accounts]]
For MARC-related questions not answered here, please write to support@hpc.ucalgary.ca .
'''Cybersecurity awareness at the U of C'''
Please note that there are typically about 950 phishing attempts targeting University of Calgary accounts each month. This is just a reminder to be careful about computer security issues, both at home and at the University. See [https://it.ucalgary.ca/it-security] for more information, such as tips for secure computing and how to report suspected security problems.


== Introduction ==
== Introduction ==
Line 27: Line 32:
== Hardware ==
== Hardware ==
MARC has compute nodes of two different varieties:  
MARC has compute nodes of two different varieties:  
* 8 GPU (Graphics Processing Unit)-enabled nodes containing:
* 4 GPU (Graphics Processing Unit)-enabled nodes containing:
** 40-cores: each node having 2 sockets. Each socket has an Intel Xeon Gold 6148 20-core processor, running at 2.4 GHz.  
** 40-cores: each node having 2 sockets. Each socket has an Intel Xeon Gold 6148 20-core processor, running at 2.4 GHz.  
** The 40 cores on the individual compute nodes share about 750 GB of RAM (memory) but, jobs should request no more than 753000 MB.
** The 40 cores on the individual compute nodes share about 750 GB of RAM (memory) but, jobs should request no more than 753000 MB.
Line 58: Line 63:


=== Home file system: /home ===
=== Home file system: /home ===
There is a per-user quota of 50 GB under /home. This limit is fixed and cannot be increased.  Each user has a directory under /home, which is the default working directory when logging in to MARC. It is expected that most researchers will work from /project and only use home for software and such things.  /home is expected to be used only for L1/L2 data and not for your patient identifiable files.  The identifiable files go in the appropriate directory under /project.
There is a per-user quota of 25 GB under /home. This limit is fixed and cannot be increased.  Each user has a directory under /home, which is the default working directory when logging in to MARC. It is expected that most researchers will work from /project and only use home for software and such things.  /home is expected to be used only for L1/L2 data and not for your patient identifiable files.  The identifiable files go in the appropriate directory under /project.


=== Project file system for larger projects: /project ===
=== Project file system for larger projects: /project ===
Directories will be created in /project named after your project ID.  This name will be the same as your SCDS share name.  The expectation is that all files to do with that project will be stored in /project/projectid.  Quotas in /project are somewhat flexible.  Please write to support@hpc.ucalgary.ca with an estimate of how much space you will require.
Directories will be created in /project named after your project ID.  This name will be the same as your SCDS share name.  The expectation is that all files to do with that project will be stored in /project/projectid.  Quotas in /project are somewhat flexible.  Please write to support@hpc.ucalgary.ca with an estimate of how much space you will require.
== Software installations ==
=== Python ===
There are some complications in using Python on MARC relative to using ARC.
Normally, we would recommend installing conda in user's home directory.
On MARC, security requirements for working with L4 data require that we block outgoing and incoming internet connections.
As a result, new packages cannot be downloaded with conda.
Depending on what you need, the two recommendations we can make are
* Download the standard anaconda distribution from the anaconda website to a personal computer: https://repo.anaconda.com/archive/Anaconda3-2020.07-Linux-x86_64.sh
** Transfer the script to MARC via SCDS
** Copy it to your /home directory
** Install it in your home directory with $bash Anaconda3-2020.07-Linux-x86_64.sh
** you will be asked to agree to a license agreement and to confirm that you wish to create a folder anaconda3 once the installation completes, you will have a new directory under your home directory ~/anaconda3. In order to make it possible to use the local conda instance you will need to change the system path to include your local python directories $ export PATH=~/anaconda3/bin:$PATH
* Download a docker container with the software that you need including python (e.g. tensorflow-gpu)
** Transfer the docker container to MARC via SCDS
** Copy it to your /home directory
** Run it with singularity
* Non-open source software which requires a connection to a license server may require admin assistance to set up. contact [mailto:support@hpc.ucalgary.ca support@hpc.ucalgary.ca] for support.<br />


== Further Reading ==
== Further Reading ==
See [[Running_jobs]] for information on starting a job
See [[Running_jobs]] for information on starting a job
[[Category:MARC]]
[[Category:Guides]]

Revision as of 22:41, 19 January 2022

Security Icon.png

Cybersecurity awareness at the U of C

Please note that there are typically about 950 phishing attempts targeting University of Calgary accounts each month. This is just a reminder to be careful about computer security issues, both at home and at the University. Please visit https://it.ucalgary.ca/it-security for more information, tips on secure computing, and how to report suspected security problems.

Support Icon.png

Need Help or have other MARC Related Questions?

For all general RCS related issues, questions, or comments, please contact us at support@hpc.ucalgary.ca.


This guide gives an overview of the MARC (Medical Advanced Research Computing) cluster at the University of Calgary.

It is intended to be read by new account holders getting started on MARC, covering such topics as the hardware and performance characteristics, available software, usage policies and how to log in.

If you are looking for how to login to MARC or how to get an account, please see MARC_accounts

Introduction

MARC is a cluster comprised of Linux-based computers purchased in 2019

The MARC cluster has been designed with controls appropriate for Level 3 and Level 4 classified data. The University of Calgary Information Security Classification Standard is published here: https://www.ucalgary.ca/policies/files/policies/im010-03-security-standard_0.pdf

Due to security requirements for Level 3/4 data, some necessary restrictions have been placed on MARC to prevent accidental (or otherwise) data exfiltration.

  • Compute nodes and login nodes have no access to the internet
  • All data must be ingested to MARC by first copying it to SCDS (Secure Compute Data Store) and then fetching it from SCDS to MARC.
  • Resulting data (outputs of analyses) must be copyied to SCDS and then fetching it from SCDS to wherever it needs to go using established means.
  • All file accesses are recorded for auditing purposes.
  • ssh connections to MARC must be through the IT Citrix system (Admin VPN is not sufficient nor necessary)
  • All accounts must be IT accounts
  • A project ID is required to use MARC. This project ID is the same number that is used on SCDS

Hardware

MARC has compute nodes of two different varieties:

  • 4 GPU (Graphics Processing Unit)-enabled nodes containing:
    • 40-cores: each node having 2 sockets. Each socket has an Intel Xeon Gold 6148 20-core processor, running at 2.4 GHz.
    • The 40 cores on the individual compute nodes share about 750 GB of RAM (memory) but, jobs should request no more than 753000 MB.
    • Two Tesla V100-PCIE-16GB GPUs.
  • 1 Bigmem Node
    • 80-cores: node with 4 sockets. Each socket has an Intel Xeon Gold 6148 20-core processor, running at 2.4 GHz.
    • The 80 cores on the node share about 3 TB of RAM (memory), but, jobs should request no more than 3000000 MB.

cpu2019

Allows non-GPU jobs to use:

  • Up to 38 cpus per node
  • No gpus.
  • Up to 500GB memory
  • Are the same

gpu2019

Allows jobs requiring nVidia v100 gpu jobs to use:

  • 1 or 2 gpus per node
  • Up to 40 cpus per node.
  • Up to 750GB memory

bigmem

For very large memory jobs:

  • Up to 80 cpus
  • Up to 3TB memory
  • No gpus

Storage

About a petabyte of raw disk storage is available to the MARC cluster, but for error checking and performance reasons, the amount of usable storage for researchers' projects is considerably less than that. From a user's perspective, the total amount of storage is less important than the individual storage limits. As described below, there are two storage areas: home and project.

Home file system: /home

There is a per-user quota of 25 GB under /home. This limit is fixed and cannot be increased. Each user has a directory under /home, which is the default working directory when logging in to MARC. It is expected that most researchers will work from /project and only use home for software and such things. /home is expected to be used only for L1/L2 data and not for your patient identifiable files. The identifiable files go in the appropriate directory under /project.

Project file system for larger projects: /project

Directories will be created in /project named after your project ID. This name will be the same as your SCDS share name. The expectation is that all files to do with that project will be stored in /project/projectid. Quotas in /project are somewhat flexible. Please write to support@hpc.ucalgary.ca with an estimate of how much space you will require.

Software installations

Python

There are some complications in using Python on MARC relative to using ARC. Normally, we would recommend installing conda in user's home directory. On MARC, security requirements for working with L4 data require that we block outgoing and incoming internet connections. As a result, new packages cannot be downloaded with conda.

Depending on what you need, the two recommendations we can make are

  • Download the standard anaconda distribution from the anaconda website to a personal computer: https://repo.anaconda.com/archive/Anaconda3-2020.07-Linux-x86_64.sh
    • Transfer the script to MARC via SCDS
    • Copy it to your /home directory
    • Install it in your home directory with $bash Anaconda3-2020.07-Linux-x86_64.sh
    • you will be asked to agree to a license agreement and to confirm that you wish to create a folder anaconda3 once the installation completes, you will have a new directory under your home directory ~/anaconda3. In order to make it possible to use the local conda instance you will need to change the system path to include your local python directories $ export PATH=~/anaconda3/bin:$PATH
  • Download a docker container with the software that you need including python (e.g. tensorflow-gpu)
    • Transfer the docker container to MARC via SCDS
    • Copy it to your /home directory
    • Run it with singularity


  • Non-open source software which requires a connection to a license server may require admin assistance to set up. contact support@hpc.ucalgary.ca for support.

Further Reading

See Running_jobs for information on starting a job