Types of Computational Resources

From RCSWiki
Jump to navigation Jump to search

On RCS infrastructure, we use the idea of partition and partition lists to specify the specific cpu architecture, number of cores, cpu memory, and gpu architecture. Although partitions serve many purposes, from a resource request specificity perspective, they serve to distinguish groups of identical hardware. As a result, the idea of a partition plays a role in identifying almost every type of resource that can be requested. As such, you usually start by identifying what you need in terms of other resources and then identifying a partition that is compatible with those needs. In this article, the kinds of computational resources that could be needed for a calculation to run on a compute node will be described. For more information about storage that is used to hold input and output files associated with your calculation, see the Cluster Guide for the cluster that you will be working on.

CPU Cores

The distinction between packages for collections of processing units that work together and the individual processors can be a source of considerable confusion. In SLURM terminology, the resource that gets scheduled is cpus (as in the option --cpus-per-task) but the term CPU cores might be more appropriate in some cases, so we will refer to cpus, cpu cores, and cores interchangeably in our documentation and correspondence. There is rarely a need to make finer architectural distinctions than cores and the nodes that they are assigned to when allocating resources.

On RCS systems, we require the specification of a number of CPU cores as part of the resource request. From a software perspective, you are requesting exclusive access to an execution unit for an instruction sequence, so it is common to think of this as corresponding to the number of processes and threads per process that you plan on utilizing in your code. If you don't know these numbers, consult the documentation for your software to see if there are any multithreading or multiprocessing options. Successfully utilizing a multi-core request depends on the software design and the use of suitably parallelized code. (e.g. by using OpenMP or MPI or libraries that use them) Since a single core can have drastically different performance for different specific hardware, it is important to know what kind of architecture you are requesting. For example, the individual cpu cores on a cpu2019 node are effectively 2.3 times faster than those cpu cores on a lattice node for many calculations. These differences should inform the number of cores and time requested on a given partition.

common slurm parameters: --nodes, --ntasks, --ntasks-per-node, --cpus-per-task, --partition

N.B. We usually recommend specifying the full triple --nodes, --ntasks-per-node, --cpus-per-task when making a slurm request. Although there are many many other equivalent ways of expressing this information, this is the simplest approach that is also explicit about the resources that will be allocated.

CPU Memory

CPU memory refers to the high-speed memory that is available to be allocated for any processes that you run on a compute node as part of your job. This must be able to hold both the instructions generated by the software that you are using and any data that will be read into the program as part of its operation as well as memory allocated for further computations during the operation of the software. For an image processing algorithm, for example, this would likely include the full image that was to be processed, the representation of the transformation being applied and the processed image (say, the convolution kernel), the final processed image before it is written to storage as well as some overhead associated with the software itself (e.g. a python interpreter or the compiled C++ code for the process). Although this can be approximated from first principles for some software (e.g. C code that you wrote yourself) it is usually necessary to determine it empirically by running jobs and applying statistics to the real observed resource utilization.

relevant slurm parameters: --mem, --mem-per-cpu, --partition

N.B. only specify one of mem or mem-per-cpu

Wall Time

The maximum wall time (not cpu time) for which your job can run. The time starts from when the job reaches the front of the queue and is assigned to run on specific hardware. During this time window, you are guaranteed exclusive access to the hardware that you request. If the end of the window is reached before your script completes, the job will be killed by the job scheduler.

slurm parameter: --time=dd-hh:mm:ss

N.B. days can be harmlessly left off and any field can take a 0 value. Thus, two days could be --time=48:00:00

GPUs

A General Purpose GPU consists of a parallel processing resource that is capable of accelerating special classes of calculations (e.g. linear algebra or ray tracing) and an associated collection of GPU memory. Unlike with CPUs, the full memory associated with a GPU is always reserved along with access to the GPU itself. As with utilizing CPUs, a GPU merely being present on a computer when code runs is not enough to utilize it. Your software must make explicit use of the GPU. This generally means writing code in CUDA or using software that links to a CUDA toolkit (which must be loaded via a module or from a conda virtual environment at runtime).

slurm parameters: --gres=gpu:N where N is the number of GPUs requested per node, --partition