How to find available partitions on ARC

From RCSWiki
Jump to navigation Jump to search

ARC is a relatively large and very heterogeneous cluster. It has lots of compute nodes and some of these nodes are very different in their hardware specifications and performance capabilities. On ARC, nodes with similar specifications are grouped into SLURM partitions.

To use ARC effectively and to its full potential it is important that the researchers who use ARC can find available partitions and see important features of the nodes in those partitions.

For this purpose on ARC, a special command, arc.hardware is provided:

$ arc.hardware

Node specifications per partition:
      ================================================================================
           Partition |   Node    CPUs    Memory        GPUs  Node list
                     |  count   /node      (MB)       /node  
      --------------------------------------------------------------------------------
              bigmem |      2      80   3000000              fm[1-2]
                     |      1      40   4127000      a100:4  mmg1
                     |      1      40   8256000      a100:2  mmg2
             cpu2013 |     14      16    120000              h[1-14]
        cpu2017-bf05 |     16      28    245000              s[1-16]
                     |     20      28    188000              th[1-20]
             cpu2019 |     40      40    185000              fc[22-61]
        cpu2019-bf05 |     87      40    185000              fc[1-21,62-127]
             cpu2021 |     17      48    185000              mc[1-11,14-19]
        cpu2021-bf05 |     21      48    185000              mc[23-43]
        cpu2021-bf24 |      7      48    381000              mc[49-55]
             cpu2022 |     52      52    256000              mc[73-124]
        cpu2022-bf24 |     16      52    256000              mc[57-72]
            gpu-a100 |      6      40    515000      a100:2  mg[1-6]
            gpu-v100 |     13      40    753000      v100:2  fg[1-13]
             lattice |    196       8     11800              cn[169-364]
            parallel |    572      12     23000              cn[0513-0544,0557-1096]
                     |      4      12     23000     m2070:2  cn[0553-0556]
              single |    168       8     11800              cn[001-168]
      ================================================================================

The output table shows the list of the partitions available for use for this specific user (holder of this account).

The left column shows partition names and the rest of the table shows information about the nodes in the partition. If a partition contains nodes with different hardware configurations then the specs for these nodes will be shown on additional lines without a partition name (see the bigmem partition, for example).

Example 1: cpu2019

The cpu2019 partition, for example, has 40 compute nodes (second column) in total.

Each node in that partition (out of those 40) has 40 CPUs and 185000 MB of RAM (about 180gb of RAM). Note, that it means, that these nodes have about 4gb of RAM per 1 CPU.

There are no GPUs in the nodes in this partition.

The last column shows the node name pattern, the names of the nodes in the partition go from fc22 to fc61.

Example 2: gpu-v100

The gpu-v100 partition has 13 nodes in total.

Each of these nodes has 40 CPUs and 753000 MB of RAM (about 735 gb of RAM), this comes out as 18 gb per 1 CPU, approximately.

Each node also has two V100 nVidia GPUs, this is shown in the "GPUs/node" column.

The name pattern shows that the names for the nodes go from fg1 to fg13.