How to check states of ARC partitions and nodes

From RCSWiki
Jump to navigation Jump to search

Background

Node states

To see the current states of the nodes in ARC's partition you can use the arc.nodes command:

$ arc.nodes

          Partitions: 16 (bigmem, cpu2017-bf05, cpu2019, cpu2019-bf05, cpu2021, cpu2021-bf24, cpu2022, 
                          cpu2022-bf24, cpu2023, disa, gpu-a100, gpu-v100, ood-vis, parallel, single, wdf-altis)

      ==========================================================================
                     | Total  Allocated  Down  Drained  Idle  Maint  Mixed
      --------------------------------------------------------------------------
              bigmem |     5          1     0        0     3      0      1 
        cpu2017-bf05 |    41          7     1        0    11      0     22 
             cpu2019 |    40         13     0        0     0      0     27 
        cpu2019-bf05 |    87         12     3        1    70      0      1 
             cpu2021 |    17          3     0        0     0      0     14 
        cpu2021-bf24 |    28          0     0        0    27      0      1 
             cpu2022 |    52          5     0        5     0      0     42 
        cpu2022-bf24 |    16          0     0        2    13      0      1 
             cpu2023 |    20          1     1        0     0      0     18 
                disa |     1          0     0        0     1      0      0 
            gpu-a100 |     5          2     0        0     1      0      2 
            gpu-v100 |     7          0     2        2     1      0      2 
             ood-vis |     1          0     0        0     1      0      0 
            parallel |   576        129    92      108     0      4    243 
              single |    14          0     1        1    10      0      2 
           wdf-altis |    12          0     0        0    12      0      0 
      --------------------------------------------------------------------------
       logical total |   922        173   100      119   150      4    376 
                     |
      physical total |   922        173   100      119   150      4    376 

  • The first column on the left shows the partition names. Each line presents information about this specific partition.
  • The second column, Total, indicates the total number of nodes in this partition.
  • The following columns show node states. The table is built dynamically, so it only shows the states which currently present in ARC.
These are standard SLURM states and their description can be found in SLURM documentation:
https://slurm.schedmd.com/sinfo.html#SECTION_NODE-STATE-CODES


At the bottom of the table there are two bottom lines, the logical total, and the physical total. Sometimes, the same node can be assigned to more than one partition in the cluster. This way, logically, it will be counted in each partition as many times as the number of partitions it is in. The logical count in this case will over count the actual number of nodes in the cluster. The physical count corrects for this and reports the actual number of nodes in the cluster. Thus, if both the numbers are the same, this also means that each reported node is assigned to only one partition.

Links

How-Tos