How to check states of ARC partitions and nodes
Jump to navigation
Jump to search
Background
Node states
To see the current states of the nodes in ARC's partition you can use the arc.nodes
command:
$ arc.nodes Partitions: 16 (bigmem, cpu2017-bf05, cpu2019, cpu2019-bf05, cpu2021, cpu2021-bf24, cpu2022, cpu2022-bf24, cpu2023, disa, gpu-a100, gpu-v100, ood-vis, parallel, single, wdf-altis) ========================================================================== | Total Allocated Down Drained Idle Maint Mixed -------------------------------------------------------------------------- bigmem | 5 1 0 0 3 0 1 cpu2017-bf05 | 41 7 1 0 11 0 22 cpu2019 | 40 13 0 0 0 0 27 cpu2019-bf05 | 87 12 3 1 70 0 1 cpu2021 | 17 3 0 0 0 0 14 cpu2021-bf24 | 28 0 0 0 27 0 1 cpu2022 | 52 5 0 5 0 0 42 cpu2022-bf24 | 16 0 0 2 13 0 1 cpu2023 | 20 1 1 0 0 0 18 disa | 1 0 0 0 1 0 0 gpu-a100 | 5 2 0 0 1 0 2 gpu-v100 | 7 0 2 2 1 0 2 ood-vis | 1 0 0 0 1 0 0 parallel | 576 129 92 108 0 4 243 single | 14 0 1 1 10 0 2 wdf-altis | 12 0 0 0 12 0 0 -------------------------------------------------------------------------- logical total | 922 173 100 119 150 4 376 | physical total | 922 173 100 119 150 4 376
- The first column on the left shows the partition names. Each line presents information about this specific partition.
- The second column, Total, indicates the total number of nodes in this partition.
- The following columns show node states. The table is built dynamically, so it only shows the states which currently present in ARC.
- These are standard SLURM states and their description can be found in SLURM documentation:
- https://slurm.schedmd.com/sinfo.html#SECTION_NODE-STATE-CODES
At the bottom of the table there are two bottom lines, the logical total, and the physical total.
Sometimes, the same node can be assigned to more than one partition in the cluster.
This way, logically, it will be counted in each partition as many times as the number of partitions it is in.
The logical count in this case will over count the actual number of nodes in the cluster.
The physical count corrects for this and reports the actual number of nodes in the cluster.
Thus, if both the numbers are the same, this also means that each reported node is assigned to only one partition.