How to find current limits on ARC: Difference between revisions
Jump to navigation
Jump to search
(Created page with " Checking the current limits on ARC is recommended before planning a new series of computations, especially if GPUs are required for the computations. The current limits can...") |
No edit summary |
||
Line 53: | Line 53: | ||
* '''MinTRES''' -- minimal amount of resources in job's resource request that is required for a job to be accepted. Relevant on GPU partition, as jobs are expected to request at least one GPU to qualify to run on a GPU partition. | * '''MinTRES''' -- minimal amount of resources in job's resource request that is required for a job to be accepted. Relevant on GPU partition, as jobs are expected to request at least one GPU to qualify to run on a GPU partition. | ||
'''For example''', the <code>gpu-v100</code> partition is limited to 4000 jobs per user and the maximum run time is limited to 24 hours. | |||
The job must request at least 1 GPU, but not more than 4 GPUs. | |||
The total number of GPUs working for a single user's jobs is limited to 8. | |||
If the resource request of a submitted job is over the limit, the job will be rejected. |
Revision as of 22:20, 21 December 2022
Checking the current limits on ARC is recommended before planning a new series of computations, especially if GPUs are required for the computations.
The current limits can be shown by the arc.limits
command:
$ arc.limits PartitionName Flags MaxTRES MaxWall MaxTRESPU MaxSubmitPU MinTRES 0 normal 7-00:00:00 4000 1 cpu2019 cpu=240 7-00:00:00 cpu=240 4000 2 gpu-v100 DenyOnLimit cpu=80,gpu=4 1-00:00:00 cpu=160,gpu=8 4000 gpu=1 3 single cpu=200 7-00:00:00 cpu=200,node=30 4000 4 razi 7-00:00:00 4000 5 apophis 7-00:00:00 4000 6 razi-bf cpu=546 05:00:00 cpu=546 4000 7 apophis-bf cpu=280 05:00:00 cpu=280 4000 8 lattice cpu=408 7-00:00:00 cpu=408 4000 9 parallel cpu=624 7-00:00:00 cpu=624 4000 10 bigmem cpu=80 1-00:00:00 cpu=80,gpu=1 4000 11 cpu2013 7-00:00:00 4000 12 pawson 7-00:00:00 4000 13 pawson-bf cpu=480 05:00:00 cpu=480 4000 14 theia 7-00:00:00 4000 15 theia-bf cpu=280 05:00:00 4000 16 demo 7-00:00:00 4000 17 synergy 7-00:00:00 4000 18 synergy-bf cpu=448 05:00:00 cpu=448 4000 19 backfill05 cpu=1000 05:00:00 cpu=1000 4000 20 cpu2021 cpu=576 7-00:00:00 cpu=576 4000 21 backfill24 cpu=208 1-00:00:00 cpu=208 4000 22 sherlock 7-00:00:00 4000 23 wdf-zach 7-00:00:00 4000 24 wdf-think 7-00:00:00 4000 25 mtst 7-00:00:00 26 cpu2022 cpu=520 7-00:00:00 cpu=520 4000 27 gpu-a100 DenyOnLimit cpu=80 1-00:00:00 cpu=160 TRES=Trackable RESources PU=Per User
The table shows the list of partitions and set limits for each of the partitions.
- Flags column shows settings that determine if the job may be accepted or denied if the resource request is over the limit.
- MaxTRES -- the maximum traceable resources, maximal amount of resources allowed per job on this partition.
- MaxWall -- the maximal wall time for a job, the longest time a job is allowed to run on this partition.
- MaxTRESPU -- maximum traceable resources per user, the total maximal amount of resources allowed per single user on the partition. If the limit is reached the resource requests above the limit will have to wait in the queue until some resources are freed by currently running jobs.
- MaxSubmitPU -- maximal number of jobs submitted to this partition. SLURM will reject any jobs above this limit. Please note, that there is also a global limit of 4000 jobs per user for the entire cluster.
- MinTRES -- minimal amount of resources in job's resource request that is required for a job to be accepted. Relevant on GPU partition, as jobs are expected to request at least one GPU to qualify to run on a GPU partition.
For example, the gpu-v100
partition is limited to 4000 jobs per user and the maximum run time is limited to 24 hours.
The job must request at least 1 GPU, but not more than 4 GPUs.
The total number of GPUs working for a single user's jobs is limited to 8.
If the resource request of a submitted job is over the limit, the job will be rejected.