How to run an interactive job on ARC: Difference between revisions
(Review and editing) |
|||
(14 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
= Background = | |||
Submitting and running jobs is the only way to run computations on ARC. Normally, a user creates a '''job script''' and submit it to SLURM on ARC using the <code>sbatch</code> command to run a computation. This type of jobs are non-interactive batch jobs. In some circumstances, this type of workflow may be difficult or incompatible with workflows that require interactive inputs by the user such as when you need to apply data transformation, result collection and review from large computations, or to test new methods or software. For this kind of work, the best approach is to request an '''interactive job''' from ARC's. | Submitting and running jobs is the only way to run computations on ARC. Normally, a user creates a '''job script''' and submit it to SLURM on ARC using the <code>sbatch</code> command to run a computation. This type of jobs are non-interactive batch jobs. In some circumstances, this type of workflow may be difficult or incompatible with workflows that require interactive inputs by the user such as when you need to apply data transformation, result collection and review from large computations, or to test new methods or software. For this kind of work, the best approach is to request an '''interactive job''' from ARC's. | ||
'''Please note''', that '''interactive jobs''' on ARC should be limited to '''5 hours or less''' of requested run time. | |||
= Why not the login node? = | |||
The login node is designed for users to interactively use ARC. However, this usage is intended for only '''management work''', such as: | The login node is designed for users to interactively use ARC. However, this usage is intended for only '''management work''', such as: | ||
* data management, | * data management, | ||
Line 9: | Line 13: | ||
The '''login node''' could also be used to '''software development''', if you write scripts, or if you work on some new code for your research. The '''login node''', however, '''cannot be used''' to run anything that is CPU heavy and relates to '''research''', that is '''not management'''. See our[[General Cluster Guidelines and Policies#Login Node | guidelines and policies]]. | The '''login node''' could also be used to '''software development''', if you write scripts, or if you work on some new code for your research. The '''login node''', however, '''cannot be used''' to run anything that is CPU heavy and relates to '''research''', that is '''not management'''. See our[[General Cluster Guidelines and Policies#Login Node | guidelines and policies]]. | ||
To '''request an interactive job''' on ARC via SLURM, you have to use the <code>salloc</code> command and to '''specify resources''' that your interactive work will need, similar to the resource request in a normal job script. After that, SLURM will try to '''allocate resources''' for your interactive job, and then it will '''transfer your command line''' session from ARC's login node '''to a new command line session on the allocated compute node'''. | The '''login node''' also has a '''limit on memory (RAM) of 5GB''' for all the processes of a single user. | ||
Sometimes this may be a limiting factor for interactive work. | |||
In such cases, even if the work fits the login node's design purpose, one has to request an '''interactive job''' to complete the work. | |||
= Submitting an interactive job = | |||
To '''request an interactive job''' on ARC via SLURM, you have to use the <code>salloc</code> command and | |||
to '''specify resources''' that your interactive work will need, similar to the resource request in a normal job script. | |||
After that, SLURM will try to '''allocate resources''' for your interactive job, and then | |||
it will '''transfer your command line''' session from ARC's login node '''to a new command line session on the allocated compute node'''. | |||
In this interactive job you can run any command on the command line without interrupting any other user's work. The resources you are using are allocated to your only and there is nobody else who can use them. | In this interactive job you can run any command on the command line without interrupting any other user's work. The resources you are using are allocated to your only and there is nobody else who can use them. | ||
=== | |||
* On ARC '''any interactive job should be limited to 5 hours or less'''. Please do not request more than 5 hours for your interactive jobs. | |||
For example: | |||
$ salloc -N1 -n1 -c8 --mem=16gb -t 1:00:00 | |||
This command requests 8 CPUs and 16 GB of RAM for 1 hour on one of the compute nodes in the default partitions. | |||
For a partition that is not a default partition, such as a '''bigmem''' partition, or the GPU partitions. You have to specify it on the command line: | |||
$ salloc -N1 -n1 -c20 --mem=600gb -t 1:00:00 -p bigmem | |||
This command requests 20 CPUs and 600 GB of RAM for 1 hour on a node from the '''bigmem''' partition. | |||
For '''GPU partitions and jobs''' please see its own [[How to request an interactive GPU on ARC | How To article]]. | |||
== Using the "single" partition == | |||
The '''single partition''' does not offer much computing power, but designed to be almost always available for '''interactive work'''. | |||
<pre> | |||
========================================================================================== | |||
Partition | Nodes CPUs CPUs Memory (MB) GPUs Node list | |||
| count total /node /node /node | |||
------------------------------------------------------------------------------------------ | |||
single | 14 224 16 120000 h[1-14] | |||
</pre> | |||
The nodes there have 16 CPUs and about 117GB of RAM each. There are 14 such nodes in the partition. | |||
The jobs are limited to 4 CPUs and 32GB of RAM. A user is limited to up to 2 such jobs at the same time. | |||
(You can run more jobs but the total limit on the resources is 8 CPUs and 64GB of RAM per user). | |||
Because of these limits it is difficult to make all the nodes busy with work and this is why | |||
there almost always some nodes available for interactive work. | |||
To request an interactive job on the single partition, you can use this command: | |||
$ salloc -N1 -n4 -c1 --mem=28GB -t 5:00:00 -p single | |||
This will give you an interactive resources for 5 hours. | |||
Please '''do not forget to end the job''' when you are done with either <Ctrl-D> keystroke or the exit command. | |||
= Downsides to interactive jobs = | |||
However, one has to remember that there are some '''negatives''' of using '''interactive jobs''': | However, one has to remember that there are some '''negatives''' of using '''interactive jobs''': | ||
* '''Resource utilization''' is typically '''quite low''' when compared with non-interactive jobs. | * '''Resource utilization''' is typically '''quite low''' when compared with non-interactive jobs. | ||
Line 28: | Line 82: | ||
Please remember that interactive jobs is '''not intended to be the primary way''' of using the ARC cluster. | Please remember that interactive jobs is '''not intended to be the primary way''' of using the ARC cluster. | ||
= Links = | |||
[[How-Tos]] | |||
[[Category:Guides]] | [[Category:Guides]] |
Latest revision as of 17:53, 27 September 2024
Background
Submitting and running jobs is the only way to run computations on ARC. Normally, a user creates a job script and submit it to SLURM on ARC using the sbatch
command to run a computation. This type of jobs are non-interactive batch jobs. In some circumstances, this type of workflow may be difficult or incompatible with workflows that require interactive inputs by the user such as when you need to apply data transformation, result collection and review from large computations, or to test new methods or software. For this kind of work, the best approach is to request an interactive job from ARC's.
Please note, that interactive jobs on ARC should be limited to 5 hours or less of requested run time.
Why not the login node?
The login node is designed for users to interactively use ARC. However, this usage is intended for only management work, such as:
- data management,
- software management, and
- job management.
The login node could also be used to software development, if you write scripts, or if you work on some new code for your research. The login node, however, cannot be used to run anything that is CPU heavy and relates to research, that is not management. See our guidelines and policies.
The login node also has a limit on memory (RAM) of 5GB for all the processes of a single user.
Sometimes this may be a limiting factor for interactive work.
In such cases, even if the work fits the login node's design purpose, one has to request an interactive job to complete the work.
Submitting an interactive job
To request an interactive job on ARC via SLURM, you have to use the salloc
command and
to specify resources that your interactive work will need, similar to the resource request in a normal job script.
After that, SLURM will try to allocate resources for your interactive job, and then
it will transfer your command line session from ARC's login node to a new command line session on the allocated compute node.
In this interactive job you can run any command on the command line without interrupting any other user's work. The resources you are using are allocated to your only and there is nobody else who can use them.
- On ARC any interactive job should be limited to 5 hours or less. Please do not request more than 5 hours for your interactive jobs.
For example:
$ salloc -N1 -n1 -c8 --mem=16gb -t 1:00:00
This command requests 8 CPUs and 16 GB of RAM for 1 hour on one of the compute nodes in the default partitions.
For a partition that is not a default partition, such as a bigmem partition, or the GPU partitions. You have to specify it on the command line:
$ salloc -N1 -n1 -c20 --mem=600gb -t 1:00:00 -p bigmem
This command requests 20 CPUs and 600 GB of RAM for 1 hour on a node from the bigmem partition.
For GPU partitions and jobs please see its own How To article.
Using the "single" partition
The single partition does not offer much computing power, but designed to be almost always available for interactive work.
========================================================================================== Partition | Nodes CPUs CPUs Memory (MB) GPUs Node list | count total /node /node /node ------------------------------------------------------------------------------------------ single | 14 224 16 120000 h[1-14]
The nodes there have 16 CPUs and about 117GB of RAM each. There are 14 such nodes in the partition.
The jobs are limited to 4 CPUs and 32GB of RAM. A user is limited to up to 2 such jobs at the same time. (You can run more jobs but the total limit on the resources is 8 CPUs and 64GB of RAM per user). Because of these limits it is difficult to make all the nodes busy with work and this is why there almost always some nodes available for interactive work.
To request an interactive job on the single partition, you can use this command:
$ salloc -N1 -n4 -c1 --mem=28GB -t 5:00:00 -p single
This will give you an interactive resources for 5 hours. Please do not forget to end the job when you are done with either <Ctrl-D> keystroke or the exit command.
Downsides to interactive jobs
However, one has to remember that there are some negatives of using interactive jobs:
- Resource utilization is typically quite low when compared with non-interactive jobs.
- The resources are not used when you are reading the outputs, type new commands, or think about what to do next. This is inefficient
- There is no amount of work that have to be done, so that the resources can be freed for other users.
- The only way an interactive job can end is if it runs out of time, or you manually end it.
- Typically, you can only interact with one job at a time.
- In contrast, you can run multiple non-interactive jobs when running normal batch jobs.
- If the resources you want for your interactive work are not available, you may be waiting for a long time before your interactive job starts.
Because of these reasons, interactive jobs on ARC is limited to up to 5 hours of run time.
Please remember that interactive jobs is not intended to be the primary way of using the ARC cluster.