General Cluster Guidelines and Policies: Difference between revisions

From RCSWiki
Jump to navigation Jump to search
(proofread and reformat content layout)
Line 1: Line 1:
= General Rules =
== General Rules ==
* '''Never''' run anything related to your research on the '''Login Node'''.
* You have to make sure that the '''resources you request''' for the job are '''used''' by the job. <br />When resources are requested from '''SLURM''' by the job script, the resources are reserved for the job and will not be allocated for other users. Jobs that do not make complete use of the allocated resource reduces the overall cluster efficiency.  It is essential that the user makes resource requests that fit with the requirements of their jobs so that resources are properly used.


* '''Never''' run anything related to your research on the '''Login Node'''.
== Guidelines ==
: The '''login node''' is for
Please review the guidelines set out below when using our cluster.
: => Data management, that is file management, compression / decompression, and, possibly, data transfer.
 
: => Job management: job script creation / submission / monitoring.
=== Login Node ===
: => Software development: Source editing / compilation.
The login node should be used only for:
: => Short data analysis computations that take 100% of 1 CPU for up to 15 minutes.
* Data management, that is file management, compression / decompression, and, possibly, data transfer.
: Everything else should be run on compute nodes, either via the '''sbatch''' command or in an interactive job via the '''salloc''' command.
* Job management: job script creation / submission / monitoring.
* Software development: Source editing / compilation.
* Short data analysis computations that take 100% of 1 CPU for up to 15 minutes.


Everything else should be run on compute nodes either via the '''[[sbatch command]]''' or in an interactive job via the '''[[salloc command]]'''. These restrictions are in place to ensure that the login node remains available for other users and is not unnecessarily overburdened.


* You have to make sure that the '''resources you request''' for the job are '''used''' by the job.
=== Interactive Jobs ===
: When resources are requested from '''SLURM''' by the job script, they will be provided for the job, but it does not mean that the code that is run as a part of the job knows how to use them. It is instrumental that the user makes sure that the resources that are requested are properly used.
Interactive jobs can be started using the '''[[salloc command]]''' and are limited to a maximum of 5 hours.


== Guidelines ==
The reason for the time restriction on interactive jobs are:
* If an interactive job asks for more than 5 hours of run time, it is hardly interactive. Who can stare in the screen for more than 5 hours straight?
* Interactive jobs tend to be resource-wise wasteful as the job does not finish when the computation is done, but keeps running until it times out.
* The partition setup allows for much quicker resource allocation for jobs that are 5 hours or less, so it is ''significantly'' easier to get resources in the default partitions for shorter jobs.


* The '''bigmem''' partitions while can be used for general shorter jobs is intended for computations that need lots of memory.
=== Bigmem Partition ===
: Please avoid running low memory computations on the '''bigmem''' partition.
The bigmem partitions can be used for general shorter jobs is intended for computations that need lots of memory.


'''Please avoid running low memory computations on the bigmem partition.'''


* The '''gpu-v100''' partition is strictly for computations that utilize GPUs.
=== gpu-v100 Partition ===
: Please do not run CPU-only computations on the '''gpu-v100''' partition.
The gpu-v100 partition is strictly for computations that utilize GPUs.  


'''Please do not run CPU-only computations on the gpu-v100 partition.'''


* '''Interactive jobs''' should be limited to '''less than 5 hours'''.
__NOTOC__
: => If an interactive job asks for more than 5 hours of run time, it is hardly interactive. Who can stare in the screen for more than 5 hours straight?
: => Interactive jobs tend to be resource-wise wasteful as the job does not finish when the computation is done, but keeps running until it times out.
: => The partition setup allows for much quicker resource allocation for jobs that are 5 hours or less, so it is *significantly* easier to get resources in the default partitions for shorter jobs.

Revision as of 21:26, 23 July 2020

General Rules

  • Never run anything related to your research on the Login Node.
  • You have to make sure that the resources you request for the job are used by the job.
    When resources are requested from SLURM by the job script, the resources are reserved for the job and will not be allocated for other users. Jobs that do not make complete use of the allocated resource reduces the overall cluster efficiency. It is essential that the user makes resource requests that fit with the requirements of their jobs so that resources are properly used.

Guidelines

Please review the guidelines set out below when using our cluster.

Login Node

The login node should be used only for:

  • Data management, that is file management, compression / decompression, and, possibly, data transfer.
  • Job management: job script creation / submission / monitoring.
  • Software development: Source editing / compilation.
  • Short data analysis computations that take 100% of 1 CPU for up to 15 minutes.

Everything else should be run on compute nodes either via the sbatch command or in an interactive job via the salloc command. These restrictions are in place to ensure that the login node remains available for other users and is not unnecessarily overburdened.

Interactive Jobs

Interactive jobs can be started using the salloc command and are limited to a maximum of 5 hours.

The reason for the time restriction on interactive jobs are:

  • If an interactive job asks for more than 5 hours of run time, it is hardly interactive. Who can stare in the screen for more than 5 hours straight?
  • Interactive jobs tend to be resource-wise wasteful as the job does not finish when the computation is done, but keeps running until it times out.
  • The partition setup allows for much quicker resource allocation for jobs that are 5 hours or less, so it is significantly easier to get resources in the default partitions for shorter jobs.

Bigmem Partition

The bigmem partitions can be used for general shorter jobs is intended for computations that need lots of memory.

Please avoid running low memory computations on the bigmem partition.

gpu-v100 Partition

The gpu-v100 partition is strictly for computations that utilize GPUs.

Please do not run CPU-only computations on the gpu-v100 partition.