General Cluster Guidelines and Policies: Difference between revisions

From RCSWiki
Jump to navigation Jump to search
Line 1: Line 1:
= General Rules =
= General Rules =


* Never run anything related to your research on the '''Login Node'''.
* '''Never''' run anything related to your research on the '''Login Node'''.
: The '''login node''' is for
: The '''login node''' is for
: => Data management, that is file management, compression / decompression, and, possibly, data transfer.
: => Data management, that is file management, compression / decompression, and, possibly, data transfer.
: => Job management: job script creation / submission / monitoring.
: => Software development: Source editing / compilation.
: => Short data analysis computations that take 100% of 1 CPU for up to 15 minutes.
: Everything else should be run on compute nodes, either via the '''sbatch''' command or in an interactive job via the '''salloc''' command.
* You have to make sure that the '''resources you request''' for the job are '''used''' by the job.
: When resources are requested from '''SLURM''' by the job script, they will be provided for the job, but it does not mean that the code that is run as a part of the job knows how to use them. It is instrumental that the user makes sure that the resources that are requested are properly used.
== Guidelines ==
* The '''bigmem''' partitions while can be used for general shorter jobs is intended for computations that need lots of memory.
: Please avoid running low memory computations on the '''bigmem''' partition.
* The '''gpu-v100''' partition is strictly for computations that utilize GPUs.
: Please do not run CPU-only computations on the '''gpu-v1000''' partition.
* '''Interactive jobs''' should be limited to '''less than 5 hours'''.
: => If an interactive job asks for more than 5 hours of run time, it is hardly interactive. Who can stare in the screen for more than 5 hours straight?
: => Interactive jobs tend to be resource-wise wasteful as the job does not finish when the computation is done, but keeps running until it times out.
: => The partition setup allows for much quicker resource allocation for jobs that are 5 hours or less, so it is *significantly* easier to get resources in the default partitions for shorter jobs.

Revision as of 20:20, 21 July 2020

General Rules

  • Never run anything related to your research on the Login Node.
The login node is for
=> Data management, that is file management, compression / decompression, and, possibly, data transfer.
=> Job management: job script creation / submission / monitoring.
=> Software development: Source editing / compilation.
=> Short data analysis computations that take 100% of 1 CPU for up to 15 minutes.
Everything else should be run on compute nodes, either via the sbatch command or in an interactive job via the salloc command.


  • You have to make sure that the resources you request for the job are used by the job.
When resources are requested from SLURM by the job script, they will be provided for the job, but it does not mean that the code that is run as a part of the job knows how to use them. It is instrumental that the user makes sure that the resources that are requested are properly used.

Guidelines

  • The bigmem partitions while can be used for general shorter jobs is intended for computations that need lots of memory.
Please avoid running low memory computations on the bigmem partition.


  • The gpu-v100 partition is strictly for computations that utilize GPUs.
Please do not run CPU-only computations on the gpu-v1000 partition.


  • Interactive jobs should be limited to less than 5 hours.
=> If an interactive job asks for more than 5 hours of run time, it is hardly interactive. Who can stare in the screen for more than 5 hours straight?
=> Interactive jobs tend to be resource-wise wasteful as the job does not finish when the computation is done, but keeps running until it times out.
=> The partition setup allows for much quicker resource allocation for jobs that are 5 hours or less, so it is *significantly* easier to get resources in the default partitions for shorter jobs.