General Cluster Guidelines and Policies: Difference between revisions
Jump to navigation
Jump to search
Line 1: | Line 1: | ||
= General Rules = | = General Rules = | ||
* Never run anything related to your research on the '''Login Node'''. | * '''Never''' run anything related to your research on the '''Login Node'''. | ||
: The '''login node''' is for | : The '''login node''' is for | ||
: => Data management, that is file management, compression / decompression, and, possibly, data transfer. | : => Data management, that is file management, compression / decompression, and, possibly, data transfer. | ||
: => Job management: job script creation / submission / monitoring. | |||
: => Software development: Source editing / compilation. | |||
: => Short data analysis computations that take 100% of 1 CPU for up to 15 minutes. | |||
: Everything else should be run on compute nodes, either via the '''sbatch''' command or in an interactive job via the '''salloc''' command. | |||
* You have to make sure that the '''resources you request''' for the job are '''used''' by the job. | |||
: When resources are requested from '''SLURM''' by the job script, they will be provided for the job, but it does not mean that the code that is run as a part of the job knows how to use them. It is instrumental that the user makes sure that the resources that are requested are properly used. | |||
== Guidelines == | |||
* The '''bigmem''' partitions while can be used for general shorter jobs is intended for computations that need lots of memory. | |||
: Please avoid running low memory computations on the '''bigmem''' partition. | |||
* The '''gpu-v100''' partition is strictly for computations that utilize GPUs. | |||
: Please do not run CPU-only computations on the '''gpu-v1000''' partition. | |||
* '''Interactive jobs''' should be limited to '''less than 5 hours'''. | |||
: => If an interactive job asks for more than 5 hours of run time, it is hardly interactive. Who can stare in the screen for more than 5 hours straight? | |||
: => Interactive jobs tend to be resource-wise wasteful as the job does not finish when the computation is done, but keeps running until it times out. | |||
: => The partition setup allows for much quicker resource allocation for jobs that are 5 hours or less, so it is *significantly* easier to get resources in the default partitions for shorter jobs. |
Revision as of 20:20, 21 July 2020
General Rules
- Never run anything related to your research on the Login Node.
- The login node is for
- => Data management, that is file management, compression / decompression, and, possibly, data transfer.
- => Job management: job script creation / submission / monitoring.
- => Software development: Source editing / compilation.
- => Short data analysis computations that take 100% of 1 CPU for up to 15 minutes.
- Everything else should be run on compute nodes, either via the sbatch command or in an interactive job via the salloc command.
- You have to make sure that the resources you request for the job are used by the job.
- When resources are requested from SLURM by the job script, they will be provided for the job, but it does not mean that the code that is run as a part of the job knows how to use them. It is instrumental that the user makes sure that the resources that are requested are properly used.
Guidelines
- The bigmem partitions while can be used for general shorter jobs is intended for computations that need lots of memory.
- Please avoid running low memory computations on the bigmem partition.
- The gpu-v100 partition is strictly for computations that utilize GPUs.
- Please do not run CPU-only computations on the gpu-v1000 partition.
- Interactive jobs should be limited to less than 5 hours.
- => If an interactive job asks for more than 5 hours of run time, it is hardly interactive. Who can stare in the screen for more than 5 hours straight?
- => Interactive jobs tend to be resource-wise wasteful as the job does not finish when the computation is done, but keeps running until it times out.
- => The partition setup allows for much quicker resource allocation for jobs that are 5 hours or less, so it is *significantly* easier to get resources in the default partitions for shorter jobs.