User:Tannistha.nandi: Difference between revisions

From RCSWiki
Jump to navigation Jump to search
No edit summary
Line 5: Line 5:
[[File:Run3.png]]
[[File:Run3.png]]


A user will connect to ARC via his/her personal device (laptop / desktop) using the UCalgary IT user credentials. Once connected, the user will be placed in the ARC login node, that is meant for basic tasks such as job submission, monitor job status, manage files, edit text, etc. It is a shared resource and multiple users are connected at the same time. Thus, any intensive tasks may block potential users to connect / submit their computations. 
A user will connect to ARC via his/her device (laptop/desktop) using the UCalgary IT user credentials. Once connected, the user gets placed in the ARC login node, for basic tasks such as job submission, monitor job status, manage files, edit text, etc. It is a shared resource where multiple users get connected at the same time. Thus, any intensive tasks may block potential users to connect/submit their computations.  
 
           [tannistha.nandi@arc ~]$ 
           [tannistha.nandi@arc ~]$ 
The job scheduling system on ARC is called SLURM.  On ARC, there are two SLURM commands that can allocate resources to a job under appropriate conditions: ‘salloc’ and ‘sbatch’. They both accept the same set of command line options with respect to resource allocation. 
The job scheduling system on ARC is called SLURM.  On ARC, there are two SLURM commands that can allocate resources to a job under appropriate conditions: ‘salloc’ and ‘sbatch’. They both accept the same set of command line options with respect to resource allocation. 

Revision as of 22:48, 4 September 2020

Running Jobs on ARC

ARC cluster is a collection of several compute nodes connected by high speed network. On ARC, computations have to be submitted as jobs. Once submitted, the jobs are then assigned to appropriate compute nodes by the job scheduler as resources become available.

Run3.png

A user will connect to ARC via his/her device (laptop/desktop) using the UCalgary IT user credentials. Once connected, the user gets placed in the ARC login node, for basic tasks such as job submission, monitor job status, manage files, edit text, etc. It is a shared resource where multiple users get connected at the same time. Thus, any intensive tasks may block potential users to connect/submit their computations.

         [tannistha.nandi@arc ~]$ 

The job scheduling system on ARC is called SLURM. On ARC, there are two SLURM commands that can allocate resources to a job under appropriate conditions: ‘salloc’ and ‘sbatch’. They both accept the same set of command line options with respect to resource allocation. 

1.‘salloc’ is used to launch an interactive session, resources are requested with slurm directives.

a. Example 1: It requests for 1 node, 1 task, 1 process per task and 1 GB of RAM for an hour. 

         [tannistha.nandi@arc ~]$ salloc --mem=1G -N1 -n 1 -c 1 -t 01:00:00
         salloc: Granted job allocation 6758015
         salloc: Waiting for resource configuration
         salloc: Nodes fc4 are ready for job
         [tannistha.nandi@fc4 ~]$ 


b. Example 2:  It requests for 1 GPU to be used from 1 node belonging to the gpu-v100 partition along with 1 GB of RAM for 1 hour. Generic resource scheduling (--gres) is used for to request for GPU resources.

        [tannistha.nandi@arc ~]$ salloc --mem=1G -t 01:00:00 -p gpu-v100 --gres=gpu:1
        salloc: Granted job allocation 6760460
        salloc: Waiting for resource configuration
        salloc: Nodes fg3 are ready for job
        [tannistha.nandi@fg3 ~]$
2.‘sbatch’ is used to submit computations as jobs to run on the cluster. The user submits a job-script.sh via 'sbatch' for execution.
        [tannistha.nandi@arc ~]$ sbatch job-script.sh

When resources become available, they get allocated to this task. Batch jobs are suited for tasks that run for long periods of time without any user supervision. When the job-script terminates, the allocation is released. Please review the section on how to prepare job scripts for more information.