User:Tannistha.nandi

From RCSWiki
Revision as of 21:26, 6 September 2020 by Tannistha.nandi (talk | contribs)
Jump to navigation Jump to search

Run jobs on ARC

ARC cluster is a collection of several compute nodes connected by a high-speed network. On ARC, computations get submitted as jobs. Once submitted, the jobs are then assigned to compute nodes by the job scheduler as resources become available. Cluter.png

You can access ARC with your UCalgary IT user credentials. Once connected, you will get placed in the ARC login node, for basic tasks such as job submission, monitor job status, manage files, edit text, etc. It is a shared resource where multiple users get connected at the same time. Thus, any intensive tasks is not allowed on the login node as it may block other potential users to connect/submit their computations.

        [tannistha.nandi@arc ~]$ 

The job scheduling system on ARC is called SLURM. On ARC, there are two SLURM commands that can allocate resources to a job under appropriate conditions: ‘salloc’ and ‘sbatch’. They both accept the same set of command line options with respect to resource allocation. 

1.‘salloc’ is to launch an interactive session, typically for tasks under 5 hours.

Once an interactive job session is created, you can do things like explore research datasets, start R or python sessions to test your code, compile software applications etc.

a. Example 1: The following command requests for 1 node, 1 task, 1 process per task and 1 GB of RAM for an hour. 

         [tannistha.nandi@arc ~]$ salloc --mem=1G -N1 -n 1 -c 1 -t 01:00:00
         salloc: Granted job allocation 6758015
         salloc: Waiting for resource configuration
         salloc: Nodes fc4 are ready for job
         [tannistha.nandi@fc4 ~]$ 


b. Example 2:  The following command requests for 1 GPU to be used from 1 node belonging to the gpu-v100 partition along with 1 GB of RAM for 1 hour. Generic resource scheduling (--gres) is used to request for GPU resources.

        [tannistha.nandi@arc ~]$ salloc --mem=1G -t 01:00:00 -p gpu-v100 --gres=gpu:1
        salloc: Granted job allocation 6760460
        salloc: Waiting for resource configuration
        salloc: Nodes fg3 are ready for job
        [tannistha.nandi@fg3 ~]$

Once you finish the work, type 'exit' at the command prompt to end the interactive session,

        [tannistha.nandi@fg3 ~]$ exit
        [tannistha.nandi@fg3 ~]$ salloc: Relinquishing job allocation 6760460

It is to ensure that the allocated resources are released from your job and now available to other users.

2.‘sbatch’ is to submit computations as jobs to run on the cluster. You can submit a job-script.sh via 'sbatch' for execution.
        [tannistha.nandi@arc ~]$ sbatch job-script.sh

When resources become available, they get allocated to this task. Batch jobs are suited for tasks that run for long periods of time without any user supervision. When the job-script terminates, the allocation is released. Please review the section on how to prepare job scripts for more information.

Prepare job scripts