How to request an interactive GPU on ARC

From RCSWiki
Revision as of 21:56, 20 September 2023 by Lleung (talk | contribs) (Added categories)
Jump to navigation Jump to search

GPUs on ARC

Currently, there are three partitions on ARC that have usable CUDA GPUs, gpu-v100, gpu-a100, and bigmem.

The arc.hardware tool can be used to see that.

$ arc.hardware

Node specifications per partition:
      ================================================================================
           Partition |   Node    CPUs    Memory        GPUs  Node list
                     |  count   /node      (MB)       /node  
      --------------------------------------------------------------------------------
              bigmem |      2      80   3000000              fm[1-2]
                     |      1      40   4127000      a100:4  mmg1
                     |      1      40   8256000      a100:2  mmg2
      .........
            gpu-v100 |     13      40    753000      v100:2  fg[1-13]
      .........

For more information about finding hardware on ARC, see How to find available partitions on ARC.

Example 1

1 GPU and 4 CPUs on the gpu-v100 partition for 1 hour (16 GB of RAM):

salloc -N1 -n1 -c4 --mem=16GB --gres=gpu:1 -p gpu-v100 -t 1:00:00 

Example 2

1 GPU and 4 CPUs on the bigmem partition for 1 hour (256 GB of RAM):

$ salloc -N1 -n1 -c4 --mem=256gb --gres=gpu:1 -p bigmem -t 1:00:00

Checking the allocated GPU(s)

Use the nvidia-smi command to check the GPU:

$ nvidia-smi
Fri Jun  3 11:35:14 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-PCI...  Off  | 00000000:17:00.0 Off |                    0 |
| N/A   39C    P0    42W / 250W |      0MiB / 40536MiB |     32%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Links

How-Tos