How to check GPU utilization

From RCSWiki
Revision as of 20:28, 21 September 2023 by Lleung (talk | contribs) (Replaced links section with navbox)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

General

For Running Jobs

using SLURM

If you have a job that is running on a GPU node and that is expected to use a GPU on that node, you can check the GPU use by your code by running the following command on ARC's login node:

$ srun -s --jobid 12345678 --pty nvidia-smi

The number here is the job ID of the running job.

The output should look similar to the following:

Mon Aug 22 09:27:38 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  Off  | 00000000:3B:00.0 Off |                    0 |
| N/A   33C    P0    36W / 250W |    848MiB / 16160MiB |     30%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A   2232533      C   .../Programs/OpenDBA/openDBA      338MiB |
+-----------------------------------------------------------------------------+

In this case there was 1 GPU allocated and its usage was 30%. The code openDBA also uses 338 MB of the GPU memory.

For Past Jobs

Using Sampled Metrics

On the OOD portal, here https://ood-arc.rcs.ucalgary.ca login using your UofC credentials (it may require your full UofC email address instead of just user name).

Once you log in,

  • Select Help --> View my job metrics,
this will open an interface to your past jobs that are available in the database.
  • Find the job you want to check and there may be useful graphs for GPU usage.