How to check GPU utilization: Difference between revisions

From RCSWiki
Jump to navigation Jump to search
Line 41: Line 41:
== Using Sampled Metrics ==
== Using Sampled Metrics ==


On the [[Open OnDemand | OOD portal]]  
On the [[Open OnDemand | OOD portal]], here https://ood-arc.rcs.ucalgary.ca
login using your UofC credentials.


Once you log in,
* Select '''Help --> View my job metrics''',
: this will open an interface to your past jobs we keep metrics about.


once you log in, you can select "Help --> View my job metrics", this will open an interface to your past jobs we keep metrics about. Find the job you like and there may be useful graphs for GPU usage.
* Find the job you like and there may be useful graphs for GPU usage.


= Links =
= Links =


[[How-Tos]]
[[How-Tos]]

Revision as of 20:20, 14 October 2022

General

For Running Jobs

using SLURM

If you have a job that is running on a GPU node and that is expected to use a GPU on that node, you can check the GPU use by your code by running the following command on ARC's login node:

$ srun -s --jobid 12345678 --pty nvidia-smi

The number here is the job ID of the running job.

The output should look similar to the following:

Mon Aug 22 09:27:38 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  Off  | 00000000:3B:00.0 Off |                    0 |
| N/A   33C    P0    36W / 250W |    848MiB / 16160MiB |     30%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A   2232533      C   .../Programs/OpenDBA/openDBA      338MiB |
+-----------------------------------------------------------------------------+

In this case there was 1 GPU allocated and its usage was 30%. The code openDBA also uses 338 MB of the GPU memory.

For Past Jobs

Using Sampled Metrics

On the OOD portal, here https://ood-arc.rcs.ucalgary.ca login using your UofC credentials.

Once you log in,

  • Select Help --> View my job metrics,
this will open an interface to your past jobs we keep metrics about.
  • Find the job you like and there may be useful graphs for GPU usage.

Links

How-Tos