How to check GPU utilization: Difference between revisions

From RCSWiki
Jump to navigation Jump to search
(Replaced links section with navbox)
 
(7 intermediate revisions by 2 users not shown)
Line 1: Line 1:
= General =
= General =
= For Running Jobs =
== using SLURM ==


If you have a job that is running on a GPU node and that is expected to use a GPU on that node,
If you have a job that is running on a GPU node and that is expected to use a GPU on that node,
Line 33: Line 37:
The code <code>openDBA</code> also uses '''338 MB''' of the GPU memory.
The code <code>openDBA</code> also uses '''338 MB''' of the GPU memory.


= Links =
= For Past Jobs =
 
== Using Sampled Metrics ==
 
On the [[Open OnDemand | OOD portal]], here https://ood-arc.rcs.ucalgary.ca
login using your UofC credentials (it may require your full UofC email address instead of just user name).
 
Once you log in,
* Select '''Help --> View my job metrics''',
: this will open an interface to your past jobs that are available in the database.
 
* Find the job you want to check and there may be useful graphs for GPU usage.


[[How-Tos]]
[[Category:Guides]]
[[Category:How-Tos]]
{{Navbox Guides}}

Latest revision as of 20:28, 21 September 2023

General

For Running Jobs

using SLURM

If you have a job that is running on a GPU node and that is expected to use a GPU on that node, you can check the GPU use by your code by running the following command on ARC's login node:

$ srun -s --jobid 12345678 --pty nvidia-smi

The number here is the job ID of the running job.

The output should look similar to the following:

Mon Aug 22 09:27:38 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  Off  | 00000000:3B:00.0 Off |                    0 |
| N/A   33C    P0    36W / 250W |    848MiB / 16160MiB |     30%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A   2232533      C   .../Programs/OpenDBA/openDBA      338MiB |
+-----------------------------------------------------------------------------+

In this case there was 1 GPU allocated and its usage was 30%. The code openDBA also uses 338 MB of the GPU memory.

For Past Jobs

Using Sampled Metrics

On the OOD portal, here https://ood-arc.rcs.ucalgary.ca login using your UofC credentials (it may require your full UofC email address instead of just user name).

Once you log in,

  • Select Help --> View my job metrics,
this will open an interface to your past jobs that are available in the database.
  • Find the job you want to check and there may be useful graphs for GPU usage.