PyTorch on ARC: Difference between revisions

From RCSWiki
Jump to navigation Jump to search
(Added ARC navbox)
(formatting, spellcheck)
Line 1: Line 1:
= General =
* PyTorch on GitHub: https://github.com/pytorch/pytorch


'''PyTorch''' is a Python package that provides two high-level features:
'''PyTorch''' is a Python package that provides two high-level features:
Line 9: Line 5:
* Deep neural networks built on a tape-based autograd system
* Deep neural networks built on a tape-based autograd system


You can reuse your favourite Python packages such as NumPy, SciPy, and Cython to extend PyTorch when needed.
You can reuse your favorite Python packages such as NumPy, SciPy, and Cython to extend PyTorch when needed.
 


* Compute Canada article is not directly applicable on ARC but contains a lot of good information:
== Installing PyTorch ==
:https://docs.computecanada.ca/wiki/PyTorch
You will need a working local '''Conda''' install in your home directory first. If you do not have it yet, please follow [[Conda_on_ARC#Installing_Conda | these instructions]] to have it installed.


== Checkpointing ==
Once you have your own '''Conda''', activate it with
 
* https://pytorch.org/tutorials/recipes/recipes/saving_and_loading_a_general_checkpoint.html
 
= Installing PyTorch =
 
You will need a working local '''Conda''' install in your home directory first.
If you do not have it yet, plaese follow [[Conda_on_ARC#Installing_Conda | these instructions]]
to have it isntalled.
 
 
Once you have your own '''Conda''', activate it with  
  $ source ~/software/init-conda
  $ source ~/software/init-conda


We will install '''PyTorch''' into its own '''conda environment'''.
We will install '''PyTorch''' into its own '''conda environment'''.


It is '''very important''' to create the environment with '''python''' and '''pytorch''' in the same command.
It is '''very important''' to create the environment with '''python''' and '''pytorch''' in the same command. This way '''conda''' can select the best '''pytorch''' and '''python''' combination.
This way '''conda''' can select the best '''pytorch''' and '''python''' combination.
  $ conda create -n pytorch python pytorch-gpu torchvision
  $ conda create -n pytorch python pytorch-gpu torchvision


Once it is done,
Once it is done, activate your '''pytorch''' environment:
activate your '''pytorch''' environment:
  $ conda activate pytorch
  $ conda activate pytorch


You can test with the <code>torch-gpu-test.py</code> script shown below.
You can test with the <code>torch-gpu-test.py</code> script shown below. Copy and paste the text into a file and run if from the command line:
Copy and paste the text into a file and run if from the command line:
  $ python torch-gpu-test.py
  $ python torch-gpu-test.py


If you try this on the login node, it should tell you that GPUs are not available.
If you try this on the login node, it should tell you that GPUs are not available. It is normal, as the login node does not have any. You will need a GPU node to test the GPUs.  
It is normal, as the login node does not have any.  
You will need a GPU node to test the GPUs.  


Once you know that your '''pytorch''' environment is working properly, you can add more packages to the environment using '''conda'''.
Once you know that your '''pytorch''' environment is working properly, you can add more packages to the environment using '''conda'''.
Line 109: Line 88:
</syntaxhighlight>
</syntaxhighlight>


= Using PyTorch on ARC =
== Using PyTorch on ARC ==
 
== Requesting GPU Resources for PyTorch Jobs ==


=== Requesting GPU Resources for PyTorch Jobs ===
For '''interactive''' use see this How-To: [[How to request an interactive GPU on ARC]].
For '''interactive''' use see this How-To: [[How to request an interactive GPU on ARC]].


An example of the job script <code>torch_job.slurm</code>:
An example of the job script <code>torch_job.slurm</code>:
Line 135: Line 112:
python torch-gpu-test.py
python torch-gpu-test.py
</syntaxhighlight>
</syntaxhighlight>
=== Checkpointing ===
Refer to the checkpointing tutorial at https://pytorch.org/tutorials/recipes/recipes/saving_and_loading_a_general_checkpoint.html.
== See also ==
* https://github.com/pytorch/pytorch - PyTorch project page on GitHub.
* https://docs.computecanada.ca/wiki/PyTorch - Compute Canada article is not directly applicable on ARC but contains a lot of good information.
* https://pytorch.org/tutorials/recipes/recipes/saving_and_loading_a_general_checkpoint.html - Checkpointing tutorial


[[Category:Software]]
[[Category:Software]]

Revision as of 18:24, 21 September 2023

PyTorch is a Python package that provides two high-level features:

  • Tensor computation (like NumPy) with strong GPU acceleration
  • Deep neural networks built on a tape-based autograd system

You can reuse your favorite Python packages such as NumPy, SciPy, and Cython to extend PyTorch when needed.

Installing PyTorch

You will need a working local Conda install in your home directory first. If you do not have it yet, please follow these instructions to have it installed.

Once you have your own Conda, activate it with

$ source ~/software/init-conda

We will install PyTorch into its own conda environment.

It is very important to create the environment with python and pytorch in the same command. This way conda can select the best pytorch and python combination.

$ conda create -n pytorch python pytorch-gpu torchvision

Once it is done, activate your pytorch environment:

$ conda activate pytorch

You can test with the torch-gpu-test.py script shown below. Copy and paste the text into a file and run if from the command line:

$ python torch-gpu-test.py

If you try this on the login node, it should tell you that GPUs are not available. It is normal, as the login node does not have any. You will need a GPU node to test the GPUs.

Once you know that your pytorch environment is working properly, you can add more packages to the environment using conda.

To deactivate the environment use the

$ conda deactivate 

command.

Test script

torch-gpu-test.py:

#! /usr/bin/env python 
# -------------------------------------------------------
import torch
# -------------------------------------------------------
print("Defining torch tensors:")
x = torch.Tensor(5, 3)
print(x)
y = torch.rand(5, 3)
print(y)

# -------------------------------------------------------
# let us run the following only if CUDA is available
if torch.cuda.is_available():
    print("CUDA is available.")
    x = x.cuda()
    y = y.cuda()
    print(x + y)
else:
    print("CUDA is NOT available.")

# -------------------------------------------------------

Test script 2

torch-gpu-test2.py:

#! /usr/bin/env python 
# -------------------------------------------------------
import os
import sys
import socket
import torch
# -------------------------------------------------------

dev = os.environ['CUDA_VISIBLE_DEVICES']

host = socket.gethostname()
tdev = torch.cuda.current_device()
tavail = torch.cuda.is_available()
tcount = torch.cuda.device_count()
tname = torch.cuda.get_device_name()

print("Host: %s\nENV Devices: %s\nCudaDev: %s\nCUDA is available: %s\nDevice count: %d\nDevice: %s" % \
        (host, dev, tdev, tavail, tcount, tname))

print(os.popen("/usr/bin/nvidia-smi -L").read().strip())
print(os.popen("env | grep CUDA").read().strip())
print("")
# -------------------------------------------------------

Using PyTorch on ARC

Requesting GPU Resources for PyTorch Jobs

For interactive use see this How-To: How to request an interactive GPU on ARC.

An example of the job script torch_job.slurm:

#! /bin/bash
# ====================================
#SBATCH --job-name=torch-test
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=16GB
#SBATCH --time=0-04:00:00
#SBATCH --gres=gpu:1
#SBATCH --partition=gpu-v100
# ====================================

source ~/software/init-conda
conda activate pytorch

python torch-gpu-test.py

Checkpointing

Refer to the checkpointing tutorial at https://pytorch.org/tutorials/recipes/recipes/saving_and_loading_a_general_checkpoint.html.

See also