Tensorflow on ARC: Difference between revisions
Line 94: | Line 94: | ||
conda activate tensorflow | conda activate tensorflow | ||
python | python tensorflow-test.py | ||
</syntaxhighlight> | </syntaxhighlight> | ||
[[Category:Software]] | [[Category:Software]] | ||
[[Category:ARC]] | [[Category:ARC]] |
Revision as of 15:20, 8 June 2022
Introduction to Tensorflow
Tensorflow is a tool for evaluating dataflow graphs that represent both the computations and model state in a machine learning algorithm. It enables distributed evaluation and explicit communication across a large number of computing devices (e.g. numerous CPUs or GPUs). The core tools of tensorflow consist of shared C libraries for constructing dataflow graphs and executing computations (typically linear algebra on tensors). This constitutes a fairly low level language for setting up data preprocessing, model training, and inference infrastructure for online machine learning applications. For an overview of tensorflow's design principles and its relationship with other popular machine learning technologies (like Caffe, Torch, and MXNet), Abadi's 2016 article is a good starting point.
However, in its most common usage, Tensorflow's implementation of common deep neural network operations are exposed through the Python API. Direct access to all of this functionality from a C++ API is under development but most of the core deep neural network functionality that users are familiar with are still only available through specific client languages with Python being the most developed. The high-level Tensorflow API is consistent with the modelling standards established by Keras (that support multiple alternate backend evaluation engines). The higher level abstractions do not necessarily offer the same flexibility of parallel computation as the core Tensorflow tool. However, they generally are more accessible to users that are primarily familiar with the modelling techniques that are common to artificial neural networks. Consequently, Keras has been integrated directly into the core utility libraries of the Tensorflow Python API.
Tensorflow documentation and tutorials
- Compute Canada article is not directly applicable on ARC but contains a lot of good information:
Installing Tensorflow
You will need a working local Conda install in your home directory first. If you do not have it yet, plaese follow these instructions to have it isntalled.
Once you have your own Conda, activate it with
$ ~/software/init-conda
We will install Tensorflow-GPU into its own conda environment.
It is very important to create the environment with python and tensorflow-gpu in the same command. This way conda can select the best tensorflow-gpu and python combination.
$ conda create -n tensorflow python tensorflow-gpu
Once it is done, activate your tensorflow ennvironment:
$ conda activate tensorflow
You can test with the tensorflow-test.py
script shown below.
Copy and paste the text into a file and run if from the command line:
$ python tensorflow-test.py
If you try this on the login node, it should tell you that GPUs are not available. It is normal, as the login node does not have any. You will need a GPU node to test the GPUs.
Once you know that your tensorflow environment is working properly, you can add more packages to the environment using conda.
To deactivate the environment use the
$ conda deactivate
command.
Test script
tensorflow-test.py
:
#! /usr/bin/env python
# ------------------------------------------
import os
import tensorflow as tf
os.environ['TF_XLA_FLAGS'] = '--tf_xla_enable_xla_devices'
# ------------------------------------------
print("Define data:")
node1 = tf.constant(3.0)
node2 = tf.constant(4.0)
print(node1, node2)
# ------------------------------------------
print("Compute:")
node3 = node1 + node2
print(node3, "=", float(node3))
# ------------------------------------------
Using Tensorflow on ARC
Requesting GPU Resources for Tensorflow Jobs
For interactive use see this How-To: How to request an interactive GPU on ARC.
An example of the job script tensorflow_job.slurm
:
#! /bin/bash
# ====================================
#SBATCH --job-name=tf-test
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=16GB
#SBATCH --time=0-04:00:00
#SBATCH --gres=gpu:1
#SBATCH --partition=gpu-v100
# ====================================
source ~/software/init-conda
conda activate tensorflow
python tensorflow-test.py