Conda on ARC

From RCSWiki
Revision as of 22:01, 16 October 2024 by Dmitri (talk | contribs) (→‎Installing Conda)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Background

Conda is a tool for managing and deploying applications, environments and packages.


Miniforge is a totally free alternative Conda installer which is maintained by the conda-forge channel. Miniforge version of Conda sets the conda-forge channel as the default source of packages.

https://conda-forge.org/miniforge/


While the Conda package manager is open source and free, downloading and installing software from the defaults and anaconda channels is not and requires payments.

Please use the Miniforge installer over the Miniconda one.

Some points on using Conda

  • Conda is a package manager and installer.
It has to be installed once, and then it can be used for managing the software the user wants.


  • Python is not a part of Conda.


  • Conda uses environments to separate software installations to prevent possible conflicts and incompatibilities.


  • Different software packages are to be installed into different environments.


  • Before one can use a package installed into an environment, the environment has to be activated.
Only the software installed into that environment will be available after the activation.


  • If a different environment needs to be activated, please make sure that the current environment is deactivated.


  • If multiple environments need a specific module or library, this module or library has to be installed multiple times.
Each environment is independent and separate from other environments, thus a module installed in one environment will not be available in a different environment.


  • Environments can be organized based on activities, rather than software.
If you are sure that several software packages you want do not interfere with each other, you can have them installed in the same environment, if this fits your usage pattern.
For example, you may want to have both Python and R installed in the same environment, if you typically use them both in the same activity.

Brief help message

$ conda --help
usage: conda [-h] [-V] command ...

conda is a tool for managing and deploying applications, environments and packages.

Options:

positional arguments:
  command
    clean        Remove unused packages and caches.
    config       Modify configuration values in .condarc. This is modeled
                 after the git config command. Writes to the user .condarc
                 file (/home/drozmano/.condarc) by default.
    create       Create a new conda environment from a list of specified
                 packages.
    help         Displays a list of available conda commands and their help
                 strings.
    info         Display information about current conda install.
    init         Initialize conda for shell interaction. [Experimental]
    install      Installs a list of packages into a specified conda
                 environment.
    list         List linked packages in a conda environment.
    package      Low-level conda package utility. (EXPERIMENTAL)
    remove       Remove a list of packages from a specified conda environment.
    uninstall    Alias for conda remove.
    run          Run an executable in a conda environment. [Experimental]
    search       Search for packages and display associated information. The
                 input is a MatchSpec, a query language for conda packages.
                 See examples below.
    update       Updates conda packages to the latest compatible version.
    upgrade      Alias for conda update.

optional arguments:
  -h, --help     Show this help message and exit.
  -V, --version  Show the conda version number and exit.

conda commands available from other packages:
  build
  convert
  debug
  develop
  env
  index
  inspect
  metapackage
  render
  server
  skeleton
  verify

Conda on ARC

Installing Conda

You can install a local copy of miniforge (miniconda) in your home directory on our clusters. It will give you flexibility to install packages needed for the workflow.

Before installing Conda, please review the article about installing software in your personal home directory. It may help you to plan your installations better.


IMPORTANT! When you follow these steps, it is VERY IMPORTANT to DECLINE the offer by the installer to modify your account, so that conda is automatically activated. Automatic activation leads to multiple potential problems later in your work. DECLINE the offer by the conda installer.


Here are the steps to follow:

Once connected to the login node, in your SSH session, make sure you are in your home directory:

$ cd 

Create a "software" subdirectory for all custom software you are going to have:

$ mkdir software
$ cd software 

Create a directory for installation sources (if you do not have it yet) and download the latest Miniforge (Miniconda) distribution file.

You can find recent releases on this page: https://conda-forge.org/miniforge/

Copy the link to the Miniforge3 with the name pattern Miniforge3-....-Linux-x86_64.sh and paste it on the wget command line:

$ mkdir src
$ cd src
$ wget https://github.com/conda-forge/miniforge/releases/download/24.3.0-0/Miniforge3-24.3.0-0-Linux-x86_64.sh

Execute the installer, the .sh file, and install miniforge (miniconda):

$ bash Miniforge3-24.3.0-0-Linux-x86_64.sh

Follow the instructions (choosing ~/software/miniforge3 as the directory to create), agree to the license, decline the offer to initialize.

Every time you launch a new terminal session and want to use this Conda install you have to initialize it. To make it easier you can create a short script
init-conda it the ~/software directory:

#! /bin/bash
eval "$(~/software/miniforge3/bin/conda shell.bash hook)"

You can use your favourite text editor, nano for example, to create it:

$ nano ~/software/init-conda

Once you have the init script ready, you can activate your conda with the

$ source ~/software/init-conda
(base) $

command.

You can check that it works by ensuring that it is using python installed inside your home directory

(base) $ which python 
~/software/miniforge3/bin/python

(base) $ python --version
Python 3.9.18

The version of python depends on when you downloaded the conda installation file.

To deactivate conda, use the

$ conda deactivate

command.

Using Conda Environments in SLURM jobs

Any new software you are going to install using Conda has to be installed in a special Conda environment. You should never install anything it the (base) environment, this environment is the system enivronment that provide Conda funtionality, it is required for Conda to function properly. New software packages for the programs you want to use have to be installed in derived Conda environments. The derived environments can hold just one software package, or a set of programs that can be installed together (without conflicts) and which are used for the same purpose / activity.

If something goes wrong, a derived environment can be removed and recreated, relatively easily. if something wrong with the base environment, the entire Conda setup, likely including other Conda environments, will have to be reinstalled.


This is an example job script which does Conda initialization and a custom environment activation. The init-conda script is used to activate Conda itself, then Conda is used to activate the enviroment, custom_env.

conda-job.slurm:

#! /bin/bash
# ====================================
#SBATCH --job-name=conda-test
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=4GB
#SBATCH --time=0-01:00:00
# ====================================
# Activate Conda and then the environment.
source ~/software/init-conda
conda activate custom_env

# Use the software here.
....
....

Using Conda

Activating Base Conda

This is a more detailed rehearsal of the example shown in the installation procedure.

Once you installed your own Miniforge3, in the directory of your choice, it has to be activated before you can use it. It has to be in every session you want to use it on ARC's login node, or in every job script your will be submitting to ARC, if the job needs to rely on your Conda environments.


Let us assume, that Conda is installed into the ~/software/miniforge3 sub-directory (~ indicates your home directory).

Then to activate it you can use the following command on the command line:

[username@arc ~]$ eval "$(~/software/miniforge3/bin/conda shell.bash hook)"
(base) [username@arc ~]$ 

If you do not need it any more, you can deactivate your Conda with the following command:

(base) [username@arc ~]$ conda deactivate
[username@arc ~]$

To avoid typing this cryptic command every time you need your Conda, you can save it into a shell script file, init-conda, and place it in some handy location in your home directory, ~/software, for example. ~/software/init-conda:

#! /bin/bash
eval "$(~/software/miniforge3/bin/conda shell.bash hook)"

Once you have the init script, you can use it, instead, to get your Conda install active:

[username@arc ~]$ source ~/software/init-conda 
(base) [username@arc ~]$ 

Updating Conda

You can update your Conda by using the update conda command. Below is shown an example session, which updates 'conda from version 23.3.1 to version 23.9.0.

You have to activate your base conda first.

(base) [username@arc ~]$ conda --version
conda 23.3.1

(base) [username@arc ~]$ conda update conda

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##
  environment location: /home/username/software/miniforge3

  added / updated specs: 
    - conda

The following packages will be downloaded:
    package                    |            build
    ---------------------------|-----------------
    brotli-python-1.0.9        |   py39h6a678d5_7         330 KB
    ca-certificates-2023.08.22 |       h06a4308_0         123 KB
    ....
    ....
    wheel-0.41.2               |   py39h06a4308_0         108 KB
    ------------------------------------------------------------
                                           Total:        38.5 MB

The following NEW packages will be INSTALLED:
  brotli-python      pkgs/main/linux-64::brotli-python-1.0.9-py39h6a678d5_7

The following packages will be REMOVED:

  brotlipy-0.7.0-py39h27cfd23_1003
  ....
  yaml-0.2.5-h7b6447c_0  

The following packages will be UPDATED:
  ca-certificates                     2023.01.10-h06a4308_0 --> 2023.08.22-h06a4308_0
  ....
  ....
  wheel                               0.38.4-py39h06a4308_0 --> 0.41.2-py39h06a4308_0

Proceed ([y]/n)? y

Downloading and Extracting Packages
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

(base) [drozmano@arc ~]$ conda --version
conda 23.9.0

Creating Conda environments

Create a virtual environment for your project

$ conda create -n <yourenvname>

Install additional Python packages to the virtual environment

$ conda install -n <yourenvname> [package]

Activate the virtual environment

$ source activate <yourenvname>

At this point you should be able to use your own python with the modules you added to it.

Example

After you login to ARC:

# Activate Conda using your own activation script
$ source ~/software/init-conda

(base) $ conda info
     active environment : base
    active env location : /home/username/software/miniforge3
.....

# Create a new environment based on python and tensorflow module.
(base) $ conda create -n tensorflow python tensorflow-gpu
....

# Once installed, activate the new environment for testing and work.
(base) $ conda activate tensorflow

# Test the installed software.
(tensorflow) $ python tensorflow-test.py
.... 

# Deactivate the environment.
(tensorflow) $ conda deactivate 

# Deactivate Conda
(base) $ conda deactivate 

$

Managing environments

Get help

$ conda env --help
$ conda env list --help
$ conda env remove --help

List Conda environments

$ conda env list

# conda environments:
#
base                  *  /home/username/my_software/miniforge3
pytorch                  /home/username/my_software/miniforge3/envs/pytorch
tensorflow               /home/username/my_software/miniforge3/envs/tensorflow
                         /home/username/opt/my_env
                         /home/username/opt/my_env2

In the example, 5 environments are listed.

The first 3 are named environments: base, pytorch and tensorflow.

The last two can only be referenced by the path the installed in: ~/opt/my_env, and ~/opt/my_env2.

Remove an environment

Remove a named environment:

$ conda env remove -n pytorch

or, using an environment path:

$ conda env remove -p ~/opt/my_env

Getting info about environments

List packages installed in the current environment:

$ conda list

# packages in environment at /home/username/my_software/miniforge3/envs/tensorflow:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             4.5                       1_gnu  
_tflow_select             2.1.0                       gpu  
....
....

List packages matching a pattern:

$ conda list tensorflow

# packages in environment at /home/username/my_software/miniforge3/envs/tensorflow:
#
# Name                    Version                   Build  Channel
tensorflow                2.4.1           gpu_py39h8236f22_0  
tensorflow-base           2.4.1           gpu_py39h29c2da4_0  
tensorflow-estimator      2.6.0              pyh7b7c402_0  
tensorflow-gpu            2.4.1                h30adc30_0  

Links

ARC Software pages