Draft: Workflow Management Systems

From RCSWiki
Revision as of 15:11, 17 October 2024 by Pemartin (talk | contribs) (Created page with "= Overview = This guide gives an overview of running Workflow Management Systems (WfMS) like Prefect and Toil-CWL. == Toil-CWL == === Installation === There are two required environments for running toil CWL. In addition to a standard python environment we also need a node.js environment. First we create the python environment. $ python3 -m venv /path/to/new/virtual/environment We activate the environment with $ source /path/to/new/virtual/environment/bin/activate...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Overview

This guide gives an overview of running Workflow Management Systems (WfMS) like Prefect and Toil-CWL.

Toil-CWL

Installation

There are two required environments for running toil CWL. In addition to a standard python environment we also need a node.js environment. First we create the python environment.

$ python3 -m venv /path/to/new/virtual/environment

We activate the environment with

$ source /path/to/new/virtual/environment/bin/activate

Then we can proceed to install toil and CWL with the following command

$ pip install 'toil[cwl]'

This will install the toil-cwl-runner executable

then we proceed to install nodeenv to create an isolated node.js environment with the following command

$ pip install nodeenv


We then create a node.js environment

$ nodeenv node_env

or

$ nodeenv -n 16.15.0 node_env #for a specific version


This environment can be then activated with

$ . env/bin/activate


Usually the two environment activation commands are grouped in a single bash script for manual launch or automatic launch by the toil script.

Running a workflow

There are several steps to run a CWL workflow on ARC. It is advised that all this steps are writen in a single bash file like the one below:

 #!/bin/zsh
 source activate_envs.sh

 module purge
 module load R/3.6.2

 PATH=/home/pemartin/scripts/bioinformatics-tools/samtools-1.16.1:/home/pemartin/scripts/bioinformatics-tools/bwa:$PATH
 
 export TOIL_SLURM_ARGS="-t 72:00:00 --partition cpu2023"
 
 toil clean /bulk/rsws/my-job-store-gridss/
 cwltool --validate src/pipeline_gridss.cwl src/inputs.yml
 toil-cwl-runner --restart --singularity --batchSystem=slurm --workDir=/bulk/rsws/work_dir --coordinationDir=/bulk/rsws/coord_dir --logFile=cwltoil-gridss.log --writeLogs=/bulk/rsws/logs --jobStore file:/bulk/rsws/my-job-store-gridss --stats --retryCount=0 --cleanWorkDir=onSuccess --bypass-file-store --outdir=/home/pemartin/scripts/bioinformatics-data/OUT/gridss src/pipeline_gridss.cwl src/inputs_gridss.yml
 
 deactivate


Resources

https://www.commonwl.org/v1.2/Workflow.html
https://www.commonwl.org/v1.2/CommandLineTool.html