Draft: Workflow Management Systems
Overview
This guide gives an overview of running Workflow Management Systems (WfMS) like Prefect and Toil-CWL.
Toil-CWL
Installation
There are two required environments for running toil CWL. In addition to a standard python environment we also need a node.js environment. First we create the python environment.
$ python3 -m venv /path/to/new/virtual/environment
We activate the environment with
$ source /path/to/new/virtual/environment/bin/activate
Then we can proceed to install toil and CWL with the following command
$ pip install 'toil[cwl]'
This will install the toil-cwl-runner executable
then we proceed to install nodeenv to create an isolated node.js environment with the following command
$ pip install nodeenv
We then create a node.js environment
$ nodeenv node_env
or
$ nodeenv -n 16.15.0 node_env #for a specific version
This environment can be then activated with
$ . env/bin/activate
Usually the two environment activation commands are grouped in a single bash script for manual launch or automatic launch by the toil script.
Running a workflow
There are several steps to run a CWL workflow on ARC. It is advised that all this steps are writen in a single bash file like the one below:
#!/bin/zsh
source activate_envs.sh
module purge
module load R/3.6.2
PATH=/home/pemartin/scripts/bioinformatics-tools/samtools-1.16.1:/home/pemartin/scripts/bioinformatics-tools/bwa:$PATH
export TOIL_SLURM_ARGS="-t 72:00:00 --partition cpu2023"
toil clean /bulk/rsws/my-job-store-gridss/
cwltool --validate src/pipeline_gridss.cwl src/inputs.yml
toil-cwl-runner --restart --singularity --batchSystem=slurm --workDir=/bulk/rsws/work_dir --coordinationDir=/bulk/rsws/coord_dir --logFile=cwltoil-gridss.log --writeLogs=/bulk/rsws/logs --jobStore file:/bulk/rsws/my-job-store-gridss --stats --retryCount=0 --cleanWorkDir=onSuccess --bypass-file-store --outdir=/home/pemartin/scripts/bioinformatics-data/OUT/gridss src/pipeline_gridss.cwl src/inputs_gridss.yml
deactivate
Resources
https://www.commonwl.org/v1.2/Workflow.html https://www.commonwl.org/v1.2/CommandLineTool.html