Bioinformatics applications: Difference between revisions

Revision as of 05:41, 8 June 2021

Strategies to write efficient bioinformatics workflows for the ARC high performance computing (HPC) cluster

One of the challenges to deal with big genomics data set is their long runtimes. The effective and efficient use of the computing resources on the Advanced Research Computing (ARC) cluster can speed up the runtimes.

The ARC cluster offers a variety of hardware gears to suit the requirements of any workflow. As a first step, review the compute resources on the ARC cluster (weblink). It will facilitate the process of choosing the appropriate partition depending on the nature of the workflow. For example, choose the gpu-v100 partition for GPU accelerators or the bigmem partition for a memory intensive workflow. Each partition has multiple compute nodes with similar hardware specifications. The illustration below shows that gpu-v100 partition has 13 compute nodes, cpu2019 partition has 40 compute nodes, cpu2013 has 14 compute nodes and so on.

The ARC cluster is a diverse collection of hardwares

This section will review the SLURM job submission scripts for the bioinformatics applications below.

FastQC - A high throughput sequence QC analysis tool
Burrows-Wheeler Aligner (BWA)
Samtools

FastQC

FastQC assess the quality of your sequencing data. It is available as a module on the ARC cluster. To run the installed version of fastqc, load the biobuilds/2017.11 module as below:

         [tannistha.nandi@arc ~]$ module load biobuilds/2017.11
         Loading biobuilds/2017.11
          Loading requirement: java/1.8.0 biobuilds/conda
         [tannistha.nandi@arc ~]$ fastqc --version
         FastQC v0.11.5
         [tannistha.nandi@arc ~]$ fastqc --help

           FastQC - A high throughput sequence QC analysis tool

         SYNOPSIS

         fastqc seqfile1 seqfile2 .. seqfileN

         fastqc [-o output dir] [--(no)extract] [-f fastq|bam|sam] 
          [-c contaminant file] seqfile1 .. seqfileN

Bioinformatics applications: Difference between revisions

Revision as of 05:41, 8 June 2021

Strategies to write efficient bioinformatics workflows for the ARC high performance computing (HPC) cluster

FastQC

Navigation menu

Search