Bioinformatics applications: Difference between revisions
(/* The ARC cluster offers a variety of hardware gears to suit the requirements of any workflow. As a first step, review the compute resources on the ARC cluster (weblink). It will facilitate the process of choosing the appropriate partition depending on the nature of the workflow. For example, choose the gpu-v100 partition for GPU accelerators or the bigmem partition for a memory intensive workflow. Each partition has multiple compute nodes with similar hardware specifications. The illustrati...) |
(→FastQC) |
||
Line 20: | Line 20: | ||
FastQC assess the quality of your sequencing data. It is available as a module on the ARC cluster. To run the installed version of fastqc, load the biobuilds/2017.11 module as below: | FastQC assess the quality of your sequencing data. It is available as a module on the ARC cluster. To run the installed version of fastqc, load the biobuilds/2017.11 module as below: | ||
[tannistha.nandi@arc ~]$ module load biobuilds/2017.11 | |||
Loading biobuilds/2017.11 | |||
Loading requirement: java/1.8.0 biobuilds/conda | |||
[tannistha.nandi@arc ~]$ fastqc --version | |||
FastQC v0.11.5 | |||
[tannistha.nandi@arc ~]$ fastqc --help | |||
FastQC - A high throughput sequence QC analysis tool | |||
SYNOPSIS | |||
fastqc seqfile1 seqfile2 .. seqfileN | |||
fastqc [-o output dir] [--(no)extract] [-f fastq|bam|sam] | |||
[-c contaminant file] seqfile1 .. seqfileN |
Revision as of 05:41, 8 June 2021
Strategies to write efficient bioinformatics workflows for the ARC high performance computing (HPC) cluster
One of the challenges to deal with big genomics data set is their long runtimes. The effective and efficient use of the computing resources on the Advanced Research Computing (ARC) cluster can speed up the runtimes.
The ARC cluster offers a variety of hardware gears to suit the requirements of any workflow. As a first step, review the compute resources on the ARC cluster (weblink). It will facilitate the process of choosing the appropriate partition depending on the nature of the workflow. For example, choose the gpu-v100 partition for GPU accelerators or the bigmem partition for a memory intensive workflow. Each partition has multiple compute nodes with similar hardware specifications. The illustration below shows that gpu-v100 partition has 13 compute nodes, cpu2019 partition has 40 compute nodes, cpu2013 has 14 compute nodes and so on.
This section will review the SLURM job submission scripts for the bioinformatics applications below.
- FastQC - A high throughput sequence QC analysis tool
- Burrows-Wheeler Aligner (BWA)
- Samtools
FastQC
FastQC assess the quality of your sequencing data. It is available as a module on the ARC cluster. To run the installed version of fastqc, load the biobuilds/2017.11 module as below:
[tannistha.nandi@arc ~]$ module load biobuilds/2017.11 Loading biobuilds/2017.11 Loading requirement: java/1.8.0 biobuilds/conda [tannistha.nandi@arc ~]$ fastqc --version FastQC v0.11.5 [tannistha.nandi@arc ~]$ fastqc --help
FastQC - A high throughput sequence QC analysis tool
SYNOPSIS
fastqc seqfile1 seqfile2 .. seqfileN
fastqc [-o output dir] [--(no)extract] [-f fastq|bam|sam] [-c contaminant file] seqfile1 .. seqfileN