GROMACS: Difference between revisions

From RCSWiki
Jump to navigation Jump to search
m (Updated arc cluster category)
(34 intermediate revisions by 4 users not shown)
Line 1: Line 1:
= General =
= General =


Line 17: Line 18:


Researchers using '''GROMACS''' on ARC are expected to be generally familiar with its capabilities,  
Researchers using '''GROMACS''' on ARC are expected to be generally familiar with its capabilities,  
input file types and their formats and the use of checkpoint files to restart symulations.
input file types and their formats and the use of checkpoint files to restart simulations.


Like other calculations on ARC systems, '''GROMACS''' is run by submitting an appropriate script for batch scheduling using the '''sbatch''' command.  
Like other calculations on ARC systems, '''GROMACS''' is run by submitting an appropriate script for batch scheduling using the '''sbatch''' command.  
Line 25: Line 26:


Currently there are several software modules on ARC that provide different versions of '''GROMACS'''.  
Currently there are several software modules on ARC that provide different versions of '''GROMACS'''.  
The versions differ in the release date as well as in the kind of the CPU architecture the software is compiled for.
The versions differ in the release date as well as the CPU architecture the software is compiled for.


You can see them using the <code>module</code> command:
You can see them using the <code>module</code> command:
Line 90: Line 91:
(2) a SLURM job script '''.slurm'''.
(2) a SLURM job script '''.slurm'''.


Place you input files for a simulation into a separate directory and  
Place your input files for a simulation into a separate directory and  
prepare an appropriate '''job script''' for it.
prepare an appropriate '''job script''' for it.
<pre>
<pre>
Line 178: Line 179:
The <code>md.log</code> is the '''next''' thing to look into.
The <code>md.log</code> is the '''next''' thing to look into.


If everything as expected, then you '''computation is done''' and you can use the results.  
If everything is as expected, then your '''computation is done''' and you can use the results.  
'''Success!'''
'''Success!'''


Line 193: Line 194:
=== The job script ===
=== The job script ===


If you have input files as in the example above, and you use you own computer,  
If you have input files as in the example above, and you use your own computer,  
then to run the simulation,
then to run the simulation,
you have to generate a binary input file for the <code>mdrun</code> '''GROMACS''' command first.
you have to generate a binary input file for the <code>mdrun</code> '''GROMACS''' command first.
Line 482: Line 483:
==== Tip ====
==== Tip ====


Sometimes it is useful to run preprocessor to generate the binary input before submitting a job.  
Sometimes it is useful to run the preprocessor to generate the binary input before submitting a job.  
In the example above there had been two notes about the setup and you have to make sure that  
In the example above there had been two notes about the setup and you have to make sure that  
you are ready to continue with the simulation despite them.  
you are ready to continue with the simulation despite them.  
Line 496: Line 497:


The '''bilayer''' system of ~105000 atoms was simulated for 100000 steps.
The '''bilayer''' system of ~105000 atoms was simulated for 100000 steps.
=== Skylake partitions ===
<pre>
<pre>
----------------------------------------------------------------------
-------------------------------------------------------------------------------
#CPUs   Node   Processes    Threads    Wall Time    Performance
#CPUs  #Nodes  Processes    Threads    Wall Time    Performance Efficiency
                 per node  per proc          (s)      (ns/day)
                 per node  per proc          (s)      (ns/day)        (%)
----------------------------------------------------------------------
-------------------------------------------------------------------------------
     1      1          1          1      1031.6          0.84
     1      1          1          1      1031.6          0.84       100.0


   10      1          1        10        119.0          7.26
   10      1          1        10        119.0          7.26       86.4
   20      1          1        20        65.9          13.11
   20      1          1        20        65.9          13.11       78.0
   40      1          1        40        37.6          22.97
   40      1          1        40        37.6          22.97       68.4


   10      1          10          1        126.4          6.83
   10      1          10          1        126.4          6.83       81.3 
   20      1          20          1        78.0          11.08
   20      1          20          1        78.0          11.08       66.0
   40      1          40          1        45.3          19.07
   40      1          40          1        45.3          19.07       56.8


   40      1          20          2        48.3          17.90
   40      1          20          2        48.3          17.90       53.3       
   40      1          10          4        43.9          19.68
   40      1          10          4        43.9          19.68       58.6
   36      1          6          6        49.4          17.50
   36      1          6          6        49.4          17.50       57.9


   80      2          10          4        30.5          28.34
   80      2          10          4        30.5          28.34       42.2 
   80      2          20          2        32.6          26.53
   80      2          20          2        32.6          26.53       39.4
   80      2          40          1        30.3          28.49
   80      2          40          1        30.3          28.49       42.4


   120      3          40          1        20.8          41.51
   120      3          40          1        20.8          41.51       41.2
   160      4          40          1        19.2          44.95
   160      4          40          1        19.2          44.95       33.4
----------------------------------------------------------------------
-------------------------------------------------------------------------------
</pre>
</pre>


The observations:
'''Observations''':
* If you want to run the job on a '''single node''', then use '''1 process''' with as many threads as the number of CPUs you request.
* If you want to run the job on a '''single node''', then use '''1 process''' with as many threads as the number of CPUs you request.


Line 530: Line 533:


* Going '''beyond 3 nodes''' may not be computationally efficient on ARC.
* Going '''beyond 3 nodes''' may not be computationally efficient on ARC.
'''gpu-v100''' partition:
<pre>
-------------------------------------------------------------------------------------
#CPUs  #GPUs  #Nodes  Processes    Threads    Wall Time    Performance  Efficiency
                        per node  per proc          (s)      (ns/day)        (%)
-------------------------------------------------------------------------------------
    1      0      1          1          1      1031.6          0.84      100.0
    4      1      1          1          4        73.8          11.71
  10      1      1          1        10        39.4          21.92
  20      1      1          1        20        123.3          7.01      ??
  20      1      1          1        20        33.0          26.22      ??
  40      1      1          1        40        35.0          24.68
  40      2      1          1        40        34.4          25.10
  16      2      1          2          8        164.1          5.27
  40      2      1        10          4        37.2          23.23
  80      2      2        10          2        34.1          25.34
-------------------------------------------------------------------------------
</pre>
=== Legacy partitions ===
The '''parallel''' partition:
<pre>
-------------------------------------------------------------------------------
#CPUs  #Nodes  Processes    Threads    Wall Time    Performance  Efficiency
                per node  per proc          (s)      (ns/day)        (%)
-------------------------------------------------------------------------------
    1      1          1          1      2732.2          0.32      100.0
    6      1          1          6        518.4          1.67
  12      1          1        12        270.0          3.20
  12      1          12          1        283.8          3.04
  24      2          12          1        139.5          6.20
  36      3          12          1        93.5          9.25
  48      4          12          1        71.0          12.17
  72      6          12          1        55.9          15.46
  96      8          12          1        45.1          19.16
  120      10          12          1        40.8          21.16
  144      12          12          1        35.0          24.70
  196      16          12          1        31.1          27.75
-------------------------------------------------------------------------------
</pre>
The '''lattice''' partition:
<pre>
-------------------------------------------------------------------------------
#CPUs  #Nodes  Processes    Threads    Wall Time    Performance  Efficiency
                per node  per proc          (s)      (ns/day)        (%)
-------------------------------------------------------------------------------
    1      1          1          1      2732.2          0.32      100.0
    8      1          8          1       
-------------------------------------------------------------------------------
</pre>


== Selected GROMACS commands ==
== Selected GROMACS commands ==
Line 826: Line 890:
           Seed for replica exchange, -1 is generate a seed
           Seed for replica exchange, -1 is generate a seed
</pre>
</pre>
= Support =
Please send any questions regarding using GROMACS on ARC to support@hpc.ucalgary.ca.
[[Category:GROMACS]]
[[Category:ARC]]
[[Category:Software]]

Revision as of 15:55, 24 July 2020

General


GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles.

It is primarily designed for biochemical molecules like proteins, lipids and nucleic acids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.

Using GROMACS on ARC

Researchers using GROMACS on ARC are expected to be generally familiar with its capabilities, input file types and their formats and the use of checkpoint files to restart simulations.

Like other calculations on ARC systems, GROMACS is run by submitting an appropriate script for batch scheduling using the sbatch command. For more information about submitting jobs, see Running jobs article.

GROMACS modules

Currently there are several software modules on ARC that provide different versions of GROMACS. The versions differ in the release date as well as the CPU architecture the software is compiled for.

You can see them using the module command:

$ module avail gromacs

----------- /global/software/Modules/3.2.10/modulefiles ---------
gromacs/2016.3-gnu
gromacs/2018.0-gnu
gromacs/2019.6-legacy 
gromacs/2019.6-skylake 
gromacs/5.0.7-gnu

The names of the modules give hints on the specific version of GROMACS they provide access to.

  • The gnu suffix indicates that those versions have been compiled with GNU GCC compiler.
In these specific cases, GCC 4.8.5.
  • GROMACS 2019.6 was compiled using GCC 7.3.0 for two different CPU kinds, the old kind, legacy, and the new kind, skylake.

The legacy module should be used on compute nodes before 2019, and the skylake module is for node from 2019 and up.

  • All GROMACS versions provided by all the modules have support for GPU computations, even though it may not be practical to run it on GPU nodes due to limited resources.

A module has to be loaded before GROMACS can be used on ARC. Like this:

$ gmx --version
bash: gmx: command not found...

$ module load gromacs/2019.6-legacy  

$ gmx --version
                         :-) GROMACS - gmx, 2019.6 (-:

                            GROMACS is written by:
     Emile Apol      Rossen Apostolov      Paul Bauer     Herman J.C. Berendsen
.....
.....
GROMACS version:    2019.6
Precision:          single
Memory model:       64 bit
MPI library:        none
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support:        CUDA
SIMD instructions:  SSE4.1
FFT library:        fftw-3.3.7-sse2-avx
RDTSCP usage:       enabled
TNG support:        enabled
Hwloc support:      hwloc-1.11.8
Tracing support:    disabled
C compiler:         /global/software/gcc/gcc-7.3.0/bin/gcc GNU 7.3.0
C compiler flags:    -msse4.1     -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  
C++ compiler:       /global/software/gcc/gcc-7.3.0/bin/g++ GNU 7.3.0
C++ compiler flags:  -msse4.1    -std=c++11   -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  
CUDA compiler:      /global/software/cuda/cuda-10.0.130/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2018 NVIDIA Corporation;Built on Sat_Aug_25_21:08:01_CDT_2018;Cuda compilation tools, release 10.0, V10.0.130
CUDA compiler flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;;; ;-msse4.1;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
CUDA driver:        9.10
CUDA runtime:       N/A

Running a GROMACS Job

To run your simulation on ARC cluster you have to have: (1) a set of GROMACS input files and (2) a SLURM job script .slurm.

Place your input files for a simulation into a separate directory and prepare an appropriate job script for it.

$ ls -l

-rw-r--r-- 1 drozmano drozmano 7224622 Jan 29  2014 bilayer.gro
-rw-r--r-- 1 drozmano drozmano    2567 May 21 12:14 bilayer.mdp
-rw-r--r-- 1 drozmano drozmano      87 Apr 29  2016 bilayer.top
drwxr-xr-x 1 drozmano drozmano     200 May 20 16:27 ff
-rw-r--r-- 1 drozmano drozmano     504 May 20 16:30 ff.top
-rwxr-xr-x 1 drozmano drozmano    1171 May 21 13:45 job.slurm

Here:

  • bilayer.top -- contains the topology of the system.
  • bilayer.gro -- contains initial configuration (positions of atoms) of the system.
  • bilayer.mdp -- contains parameters of the simulation run (GROMACS settings).
  • ff -- a directory containing external custom force field (models). Not required if only standard models are used.
  • ff.top -- a topology file that includes required models from available force fields.
This file is included by the bilayer.top file.
  • job.slurm -- a SLURM jobs script that is used to submit this calculation to the cluster.


On ARC, you can get an example with the files shown above with the commands:

$ tar xvf /global/software/gromacs/tests-2019/bilayer.tar.bz2

$ cd bilayer
$ ls -l
....

At this point you can submit your job to ARC's scheduler (SLURM).

$ sbatch job.slurm 

Submitted batch job 5570681

You can check the status of the job using the job ID from the confirmation message above.

$ squeue -j 5570681

JOBID     USER     STATE   TIME_LIMIT  TIME  NODES TASKS CPUS  MIN_MEMORY  NODELIST            
5570681   drozmano RUNNING    5:00:00  0:15      1     1   40          8G       fc1                 

The squeue command output may look different in your case depending on your settings.


After the job is over and it does not show in the squeue output we can check the results.

$ ls -l

-rw-r--r-- 1 drozmano drozmano 7224622 Jan 29  2014 bilayer.gro
-rw-r--r-- 1 drozmano drozmano    2567 May 21 12:14 bilayer.mdp
-rw-r--r-- 1 drozmano drozmano      87 Apr 29  2016 bilayer.top
-rw-r--r-- 1 drozmano drozmano 2594772 May 21 14:37 bilayer.tpr
-rw-r--r-- 1 drozmano drozmano 7224622 May 21 14:37 confout.gro
-rw-r--r-- 1 drozmano drozmano    1772 May 21 14:37 ener.edr
drwxr-xr-x 1 drozmano drozmano     200 May 20 16:27 ff
-rw-r--r-- 1 drozmano drozmano     504 May 20 16:30 ff.top
-rwxr-xr-x 1 drozmano drozmano    1172 May 21 14:36 job.slurm
-rw-r--r-- 1 drozmano drozmano   82051 May 21 14:37 md.log
-rw-r--r-- 1 drozmano drozmano   10949 May 21 14:37 mdout.mdp
-rw-r--r-- 1 drozmano drozmano   13162 May 21 14:37 slurm-5570681.out
-rw-r--r-- 1 drozmano drozmano 2514832 May 21 14:37 state.cpt
-rw-r--r-- 1 drozmano drozmano 4309444 May 21 14:37 traj_comp.xtc

The new files here are:

  • bilayer.tpr -- Binary input file that is generated by the gmx grompp before the actual simulation.
This file can be generated before submitting the job, if needed. Here it is created in the job script.
  • confout.gro -- a configuration file containing the final atomic positions at the end of the simulation.
  • ener.edr -- a file with energy data. Can be used for later analysis.
  • md.log -- the main log file for the simulation.
  • mdout.mdp -- a file containing all simulation parameters as used by GROMACS. Based on bilayer.mdp.
  • state.cpt -- a binary checkpoint file containing the system state at the end of the simulation.
This file can be used to continue simulations further.
  • traj_com.xtc -- a trajectory file containing atomic positions at some time points during the simulation.
  • slurm-5570681.out -- The intercept of the output printed on screen during the simulation.
Done by SLURM for you. The number in the name is the job ID of the job.


If something is not working the way you expected, then the slurm-5570681.out file is the first place you should examine.

The md.log is the next thing to look into.

If everything is as expected, then your computation is done and you can use the results. Success!

You may want to check the output file and the main log anyways:

# Press "q" to exit the text viewer.
$ less slurm-5570681.out
....

$ less md.log
....

The job script

If you have input files as in the example above, and you use your own computer, then to run the simulation, you have to generate a binary input file for the mdrun GROMACS command first.

$ gmx grompp -v -f bilayer.mdp -c bilayer.gro -p bilayer.top -o bilayer.tpr

                      :-) GROMACS - gmx grompp, 2019.6 (-:

                            GROMACS is written by:
     Emile Apol      Rossen Apostolov      Paul Bauer     Herman J.C. Berendsen
.....
.....
Checking consistency between energy and charge groups...
Calculating fourier grid dimensions for X Y Z
Using a fourier grid of 112x112x52, spacing 0.116 0.116 0.112
Estimate for the relative computational load of the PME mesh part: 0.13
This run will generate roughly 14 Mb of data
writing run input file...

There were 2 notes

$ ls -l
-rw-r--r-- 1 drozmano drozmano 7224622 Jan 29  2014 bilayer.gro
-rw-r--r-- 1 drozmano drozmano    2567 May 21 12:14 bilayer.mdp
-rw-r--r-- 1 drozmano drozmano      87 Apr 29  2016 bilayer.top
-rw-r--r-- 1 drozmano drozmano 2594772 May 21 15:06 bilayer.tpr
drwxr-xr-x 1 drozmano drozmano     200 May 20 16:27 ff
-rw-r--r-- 1 drozmano drozmano     504 May 20 16:30 ff.top

At this point you can run the simulation like this:

$ gmx mdrun -v -s bilayer.tpr

                      :-) GROMACS - gmx grompp, 2019.6 (-:

                            GROMACS is written by:
     Emile Apol      Rossen Apostolov      Paul Bauer     Herman J.C. Berendsen
......
... lots-of-output ...
......
Writing final coordinates.
^Mstep 10000, remaining wall clock time:     0 s          
               Core t (s)   Wall t (s)        (%)
       Time:     1557.342       38.939     3999.4
                 (ns/day)    (hour/ns)
Performance:       22.191        1.082

GROMACS reminds you: "In ..."

To run this calculation on a cluster via the SLURM scheduling system you have to provide a job script that does two things:

  1. Provides steps that run the simulation, and
  2. Requests all necessary computational resources that are needed for this simulation.


The steps to run the simulation you already know:

  1. Load a desired GROMACS module to activate the software.
  2. Generate the binary input, the .tpr file.
  3. Run the mdrun command on it.


The computational resources for the run include:

  • the Number of CPUs,
  • the Amount of memory (RAM), and
  • the Time sufficient to complete the computation.


Below, several examples of job scripts are given.

These scripts are suitable for use on ARC depending on

  • what part of the cluster the job should run on and
  • the number of CPUs the job is going to use.

Single node (modern) job script

This script is for jobs that use up to one full modern node (2019 and later). These nodes are in the list of default partitions and have 40 CPUs each.


This specific example requests 40 CPUs on 1 node for 5 hours. It also requests 16GB of RAM on the node. The simulation runs as 1 process of 40 threads.

job-single.slurm:

#!/bin/bash
# ================================================================
#SBATCH --job-name=gro_test

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=40
#SBATCH --mem=16GB
#SBATCH --time=0-05:00:00
# ================================================================
module purge
module load gromacs/2019.6-skylake
# ================================================================
echo "Starting at `date`."
echo "========================="
# Input files.
MDP=bilayer.mdp
TOP=bilayer.top
GRO=bilayer.gro

# Binary input file to generate.
TPR=bilayer.tpr

# Preprocess the input files.
gmx grompp -v -f $MDP -c $GRO -p $TOP -o $TPR

# Check if preprocessing have gone well.
if ! test -e $TPR; then
	echo "ERROR: Could not create a TPR file for the run. Aborting."
	exit
fi

# Run the simulation.
OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

gmx mdrun -v -s $TPR
echo "========================="
echo "Done at `date`."
# ================================================================

Multi node (modern) job script

This script is for jobs that use several modern nodes (2019 and later). These nodes are in the list of default partitions and have 40 CPUs each.


This specific example requests 80 CPUs on 2 nodes (40 CPUs each) for 5 hours. It also requests 16GB of RAM on each node. The simulation runs as 80 one-threaded processes.

job-multi.slurm:

#!/bin/bash
# ================================================================
#SBATCH --job-name=gro_test

#SBATCH --nodes=2
#SBATCH --ntasks-per-node=40
#SBATCH --cpus-per-task=1
#SBATCH --mem=16GB
#SBATCH --time=0-05:00:00
# ================================================================
module purge
module load gromacs/2019.6-skylake
# ================================================================
echo "Starting at `date`."
echo "========================="
# Input files.
MDP=bilayer.mdp
TOP=bilayer.top
GRO=bilayer.gro

# Binary input file to generate.
TPR=bilayer.tpr

# Preprocess the input files.
gmx grompp -v -f $MDP -c $GRO -p $TOP -o $TPR

# Check if preprocessing have gone well.
if ! test -e $TPR; then
	echo "ERROR: Could not create a TPR file for the run. Aborting."
	exit
fi

# Run the simulation.
OMP_NUM_THREADS=1
mpiexec gmx_mpi mdrun -v -s $TPR

echo "========================="
echo "Done at `date`."
# ================================================================

Single node (legacy) job script

This script is for jobs that use up to one full legacy node (parallel, cpu2013, lattice partitions). The partition has to be specified in the resource request.


This specific example requests 12 CPUs on 1 node in the parallel legacy partition for 5 hours. It also requests 23GB of RAM on the node (all available memory on this kind). The simulation runs as 1 process of 12 threads.

job-single-legacy.slurm:

#!/bin/bash
# ================================================================
#SBATCH --job-name=gro_test

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=12
#SBATCH --mem=23GB
#SBATCH --time=0-12:00:00

#SBATCH --partition=parallel
# ================================================================
module purge
module load gromacs/2019.6-legacy
# ================================================================
echo "Starting at `date`."
echo "========================="
# Input files.
MDP=bilayer.mdp
TOP=bilayer.top
GRO=bilayer.gro

# Binary input file to generate.
TPR=bilayer.tpr

# Preprocess the input files.
gmx grompp -v -f $MDP -c $GRO -p $TOP -o $TPR

# Check if preprocessing have gone well.
if ! test -e $TPR; then
	echo "ERROR: Could not create a TPR file for the run. Aborting."
	exit
fi

# Run the simulation.
OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

gmx mdrun -v -s $TPR
echo "========================="
echo "Done at `date`."
# ================================================================

Multi node (legacy) job script

This script is for jobs that use several legacy node (parallel, cpu2013, lattice partitions). The partition has to be specified in the resource request.


This specific example requests 48 CPUs on 4 nodes in the parallel legacy partition (12 CPUs each) for 5 hours. It also requests 23GB of RAM on each node. The simulation runs as 48 one-threaded processes.

job-multi.slurm:

#!/bin/bash
# ================================================================
#SBATCH --job-name=gro_test

#SBATCH --nodes=4
#SBATCH --ntasks-per-node=12
#SBATCH --cpus-per-task=1
#SBATCH --mem=23GB
#SBATCH --time=0-05:00:00

#SBATCH --partition=parallel
# ================================================================
module purge
module load gromacs/2019.6-legacy
# ================================================================
echo "Starting at `date`."
echo "========================="
# Input files.
MDP=bilayer.mdp
TOP=bilayer.top
GRO=bilayer.gro

# Binary input file to generate.
TPR=bilayer.tpr

# Preprocess the input files.
gmx grompp -v -f $MDP -c $GRO -p $TOP -o $TPR

# Check if preprocessing have gone well.
if ! test -e $TPR; then
	echo "ERROR: Could not create a TPR file for the run. Aborting."
	exit
fi

# Run the simulation.
OMP_NUM_THREADS=1
mpiexec gmx_mpi mdrun -v -s $TPR

echo "========================="
echo "Done at `date`."
# ================================================================

Tip

Sometimes it is useful to run the preprocessor to generate the binary input before submitting a job. In the example above there had been two notes about the setup and you have to make sure that you are ready to continue with the simulation despite them.

Also, if there are problems with your input you will be able to see them right away without going through job submission.

Misc

Performance

Performance measurements for GROMACS 2019.6 on the 2019 compute nodes using different parallelization options and number of CPUs.

The bilayer system of ~105000 atoms was simulated for 100000 steps.

Skylake partitions

-------------------------------------------------------------------------------
#CPUs   #Nodes  Processes    Threads    Wall Time    Performance  Efficiency
                 per node   per proc          (s)       (ns/day)         (%)
-------------------------------------------------------------------------------
    1       1           1          1       1031.6           0.84       100.0

   10       1           1         10        119.0           7.26        86.4
   20       1           1         20         65.9          13.11        78.0
   40       1           1         40         37.6          22.97        68.4

   10       1          10          1        126.4           6.83        81.3  
   20       1          20          1         78.0          11.08        66.0
   40       1          40          1         45.3          19.07        56.8

   40       1          20          2         48.3          17.90        53.3        
   40       1          10          4         43.9          19.68        58.6
   36       1           6          6         49.4          17.50        57.9

   80       2          10          4         30.5          28.34        42.2  
   80       2          20          2         32.6          26.53        39.4
   80       2          40          1         30.3          28.49        42.4

  120       3          40          1         20.8          41.51        41.2
  160       4          40          1         19.2          44.95        33.4
-------------------------------------------------------------------------------

Observations:

  • If you want to run the job on a single node, then use 1 process with as many threads as the number of CPUs you request.
  • If you need more than 1 node, then run 1-threaded MPI processes for each CPU you request.
  • Going beyond 3 nodes may not be computationally efficient on ARC.

gpu-v100 partition:

-------------------------------------------------------------------------------------
#CPUs   #GPUs  #Nodes  Processes    Threads    Wall Time    Performance  Efficiency
                        per node   per proc          (s)       (ns/day)         (%)
-------------------------------------------------------------------------------------
    1       0       1          1          1       1031.6           0.84       100.0

    4       1       1          1          4         73.8          11.71
   10       1       1          1         10         39.4          21.92

   20       1       1          1         20        123.3           7.01      ??
   20       1       1          1         20         33.0          26.22      ??

   40       1       1          1         40         35.0          24.68
   40       2       1          1         40         34.4          25.10

   16       2       1          2          8        164.1           5.27
   40       2       1         10          4         37.2          23.23

   80       2       2         10          2         34.1          25.34
-------------------------------------------------------------------------------

Legacy partitions

The parallel partition:

-------------------------------------------------------------------------------
#CPUs   #Nodes  Processes    Threads    Wall Time    Performance  Efficiency
                 per node   per proc          (s)       (ns/day)         (%)
-------------------------------------------------------------------------------
    1       1           1          1       2732.2           0.32       100.0

    6       1           1          6        518.4           1.67
   12       1           1         12        270.0           3.20
   12       1          12          1        283.8           3.04

   24       2          12          1        139.5           6.20
   36       3          12          1         93.5           9.25
   48       4          12          1         71.0          12.17
   72       6          12          1         55.9          15.46
   96       8          12          1         45.1          19.16
  120      10          12          1         40.8          21.16
  144      12          12          1         35.0          24.70
  196      16          12          1         31.1          27.75
-------------------------------------------------------------------------------

The lattice partition:

-------------------------------------------------------------------------------
#CPUs   #Nodes  Processes    Threads    Wall Time    Performance  Efficiency
                 per node   per proc          (s)       (ns/day)         (%)
-------------------------------------------------------------------------------
    1       1           1          1       2732.2           0.32       100.0

    8       1           8          1        
-------------------------------------------------------------------------------

Selected GROMACS commands

gmx

SYNOPSIS

gmx [-[no]h] [-[no]quiet] [-[no]version] [-[no]copyright] [-nice <int>]
    [-[no]backup]

OPTIONS

Other options:

 -[no]h                     (no)
           Print help and quit
 -[no]quiet                 (no)
           Do not print common startup info or quotes
 -[no]version               (no)
           Print extended version information and quit
 -[no]copyright             (yes)
           Print copyright information on startup
 -nice   <int>              (19)
           Set the nicelevel (default depends on command)
 -[no]backup                (yes)
           Write backups if output files exist

Additional help is available on the following topics:
    commands    List of available commands
    selections  Selection syntax and usage
To access the help, use 'gmx help <topic>'.
For help on a command, use 'gmx help <command>'.

gmx grompp

Preprocess input files.

$ gmx help grompp

SYNOPSIS

gmx grompp [-f [<.mdp>]] [-c [<.gro/.g96/...>]] [-r [<.gro/.g96/...>]]
           [-rb [<.gro/.g96/...>]] [-n [<.ndx>]] [-p [<.top>]]
           [-t [<.trr/.cpt/...>]] [-e [<.edr>]] [-ref [<.trr/.cpt/...>]]
           [-po [<.mdp>]] [-pp [<.top>]] [-o [<.tpr>]] [-imd [<.gro>]]
           [-[no]v] [-time <real>] [-[no]rmvsbds] [-maxwarn <int>]
           [-[no]zero] [-[no]renum]

OPTIONS

Options to specify input files:

 -f      [<.mdp>]           (grompp.mdp)
           grompp input file with MD parameters
 -c      [<.gro/.g96/...>]  (conf.gro)
           Structure file: gro g96 pdb brk ent esp tpr
 -r      [<.gro/.g96/...>]  (restraint.gro)  (Opt.)
           Structure file: gro g96 pdb brk ent esp tpr
 -rb     [<.gro/.g96/...>]  (restraint.gro)  (Opt.)
           Structure file: gro g96 pdb brk ent esp tpr
 -n      [<.ndx>]           (index.ndx)      (Opt.)
           Index file
 -p      [<.top>]           (topol.top)
           Topology file
 -t      [<.trr/.cpt/...>]  (traj.trr)       (Opt.)
           Full precision trajectory: trr cpt tng
 -e      [<.edr>]           (ener.edr)       (Opt.)
           Energy file

Options to specify input/output files:

 -ref    [<.trr/.cpt/...>]  (rotref.trr)     (Opt.)
           Full precision trajectory: trr cpt tng

Options to specify output files:

 -po     [<.mdp>]           (mdout.mdp)
           grompp input file with MD parameters
 -pp     [<.top>]           (processed.top)  (Opt.)
           Topology file
 -o      [<.tpr>]           (topol.tpr)
           Portable xdr run input file
 -imd    [<.gro>]           (imdgroup.gro)   (Opt.)
           Coordinate file in Gromos-87 format

Other options:

 -[no]v                     (no)
           Be loud and noisy
 -time   <real>             (-1)
           Take frame at or first after this time.
 -[no]rmvsbds               (yes)
           Remove constant bonded interactions with virtual sites
 -maxwarn <int>             (0)
           Number of allowed warnings during input processing. Not for normal
           use and may generate unstable systems
 -[no]zero                  (no)
           Set parameters for bonded interactions without defaults to zero
           instead of generating an error
 -[no]renum                 (yes)
           Renumber atomtypes and minimize number of atomtypes

gmx mdrun

gmx mdrun is the main computational chemistry engine within GROMACS. It performs Molecular Dynamics simulations, but it can also perform Stochastic Dynamics, Energy Minimization, test particle insertion or (re)calculation of energies. Normal mode analysis is another option.

SYNOPSIS

gmx mdrun [-s [<.tpr>]] [-cpi [<.cpt>]] [-table [<.xvg>]] [-tablep [<.xvg>]]
          [-tableb [<.xvg> [...]]] [-rerun [<.xtc/.trr/...>]] [-ei [<.edi>]]
          [-multidir [<dir> [...]]] [-awh [<.xvg>]] [-membed [<.dat>]]
          [-mp [<.top>]] [-mn [<.ndx>]] [-o [<.trr/.cpt/...>]]
          [-x [<.xtc/.tng>]] [-cpo [<.cpt>]] [-c [<.gro/.g96/...>]]
          [-e [<.edr>]] [-g [<.log>]] [-dhdl [<.xvg>]] [-field [<.xvg>]]
          [-tpi [<.xvg>]] [-tpid [<.xvg>]] [-eo [<.xvg>]] [-devout [<.xvg>]]
          [-runav [<.xvg>]] [-px [<.xvg>]] [-pf [<.xvg>]] [-ro [<.xvg>]]
          [-ra [<.log>]] [-rs [<.log>]] [-rt [<.log>]] [-mtx [<.mtx>]]
          [-if [<.xvg>]] [-swap [<.xvg>]] [-deffnm <string>] [-xvg <enum>]
          [-dd <vector>] [-ddorder <enum>] [-npme <int>] [-nt <int>]
          [-ntmpi <int>] [-ntomp <int>] [-ntomp_pme <int>] [-pin <enum>]
          [-pinoffset <int>] [-pinstride <int>] [-gpu_id <string>]
          [-gputasks <string>] [-[no]ddcheck] [-rdd <real>] [-rcon <real>]
          [-dlb <enum>] [-dds <real>] [-gcom <int>] [-nb <enum>]
          [-nstlist <int>] [-[no]tunepme] [-pme <enum>] [-pmefft <enum>]
          [-bonded <enum>] [-[no]v] [-pforce <real>] [-[no]reprod]
          [-cpt <real>] [-[no]cpnum] [-[no]append] [-nsteps <int>]
          [-maxh <real>] [-replex <int>] [-nex <int>] [-reseed <int>]

OPTIONS

Options to specify input files:

 -s      [<.tpr>]           (topol.tpr)
           Portable xdr run input file
 -cpi    [<.cpt>]           (state.cpt)      (Opt.)
           Checkpoint file
 -table  [<.xvg>]           (table.xvg)      (Opt.)
           xvgr/xmgr file
 -tablep [<.xvg>]           (tablep.xvg)     (Opt.)
           xvgr/xmgr file
 -tableb [<.xvg> [...]]     (table.xvg)      (Opt.)
           xvgr/xmgr file
 -rerun  [<.xtc/.trr/...>]  (rerun.xtc)      (Opt.)
           Trajectory: xtc trr cpt gro g96 pdb tng
 -ei     [<.edi>]           (sam.edi)        (Opt.)
           ED sampling input
 -multidir [<dir> [...]]    (rundir)         (Opt.)
           Run directory
 -awh    [<.xvg>]           (awhinit.xvg)    (Opt.)
           xvgr/xmgr file
 -membed [<.dat>]           (membed.dat)     (Opt.)
           Generic data file
 -mp     [<.top>]           (membed.top)     (Opt.)
           Topology file
 -mn     [<.ndx>]           (membed.ndx)     (Opt.)
           Index file

Options to specify output files:

 -o      [<.trr/.cpt/...>]  (traj.trr)
           Full precision trajectory: trr cpt tng
 -x      [<.xtc/.tng>]      (traj_comp.xtc)  (Opt.)
           Compressed trajectory (tng format or portable xdr format)
 -cpo    [<.cpt>]           (state.cpt)      (Opt.)
           Checkpoint file
 -c      [<.gro/.g96/...>]  (confout.gro)
           Structure file: gro g96 pdb brk ent esp
 -e      [<.edr>]           (ener.edr)
           Energy file
 -g      [<.log>]           (md.log)
           Log file
 -dhdl   [<.xvg>]           (dhdl.xvg)       (Opt.)
           xvgr/xmgr file
 -field  [<.xvg>]           (field.xvg)      (Opt.)
           xvgr/xmgr file
 -tpi    [<.xvg>]           (tpi.xvg)        (Opt.)
           xvgr/xmgr file
 -tpid   [<.xvg>]           (tpidist.xvg)    (Opt.)
           xvgr/xmgr file
 -eo     [<.xvg>]           (edsam.xvg)      (Opt.)
           xvgr/xmgr file
 -devout [<.xvg>]           (deviatie.xvg)   (Opt.)
           xvgr/xmgr file
 -runav  [<.xvg>]           (runaver.xvg)    (Opt.)
           xvgr/xmgr file
 -px     [<.xvg>]           (pullx.xvg)      (Opt.)
           xvgr/xmgr file
 -pf     [<.xvg>]           (pullf.xvg)      (Opt.)
           xvgr/xmgr file
 -ro     [<.xvg>]           (rotation.xvg)   (Opt.)
           xvgr/xmgr file
 -ra     [<.log>]           (rotangles.log)  (Opt.)
           Log file
 -rs     [<.log>]           (rotslabs.log)   (Opt.)
           Log file
 -rt     [<.log>]           (rottorque.log)  (Opt.)
           Log file
 -mtx    [<.mtx>]           (nm.mtx)         (Opt.)
           Hessian matrix
 -if     [<.xvg>]           (imdforces.xvg)  (Opt.)
           xvgr/xmgr file
 -swap   [<.xvg>]           (swapions.xvg)   (Opt.)
           xvgr/xmgr file

Other options:

 -deffnm <string>
           Set the default filename for all file options
 -xvg    <enum>             (xmgrace)
           xvg plot formatting: xmgrace, xmgr, none
 -dd     <vector>           (0 0 0)
           Domain decomposition grid, 0 is optimize
 -ddorder <enum>            (interleave)
           DD rank order: interleave, pp_pme, cartesian
 -npme   <int>              (-1)
           Number of separate ranks to be used for PME, -1 is guess
 -nt     <int>              (0)
           Total number of threads to start (0 is guess)
 -ntmpi  <int>              (0)
           Number of thread-MPI ranks to start (0 is guess)
 -ntomp  <int>              (0)
           Number of OpenMP threads per MPI rank to start (0 is guess)
 -ntomp_pme <int>           (0)
           Number of OpenMP threads per MPI rank to start (0 is -ntomp)
 -pin    <enum>             (auto)
           Whether mdrun should try to set thread affinities: auto, on, off
 -pinoffset <int>           (0)
           The lowest logical core number to which mdrun should pin the first
           thread
 -pinstride <int>           (0)
           Pinning distance in logical cores for threads, use 0 to minimize
           the number of threads per physical core
 -gpu_id <string>
           List of unique GPU device IDs available to use
 -gputasks <string>
           List of GPU device IDs, mapping each PP task on each node to a
           device
 -[no]ddcheck               (yes)
           Check for all bonded interactions with DD
 -rdd    <real>             (0)
           The maximum distance for bonded interactions with DD (nm), 0 is
           determine from initial coordinates
 -rcon   <real>             (0)
           Maximum distance for P-LINCS (nm), 0 is estimate
 -dlb    <enum>             (auto)
           Dynamic load balancing (with DD): auto, no, yes
 -dds    <real>             (0.8)
           Fraction in (0,1) by whose reciprocal the initial DD cell size will
           be increased in order to provide a margin in which dynamic load
           balancing can act while preserving the minimum cell size.
 -gcom   <int>              (-1)
           Global communication frequency
 -nb     <enum>             (auto)
           Calculate non-bonded interactions on: auto, cpu, gpu
 -nstlist <int>             (0)
           Set nstlist when using a Verlet buffer tolerance (0 is guess)
 -[no]tunepme               (yes)
           Optimize PME load between PP/PME ranks or GPU/CPU (only with the
           Verlet cut-off scheme)
 -pme    <enum>             (auto)
           Perform PME calculations on: auto, cpu, gpu
 -pmefft <enum>             (auto)
           Perform PME FFT calculations on: auto, cpu, gpu
 -bonded <enum>             (auto)
           Perform bonded calculations on: auto, cpu, gpu
 -[no]v                     (no)
           Be loud and noisy
 -pforce <real>             (-1)
           Print all forces larger than this (kJ/mol nm)
 -[no]reprod                (no)
           Try to avoid optimizations that affect binary reproducibility
 -cpt    <real>             (15)
           Checkpoint interval (minutes)
 -[no]cpnum                 (no)
           Keep and number checkpoint files
 -[no]append                (yes)
           Append to previous output files when continuing from checkpoint
           instead of adding the simulation part number to all file names
 -nsteps <int>              (-2)
           Run this number of steps (-1 means infinite, -2 means use mdp
           option, smaller is invalid)
 -maxh   <real>             (-1)
           Terminate after 0.99 times this time (hours)
 -replex <int>              (0)
           Attempt replica exchange periodically with this period (steps)
 -nex    <int>              (0)
           Number of random exchanges to carry out each exchange interval (N^3
           is one suggestion).  -nex zero or not specified gives neighbor
           replica exchange.
 -reseed <int>              (-1)
           Seed for replica exchange, -1 is generate a seed

Support

Please send any questions regarding using GROMACS on ARC to support@hpc.ucalgary.ca.