GROMACS

From RCSWiki
Jump to navigation Jump to search

General


GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles.

It is primarily designed for biochemical molecules like proteins, lipids and nucleic acids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.

Using GROMACS on ARC

Researchers using GROMACS on ARC are expected to be generally familiar with its capabilities, input file types and their formats and the use of checkpoint files to restart symulations.

Like other calculations on ARC systems, GROMACS is run by submitting an appropriate script for batch scheduling using the sbatch command. For more information about submitting jobs, see Running jobs article.

GROMACS modules

Currently there are several software modules on ARC that provide different versions of GROMACS. The versions differ in the release date as well as in the kind of the CPU architecture the software is compiled for.

You can see them using the module command:

$ module avail gromacs

----------- /global/software/Modules/3.2.10/modulefiles ---------
gromacs/2016.3-gnu
gromacs/2018.0-gnu
gromacs/2019.6-nehalem 
gromacs/2019.6-skylake 
gromacs/5.0.7-gnu

The names of the modules give hints on the specific version of GROMACS they provide access to.

  • The gnu suffix indicates that those versions have been compiled with GNU GCC compiler.
In these specific cases, GCC 4.8.5.
  • GROMACS 2019.6 was compiled using GCC 7.3.0 for two different CPU kinds, the old kind, nehalem, and the new kind, skylake.

The nehalem module should be used on compute nodes before 2019, and the skylake module is for node from 2019 and up.

  • All GROMACS versions provided by all the modules have support for GPU computations, even though it may not be practical to run it on GPU nodes due to limited resources.

A module has to be loaded before GROMACS can be used on ARC. Like this:

$ gmx --version
bash: gmx: command not found...

$ module load gromacs/2019.6-nehalem  

$ gmx --version
                         :-) GROMACS - gmx, 2019.6 (-:

                            GROMACS is written by:
     Emile Apol      Rossen Apostolov      Paul Bauer     Herman J.C. Berendsen
.....
.....
GROMACS version:    2019.6
Precision:          single
Memory model:       64 bit
MPI library:        none
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support:        CUDA
SIMD instructions:  SSE4.1
FFT library:        fftw-3.3.7-sse2-avx
RDTSCP usage:       enabled
TNG support:        enabled
Hwloc support:      hwloc-1.11.8
Tracing support:    disabled
C compiler:         /global/software/gcc/gcc-7.3.0/bin/gcc GNU 7.3.0
C compiler flags:    -msse4.1     -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  
C++ compiler:       /global/software/gcc/gcc-7.3.0/bin/g++ GNU 7.3.0
C++ compiler flags:  -msse4.1    -std=c++11   -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  
CUDA compiler:      /global/software/cuda/cuda-10.0.130/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2018 NVIDIA Corporation;Built on Sat_Aug_25_21:08:01_CDT_2018;Cuda compilation tools, release 10.0, V10.0.130
CUDA compiler flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;;; ;-msse4.1;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
CUDA driver:        9.10
CUDA runtime:       N/A

Running a GROMACS Job

To run your simulation on ARC cluster you have to have: (1) a set of GROMACS input files and (2) a SLURM job script .slurm.

Place you input files for a simulation into a separate directory and prepare an appropriate job script for it.

$ ls -l

-rw-r--r-- 1 drozmano drozmano 7224622 Jan 29  2014 bilayer.gro
-rw-r--r-- 1 drozmano drozmano    2567 May 21 12:14 bilayer.mdp
-rw-r--r-- 1 drozmano drozmano      87 Apr 29  2016 bilayer.top
drwxr-xr-x 1 drozmano drozmano     200 May 20 16:27 ff
-rw-r--r-- 1 drozmano drozmano     504 May 20 16:30 ff.top
-rwxr-xr-x 1 drozmano drozmano    1171 May 21 13:45 job.slurm

Here:

  • bilayer.top -- contains the topology of the system.
  • bilayer.gro -- contains initial configuration (positions of atoms) of the system.
  • bilayer.mdp -- contains parameters of the simulation run (GROMACS settings).
  • ff -- a directory containing external custom force field (models). Not required if only standard models are used.
  • ff.top -- a topology file that includes required models from available force fields.
This file is included by the bilayer.top file.
  • job.slurm -- a SLURM jobs script that is used to submit this calculation to the cluster.


At this point you can submit your job to ARC's scheduler (SLURM).

$ sbatch job.slurm 

Submitted batch job 5570681

You can check the status of the job using the job ID from the confirmation message above.

$ squeue -j 5570681

JOBID     USER     STATE   TIME_LIMIT  TIME  NODES TASKS CPUS  MIN_MEMORY  NODELIST            
5570681   drozmano RUNNING    5:00:00  0:15      1     1   40          8G       fc1                 

The squeue command output may look different in your case depending on your settings.


After the job is over and it does not show in the squeue output we can check the results.

$ ls -l

-rw-r--r-- 1 drozmano drozmano 7224622 Jan 29  2014 bilayer.gro
-rw-r--r-- 1 drozmano drozmano    2567 May 21 12:14 bilayer.mdp
-rw-r--r-- 1 drozmano drozmano      87 Apr 29  2016 bilayer.top
-rw-r--r-- 1 drozmano drozmano 2594772 May 21 14:37 bilayer.tpr
-rw-r--r-- 1 drozmano drozmano 7224622 May 21 14:37 confout.gro
-rw-r--r-- 1 drozmano drozmano    1772 May 21 14:37 ener.edr
drwxr-xr-x 1 drozmano drozmano     200 May 20 16:27 ff
-rw-r--r-- 1 drozmano drozmano     504 May 20 16:30 ff.top
-rwxr-xr-x 1 drozmano drozmano    1172 May 21 14:36 job.slurm
-rw-r--r-- 1 drozmano drozmano   82051 May 21 14:37 md.log
-rw-r--r-- 1 drozmano drozmano   10949 May 21 14:37 mdout.mdp
-rw-r--r-- 1 drozmano drozmano   13162 May 21 14:37 slurm-5570681.out
-rw-r--r-- 1 drozmano drozmano 2514832 May 21 14:37 state.cpt
-rw-r--r-- 1 drozmano drozmano 4309444 May 21 14:37 traj_comp.xtc

The new files here are:

  • bilayer.tpr -- Binary input file that is generated by the gmx grompp before the actual simulation.
This file can be generated before submitting the job, if needed. Here it is created in the job script.
  • confout.gro -- a configuration file containing the final atomic positions at the end of the simulation.
  • ener.edr -- a file with energy data. Can be used for later analysis.
  • md.log -- the main log file for the simulation.
  • mdout.mdp -- a file containing all simulation parameters as used by GROMACS. Based on bilayer.mdp.
  • state.cpt -- a binary checkpoint file containing the system state at the end of the simulation.
This file can be used to continue simulations further.
  • traj_com.xtc -- a trajectory file containing atomic positions at some time points during the simulation.
  • slurm-5570681.out -- The intercept of the output printed on screen during the simulation.
Done by SLURM for you. The number in the name is the job ID of the job.


If something is not working the way you expected, then the slurm-5570681.out file is the first place you should examine.

The md.log is the next thing to look into.

If everything as expected, then you computation is done and you can use the results. Success!

You may want to check the output file and the main log anyways:

# Press "q" to exit the text viewer.
$ less slurm-5570681.out

$ less mdout.log

Misc

Performance

Performance measurements for GROMACS 2019.6 on the 2019 compute nodes using different parallelization options and number of CPUs.

The bilayer512 system of ~105000 atoms was simulated for 100000 steps.

----------------------------------------------------------------------
#CPUs    Node   Processes    Threads    Wall Time    Performance
                 per node   per proc          (s)       (ns/day)
----------------------------------------------------------------------
    1       1           1          1       1031.6           0.84

   10       1           1         10        119.0           7.26
   20       1           1         20         65.9          13.11
   40       1           1         40         37.6          22.97

   10       1          10          1        126.4           6.83
   20       1          20          1         78.0          11.08
   40       1          40          1         45.3          19.07

   40       1          20          2         48.3          17.90
   40       1          10          4         43.9          19.68
   36       1           6          6         49.4          17.50

   80       2          10          4         30.5          28.34
   80       2          20          2         32.6          26.53
   80       2          40          1         30.3          28.49

  120       3          40          1         20.8          41.51
  160       4          40          1         19.2          44.95
----------------------------------------------------------------------

Selected GROMACS commands

gmx

SYNOPSIS

gmx [-[no]h] [-[no]quiet] [-[no]version] [-[no]copyright] [-nice <int>]
    [-[no]backup]

OPTIONS

Other options:

 -[no]h                     (no)
           Print help and quit
 -[no]quiet                 (no)
           Do not print common startup info or quotes
 -[no]version               (no)
           Print extended version information and quit
 -[no]copyright             (yes)
           Print copyright information on startup
 -nice   <int>              (19)
           Set the nicelevel (default depends on command)
 -[no]backup                (yes)
           Write backups if output files exist

Additional help is available on the following topics:
    commands    List of available commands
    selections  Selection syntax and usage
To access the help, use 'gmx help <topic>'.
For help on a command, use 'gmx help <command>'.

gmx grompp

Preprocess input files.

$ gmx help grompp

SYNOPSIS

gmx grompp [-f [<.mdp>]] [-c [<.gro/.g96/...>]] [-r [<.gro/.g96/...>]]
           [-rb [<.gro/.g96/...>]] [-n [<.ndx>]] [-p [<.top>]]
           [-t [<.trr/.cpt/...>]] [-e [<.edr>]] [-ref [<.trr/.cpt/...>]]
           [-po [<.mdp>]] [-pp [<.top>]] [-o [<.tpr>]] [-imd [<.gro>]]
           [-[no]v] [-time <real>] [-[no]rmvsbds] [-maxwarn <int>]
           [-[no]zero] [-[no]renum]

OPTIONS

Options to specify input files:

 -f      [<.mdp>]           (grompp.mdp)
           grompp input file with MD parameters
 -c      [<.gro/.g96/...>]  (conf.gro)
           Structure file: gro g96 pdb brk ent esp tpr
 -r      [<.gro/.g96/...>]  (restraint.gro)  (Opt.)
           Structure file: gro g96 pdb brk ent esp tpr
 -rb     [<.gro/.g96/...>]  (restraint.gro)  (Opt.)
           Structure file: gro g96 pdb brk ent esp tpr
 -n      [<.ndx>]           (index.ndx)      (Opt.)
           Index file
 -p      [<.top>]           (topol.top)
           Topology file
 -t      [<.trr/.cpt/...>]  (traj.trr)       (Opt.)
           Full precision trajectory: trr cpt tng
 -e      [<.edr>]           (ener.edr)       (Opt.)
           Energy file

Options to specify input/output files:

 -ref    [<.trr/.cpt/...>]  (rotref.trr)     (Opt.)
           Full precision trajectory: trr cpt tng

Options to specify output files:

 -po     [<.mdp>]           (mdout.mdp)
           grompp input file with MD parameters
 -pp     [<.top>]           (processed.top)  (Opt.)
           Topology file
 -o      [<.tpr>]           (topol.tpr)
           Portable xdr run input file
 -imd    [<.gro>]           (imdgroup.gro)   (Opt.)
           Coordinate file in Gromos-87 format

Other options:

 -[no]v                     (no)
           Be loud and noisy
 -time   <real>             (-1)
           Take frame at or first after this time.
 -[no]rmvsbds               (yes)
           Remove constant bonded interactions with virtual sites
 -maxwarn <int>             (0)
           Number of allowed warnings during input processing. Not for normal
           use and may generate unstable systems
 -[no]zero                  (no)
           Set parameters for bonded interactions without defaults to zero
           instead of generating an error
 -[no]renum                 (yes)
           Renumber atomtypes and minimize number of atomtypes

gmx mdrun

gmx mdrun is the main computational chemistry engine within GROMACS. It performs Molecular Dynamics simulations, but it can also perform Stochastic Dynamics, Energy Minimization, test particle insertion or (re)calculation of energies. Normal mode analysis is another option.

SYNOPSIS

gmx mdrun [-s [<.tpr>]] [-cpi [<.cpt>]] [-table [<.xvg>]] [-tablep [<.xvg>]]
          [-tableb [<.xvg> [...]]] [-rerun [<.xtc/.trr/...>]] [-ei [<.edi>]]
          [-multidir [<dir> [...]]] [-awh [<.xvg>]] [-membed [<.dat>]]
          [-mp [<.top>]] [-mn [<.ndx>]] [-o [<.trr/.cpt/...>]]
          [-x [<.xtc/.tng>]] [-cpo [<.cpt>]] [-c [<.gro/.g96/...>]]
          [-e [<.edr>]] [-g [<.log>]] [-dhdl [<.xvg>]] [-field [<.xvg>]]
          [-tpi [<.xvg>]] [-tpid [<.xvg>]] [-eo [<.xvg>]] [-devout [<.xvg>]]
          [-runav [<.xvg>]] [-px [<.xvg>]] [-pf [<.xvg>]] [-ro [<.xvg>]]
          [-ra [<.log>]] [-rs [<.log>]] [-rt [<.log>]] [-mtx [<.mtx>]]
          [-if [<.xvg>]] [-swap [<.xvg>]] [-deffnm <string>] [-xvg <enum>]
          [-dd <vector>] [-ddorder <enum>] [-npme <int>] [-nt <int>]
          [-ntmpi <int>] [-ntomp <int>] [-ntomp_pme <int>] [-pin <enum>]
          [-pinoffset <int>] [-pinstride <int>] [-gpu_id <string>]
          [-gputasks <string>] [-[no]ddcheck] [-rdd <real>] [-rcon <real>]
          [-dlb <enum>] [-dds <real>] [-gcom <int>] [-nb <enum>]
          [-nstlist <int>] [-[no]tunepme] [-pme <enum>] [-pmefft <enum>]
          [-bonded <enum>] [-[no]v] [-pforce <real>] [-[no]reprod]
          [-cpt <real>] [-[no]cpnum] [-[no]append] [-nsteps <int>]
          [-maxh <real>] [-replex <int>] [-nex <int>] [-reseed <int>]

OPTIONS

Options to specify input files:

 -s      [<.tpr>]           (topol.tpr)
           Portable xdr run input file
 -cpi    [<.cpt>]           (state.cpt)      (Opt.)
           Checkpoint file
 -table  [<.xvg>]           (table.xvg)      (Opt.)
           xvgr/xmgr file
 -tablep [<.xvg>]           (tablep.xvg)     (Opt.)
           xvgr/xmgr file
 -tableb [<.xvg> [...]]     (table.xvg)      (Opt.)
           xvgr/xmgr file
 -rerun  [<.xtc/.trr/...>]  (rerun.xtc)      (Opt.)
           Trajectory: xtc trr cpt gro g96 pdb tng
 -ei     [<.edi>]           (sam.edi)        (Opt.)
           ED sampling input
 -multidir [<dir> [...]]    (rundir)         (Opt.)
           Run directory
 -awh    [<.xvg>]           (awhinit.xvg)    (Opt.)
           xvgr/xmgr file
 -membed [<.dat>]           (membed.dat)     (Opt.)
           Generic data file
 -mp     [<.top>]           (membed.top)     (Opt.)
           Topology file
 -mn     [<.ndx>]           (membed.ndx)     (Opt.)
           Index file

Options to specify output files:

 -o      [<.trr/.cpt/...>]  (traj.trr)
           Full precision trajectory: trr cpt tng
 -x      [<.xtc/.tng>]      (traj_comp.xtc)  (Opt.)
           Compressed trajectory (tng format or portable xdr format)
 -cpo    [<.cpt>]           (state.cpt)      (Opt.)
           Checkpoint file
 -c      [<.gro/.g96/...>]  (confout.gro)
           Structure file: gro g96 pdb brk ent esp
 -e      [<.edr>]           (ener.edr)
           Energy file
 -g      [<.log>]           (md.log)
           Log file
 -dhdl   [<.xvg>]           (dhdl.xvg)       (Opt.)
           xvgr/xmgr file
 -field  [<.xvg>]           (field.xvg)      (Opt.)
           xvgr/xmgr file
 -tpi    [<.xvg>]           (tpi.xvg)        (Opt.)
           xvgr/xmgr file
 -tpid   [<.xvg>]           (tpidist.xvg)    (Opt.)
           xvgr/xmgr file
 -eo     [<.xvg>]           (edsam.xvg)      (Opt.)
           xvgr/xmgr file
 -devout [<.xvg>]           (deviatie.xvg)   (Opt.)
           xvgr/xmgr file
 -runav  [<.xvg>]           (runaver.xvg)    (Opt.)
           xvgr/xmgr file
 -px     [<.xvg>]           (pullx.xvg)      (Opt.)
           xvgr/xmgr file
 -pf     [<.xvg>]           (pullf.xvg)      (Opt.)
           xvgr/xmgr file
 -ro     [<.xvg>]           (rotation.xvg)   (Opt.)
           xvgr/xmgr file
 -ra     [<.log>]           (rotangles.log)  (Opt.)
           Log file
 -rs     [<.log>]           (rotslabs.log)   (Opt.)
           Log file
 -rt     [<.log>]           (rottorque.log)  (Opt.)
           Log file
 -mtx    [<.mtx>]           (nm.mtx)         (Opt.)
           Hessian matrix
 -if     [<.xvg>]           (imdforces.xvg)  (Opt.)
           xvgr/xmgr file
 -swap   [<.xvg>]           (swapions.xvg)   (Opt.)
           xvgr/xmgr file

Other options:

 -deffnm <string>
           Set the default filename for all file options
 -xvg    <enum>             (xmgrace)
           xvg plot formatting: xmgrace, xmgr, none
 -dd     <vector>           (0 0 0)
           Domain decomposition grid, 0 is optimize
 -ddorder <enum>            (interleave)
           DD rank order: interleave, pp_pme, cartesian
 -npme   <int>              (-1)
           Number of separate ranks to be used for PME, -1 is guess
 -nt     <int>              (0)
           Total number of threads to start (0 is guess)
 -ntmpi  <int>              (0)
           Number of thread-MPI ranks to start (0 is guess)
 -ntomp  <int>              (0)
           Number of OpenMP threads per MPI rank to start (0 is guess)
 -ntomp_pme <int>           (0)
           Number of OpenMP threads per MPI rank to start (0 is -ntomp)
 -pin    <enum>             (auto)
           Whether mdrun should try to set thread affinities: auto, on, off
 -pinoffset <int>           (0)
           The lowest logical core number to which mdrun should pin the first
           thread
 -pinstride <int>           (0)
           Pinning distance in logical cores for threads, use 0 to minimize
           the number of threads per physical core
 -gpu_id <string>
           List of unique GPU device IDs available to use
 -gputasks <string>
           List of GPU device IDs, mapping each PP task on each node to a
           device
 -[no]ddcheck               (yes)
           Check for all bonded interactions with DD
 -rdd    <real>             (0)
           The maximum distance for bonded interactions with DD (nm), 0 is
           determine from initial coordinates
 -rcon   <real>             (0)
           Maximum distance for P-LINCS (nm), 0 is estimate
 -dlb    <enum>             (auto)
           Dynamic load balancing (with DD): auto, no, yes
 -dds    <real>             (0.8)
           Fraction in (0,1) by whose reciprocal the initial DD cell size will
           be increased in order to provide a margin in which dynamic load
           balancing can act while preserving the minimum cell size.
 -gcom   <int>              (-1)
           Global communication frequency
 -nb     <enum>             (auto)
           Calculate non-bonded interactions on: auto, cpu, gpu
 -nstlist <int>             (0)
           Set nstlist when using a Verlet buffer tolerance (0 is guess)
 -[no]tunepme               (yes)
           Optimize PME load between PP/PME ranks or GPU/CPU (only with the
           Verlet cut-off scheme)
 -pme    <enum>             (auto)
           Perform PME calculations on: auto, cpu, gpu
 -pmefft <enum>             (auto)
           Perform PME FFT calculations on: auto, cpu, gpu
 -bonded <enum>             (auto)
           Perform bonded calculations on: auto, cpu, gpu
 -[no]v                     (no)
           Be loud and noisy
 -pforce <real>             (-1)
           Print all forces larger than this (kJ/mol nm)
 -[no]reprod                (no)
           Try to avoid optimizations that affect binary reproducibility
 -cpt    <real>             (15)
           Checkpoint interval (minutes)
 -[no]cpnum                 (no)
           Keep and number checkpoint files
 -[no]append                (yes)
           Append to previous output files when continuing from checkpoint
           instead of adding the simulation part number to all file names
 -nsteps <int>              (-2)
           Run this number of steps (-1 means infinite, -2 means use mdp
           option, smaller is invalid)
 -maxh   <real>             (-1)
           Terminate after 0.99 times this time (hours)
 -replex <int>              (0)
           Attempt replica exchange periodically with this period (steps)
 -nex    <int>              (0)
           Number of random exchanges to carry out each exchange interval (N^3
           is one suggestion).  -nex zero or not specified gives neighbor
           replica exchange.
 -reseed <int>              (-1)
           Seed for replica exchange, -1 is generate a seed