Gaussian on ARC

From RCSWiki
Jump to navigation Jump to search

Introduction

Gaussian is a commercial software package for electronic structure modelling. The University of Calgary has acquired a site license for the Linux source code for Gaussian 16 and the TCP Linda 9 software that allows for parallel execution of Gaussian 16 on multiple compute nodes.

We are also licensed for the Microsoft Windows version of the graphical pre- and post-processing program GaussView 6. Note, however, that we do not have a Linux vesion of the software, so, GaussView cannot be run on ARC. If you use a Microsoft Windows desktop or laptop computer and have been granted access to the software after agreeing to the license conditions, GaussView 6 can be downloaded from ARC, as mentioned below.

Here we concentrate on using Gaussian 16 on ARC, but, the software can also be installed on other Linux-based machines at the University of Calgary.

General information

  • g16 command line options:
http://gaussian.com/options/
  • Google search on Gaussian input file format:
https://www.google.com/search?client=firefox-b-e&q=gaussian+input+file+format


  • Compute Canada Gaussian Errors article:
https://docs.computecanada.ca/wiki/Gaussian_error_messages

Licensing and access

Although the University of Calgary has a Gaussian 16 site license, access to the software is only made available to those researchers who are able to confirm that they can abide to the conditions of a license agreement. The license agreement can be downloaded from

/global/software/gaussian/20190311_Gaussian_License_Updated-Calgary-G16_GVW6_Linda.pdf 

on ARC.

If you would like access to run Gaussian 16 on ARC or download GaussView 16 for use on a Microsoft Windows computer located at the University of Calgary, please send an email to support@hpc.ucalgary.ca with a subject line of the form: Gaussian access request (your_ARC_user_name) with the body of the email including a copy of the statement:

    ------------------------------------------
I have read the license agreement
20190311_Gaussian_License_Updated-Calgary-G16_GVW6_Linda.pdf
in its entirety and agree to abide by the conditions set forth in that document.
These include, in part, that:

  - I will not use the Gaussian software to compete with Gaussian Inc. or
    provide assistance to its competitors.

  - I will not copy the Gaussian 16 or Linda software, nor make it
    available to anyone else.

  - I will only copy the GaussViewW Version 6 software to a computer
    under my control and will remove it when I leave the University of
    Calgary.  I will not make the GaussView software available to anyone
    else.

  - I will acknowledge Gaussian Inc., as described in section 10 of the
    agreement, in publications based on results obtained from using the
    Gaussian software.

  - I will notify Research Computing Services if there is any change
    that would void the agreement, such as leaving the University of
    Calgary or collaborating with a Gaussian competitor.

Signed,
   Your typed signature
--------------------------------------------------

After your email has been received and approved, your user name will be added to the g16 group on ARC, which is used to control access to the directory containing the software.

Look under /global/software/gaussian .

Installing GaussView 6.0 for Windows

The licensing terms for the GaussView 6.0 software require that it is installed on a University of Calgary owned and controlled computers only. If you have a Windows laptop or a workstation that is centrally managed by the UofC IT department, you can install GaussView on it yourself using the Software Centre on the computer. Look for GaussView 6.0 in the Software Centre.

Using Gaussian 16 on ARC

Running Gaussian batch jobs

Researchers using Gaussian on ARC are expected to be generally familiar with Gaussian capabilities, input file format and the use of checkpoint files.

Like other calculations on ARC systems, Gaussian is run by submitting an appropriate script for batch scheduling using the sbatch command. For more information about submitting jobs, see Running jobs article.

Sample scripts for running Gaussian 16 on ARC will be supplied once testing of the software is complete and installed on ARC under /global/software/gaussian .

Gaussian 16 modules

Currently there are two software modules on ARC that provide Gaussian16. You can see them useing the module avail:

$ module avail
...

$ module avail Gaussian

-------------- /global/software/Modules/3.2.10/modulefiles ----------------
Gaussian16/b01-nehalem 
Gaussian16/b01-skylake

There are two kinds of compute nodes in ARC. The newer nodes with Intel Skylake CPUs are in the cpu2019, apophis-bf, razi-bf, and pawson-bf partitions. The older nodes, legacy nodes have older Intel Nehalem CPUs, these are in lattice and parallel partitions.

Gaussian on ARC was compiled for these two different types of Intel CPUs separately to provide maximum performance on each of the CPU kinds. So, the Gaussian16/b01-nehalem module has to be loaded when submitting a job to the legacy partitions, and the Gaussian16/b01-skylake should be used when the job is sent to the newer partition. When the partition is not specified, the job goes to the default partitions with newer CPUs.


Once the module is loaded it provides access to the g16 executable program.

$ module load Gaussian16/b01-nehalem

The module, however, does not need to be loaded on the login node, it has to be loaded on the compute node that is going to work on your computation. The best place to load the module is the job script for your computation.

Running a Gaussian Job

To run your computation on ARC cluster you have to have two files: (1) a Gaussian .com input file and (2) SLURM job script .slurm. Put or prepare the files in the directory dedicated to the computation, dimer for example:

$ cd dimer

$ ls -l
-rw-r--r-- 1 drozmano drozmano    429 May  5 13:29 dimer.com
-rw-r--r-- 1 drozmano drozmano   1452 May  5 13:24 dimer.slurm

To submit the jobs simply

$ sbatch dimer.slurm
Submitted batch job 5527893

The number printed out during submission is a job ID of you job that can be used to monitor its state.

Like this:

$ squeue -j 5527893
JOBID      USER        STATE     PARTITION  TIME_LIMIT  TIME   NODES TASKS CPUS  MIN_MEMORY REASON      NODELIST
5527894    drozmano    RUNNING   apophis-bf 1:00:00     0:03   1     1     40    180G       None        fc6

The output of the squeue command may look different for you, depending on your settings.

You can also get more information about the state of the job with

$ arc.job-info 5527893
....

You will have to replace the number with the actual job ID of your job.

Input files

The Gaussian input file is a text file describing the geometry of your system as well as specifications of the computation you are going to perform. You have to consult the Gaussian manual and tutorials to create it.

Below is a geometry optimization run for a water dimer, dimer.com:

%Chk=dimer.chk
#p b3lyp/6-31+G(d,p) opt=(Z-Matrix) iop(1/7=30) int=ultrafine EmpiricalDispersion=GD3

water dimer, B3LYP-D3/6-31+G(d,p) opt tight, Cs, int=ultrafine

0 1
O1
H2  1  r2
H3  1  r3  2  a3
X4  2  1.0  1  90.0  3  180.0
O5  2  r5  4  a5  1  180.0
H6  5  r6  2  a6  4  d6
H7  5  r6  2  a6  4  -d6

r2=0.9732       
r3=0.9641      
r5=1.9128    
r6=0.9659   
a3=105.9    
a5=83.1         
a6=112.1       
d6=59.6        

The dimer.com input file provides instructions for the Gaussian16 software.

Now you need to create a file for the ARC cluster's job manager, SLURM, and explain

  1. what resources the computation needs and
  2. how to do it.

An example job script, that can be used as a template is shown below.

The SLURM job script, dimer.slurm:

#! /bin/bash
# ========================================================================
#SBATCH --job-name=g16_test

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=40
#SBATCH --mem=180gb
#SBATCH --time=0-01:00:00

# You only need to specify the partition when you want to run the job
# on non-default partition, such as
# a legacy one: lattice, parallel, cpu2013
# or big memory parition: bigmem.
# with
#    #SBATCH --partition=parallel

# Note, that you have to adjust
# the memory request, #CPUs, and time accordingly.

# ========================================================================
skylake=true 
if [[ `grep "avx512" /proc/cpuinfo` == "" ]]; then skylake=false; fi

if $skylake; then 
        module load Gaussian16/b01-skylake
else
        module load Gaussian16/b01-nehalem
fi

NCPUS=$SLURM_CPUS_PER_TASK

export GAUSS_CDEF="0-`expr $NCPUS - 1`"
export GAUSS_MDEF="${SLURM_MEM_PER_NODE}MB"
export GAUSS_SCRDIR="/scratch/$SLURM_JOBID"

echo "=================================================="
echo "       g16 binary: `which g16`"
echo "Memory definition: $GAUSS_MDEF"
echo "   Number of CPUs: $NCPUS"
echo "  CPUs definition: $GAUSS_CDEF"
echo "Scratch directory: $GAUSS_SCRDIR"
echo "=================================================="

# ========================================================================
# Run Gaussian here

g16 dimer.com
# ========================================================================

Generally, job scripts do not have to be that long and elaborate. This specific script does extra work for you:

  1. Determines if the job runs on a newer or older node and loads proper version of Gaussian for it.
  2. Sets the number of CPUs, amount of memory and the scratch directory for Gaussian based on the resource request from SLURM.
  3. Prints out the accepted settings for the run.

and then it runs Gaussian as the last command.

Output files

The run will produce three more files: dimer.chk, dimer.log, and slurm-5527893.out.

$ ls -l
-rw-r--r-- 1 drozmano drozmano 983040 May  6 10:35 dimer.chk
-rw-r--r-- 1 drozmano drozmano    429 May  5 13:29 dimer.com
-rw-r--r-- 1 drozmano drozmano 119739 May  6 10:35 dimer.log
-rw-r--r-- 1 drozmano drozmano   1452 May  5 13:24 dimer.slurm
-rw-r--r-- 1 drozmano drozmano    280 May  6 10:34 slurm-5527893.out

The .chk file is a Gaussian check point file that we requested in the input .com file, it can be used to restart or continue the computations from it. The Gaussian output goes into the .log file. Look into the .log file for your results.

The slurm-...out file is the intercept of what would be printed on screen during the job run time. When the job runs there is no screen and the output is captured and saved into this file. The number in the file name is the job ID of the job, which is printed out when you submit the job with the sbatch command.

Defining your own system

You can start by using these two example files on ARC to run your first job. If everything goes smoothly you can try defining your own system and changing the resource requests to match your specific case.

Gaussian 16 and GPUs

The number of GPUs and GPU nodes on ARC is relatively small and the benefit of using GPUs for Gaussian 16 installed on ARC does not seem significant enough to justify the use of the limited resources. Thus, we neither recommend nor support Gaussian jobs using GPUs at this moment.

Support

Please send any questions regarding using Gaussian on ARC to support@hpc.ucalgary.ca.