Gaussian on ARC
Introduction
Gaussian is a commercial software package for electronic structure modelling. The University of Calgary has acquired a site license for the Linux source code for Gaussian 16 and the TCP Linda 9 software that allows for parallel execution of Gaussian 16 on multiple compute nodes.
We are also licensed for the Microsoft Windows version of the graphical pre- and post-processing program GaussView 6. Note, however, that we do not have a Linux vesion of the software, so, GaussView cannot be run on ARC. If you use a Microsoft Windows desktop or laptop computer and have been granted access to the software after agreeing to the license conditions, GaussView 6 can be downloaded from ARC, as mentioned below.
Here we concentrate on using Gaussian 16 on ARC, but, the software can also be installed on other Linux-based machines at the University of Calgary.
General information
- g16 command line options:
- Linda:
- Google search on Gaussian input file format:
- Compute Canada Gaussian Errors article:
Licensing and access
Although the University of Calgary has a Gaussian 16 site license, access to the software is only made available to those researchers who are able to confirm that they can abide to the conditions of a license agreement. The license agreement can be downloaded from
/global/software/gaussian/20190311_Gaussian_License_Updated-Calgary-G16_GVW6_Linda.pdf
on ARC.
If you would like access to run Gaussian 16 on ARC or download GaussView 16 for use on a Microsoft Windows computer located at the University of Calgary, please send an email to support@hpc.ucalgary.ca with a subject line of the form: Gaussian access request (your_ARC_user_name) with the body of the email including a copy of the statement:
------------------------------------------ I have read the license agreement 20190311_Gaussian_License_Updated-Calgary-G16_GVW6_Linda.pdf in its entirety and agree to abide by the conditions set forth in that document. These include, in part, that: - I will not use the Gaussian software to compete with Gaussian Inc. or provide assistance to its competitors. - I will not copy the Gaussian 16 or Linda software, nor make it available to anyone else. - I will only copy the GaussViewW Version 6 software to a computer under my control and will remove it when I leave the University of Calgary. I will not make the GaussView software available to anyone else. - I will acknowledge Gaussian Inc., as described in section 10 of the agreement, in publications based on results obtained from using the Gaussian software. - I will notify Research Computing Services if there is any change that would void the agreement, such as leaving the University of Calgary or collaborating with a Gaussian competitor. Signed, Your typed signature --------------------------------------------------
After your email has been received and approved, your user name will be added to the g16 group on ARC, which is used to control access to the directory containing the software.
Look under /global/software/gaussian .
Installing GaussView 6.0 for Windows
The licensing terms for the GaussView 6.0 software require that it is installed on a University of Calgary owned and controlled computers only. If you have a Windows laptop or a workstation that is centrally managed by the UofC IT department, you can install GaussView on it yourself using the Software Centre on the computer. Look for GaussView 6.0 in the Software Centre.
Using Gaussian 16 on ARC
Running Gaussian batch jobs
Researchers using Gaussian on ARC are expected to be generally familiar with Gaussian capabilities, input file format and the use of checkpoint files.
Like other calculations on ARC systems, Gaussian is run by submitting an appropriate script for batch scheduling using the sbatch command. For more information about submitting jobs, see Running jobs article.
Sample scripts for running Gaussian 16 on ARC will be supplied once testing of the software is complete and installed on ARC under /global/software/gaussian .
Gaussian 16 modules
Currently there are two software modules on ARC that provide Gaussian16.
You can see them useing the module avail
:
$ module avail ... $ module avail Gaussian -------------- /global/software/Modules/3.2.10/modulefiles ---------------- Gaussian16/b01-nehalem Gaussian16/b01-skylake
There are two kinds of compute nodes in ARC. The newer nodes with Intel Skylake CPUs are in the cpu2019, apophis-bf, razi-bf, and pawson-bf partitions. The older nodes, legacy nodes have older Intel Nehalem CPUs, these are in lattice and parallel partitions.
Gaussian on ARC was compiled for these two different types of Intel CPUs separately to provide maximum performance on each of the CPU kinds. So, the Gaussian16/b01-nehalem module has to be loaded when submitting a job to the legacy partitions, and the Gaussian16/b01-skylake should be used when the job is sent to the newer partition. When the partition is not specified, the job goes to the default partitions with newer CPUs.
Once the module is loaded it provides access to the g16 executable program.
$ module load Gaussian16/b01-nehalem
The module, however, does not need to be loaded on the login node, it has to be loaded on the compute node that is going to work on your computation. The best place to load the module is the job script for your computation.
Running a Gaussian Job
To run your computation on ARC cluster you have to have two files: (1) a Gaussian .com input file and (2) SLURM job script .slurm.
Put or prepare the files in the directory dedicated to the computation, dimer
for example:
$ cd dimer $ ls -l -rw-r--r-- 1 drozmano drozmano 429 May 5 13:29 dimer.com -rw-r--r-- 1 drozmano drozmano 1452 May 5 13:24 dimer.slurm
To submit the jobs simply
$ sbatch dimer.slurm Submitted batch job 5527893
The number printed out during submission is a job ID of you job that can be used to monitor its state.
Like this:
$ squeue -j 5527893 JOBID USER STATE PARTITION TIME_LIMIT TIME NODES TASKS CPUS MIN_MEMORY REASON NODELIST 5527894 drozmano RUNNING apophis-bf 1:00:00 0:03 1 1 40 180G None fc6
The output of the squeue command may look different for you, depending on your settings.
You can also get more information about the state of the job with
$ arc.job-info 5527893 ....
You will have to replace the number with the actual job ID of your job.
Input files
The Gaussian input file is a text file describing the geometry of your system as well as specifications of the computation you are going to perform. You have to consult the Gaussian manual and tutorials to create it.
Below is a geometry optimization run for a water dimer, dimer.com:
%Chk=dimer.chk #p b3lyp/6-31+G(d,p) opt=(Z-Matrix) iop(1/7=30) int=ultrafine EmpiricalDispersion=GD3 water dimer, B3LYP-D3/6-31+G(d,p) opt tight, Cs, int=ultrafine 0 1 O1 H2 1 r2 H3 1 r3 2 a3 X4 2 1.0 1 90.0 3 180.0 O5 2 r5 4 a5 1 180.0 H6 5 r6 2 a6 4 d6 H7 5 r6 2 a6 4 -d6 r2=0.9732 r3=0.9641 r5=1.9128 r6=0.9659 a3=105.9 a5=83.1 a6=112.1 d6=59.6
The dimer.com input file provides instructions for the Gaussian16 software.
Now you need to create a file for the ARC cluster's job manager, SLURM, and explain
- what resources the computation needs and
- how to do it.
An example job script, that can be used as a template is shown below.
The SLURM job script, dimer.slurm:
#! /bin/bash # ======================================================================== #SBATCH --job-name=g16_test #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=40 #SBATCH --mem=180gb #SBATCH --time=0-01:00:00 # You only need to specify the partition when you want to run the job # on non-default partition, such as # a legacy one: lattice, parallel, cpu2013 # or big memory parition: bigmem. # with # #SBATCH --partition=parallel # Note, that you have to adjust # the memory request, #CPUs, and time accordingly. # ======================================================================== skylake=true if [[ `grep "avx512" /proc/cpuinfo` == "" ]]; then skylake=false; fi if $skylake; then module load Gaussian16/b01-skylake else module load Gaussian16/b01-nehalem fi NCPUS=$SLURM_CPUS_PER_TASK export GAUSS_CDEF="0-`expr $NCPUS - 1`" export GAUSS_MDEF="${SLURM_MEM_PER_NODE}MB" export GAUSS_SCRDIR="/scratch/$SLURM_JOBID" echo "==================================================" echo " g16 binary: `which g16`" echo "Memory definition: $GAUSS_MDEF" echo " Number of CPUs: $NCPUS" echo " CPUs definition: $GAUSS_CDEF" echo "Scratch directory: $GAUSS_SCRDIR" echo "==================================================" # ======================================================================== # Run Gaussian here g16 dimer.com # ========================================================================
Generally, job scripts do not have to be that long and elaborate. This specific script does extra work for you:
- Determines if the job runs on a newer or older node and loads proper version of Gaussian for it.
- Sets the number of CPUs, amount of memory and the scratch directory for Gaussian based on the resource request from SLURM.
- Prints out the accepted settings for the run.
and then it runs Gaussian as the last command.
Output files
The run will produce three more files: dimer.chk, dimer.log, and slurm-5527893.out.
$ ls -l -rw-r--r-- 1 drozmano drozmano 983040 May 6 10:35 dimer.chk -rw-r--r-- 1 drozmano drozmano 429 May 5 13:29 dimer.com -rw-r--r-- 1 drozmano drozmano 119739 May 6 10:35 dimer.log -rw-r--r-- 1 drozmano drozmano 1452 May 5 13:24 dimer.slurm -rw-r--r-- 1 drozmano drozmano 280 May 6 10:34 slurm-5527893.out
The .chk file is a Gaussian check point file that we requested in the input .com file, it can be used to restart or continue the computations from it. The Gaussian output goes into the .log file. Look into the .log file for your results.
The slurm-...out file is the intercept of what would be printed on screen during the job run time. When the job runs there is no screen and the output is captured and saved into this file. The number in the file name is the job ID of the job, which is printed out when you submit the job with the sbatch command.
Defining your own system
You can start by using these two example files on ARC to run your first job. If everything goes smoothly you can try defining your own system and changing the resource requests to match your specific case.
Running Gaussian across multiple nodes
Some type of computations can be spread over several compute nodes when using Gaussian16. Gaussian is using its own custom add-on to the main Gaussian16 program, called Linda. Linda is a separate product and it has to be purchased and licensed separately. ARC cluster does have the Linda add-on installed and available for use.
To spread computation across several nodes Linda logs into additional nodes using SSH and starts additional instances of Gaussian16 on them. The main instance of Gaussian16 on the first node assumes control of the additional instances. While the additional instances can provide additional computational capacity, communication time required for the master instance to send the work and receive the results back can be so long, that the communication burden can make the mutli-node computation slower than a single node computation. You have to know that the method you are going to use can run well on several nodes and test the performance benefit before you commit to running a large number of multi-node jobs.
Gaussian computational option can be specified in any of 4 ways (in the order of precedence): as a link in the inpute .com file, as a command line option for the g16 program, as an environmental variable, and as a directive in the Default.Route file. The examples below show how to control Linda runs using the command line option, so please make sure that you do not try to pass conflicting control information using the other methods at the same time. This may lead to unpredictable behaviour.
Gaussian 16 and GPUs
The number of GPUs and GPU nodes on ARC is relatively small and the benefit of using GPUs for Gaussian 16 installed on ARC does not seem significant enough to justify the use of the limited resources. Thus, we neither recommend nor support Gaussian jobs using GPUs at this moment.
Support
Please send any questions regarding using Gaussian on ARC to support@hpc.ucalgary.ca.