CESM: Difference between revisions
m (Added navbox) |
|||
(One intermediate revision by one other user not shown) | |||
Line 234: | Line 234: | ||
= Links = | = Links = | ||
[[ARC Software pages]] | [[ARC Software pages]] | ||
[[Category:Software]] | |||
[[Category:ARC]] | [[Category:ARC]] | ||
{{Navbox ARC}} | |||
{{Navbox |
Latest revision as of 19:33, 18 October 2023
General
- National Center for Atmospheric Research (NCAR):
- Project site: https://ncar.ucar.edu/what-we-offer/models/community-earth-system-model-cesm
- QuickStart Manual for CESM2.1: https://escomp.github.io/CESM/release-cesm2/index.html
The Community Earth System Model is a fully-coupled global climate model developed in collaboration with colleagues in the research community. CESM provides state-of-the-art computer simulations of Earth's past, present, and future climate states.
CESM2 is built on the CIME framework.
The majority of the CESM2 User’s Guide is contained in the CIME documentation.
- CIME: https://github.com/ESMCI/cime
- CIME Manual: https://esmci.github.io/cime/versions/master/html/index.html
The Common Infrastructure for Modeling the Earth (CIME - pronounced “SEAM”) provides a Case Control System for configuring,
compiling and executing Earth system models, data and stub model components, a driver and associated tools and libraries.
CESM on ARC
Currently, there are two versions of CESM are installed on ARC, but only one works and supported. The supported version is 2.1.3.
CESM is installed and setup to be used via environmental modules, using the module command.
$ module avail cesm ------------------- /global/software/Modules/4.6.0/modulefiles ------------------- cesm/2.1.1 cesm/2.1.3
To activate it please load its module:
$ module load cesm/2.1.3 Loading cesm/2.1.3 Loading requirement: gcc/9.4.0 cmake/3.17.3 git/2.25.0 svn/1.10.6 openmpi/4.1.1-gnu lib/openblas/0.3.13-gnu
This installation of CESM comes with its own dedicated installs of Python and Perl. To verify that the software has been properly activated you can check the locations of some of the commands provided by the install:
$ which python alias python='python3' /global/software/cesm/python/3.10.4/bin/python3 $ which perl /global/software/cesm/perl/5.34.1/bin/perl $ which create_newcase /global/software/cesm/cesm-2.1.3/cime/scripts/create_newcase
If you have any other software modules loaded on ARC, they may interfere with CESM. Please avoid loading too many modules at the same time.
There is a shared data directory, pointed by the DIN_LOC_ROOT
environmental variable for CESM data sets.
Sharing this storage directory should reduce the amount of data that needs to be downloaded as well as save storage space in users home directories.
Using CESM on ARC
Machines and Queues
The ARC cluster uses the SLURM scheduling system to manage and control jobs. SLURM assumes that their is one main queue for jobs that need to be executed, and that the cluster consists of several partitions. The partitions are collections of compute nodes, that are grouped based on some common properly. On ARC most partitions are grouped based on hardware similarity, scheduling limits, and ownership. CESM has its own model of a compute cluster, that is based on multiple queues and machine types.
In practice, on ARC CESM is setup to use arc40, arc48, and arc52 machine types, the compute nodes of which have 40, 48, and 52 CPU cores per node, respectively. However, these machine types can be used in several SLURM partitions. In this case, these partitions do contain machines of the same kind, but the run time limits are different. CESM model treats these SLURM partitions as queues. To create a new case with CESM, therefore, the machine type as well as the target queue have to be indicated.
Queues of the arc40 machine types:
Queue #Nodes #CPUs Max#nodes MaxRuntime Comment name total /node /user hours ---------------------------------------------------------------------------- cpu2019 40 40 6 168 cpu2019-bf05 87 40 20 5 default
Queues of the arc48 machine types:
Queue #Nodes #CPUs Max#nodes MaxRuntime Comment name total /node /user hours ---------------------------------------------------------------------------- cpu2021 34 48 12 168 default cpu2021-bf24 7 48 4 24
Queues of the arc52 machine types:
Queue #Nodes #CPUs Max#nodes MaxRuntime Comment name total /node /user hours ---------------------------------------------------------------------------- cpu2022 52 52 10 168 default cpu2022-bf24 16 52 4 24
Please note, that this information may change as the cluster is constantly changes based on new hardware being added and old hardware being removed.
Creating and running a case
The default machine type is arc40 and this type's default queue is cpu2019-bf05. These are the machine type and the queue which will be used if they are not specified at the create_newcase step.
Please note that the CIME scripts directory activated by the module so that there is no need to use the full path in the command name, and therefore you can invoke it in you work directory and reference the cases by their relative path. Again, no need to use full absolute path to the case directories.
The routine is to
- Create a new case using the
create_newcase
command on the login node. - Setup and build the executable code for the case on the login node.
- Submit the case to the SLURM scheduler using the
./case.submit
command on the login node. - Monitor the status of the jobs using the
squeue
SLURM command, as well as - The CESM output in the
CaseStatus
output file.
Here is the pattern:
$ cd ~/cases/ $ create_newcase --case casename --compset ... --res ... --machine arc40 --queue cpu2019 $ cd casename $ ./case.setup $ ./preview_run $ ./case.build $ ./case.submit
and an example:
$ create_newcase --case testX3 --compset X --res f19_g16 --machine arc40 Compset longname is 2000_XATM_XLND_XICE_XOCN_XROF_XGLC_XWAV Compset specification file is /global/software/cesm/cesm-2.1.3/cime/src/drivers/mct/cime_config/config_compsets.xml Compset forcing is 1972-2004 .... .... Creating Case directory /work/dmitri.rozmanov/tests/cesm/testX3 $ cd testX3 $ ./case.setup .... $ ./preview_run CASE INFO: nodes: 1 total tasks: 40 tasks per node: 40 thread count: 1 BATCH INFO: FOR JOB: case.run ENV: module command is /global/software/Modules/3.2.10/bin/modulecmd python load cesm/dev Setting Environment OMP_NUM_THREADS=1 SUBMIT CMD: sbatch --time 05:00:00 --partition cpu2019-bf05 .case.run --resubmit MPIRUN (job=case.run): mpiexec -n 40 /home/drozmano/cesm/scratch/testX3/bld/cesm.exe >> cesm.log.$LID 2>&1 FOR JOB: case.st_archive ENV: module command is /global/software/Modules/3.2.10/bin/modulecmd python load cesm/dev Setting Environment OMP_NUM_THREADS=1 SUBMIT CMD: sbatch --time 0:20:00 --partition cpu2019-bf05 --dependency=afterok:0 case.st_archive --resubmit $ ./case.build ...... Building cesm with output to /home/drozmano/cesm/scratch/testX3/bld/cesm.bldlog.220805-155036 Time spent not building: 0.852010 sec Time spent building: 69.274885 sec MODEL BUILD HAS FINISHED SUCCESSFULLY $ ./case.submit .....
Once the case is submitted, one can check the jobs running from the user's account:
$ squeue-long -u drozmano JOBID USER STATE PARTITION TIME_LIMIT TIME NODES TASKS CPUS MIN_MEMORY TRES_PER_NREASON 15397675 drozmano PENDING cpu2019-bf 20:00 0:00 1 1 1 1G N/A Dependency 15397674 drozmano RUNNING cpu2019 168:00:00 0:04 1 40 40 1G N/A None
Here, CESM submitted two jobs, one that actually runs the simulation (15397674), and another one that will resubmit the computation, if it is not done during the allocated time. The second job depends on the successful end of the computational job and will run when the first job is successfully finished.
While the simulation is running the CaseStatus
file can be checked to see the simulation status:
$ cat CaseStatus ...... ...... 2022-08-05 15:51:46: case.build success --------------------------------------------------- 2022-08-05 15:53:10: case.submit starting --------------------------------------------------- 2022-08-05 15:53:12: case.submit success case.run:15397674, case.st_archive:15397675 --------------------------------------------------- 2022-08-05 15:53:19: case.run starting --------------------------------------------------- 2022-08-05 15:53:24: model execution starting --------------------------------------------------- 2022-08-05 15:54:17: model execution success --------------------------------------------------- 2022-08-05 15:54:17: case.run success --------------------------------------------------- 2022-08-05 15:54:19: st_archive starting --------------------------------------------------- 2022-08-05 15:54:33: st_archive success ---------------------------------------------------
Success.
This example is based on the official CIME manual: