How to use local storage on ARC's compute nodes
Background
On ARC cluster’s compute nodes,
the local file systems /tmp (disk-based) and /dev/shm (memory-based)
offer fast, node-local storage that avoids network latency.
They are ideal for workloads needing quick access to temporary data, such as
intermediate files, caches, or scratch space.
Access to data in these storage locations is much faster than to shared network storage, such as /home or /scratch.
They are limited in total size but have no file count quotas.
Since they are private to each node and cleared after the job ends,
they’re best for short-lived data that doesn’t need to persist.
These file systems are not shared between compute nodes, so if you have a very large number of small files which you want to process, your job will have to, possibly, decompress the files on the the local storage at the beginning.
Local /tmp file system
The local temporary directory /tmp on all ARC's compute nodes is located on a local storage drive, either a spinning hard disk drive (HDD), or a solid state drive (SSD).
One can start an interactive job on the cpu2019-bf05 partition:
$ salloc -N1 -n1 -c4 --mem=16gb -t 1:00:00 -p cpu2019-bf05
and then check the current storage usage with the arc.quota command:
$ arc.quota Filesystem Available Used / Total Files Used / Total ---------- --------- -------- / ---------- ---------- / ---------- Home Directory 423.5GB 76.4GB / 500.0GB (15%) 1.1 Mil / 1.5 Mil (76%) /scratch 15.0TB 0.0TB / 15.0TB (0%) 0.0 K / 1000.0 K (0%) /tmp (on fc106) 99.9GB 0.0GB / 100.0GB (0%) 0.0 K / Unlimited
The third record shows the available local storage for our job, 99.9 GB in the /tmp directory with Unlimited file count.
It is also indicated that this storage is local to the fc106 compute node.