MediaWiki API result

This is the HTML representation of the JSON format. HTML is good for debugging, but is unsuitable for application use.

Specify the format parameter to change the output format. To see the non-HTML representation of the JSON format, set format=json.

See the complete documentation, or the API help for more information.

{
    "batchcomplete": "",
    "continue": {
        "gapcontinue": "Sample_Job_Scripts",
        "continue": "gapcontinue||"
    },
    "warnings": {
        "main": {
            "*": "Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/> for notice of API deprecations and breaking changes."
        },
        "revisions": {
            "*": "Because \"rvslots\" was not specified, a legacy format has been used for the output. This format is deprecated, and in the future the new format will always be used."
        }
    },
    "query": {
        "pages": {
            "118": {
                "pageid": 118,
                "ns": 0,
                "title": "Research Computing Services Courses",
                "revisions": [
                    {
                        "contentformat": "text/x-wiki",
                        "contentmodel": "wikitext",
                        "*": "Research Computing Services offers various training courses throughout the year to help users work effectively in our HPC environment. You may see our upcoming courses below.\n\n= Winter 2021 =\n\n=== Introduction to the Linux command line ===\n\nResearch Computer Services provides support for users of the Advanced Research Computing (ARC) infrastructure at the University of Calgary.\nOne of the critical skills required to fully utilize this resource is to have an understanding of the Linux command line.\n\nPrerequisites:\n* You should have basic computer skills\n* You need a UCalgary IT account (if you have a UCalgary email account you have an IT account)\n* A computer with an up-to-date web browser\n\nRecommended:\n* A reasonably sized display or two displays, this is a hands on class and you will need to work through examples\n\n[https://www.eventbrite.com/e/intro-to-linux-command-line-registration-141887377967| Register Here for Intro To Linux Command Line]\n\n=== Introduction to scheduling and parallel models of computation ===\n\nIn this workshop, we will introduce basic ideas from parallel computing such as task graphs and resource scheduling. The first half of the talk will explain the difference between serial computations, shared memory parallelism, and distributed memory parallelism as they relate to understanding the resources that a computation is capable of using efficiently. The second half of the talk will introduce a workflow for computing that takes advantage of the many benefits of using a job scheduler. This workflow will emphasize methods for tracking resource utilization and modeling job performance to improve job requests. Throughout the workshop, attendees will apply these techniques to a simple (but scalable) example problem to develop their own scaling study and optimize a job script.\n\nPrerequisites:\n* You should have basic computer skills\n* You need a UCalgary IT account (if you have a UCalgary email account you have an IT account)\n* A VPN for connecting to the UCalgary network\n* A terminal (on a Mac) or a terminal emulator like PuTTy (on a PC)\n\nRecommended:\n* A basic knowledge of the Linux command line to work through the problems\n* A basic knowledge of Python (functions and for-loops) to understand the examples\n\n[https://www.eventbrite.com/e/introduction-to-parallel-computing-and-job-scheduling-registration-142027998567| Register Here for Introduction to Scheduling and Parallel Models of Computation]\n\n=== Compiling software ===\n\nIn this workshop, we will introduce some of the basic ideas of compiling software from source code. The workshop will begin with a high-level overview of the ideas used in compiling software. Throughout the talk, we will discuss specific examples from compiling C libraries on Linux using GCC, Make, and Autotools. An understanding of these technologies is required for handling the most common custom software builds on High-Performance Computing clusters. We will introduce strategies for build process debugging and test these skills by building R from source to enable integration with some common high-performance statistical software.\n\nPrerequisites:\n* You should have basic computer skills\n* You need a UCalgary IT account (if you have a UCalgary email account you have an IT account)\n* A VPN for connecting to the UCalgary network\n* A terminal (on a Mac) or a terminal emulator like PuTTy (on a PC)\n\nRecommended:\n* A basic knowledge of the Linux command line to work through the problems\n* A very small amount of exposure to the C programming language will make the main example easier to follow\n\n[https://www.eventbrite.com/e/introduction-to-compiling-software-from-source-for-hpc-tickets-142022650571| Register Here for Introduction to Compiling Software From Source for HPC]\n\n= RCS course materials and references =\n\nThe Intro to Linux Command Line is based on the excellent course from the Software Carpentry group.\n\nhttp://swcarpentry.github.io/shell-novice/\n\nThe Introduction to Compiling for HPC presentation can be downloaded here:\n\n[https://uofc-my.sharepoint.com/:b:/g/personal/ian_percel_ucalgary_ca/Ea9KVACseHtJk1QTd8KOY9ABGNPPGXteGO-XE2jzMClPKQ?e=CvtJMd| Compiling Presentation]\n\nThe Introduction to Scheduling and Parallel Models of Computation presentation can be downloaded here:\n\n[https://uofc-my.sharepoint.com/:b:/g/personal/ian_percel_ucalgary_ca/EYDZ5v7cPVFItiClJPDTkL0B9fxjM2YRXlMH7a2_uDDnWQ?e=eqZmT4| Scheduling Presentation]\n\n[[Category:Training]]\n[[Category:Administration]]\n{{Navbox Administration}}\n__NOTOC__"
                    }
                ]
            },
            "21": {
                "pageid": 21,
                "ns": 0,
                "title": "Running jobs",
                "revisions": [
                    {
                        "contentformat": "text/x-wiki",
                        "contentmodel": "wikitext",
                        "*": "This page is intended for users who are looking for guidance on working effectively with the [[wikipedia:Slurm_Workload_Manager|SLURM]] job scheduler on the compute clusters that are operated by Research Computing Services (RCS). The content on this page assumes that you already familiar with the basic concepts of job scheduling and job scripts. If you have not worked on a large shared computer cluster before, please read [[What is a scheduler?]] first. To familiarize yourself with how we use computer architecture terminology in relation to SLURM, you may also want to read about the [[Types of Computational Resources]], as well as the cluster guide for the cluster that you are working on. Users familiar with other job schedulers including PBS/Torque, SGE, LSF, or LoadLeveler, may [https://slurm.schedmd.com/rosetta.pdf find this table of corresponding commands] useful. Finally, if you have never done any kind of parallel computation before, you may find some of the content on [[Parallel Models]] useful as an introduction.\n== Basic HPC Workflow ==\n\nThe most basic HPC computation workflow (excluding software setup and data transfer) involves 5 steps. \n\n<!--[[File:SvgTemplatePNG.png|thumb|left|a test for PNG inclusion]] -->\n<!--[[File:JobWorkflow.png|border|center|400px|The standard process for executing and optimizing a job script ]] -->\n   [[File:JobWorkflowv2.svg|thumb|400px|The standard process for executing and optimizing a job script]]\n\nThis process begins with the creation of a working preliminary job script that includes an approximate resource request and all of the commands required to complete a computational task of interest. This job script is used as the basis of a job submission to SLURM. The submitted job is then monitored for issues and analyzed for resource utilization. Depending on how well the job utilized resources, modifications to the job script can be made to improve computational efficiency and the accuracy of the resource request. In this way, the job script can be iteratively improved until a reliable job script is developed. As your computational goals and methods change, your job scripts will need to evolve to reflect these changes and this workflow can be used again to make appropriate modifications.   \n\n== Job Design and Resource Estimation ==\n\nThe goal of this first step is the development of a slurm job script for the calculations that you wish to run that reflects real resource usage in a job typical of your work. To review some example jobs scripts for different models of parallel computation, please consult the page [[Sample Job Scripts]]. We will look at the specific example of using the NumPy library in python to multiply two matrices that are read in from two files. We assume that the files have been transferred to a subdirectory of the users home directory <code>~/projects/matmul</code> and that a custom python installation has been setup under <code>~/anaconda3</code>.\n\n<source lang=\"console\">\n$ ls -lh ~/project/matmul\ntotal 4.7G\n-rw-rw-r-- 1 username username 2.4G Feb  3 10:13 A.csv\n-rw-rw-r-- 1 username username 2.4G Feb  3 10:16 B.csv\n$ ls ~/anaconda3/bin/python\n/home/username/anaconda3/bin/python\n</source>   \n\nThe computational plan for this project is to load the data from files and then use the multithreaded version of the NumPy <code>dot</code> function to perform the matrix multiplication and write it back out to a file. To test out our proposed steps and software environment, we will request a small ''Interactive Job''. This is done with the <code>salloc</code> command. The salloc command accepts all of the standard resource request parameters for SLURM. (The key system resources that can be requested can be reviewed on the [[Types of Computational Resources]] page) In our case, we will need to specify a partition, a number of cores, a memory allocation, and a small time interval in which we will do our testing work (on the order of 1-4 hours). We begin by checking which partitions are currently not busy using the <code>arc.nodes</code> command     \n\n<source lang=\"console\">\n$ arc.nodes\n\n          Partitions: 16 (apophis, apophis-bf, bigmem, cpu2013, cpu2017, cpu2019, gpu-v100, lattice, parallel, pawson, pawson-bf, razi, razi-bf, single, theia, theia-bf)\n\n\n      ====================================================================================\n                     | Total  Allocated  Down  Drained  Draining  Idle  Maint  Mixed\n      ------------------------------------------------------------------------------------\n             apophis |    21          1     1        0         0     7      1     11 \n          apophis-bf |    21          1     1        0         0     7      1     11 \n              bigmem |     2          0     0        0         0     1      0      1 \n             cpu2013 |    14          1     0        0         0     0      0     13 \n             cpu2017 |    16          0     0        0         0     0     16      0 \n             cpu2019 |    40          3     1        2         1     0      4     29 \n            gpu-v100 |    13          2     0        1         0     5      0      5 \n             lattice |   279         34     3       72         0    47      0    123 \n            parallel |   576        244    26       64         1    20      3    218 \n              pawson |    13          1     1        2         0     9      0      0 \n           pawson-bf |    13          1     1        2         0     9      0      0 \n                razi |    41         26     4        1         0     8      0      2 \n             razi-bf |    41         26     4        1         0     8      0      2 \n              single |   168          0    14       31         0    43      0     80 \n               theia |    20          0     0        0         1     0      0     19 \n            theia-bf |    20          0     0        0         1     0      0     19 \n      ------------------------------------------------------------------------------------\n       logical total |  1298        340    56      176         4   164     25    533 \n                     |\n      physical total |  1203        312    50      173         3   140     24    501 \n\n</source>\n\nThe [[ARC Cluster Guide]] tells us that the single partition is an old partition with 12GB of Memory (our data set is only 5GB and matrix multiplication is usually pretty memory efficient) and 8 cores so it is probably a good fit. Based on the <code>arc.nodes</code> output, we can choose the single partition (which at the time that this ran had 43 idle nodes) for our test. The preliminary resource request that we will use for the interactive job is\n<pre>\npartition=single\ntime=2:0:0 \nnodes=1 \nntasks=1 \ncpus-per-task=8 \nmem=0\n</pre>\n\nWe are requesting 2 hours and 0 minutes and 0 seconds, a whole single node (i.e. all 8 cores), and <code>mem=0</code> means that all memory on the node is being requested. <u>Please note that mem=0 can only used in a memory request when you are already requesting all of the cpus on a node.</u> On the single partition, 8 cpus is the full number of cpus per node. Please check this for the partition that you plan on running on using <code>arc.hardware</code> before using it. From the login node, we will request this job using <code>salloc</code>\n\n<source lang=\"console\">\n[userrname@arc ~]$ salloc --partition=single --time=2:0:0 --nodes=1 --ntasks=1 --cpus-per-task=8 --mem=8gb\nsalloc: Pending job allocation 8430460\nsalloc: job 8430460 queued and waiting for resources\nsalloc: job 8430460 has been allocated resources\nsalloc: Granted job allocation 8430460\nsalloc: Waiting for resource configuration\nsalloc: Nodes cn056 are ready for job\n[username@cn056 ~]$ export PATH=~/anaconda3/bin:$PATH\n[username@cn056 ~]$ which python\n~/anaconda3/bin/python\n[username@cn056 ~]$ export OMP_NUM_THREADS=4\n[username@cn056 ~]$ export OPENBLAS_NUM_THREADS=4\n[username@cn056 ~]$ export MKL_NUM_THREADS=4\n[username@cn056 ~]$ export VECLIB_MAXIMUM_THREADS=4\n[username@cn056 ~]$ export NUMEXPR_NUM_THREADS=4\n[username@cn056 ~]$ python\nPython 3.7.4 (default, Aug 13 2019, 20:35:49) \n[GCC 7.3.0] :: Anaconda, Inc. on linux\nType \"help\", \"copyright\", \"credits\" or \"license\" for more information.\n>>>\n</source> \nThe steps that were taken at the command line included \n<ol> \n  <li>submitting an interactive job request and waiting for the scheduler to allocate the node (cn056) and move the terminal to that node</li>\n  <li>setting up our software environment using Environment Variables</li>\n  <li>starting a python interpreter</li>\n</ol>\n\nAt this point, we are ready to test our simple python code at the python interpreter command line.\n\n<source lang=\"console\">\n>>> import numpy as np\n>>> A=np.loadtxt(\"/home/username/project/matmul/A.csv\",delimiter=\",\")\n>>> B=np.loadtxt(\"/home/username/project/matmul/B.csv\",delimiter=\",\")\n>>> C=np.dot(A,B)\n>>> np.savetxt(\"/home/username/project/matmul/C.csv\",C,delimiter=\",\")\n>>> quit()\n[username@cn056 ~]$ exit\nexit\nsalloc: Relinquishing job allocation 8430460\n</source>   \n\nIn a typical testing/debugging/prototyping session like this, some problems would be identified and fixed along the way. Here we have only shown successful steps of a simple job. However, normally this would be an opportunity to identify syntax errors, data type mismatch, out-of-memory exceptions, and other issues in an interactive environment. The data used for a test should be a manageable size. In the sample data, I am working with randomly generated 10000x10000 matrices and discussing it as though it was the actual job that would run. However, often times, the real problem is too large for an interactive job and it is better to test the commands in an interactive mode using a small data subset. For example, randomly generated 10000 x 10000 matrices might be a good starting point for planning a real job that was going to take the product of 1 million x1 million matrices.  \n\nWhile this interactive job was running, I used a separate terminal on ARC to check the resource utilization of the interactive job by running the command <code>arc.job-info 8430460</code>, and found that the CPU utilization hovered around 50% and memory utilization around 30%. We will come back to this point in later steps of the workflow but this would be a warning sign that the resource request is a bit high, especially in CPUs. It is worth noting that if you have run your code before on a different computer, you can base your initial slurm script on that past experience or the resources that are recommended in the software documentation instead of testing it out explicitly with salloc. However, as the the hardware and software installation are likely different, you may run into new challenges that need to be resolved. In the next step, we will write a slurm script that includes all of the commands we ran manually in the above example.\n\n== Job Submission ==\n\nIn order to use the full resources of a computing cluster, it is important to do your large-scale work with ''Batch Jobs''. This amounts to replacing sequentially typed commands (or successively run cells in a notebook) with a shell script that automates the entire process and using the <code>sbatch</code> command to submit the shell script as a job to the job scheduler. Please note, '''you must not run this script directly on the login node'''. It must be submitted to the job scheduler. \n\nSimply putting all of our commands in a single shell script file (with proper <code>#SBATCH</code> directives for the resource request), yields the following unsatisfactory result:\n<source lang=\"bash\">\n#!/bin/bash\n#SBATCH --nodes=1 \n#SBATCH --ntasks=1 \n#SBATCH --cpus-per-task=4 \n#SBATCH --mem=8gb\n#SBATCH --time=2:0:0 \n\nexport PATH=~/anaconda3/bin:$PATH\necho $(which python)\nexport OMP_NUM_THREADS=4\nexport OPENBLAS_NUM_THREADS=4\nexport MKL_NUM_THREADS=4\nexport VECLIB_MAXIMUM_THREADS=4\nexport NUMEXPR_NUM_THREADS=4\n\npython matmul_test.py\n</source>\n\nwhere '''matmul_test.py''':\n<source lang=\"python\">\nimport numpy as np\n\nA=np.loadtxt(\"/home/username/project/matmul/A.csv\",delimiter=\",\")\nB=np.loadtxt(\"/home/username/project/matmul/B.csv\",delimiter=\",\")\nC=np.dot(A,B)\nnp.savetxt(\"/home/username/project/matmul/C.csv\",C,delimiter=\",\")\n</source>   \n\nThere are a number of things to improve here when moving to a batch job. \nWe shoul consider generalizing our script a bit to improve reusability between jobs, instead of hard coding paths. This yields:\n\n'''matmul_test02032021.slurm:'''\n<source lang=\"bash\">\n#!/bin/bash\n#SBATCH --nodes=1 \n#SBATCH --ntasks=1 \n#SBATCH --cpus-per-task=4 \n#SBATCH --mem=10000M\n#SBATCH --time=2:0:0 \n\nexport PATH=~/anaconda3/bin:$PATH\necho $(which python)\nexport OMP_NUM_THREADS=4\nexport OPENBLAS_NUM_THREADS=4\nexport MKL_NUM_THREADS=4\nexport VECLIB_MAXIMUM_THREADS=4\nexport NUMEXPR_NUM_THREADS=4\n\nAMAT=\"/home/username/project/matmul/A.csv\"\nBMAT=\"/home/username/project/matmul/B.csv\"\nOUT=\"/home/username/project/matmul/C.csv\"\n\npython matmul_test.py $AMAT $BMAT $OUT \n</source>\n\nwhere '''matmul_test.py''':\n<source lang=\"python\">\nimport numpy as np\nimport sys\n\n#parse arguments\nif len(sys.argv)<3:\n    raise Exception\n\nlmatrixpath=sys.argv[1]\nrmatrixpath=sys.argv[2]\noutpath=sys.argv[3]\n\nA=np.loadtxt(lmatrixpath,delimiter=\",\")\nB=np.loadtxt(rmatrixpath,delimiter=\",\")\nC=np.dot(A,B)\nnp.savetxt(outpath,C,delimiter=\",\")\n</source>\n\nWe have generalized the way that data sources and output locations are specified in the job. \nIn the future, this will help us reuse this script without proliferating python scripts. \nSimilar considerations of reusability apply to any language and any job script. \nIn general, due to the complexity of using them, HPC systems are not ideal for running a couple small calculations. \nThey are at their best when you have to solve many variations on a problem at a large scale. \nAs such, it is valuable to always be thinking about how to make your workflows simpler and easier to modify.\n\nNow that we have job script, we can simply submit the script to sbatch:\n\n<source lang=\"console\">\n[username@arc matmul]$ sbatch matmul_test02032021.slurm\nSubmitted batch job 8436364\n</source>\n\nThe number reported after the sbatch job submission is the JobID and will be used in all further analyses and monitoring. \nWith this number, we can begin the process of checking on how our job progresses through the scheduling system.\n\n== Job Monitoring ==\n\nThe third step in this work flow is checking the progress of the job. We will emphasize three tools: \n<ol> \n  <li><code>--mail-user</code> , <code>--mail-type</code> \n    <ol>\n      <li>sbatch options for sending email about job progress</li>\n      <li>no performance information</li>\n    </ol>\n  </li>\n  <li><code>squeue</code>\n    <ol>\n      <li>slurm tool for monitoring job start, running, and end</li>\n      <li>no performance information</li>\n    </ol>\n  </li> \n  <li><code>arc.job-info</code>\n    <ol>\n      <li>RCS tool for monitoring job performance once it is running</li>\n      <li>provides detailed snapshot of key performance information</li>\n    </ol>\n  </li>\n</ol>\n\n\nThe motives for using the different monitoring tools are very different but they all help you track the progress of your job from start to finish. The details of these tools applied to the above example are discussed in the article [[Job Monitoring]]. If a job is determined to be incapable of running or have a serious error in it, you can cancel it at any time using the command <code>scancel JobID</code>, for example <code>scancel 8442969</code>  \n\nFurther analysis of aggregate and maximum resource utilization is needed to determine changes that need to be made to the job script and the resource request. This is the focus of the Job Performance Analysis step that is the subject of the next section.\n\n== Job Performance Analysis ==\n\nThere are a tremendous number of performance analysis tools, from complex instruction-analysis tools like gprof, valgrind, and nvprof to extremely simple tools that only provide average utilization. The minimal tools that are required for characterizing a job are\n\n<ol> \n  <li><code>\\time</code> for estimating CPU utilization on a single node</li>\n  <li><code>sacct -j JobID -o JobIDMaxRSS,Elapsed</code> for analyzing maximum memory utilization and wall time</li>\n  <li><code>seff JobID </code> for analyzing general resource utilization</li>\n</ol>\n\nWhen utilizing these tools, it is important to consider the different sources of variability in job performance. Specifically, it is important to perform a detailed analysis of \n\n<ol> \n  <li>Variability Among Datasets</li>\n  <li>Variability Across Hardware</li>\n  <li>Request Parameter Estimation from Samples Sets - Scaling Analysis</li>\n</ol>\n\nand how these can be addressed through thoughtful job design. This is the subject of the article [[Job Performance Analysis and Resource Planning]].\n\n== Revisiting the Resource Request ==\n\nHaving completed a careful analysis of performance for our problem and the hardware that we plan to run on, we are now in a position to revise the resource request and submit more jobs. If you have 10000 such jobs then it is important to think about how busy the resources that you are planning to use are likely to be. It is also important to think about the fact that every job has some finite scheduling overhead (time to setup the environment to run in, time to figure out where to run to optimally fit jobs, etc). In order to make this scheduling overhead worthwhile, it is generally important to have jobs run for more than 10 minutes. So, we may want to pack batches of 10-20 matrix pairs into one job just to make it a sensible amount of work. This also will have some benefit in making job duration more predictable when there is variability due to data sets. New hardware also tends to be busier than old hardware and cpu2019 will often have 24 hour wait times when the cluster is busy. However, for jobs that can stay under 5 hours, the backfill partitions are often not busy and so the best approach might be to bundle 20 tasks together for a duration of about 100 minutes and then have the job request be\n\n<source lang=\"bash\">\n#!/bin/bash\n#SBATCH --nodes=1 \n#SBATCH --ntasks=1 \n#SBATCH --cpus-per-task=4 \n#SBATCH --mem=5000M\n#SBATCH --time=1:40:0 \n#SBATCH --array=1-500\n\nexport PATH=~/anaconda3/bin:$PATH\necho $(which python)\nexport OMP_NUM_THREADS=4\nexport OPENBLAS_NUM_THREADS=4\nexport MKL_NUM_THREADS=4\nexport VECLIB_MAXIMUM_THREADS=4\nexport NUMEXPR_NUM_THREADS=4\n\nexport TOP_DIR=\"/home/username/project/matmul/pairs\"$SLURM_ARRAY_TASK_ID\n\nDIR_LIST=$(find $TOP_DIR -type d -mindepth 1)\nfor DIR in $FLIST\ndo \npython matmul_test.py $DIR/A.csv $DIR/B.csv $DIR/C.csv; \ndone\n \n</source>\n\nHere the directory <code>TOP_DIR</code> will be one of 500 directories, each named <code>pairsN</code> where N is an integer from 1 to 500, with each of these holding 20 subdirectories with a pair of A and B matrix files in each. The script will iterate through the 20 subdirectories and run the python script once for each pair of matrices. This is one approach to organizing work into bundles. You could then submit a job array  The choice depends on a lot of assumptions that we have made, but it is a fairly good plan. The result would be 500 jobs, each 2 hours long, running mostly on the backfill partitions on 1 core a piece. If there is usually at least 1 node free, that will be about 38 jobs running at the same time for a total of roughly 13 sequences that are 2 hours long, or 26 hours. In this way, all of work could be done in a day. Typically, we would iteratively improve the script until it was something that could scale up. This is why the workflow is a loop.\n\n== Tips and tricks ==\n\n=== Attaching to a running job === \n\nIt is possible to connect to the node running a job and execute new processes there. You might want to do this for troubleshooting or to monitor the progress of a job.\n\nSuppose you want to run the utility <code>nvidia-smi</code> to monitor GPU usage on a node where you have a job running. \nThe following command runs <code>watch</code> on the node assigned to the given job, which in turn runs <code>nvidia-smi</code> every 30 seconds, \ndisplaying the output on your terminal.\n $ srun --jobid 123456 --pty watch -n 30 nvidia-smi\n\nIt is possible to launch multiple monitoring commands using [https://en.wikipedia.org/wiki/Tmux <code>tmux</code>]. \nThe following command launches <code>htop</code> and <code>nvidia-smi</code> in separate panes to monitor the activity on a node assigned to the given job.\n $ srun --jobid 123456 --pty tmux new-session -d 'htop -u $USER' \\; split-window -h 'watch nvidia-smi' \\; attach\n\nProcesses launched with <code>srun</code> share the resources with the job specified. You should therefore be careful not to launch processes that would use a significant portion of the resources allocated for the job. Using too much memory, for example, might result in the job being killed; using too many CPU cycles will slow down the job.\n\n'''Note\u02d0''' The <code>srun</code> commands shown above work only to monitor a job submitted with <code>sbatch</code>. To monitor an interactive job, create multiple panes with <code>tmux</code> and start each process in its own pane.\n\nIt is possible to '''attach second terminal''' to a running interactive job with the\n $ srun -s --pty --jobid=12345678 /bin/bash\n\nThe '''-s''' option specifies that the resources can be '''oversubscribed'''. \nIt is needed as all the resources have already be given to the first shell. \nSo, to run the second shell we either have to wait until the first shell is over, or we have to oversubscribe.\n\n== Further reading ==\n* [https://slurm.schedmd.com/ Slurm Manual - https://slurm.schedmd.com/]\n** Additional [https://slurm.schedmd.com/tutorials.html tutorials].\n** [https://slurm.schedmd.com/sbatch.html sbatch] command options.\n** A [https://slurm.schedmd.com/rosetta.pdf \"Rosetta stone\"] mapping commands and directives from PBS/Torque, SGE, LSF, and LoadLeveler, to SLURM.\n* http://www.ceci-hpc.be/slurm_tutorial.html - Text tutorial from C\u00c9CI, Belgium\n* http://www.brightcomputing.com/blog/bid/174099/slurm-101-basic-slurm-usage-for-linux-clusters - Minimal text tutorial from Bright Computing\n[[Category:Slurm]]\n[[Category:Guides]]\n{{Navbox Guides}}"
                    }
                ]
            }
        }
    }
}