Managing x86 jobs with SLURM
To do work on the Melbourne Bioinformatics (formerly VLSCI) computers, you need to submit jobs to a queue. If resources are available then your job should run shortly afterwards. Whilst your jobs are running you will have the processors and memory dedicated to you that you requested for the amount of time that you have requested.
Job scheduling, resource management and accounting are handled by SLURM (Simple Linux Utility for Resource Management).
Caveat for parallel jobs:
For all parallel jobs please use
srun rather than
mpirun. These days
mpirun will perform far worse than using
Job Management with SLURM
A typical workflow for managing a job is to:
- create a script that specifies the settings and commands to run the job and the resources required
- submit the job to the scheduler and resource manager
- monitor the job's progress
- modify the running job
- review the state of a finished job
- refine the script to run the job more efficiently
Job script generator
SLURM scripts: SLURM reads a text file that acts as a script of comments and commands. SLURM looks for special comments where the line starts with #SBATCH. Anything on the same line after the #SBATCH comment is interpreted as a SLURM command.
NOTE: SLURM will only look for SLURM commands in the top comments section of a script. Once a non-comment has been found (a command) any SLURM command after that line will NOT be interpreted. If your job seems to ignore your SLURM settings, please check that there are no commands before the SLURM #SBATCH.
To simplify the task of writing job submission scripts we provide an interactive job script generator.
You can modify the script (or make your own) by referring to the Job types section (below).
To submit the resource requests to the queue, the
sbatch command is used. In conjunction with the job-script,
the command is of the following form:
sbatch [command line options] job-script
This command will return a number for the job id of the job, e.g.:
Submitted batch job 94402
It is possible to run a job as an interactive session using
sinteractive. For example:
Will wait until your job runs, then give you a new prompt from the node your job is on and the directory you launched the job from. The
--x11 option will forward any X11 windows to your machine (assuming you have X11 and forwarding set up).
All SLURM options can be passed as options to the
To view the state of the system use:
To limit the output to your jobs only, use the form:
squeue -u username
username is your Melbourne Bioinformatics username.
It is also possible to get more detailed information about a specific job using the
command, see The SLURM documentation for scontrol for more information.
For example, to see the details of a job with job id jobID use:
scontrol show job jobID
To view SLURM accounting details use the
sacct command, see The SLURM documentation for sacct for more information.
sacct -j jobID
Modifying a Job
If you notice that a job is taking longer than expected, please contact the Help Desk with the machine, job id and estimate of the extra time needed.
If you need to cancel your job, use:
Reviewing a Job
The best way to review a job is to view any output files that it generates and the output and error files generated by SLURM. By default SLURM creates a file containing both the standard output and standard error. This file is named as
slurm-jobID.out (where jobID is the job id number).
To monitor and review SLURMs accounting information for the job use:
This is useful for queued, running and finished jobs.
Optimising your script
An important step in job management it to customise the script to the job. For example, it is always a good idea to do small test runs to estimate the total run time, best number of cores and memory requirements. Underestimating can lead to jobs failing and overestimation will cause the job to remain in the queue longer than necessary.
It is important that you dont request unnecessarily large amounts of memory as it will delay when your jobs start due to the large amount of resources that needs to be freed up. Please see the page on Managing Memory for more information.
Job size limits
Due to the technology available at the time, different clusters have different capabilities in terms of the number of CPUs available on each machine and the maximum amount of memory available.
You can find out these limits at any time using the
Jobs can be classed as one of three types: single CPU, SMP (or multithreaded), and MPI parallel. Here are some minimal examples for each type.
Single CPU #!/bin/bash #SBATCH --ntasks=1 #SBATCH --time=01:00:00 #SBATCH --mem-per-cpu=2048 module load my-app-compiler/version my-app
Bundling single CPU jobs
For a large number of single jobs, it is better to bundle them with a wrapper. For example:
#!/bin/bash #SBATCH --ntasks=16 #SBATCH --time=01:10:00 #SBATCH --mem-per-cpu=4096 for i in `seq 1 $SLURM_NTASKS` do srun --nodes=1 --ntasks=1 --cpus-per-task=1 sh SINGLEJOB.slurm & done # IMPORTANT must wait for all to finish, or all get killed wait
SMP jobs (also called multithreaded, OpenMP)
#!/bin/bash #SBATCH --time=01:00:00 #SBATCH --nodes=1 #SBATCH --mem=22528 module load my-app-compiler/version my-app
MPI Parallel Job
#!/bin/bash #SBATCH --ntasks=16 #SBATCH --time=01:00:00 #SBATCH --mem-per-cpu=1024 module load my-app-compiler/version srun my-MPI-app
Software and Modules
Melbourne Bioinformatics makes available a range of open source and commercial software on its systems.
The range of available software will depend on the host machine. To see a list of software available on a given machine, log into the machine and type
Common SLURM options
|Give the job a name.|
|Account under which to run the job.|
|Resources on a machine can be given a partition. |
At Melbourne Bioinformatics the default is called main.
|The number of cores.|
|Per core memory. Must be in MB.|
|Walltime. Note |
|Send email notification when job fails|
|Send email notification when job start running|
|E-mail address to send information to.
Best to not use this, and the system will use your known e-mail address.
|Redirect output to this file on the path (optional).
The name and extension can be anything you like. If you use
|Redirect error to this file on the path (optional).|
The name and extension can be anything you like.If you use
1) the launch directory,
2) home directory,
3) specified path. Default is to run the job in the same directory that the job is launched (via 'sbatch'). This is the best option (i.e. don't specify and 'cd').
If your job is a SMP job, you will need to use the following options in addition to any relevant options from above.
To get the number of cpus that the node has, use:
||Request that all cores are for 1 task on 1 node. Cannot use more cores than are on 1 node.|
||Specify the whole node memory.|
Must be in MB. Do not use mem-per-cpu
If you use Job Arrays, please see the SLURM documentation for job arrays.