Job Scheduling and CPU usage

This section gives an overview of how jobs are scheduled to run on a machine and how the usage accrued is accounted.

Jobs and resources are managed using SLURM.

The job scheduling system used is based on the notion of a 'fairshare' system. In particular, we use a 'Fair Tree' system

All jobs submitted to a machine are queued and prioritised. There are several factors that determine the priority of a queued job. The main factors used by the scheduler are;

There are other limits that can apply to job scheduling that include;

Note, that all these factors only apply to queued jobs.

The advantage of fairshare is that project members do not have to accurately estimate how many CPU hours they will need for a year, and that all eligible jobs will always run as long as there is capacity. This greatly simplifies usage management and promotes optimal usage of the machines.

Every project has a fairshare target that the scheduler uses as a guideline when calculating priorities for waiting jobs. If your project is under its fairshare target you'll get a boost in priority, if you're over the target you'll get a penalty. But if there are resources free it won't stop you from running (you'll just get increasingly over your target).

To balance the target share, Fair Tree uses a hierarchical arrangement to balance the target utilisation between different allocation methods. The current hierarchy is based on the different mechanisms that projects can get access to Melbourne Bioinformatics (formerly VLSCI) resources. The tree is of the form;

This means that each project has a target share based on the accumulated usage for the project, and all projects that share the same parent. This prevents any one project or collection of projects dominating the resources.

For more information on Fair Tree please see this PDF.