Shifter

What is Shifter?

Shifter is a tool to enable use of Docker containers on a Linux cluster.

What is Docker?

Docker is a platform that employs features of the Linux kernel to run software in a container. The software housed in a docker container is not just a single program but an entire OS distribution, or at least enough of the OS to enable the program to work.

Docker can be thought of as somewhat like a software distribution mechanism like yum or apt. It also can be thought of as an expanded version of a chroot jail, or a reduced version of a virtual machine.

Read more about Docker on the official web site.

Docker vs Shifter

There are some important differences between Docker and Shifter.

Docker Hub

Docker hub is a service run by Docker to distribute and share Docker containers.

A lot of scientific software & languages are available on docker hub, for example

Creating a Docker container

This guide does not cover how to build your own Docker containers. If you wish to do this, you could start with the Build your own image guide from the Docker web site. This guide provides an example of how to extend an existing Ubuntu-based image to add extra software.

You may use such a process to extend a container found on Docker Hub to add your extra requirements, such as Infiniband libraries for MPI (see below). Once your container is complete, you can upload it to Docker Hub in order to access it from Melbourne Bioinformatics.

Shifter at Melbourne Bioinformatics

Shifter has been installed on Melbourne Bioinformatics's Intel-based clusters.

To use it: you will need to load the Shifter module:

module load shifter

Once loaded there are two extra binaries added to your $PATH, shifterimg and shifter.

Using shifterimg

shifterimg has three modes. images will list the the docker containers - known as "images" - which have already been "pulled", ie. downloaded.

shifterimg images

This will show entries like the following:

$ shifterimg images
VLSCI      docker     READY    50475a1caf   2016-09-05T18:44:54 perl:latest                   
VLSCI      docker     READY    95b04ce633   2016-09-05T19:15:10 r-base:latest                 
VLSCI      docker     READY    65e1e9d1a1   2016-08-22T18:03:28 ubuntu:latest                 

The columns are, left to right:

lookup will give the full 64-character sha256 hash for an image, or when used with the -v option, it will give further details about the image.

pull will pull an image from Docker Hub and store in in the VLSCI cache ready to use with shifter. For example, to pull the latest version of R:

shifterimg pull docker:r-base:latest

The 'docker:' prefix means you want the image to be pulled from the docker hub. This means you will be able to pull containers that have been published on the Docker (hub)[https://hub.docker.com/]. In future, other sources may be added.

When pulling an image you will see output like the following:

$ shifterimg pull docker:r-base:latest
2016-09-06T13:44:40 Pulling Image: docker:r-base:latest, status: PULLING

The last word will change from PULLING to CONVERSION and finally READY.

An advantage of shifter for Melbourne Bioinformatics users is that you do not need to request the installation of software which has been published on Docker hub. You are free to pull it yourself.

Using shifter

If you want to run a container for inspection, you may do the following

shifter --image=<imagename>:<version> <entrypoint>

For example:

shifter --image=r-base:latest R

This will run the R binary from inside the image containing the latest version of R. You can then use R as normal:

$ module load shifter
$ shifter --image=r-base:latest R

R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> cat("Hello world\n")
Hello world
> quit()
Save workspace image? [y/n/c]: n
$

However, Shifter is not intended for use on the login nodes. Just as you run regular jobs via SLURM, shifter jobs need to be run through SLURM to take advantage of the resources of Melbourne Bioinformatics's clusters.

Shifter and SLURM

shifter adds another option you may pass to slurm, --image. This will inform SLURM that you want to use a particular image, so SLURM will ensure it is set up on the compute node(s) that execute your job.

Here is an example sbatch script, which we will call Rshifter.sbatch:

#!/bin/bash
#SBATCH --image=docker:r-base:latest
#SBATCH --nodes=1
#SBATCH --partition=main

module purge
module load shifter
echo 'cat("Hello world\n")' | shifter R --no-save

When submitted, we get:

$ sbatch Rshifter.sbatch 
Submitted batch job 729897

Once the job is complete we can see the output

$ cat slurm-729897.out 

R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> cat("Hello world\n")
Hello world
> 

So we have just successfully pulled R from Docker Hub and used it inside a SLURM job.

MPI jobs in Shifter containers

To use MPI inside a Shifter container, the container must have Infiniband libraries installed.

If you are building a Debian or Ubuntu based container you can use the libmlx4-1 package, which also requires the libibverbs1 package.

For a Red Hat based container you can use the libmlx4 package.

Interaction with Melbourne Bioinformatics resources

Shifter containers run at Melbourne Bioinformatics are supplied with all Melbourne Bioinformatics user and group accounts. Processes started inside a container will be running as your user and that user will be a member of all your current groups.

Shifter containers mount the three Melbourne Bioinformatics filesystems, /scratch, /vlsci and /hsm, so all are available for any tasks you wish to perform inside a container. When you start an executable inside a container its working directory will be your home directory (/vlsci//.

Executables running inside your container are able to read from and write to any files in those three filesystems for which you have read or write permission.

Questions

If you have any questions about shifter, the Melbourne Bioinformatics team is always able to help.