Extending Docker containers for use with Singularity


This document describes how to take an existing Docker image, modify it and import it for running via Singularity on the Melbourne Bioinformatics systems.

Obtaining a container from Docker hub

Go to Docker hub and search for the image you want. In this example we'll be using tensorflow.

Use docker pull on your local system to obtain the image.

$ docker pull tensorflow/tensorflow
[Using default tag: latest
latest: Pulling from tensorflow/tensorflow
c62795f78da9: Pull complete 
d4fceeeb758e: Pull complete 
5c9125a401ae: Pull complete 
0062f774e994: Pull complete 
6b33fd031fac: Pull complete 
52e18a0f2ca7: Pull complete 
cf26e7f79a1f: Pull complete 
f1d0b6192b60: Pull complete 
d3cca787fa7c: Pull complete 
24b58a5e905f: Pull complete 
4ed0083b7815: Pull complete 
f181e59dac06: Pull complete 
Digest: sha256:51755c628e1a853f91b0574555efa70f327ffdcd7366449f87fed0066c8ef1f3
Status: Downloaded newer image for tensorflow/tensorflow:latest]

You can see a list of images you have available by using docker images.

$ docker images
[REPOSITORY                     TAG                 IMAGE ID            CREATED             SIZE
tensorflow/tensorflow          latest              2c520a260ba9        6 days ago          1.13GB
ubuntu                         16.04               0ef2e08ed3fa        7 weeks ago         130MB]

Running Docker containers

Using docker run, you can start a tensorflow container.

$ docker run -it tensorflow/tensorflow bash

docker ps will list the currently running containers.

$ docker ps
CONTAINER ID        IMAGE                   COMMAND             CREATED             STATUS              PORTS                NAMES
e2177a597a86        tensorflow/tensorflow   "bash"              54 seconds ago      Up 52 seconds       6006/tcp, 8888/tcp   sharp_archimedes

Modifying a Docker container

If you would like to add new things into your docker image, you need to find the Dockerfile.

Here is the Dockerfile for tensorflow

You can use the following command to clone the tensorflow repository:

$ git clone https://github.com/tensorflow/tensorflow.git
[Cloning into 'tensorflow'...
remote: Counting objects: 176622, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 176622 (delta 0), reused 0 (delta 0), pack-reused 176619
Receiving objects: 100% (176622/176622), 94.13 MiB | 5.64 MiB/s, done.
Resolving deltas: 100% (136073/136073), done.
Checking connectivity... done.]

Modify the Dockerfile as needed. In this example, we have added the h5py package in python3.

$ cd tensorflow/tensorflow/tools/docker
$ vim Dockerfile

In this case, we ask python3 to install h5py

$ diff Dockerfile Dockerfile.orig 
<h5py==2.6.0 \

Once you've made edits to the Dockerfile, you can use docker build to rebuild the image.

$ docker build --pull -t $USER/tensorflow/tensorflow -f Dockerfile .

Once the build is complete, you should be able to see an image called <you>/tensorflow/tensorflow. will be your username.

$ docker images
REPOSITORY                     TAG                 IMAGE ID            CREATED             SIZE
tensorflow/tensorflow          latest              2c520a260ba9        6 days ago          1.13GB
<you>/tensorflow/tensorflow    latest              9710bc4c0841        10 days ago         1.14GB
ubuntu                         16.04               0ef2e08ed3fa        7 weeks ago         130MB]

Next, let's test if your changes have taken effect. In this example, we added the h5py package in python3, so let's see if it's now present.

$ docker run -it <you>/tensorflow/tensorflow python3
Python 3.5.2 (default, Nov 17 2016, 17:05:23) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import h5py

This package wasn't present in the originally downloaded image:

$ docker run -it tensorflow/tensorflow python3
Python 3.5.2 (default, Nov 17 2016, 17:05:23) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import h5py
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named 'h5py'

Exporting a Docker container to Singularity format

Now that the Docker image is updated, we can export it to Singularity format. First, let's make sure the /hsm, /vlsci and '/scratch' directories are present in the image. This means those three filesystems, which are present on all Melbourne Bioinformatics clusters, will be able to be mounted inside your Singularity container when it is running.

$ docker run -it <you>/tensorflow/tensorflow bash
[root@ddf2c817a9d3:/notebooks# cd /
root@ddf2c817a9d3:/# mkdir hsm scratch vlsci
root@ddf2c817a9d3:/# ]

Next, find the name of the running Docker image using docker ps.

$ docker ps
[CONTAINER ID       IMAGE                       COMMAND             CREATED             STATUS              PORTS                NAMES
ddf2c817a9d3        <you>/tensorflow/tensorflow   "bash"              6 minutes ago       Up 6 minutes        6006/tcp, 8888/tcp   priceless_bell]

In this example the name of the running container is priceless_bell. Export this image to a tarball:

$ docker export priceless_bell > tensorflow.tar

Next, create an empty singularity image. In this instance we'll create a 2048MB container. You may need a larger or smaller container depending on what you are putting in it.

$ /path/to/singularity create -s 2048 tensorflow.img
[Creating a new image with a maximum size of 2048MiB...
Executing image create helper
Formatting image with ext3 file system
$ /path/to/singularity import tensorflow.img tensorflow.tar

At this stage you should have a Singularity-format container, tensorflow.img, which may be uploaded to a Melbourne Bioinformatics cluster and used for real jobs.


If you have any questions about Singularity, the Melbourne Bioinformatics team is always able to help.