Verified Commit a034c3e2 authored by Frere, Jonathan (FWCC) - 142176's avatar Frere, Jonathan (FWCC) - 142176
Browse files

Begin section on getting ready with Docker

parent 116b4cad
title: "Getting Started with Docker"
date: 2020-08-25
- frere
layout: blogpost
title_image: headway-5QgIuuBxKwM-unsplash.jpg
- tutorial
- consulting
- docker
excerpt: >
# What is Docker?
One of the key worries in modern research is how to achieve reproducibility.
Interestingly, this is also a big interest for software development.
If I write some code, it should work on my machine
(I mean, I hope it does!)
but how do I guarantee that it will work on anyone else's?
Similar when writing code to analyse data, it is important that it produce the correct result,
not just when you run the code multiple times with the same input data,
but _also_ when someone else runs the code on a different computer.
![eight different python environments](
One of the common ways that software developers have traditionally tried to solve this problem is using virtual machines (or VMs).
The idea is that on your computer, you've probably got different pieces of code that will all interact in different messy ways,
not to mention [eight different Python environments](
However, if you have a VM, you can standardise things a bit more easily.
You can specify which packages are installed, and what versions, and what operating system everything is running on in the first place.
Everyone in your group can reproduce each other's work, because you're all running it in the same place.
The problem occurs when a reviewer comes along, who probably won't have access to your specific VM.
You either need to give them the exact instructions about how to setup your VM correctly
(and can you remember the precise instructions you used then, and what versions all your dependencies were at?)
_or_ you need to copy the whole operating system (and all of the files in it) out of your VM, into a new VM for the reviewer.
Docker is both of those solutions at the same time.
Docker thinks of computers in terms of layers.
The bottom layer is a computer with almost nothing on it[^1].
The top layer is a computer with an operating system, all your dependencies, and your code, compiled and ready to run.
All the layers between those two points are the individual steps that you need to perform to get your computer in the right state to run your code.
Each step defines the changes between it and the next layer,
with each of these steps being written down in a file called a _Dockerfile_.
Moreover, once all of these layers have been built on one computer, they can be shared with other people,
meaning that you can always share your exact setup with anyone else who needs to run and review the code.
This is a bit of a simplification.
The canonical base image ("scratch") is a zero-byte empty layer,
_but_, if you were able to explore inside it,
you'd find that there is still enough of an operating system for things like files to exist, and to run certain programs.
This is because Docker images aren't separate VMs --
the operating system that you can see is actually the operating system of the computer that's running Docker.
This is a concept called _containerisation_ or _OS-level Virtualisation_, and how it works is very much beyond the scope of this blog post!
# Getting Ready with Docker
# An Example Dockerfile
Here's what an example Dockerfile for a simple Python project might look like.
(The comments are added to make it easier to reference later in this post.)
# (1)
FROM python:3.8.5
# (2)
WORKDIR /opt/my-project
# (3)
COPY . /opt/my-project
# (4)
RUN pip install -r requirements.txt
# (5)
ENTRYPOINT [ "python3", "" ]
The first thing **(1)** a Dockerfile needs is a parent image.
In our case, we're using one of the pre-built Python images.
This is an official image provided by Docker that starts with a basic Debian Linux installation,
and installs Python on top of it^[2].
We can also specify the exact version that we want (here we use 3.8.5).
There are a large number of these pre-built official images available,
for tools such as [Python](, [R](, and [Julia](
There are also unofficial images that often bring together a variety of scientific computing tools for convenience.
For example, the Jupyter Notebooks team have a [wide selection]( of different images with support for different setups.
Once we've got our base image, we want to make this image our own.
Each of the commands in this file adds a new layer on top of the previous one.
The first command we use **(2)** is fairly simple -- it just sets the current working directory.
It's a bit like running `cd` to get to the directory you want to start working in.
Here, we set it to `/opt/my-project`.
It doesn't really matter what we use here,
but I recommend `/opt/<project-name>` as a reasonable default.
The next step **(3)** is to add our own code to the image.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment