This the multi-page printable view of this section.Click here to print.

Return to the regular view of this page.

Docker Build

Notes on Dockerfile and building containers

Table of Contents

Although building images for Docker is pretty simple, it is easy to end up creating extremely large images which can be easily fixed if the builds are done correctly limiting the final image sizes.

Each section below covers different methods that can be used together to build better images.

1 - Docker Ignore

Limiting what is visible to a build

One feature that is widely ignored is .dockerignore but implementing this can speed up your builds by limiting what is sent to docker before the build starts.

The format of .dockerignore is pretty simple & is documented here but it can be pretty basic.

For example, this site is built by a docker container however we only need two directories & a single file to be passed to docker to perform the build so our .dockerignore is pretty basic:

1*
2!bin
3!tools
4!go.mod

Here we first exclude everything with * on line 1 then add exclusion rules (lines starting with !) to include the bin & tools directories and the go.mod file.

The speed improvement is noticeable. For this site it's currently 300MB but with those four simple lines only 207KB is now sent to docker instead of everything.

You'll note that Dockerfile isn't passed to the build context as it's not needed there. It's actually advised to never send it to the build context, something most dev's don't realise.

2 - Use Build Stages

Break up your builds into stages

One common pattern developers use for a Dockerfile is do everything at once. This might be fine for simple images but if you are compiling the application then this is bad as you can easily bloat the final image with either the source code or even the compilers needed to build but not to run.

The solution to this is to use stages. Here you break up your build into separate stages. The earlier stages do the compiling of your application whilst the last one contains your application.

The last stage always becomes the final built image.

For example:

 1FROM golang:alpine AS build
 2RUN apk add --no-cache tzdata
 3
 4WORKDIR /work
 5
 6RUN go env -w GOFLAGS=-mod=mod
 7COPY go.mod .
 8RUN go mod download
 9
10COPY src/ src/
11RUN CGO_ENABLED=0 go build -o /dest/exampleapp src/bin/main.go
12
13FROM debian:11-slim AS final
14COPY --from=build /dest/* /usr/local/bin/
15WORKDIR /work

Here we have two stages:

  • Lines 1…11 is the first stage, and it uses the golang:alpine image to compile an example application which it stores the built binary under the /dest/ directory.
  • Lines 13…15 is the second and last stage. It uses the debian:11-slim image, copies the built application from the first stage into /usr/local/bin

The final image is now a lot smaller as it doesn't have any of the compilers installed.

This is a simple example of build staging. There are other techniques you can use with this like ordering stages and intermediate stages to reduce the number of layers.

3 - Stage Ordering

Ordering stages to be more efficient

One common issue with complex Dockerfiles is that your final image might require additional packages installed. If this is a lot it can slow down your build.

A technique here is to break up your final image, create a stage early on in your Dockerfile which prepares the final image, installing any required packages, then your compilation stages.

The last stage will be based on the first stage and installs your build artefacts.

The benefit of this is that whilst you are developing your image the first stages are cached by docker, so unless something changes they don't get run again and later stages then use the cache.

For example:

 1FROM debian:11-slim AS base
 2WORKDIR /root
 3RUN apt-get update &&\
 4    apt-get install -y ca-certificates chromium nodejs npm &&\
 5    npm install npm@latest -g &&\
 6    npm install -g postcss postcss-cli &&\
 7    npm install autoprefixer &&\
 8    chmod -R +rx /root &&\
 9    rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* ~/.npmrc
10
11FROM golang:alpine AS build
12RUN apk add --no-cache tzdata
13
14WORKDIR /work
15
16RUN go env -w GOFLAGS=-mod=mod
17COPY go.mod .
18RUN go mod download
19
20COPY src/ src/
21RUN CGO_ENABLED=0 go build -o /dest/exampleapp src/bin/main.go
22
23FROM base AS final
24COPY --from=build /dest/* /usr/local/bin/
25WORKDIR /work

Here we now have three stages:

  • Lines 1…9 is the first stage based on debian:11-slim and we install chromium and nodejs.
  • Lines 11…21 is the second stage, which compiles our application using go.
  • Lines 23…25 is the third and last stage forming our final image. Here it uses the first stage as the base and just copies the built application from the second stage into /usr/local/bin

The main benefit here is that if you change something in the source, only those steps after the COPY src/ src/ line will be run, everything else is in the cache.

4 - Installing Packages

Installing Packages efficiently

One cause of bloated images is where additional packages are required to a base image. The developer will install them as they would on a real system not realising that the downloaded packages are being cached and get included in the final image.

The solution to this is pretty simple, either tell the package manager not to cache them or ensure the package installation files are removed once installation has been completed.

Alpine

When using the alpine base images, or images based on it, you can simply tell the apk command not to cache by passing the --no-cache parameter:

1FROM golang:alpine as build
2
3RUN apk add --no-cache curl git tzdata zip

Debian/Ubuntu

For images that use apt or apt-get it's a little bit more complicated. Here you need to do an update first, install and then finally remove the packages in the same command:

1FROM debian:11-slim
2RUN apt-get update &&\
3    apt-get install -y ca-certificates chromium nodejs npm &&\
4    rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

Here it runs an update first as the image would not have the current remote repository state available. Then it performs the installation of the required packages. Finally, it removes all traces of the files written by apt.

This last step is the important part here but keeping them all on the same RUN command ensures that just one layer is generated for this step.

Don't try to do too much in this step as the result will be cached so subsequent builds will use the same image until either something changes earlier in the Dockerfile. This will save a lot of repeated downloading.

See Stage Ordering for another example of this and why it's better to do package installation early on in a Dockerfile.

When using apt or apt-get in Dockerfiles I'd advice you always use apt-get as it can handle being run from a script. The apt command doesn't like running without a tty so will write a warning to the output stating so.

Also with apt-get install and related commands, always include the -y parameter so that it doesn't try to prompt asking if you want to continue. This would apply to any type script not just Dockerfile.

5 - Cleaning up the build host

Cleaning up after builds

When docker performs a built it caches each generated layer so that subsequent builds run faster as, if a layer has not changed and all earlier layers in the Dockerfile were taken from the cache then it will reuse that layer.

This is brilliant during development, but it can end up filling your disk due to old layers occupying space.

To solve this you need to periodically clean up old unused layers.

Manual process

The easiest way is to run the following commands every so often. These will remove any image not in active use by a running container.

1docker container prune -f
2docker image prune --all -f

Automatic process

The better way, more so if you run dedicated build hosts, is to set up a crontab to perform the cleanup periodically.

The following crontab is what I use:

150 * * * * docker container prune -f --filter "until=4h"
255 * * * * docker image prune --all -f --filter "until=4h"
359 * * * * docker volume prune -f

It runs every hour and removes any container or image if they are unused and are older than 4 hours old. This works well on both dedicated build servers and my local machine.