Why You Should Use Multi-Stage Docker Builds in Production

This is a cross-posting from https://itnext.io/using-multi-stage-docker-builds-for-speed-and-security-9d3a1cd9cd8c

It’s not too often that speed and security combine forces, but when they do, it’s a surefire way to know that a pattern is worth adopting. It’s like a car part that looks good and makes your car faster — how can you go wrong? In this post (with copious amounts of examples), I’m going to show you what multi-stage Docker builds are and how they are faster and more secure!

For the source code in this article, please refer to this GitHub repository.

How Dockerfiles Work

Docker containers are usually built with a Dockerfile, a set of instructions that help you package your source code, install dependencies, and build your application (if it compiles a binary). However, a lot of times the things you need to build your application aren’t the things you need to run your application. Let’s consider a standard Node Dockerfile from the Nodejs website.

// Node Sample Dockerfile - Single Stage
FROM node:12ADD . /app
WORKDIR /app
RUN npm install
EXPOSE 8080
CMD [ "node", "server.js" ]

You’ll notice that before we start working with directories and copying files into the image, we start with FROM node:12 . You see, Dockerfiles are like a giant onion and the first FROM is the core of your onion. It gives you the binaries and Linux file structure that you need to keep adding more layers which will eventually be your final application. However, what’s inside the core? Let’s run a bash shell inside of the node:12 image to find out!

A Look Inside Our First Layer

$ docker run -it node:12 /bin/bash
# uname -a
Linux c415d0a3fb27 4.19.76-linuxkit #1 SMP Tue May 26 11:42:35 UTC

// Ok so we're running Linux

# ls
bin  boot  dev etc  home  lib lib64  media  mnt  opt proc  root  run  sbin  srv  sys  tmp  usr  var
// Standard file system in Linux

# ls /bin
ps  su rm kill ping sh sed stty chmod chown chgrp bash date pwd  ls which mv

# ls /usr/local/bin
docker-entrypoint.sh  node  nodejs  npm  npx  yarn  yarnpkg

As you can see, node:12 is a great place to start building our application. It’s got everything we need to create a file system, change ownership of directories, and of course contains npm so we can install our node_modules dependencies. But why do we need rm , kill , mv , or ping ? Even more so, our application calls node to run server.js , why do we still need our package manager npm , which can install anything under the sun?

The answer is simple — having all of these tools and the feel of a full Linux operating system is great for getting started — but it’s also insecure to have all of those binaries and lugging such a large file around is slow. From here we can do one of two things (1) rip everything out by deleting everything we don’t need or (2 ) copy only the things we need and move those into a second, fresh stage. Enter… multi-stage Dockerfiles!

Multi-Stage Dockerfiles

In the last example we saw how convenient it was to have a huge set of tools when we were building and installing our application and its dependencies, but we knew we didn’t want to have all of that bloat when the final container gets delivered to our source destination (most likely, Kubernetes). Below is a multi-stage Dockerfile:

// Node Sample Dockerfile - Multi-stage

FROM node:12 AS stage1
ADD . /app
WORKDIR /app
RUN npm install

#Second Stage us

FROM gcr.io/distroless/nodejs
COPY --from=stage1 /app /app
WORKDIR /app
EXPOSE 8080
CMD ["server.js"]

In the first stage (stage1) we are using the node:12 image to start. This gives us a great base to build our application. However after we have copied our code to /app and ran npm install , we move onto a second stage (the second FROM) and pull the Node Distroless image (gcr.io/distroless/nodejs). Distroless is an amazing minimal docker image — here’s a great description from their GitHub:

"Distroless" images contain only your application and its runtime dependencies. They do not contain package managers, shells or any other programs you would expect to find in a standard Linux distribution.

After we pull the Distroless image, we COPY from our first stage the contents in /app which should be our node_modules as well as our source code server.js . The important part here to note is that the new image doesn’t contain bash or any other tools with which you might want to exec . Whilst inconvenient for debugging, the attack surface has been dramatically reduced — which is perfect for a production deployment of your container. From the outside, the application is doing the same thing as before (in this example, serving up a webserver on port 8080).

Debugging the Container

To avoid the inconvenience of not being able to debug your container, Distroless provides an alternative :debug tag of their images which contains shell access (via BusyBox shell). If your application is running into troubles, you should keep a debug version of your Dockerfile that you can deploy in case you need to kubectl exec or docker exec into your container. Here’s an example of that below for reference (also in the GitHub):

// Node Sample Dockerfile - Multi-stage with Shell for Debugging

FROM node:12 AS stage1
ADD . /app
WORKDIR /app
RUN npm install

#Second Stage us

FROM gcr.io/distroless/nodejs:debug
COPY --from=stage1 /app /app
WORKDIR /app
EXPOSE 8080
CMD ["server.js"]

What About Size?

Alright, up until this point we’ve just gone over the technical difference between the two Dockerfiles from a security lens. But as promised in the title, what about the size? Since multi-stage cuts out all of the unnecessary clutter by copying only the things we need, let’s inspect the size of the respective images to verify that claim:

Our multi-stage is 92% smaller in size!

Wow! Our new image is 74.3MB compared to 921MB. Having a smaller images size speeds up all of our build and deployment steps, and if you’re in Kubernetes, smaller images are one of the quickest ways to speed up performance all around.

Summary

As you’ve seen, a multi-stage Dockerfile separates your build into two parts — (1) the setup of your application code (such as dependencies) and (2) the setup of your application runtime . Dockerfiles are extremely easy to use out of the box, but when when we are optimizing for security and speed, multi-stage builds are what is needed to deploy production-ready applications. As well, we used the Distroless images to make sure our final images contained that which we need to run our application and only that! Please reach out to me at bryant.hagadorn@gmail.com if you wish to discuss further and follow me on Medium if you like reading about these types of things! Thank you.

References

https://itnext.io/using-multi-stage-docker-builds-for-speed-and-security-9d3a1cd9cd8c

https://github.com/docker-slim/examples/tree/master/3rdparty/node12_express_official