Smaller Docker Images
If your executables are 5Mb in size, why would you push an image of >900Mb into production? Our docker images are to big - way to big: Here’s a guide to reducing the size of your docker images!
Docker images can get big - very big:
microsoft/dotnet 2.1-runtime 180MB mongo latest 394MB microsoft/mssql-server-linux latest 1.35GB confluentinc/cp-enterprise-kafka 5.1.0 619MB node latest 900MB
Many times, developers choose
Ubuntu as the base image for their containers. This is understandable since it makes sense to base your work on an OS and all the libraries that you’re already familiar with! But the size of docker images can be a huge problem at runtime when these images are started in orchestration systems like Kubernetes. These systems start multiple instances of these containers to ensure availability when nodes fail. That means, that they have to be pulled onto many nodes simultaneously which puts an immense strain on the container registry, your network, storage and deployment times.
And large containers are not only a problem on production systems but also on developer machines. Fetching multiple base images of many gigabytes in size can use up precious, high-priced SSD space. There are a lot of reasons to reduce your image size.
Our industry is quickly adopting the pattern of building small self-contained units of functionality (the purest form of which we call serverless). When employing these patterns, there is no reason to accept the fact, that our binary is a few megabytes in size, but our containers come in at hundreds of megabytes if not gigabytes in size! Why would we package a complete OS like Ubuntu or a complete toolchain like NodeJs into a container, just to use a tiny subset of what is in there? An example:
This is how most tutorials teach node applications should be shipped. It’s a really clumsy approach chosen mostly by coders without insight into the technology they are using. As can be seen using
docker images | grep node: the
node base image (FIY: It’s usage is strongly discouraged by it’s makers) is a whopping 900Mb1 in size!! This is what should be rolled into production? We have not even touched on the performance implications of compiling your node code every time the image is started!!
The good news is: there is generally no need to do this! Most languages (or more precisely their compilers) offer the possibility to compile self contained binaries that need nothing but a kernel to run. And there are three main approaches to make use of this possibility:
- Use smaller base images.
- Use the builder pattern in docker.
- Zero-Waste Images - ship only what’s necessary!
Now the first approach can be a quick win because the
node base image might be 900Mb in size, but the
node:alpine image weighs in at only 70Mb and is functionally almost identical:
This is already a massive improvement, a good quick win! But our images could still be much smaller by using the builder pattern.
The Builder Pattern
Docker has a feature which enables multi-stages builds of containers. One or more base containers serve as ephemeral containers existing only to build the resources that are needed at runtime. These resources can then be packaged into another container and the base containers will be dropped when the build of the next stage has succeeded.
For an example we’ll switch from using NodeJs as an example and look at some GO code for a change (don’t worry, it works for Node too):
In this example there are two containers, the first is the builder container (indicated conveniently by
as builder). This container is based on the
golang:1.11.5-alpine3.7 image which is a base image of
alpine and a little more than 100Mb2 in size. This container contains the whole golang toolchain like compilers, dependency management tools etc. It’s sole purpose is to fetch missing dependencies and build a self-contained runnable binary for execution. The contents of this container will not even make it into our container registry and live only while we build our code.
The second container is what will actually be running in production and is based directly on
alpine. It contains only the runnable binary, nothing else (OK - some certificates, but what would you do without them…)!
Now this reduces the size of our containers to around 10Mb, which is pretty decent considering that we have a fully managed language running on nothing but our docker hosts kernel!
This approach is very similar to the third and last one:
This is really just the conclusion of the second solution except we don’t use builder containers. Whenever we’re working with technologies like Docker, we have to focus on the things these technologies do well. The thing that docker does really well is it’s container image format. It does a decent job at running them too, but for this it employs other [much] older and established technologies that other container runtimes like RKT use as well.
Yet the image format is broadly used and quite intuitive. For this reason Docker should not be used as a build system, since that’s what applying the builder pattern actually is: A makeshift CI build system. (And not a particularly ‘good’ one…)
Most decent source control systems like GitLab, Github, Azure Devops etc. have already established, repeatable and reliable mechanisms for building code. Most developers run CI/CD builds that produce exactly the binaries running in production; Why not use those to put into our docker images?
This solution improves on the builder pattern mostly in build time and only works if the
app binary has already been prebuilt! The delay of using the builder pattern may or may not be critical to you, but allows you more headroom for things like running Test, running Code-Analysis and Release-Processes. These things are hard to get using the Builder-Pattern but are actually good pracises for any professional software engineer.
So now that you know how:
Do yourself a favor and put yor images on a diet!