Understanding the layer caching mechanism when building a Docker image for better Dockerfile writing

Understanding the layer caching mechanism when building a Docker image for better Dockerfile writing

Daily short news for you
  • Since the Lunar New Year holiday has started, I won't be posting anymore. See you all after the holiday! 😁

    » Read more
  • Continuing about jj. I'm wondering if there are any GUI software made for it yet to make it easier to use. There are already so many similar to git that I can't count them all.

    Luckily, the author has compiled them all together in Community-built tools around Jujutsu 🥳

    » Read more
  • Turso announces that they are rewriting SQLite in Rust. This adds another piece of evidence supporting the notion that Rust is "redefining" many things.

    But the deeper reason is more interesting. Why are they doing this? Everyone knows that SQLite is open source, and anyone can create a fork to modify it as they wish. Does the Turso team dislike or distrust C—the language used to build SQLite?

    Let me share a bit of a story. Turso is a provider of database server services based on SQLite; they have made some customizations to a fork of SQLite to serve their purposes, calling it libSQL. They are "generous" in allowing the community to contribute freely.

    Returning to the point that SQLite is open source but not open contribution. There is only a small group of people behind the maintenance of this source code, and they do not accept pull requests from others. This means that any changes or features are created solely by this group. It seems that SQLite is very popular, but the community cannot do what they want, which is to contribute to its development.

    We know that most open source applications usually come with a "tests" directory that contains very strict tests. This makes collaboration in development much easier. If you want to modify or add a new feature, you first need to ensure that the changes pass all the tests. Many reports suggest that SQLite does not publicly share this testing suite. This inadvertently makes it difficult for those who want to modify the source code, as they are uncertain whether their new implementation is compatible with the existing features.

    tursodatabase/limbo is the project rewriting SQLite in Rust mentioned at the beginning of this article. They claim that it is fully compatible with SQLite and completely open source. Limbo is currently in the final stages of development. Let’s wait and see what the results will be in the future. For a detailed article, visit Introducing Limbo: A complete rewrite of SQLite in Rust.

    » Read more

The Issue

Docker has recently become popular in the IT community, with more and more people using it. The frequency of Docker appearing in job descriptions has also increased. My company uses Docker, my projects use Docker, projects that I work on for others also use Docker... Docker has emerged as a very convenient "packaging" solution for the automation revolution.

However, a long-standing issue with Docker has been the time-consuming process of building Docker images. The size of a Docker "image" can sometimes reach several gigabytes, turning Docker into a "hard drive killer". Jokes have been made asking if you have enough hard drive space to use Docker for CI/CD. But in this article, I will not discuss how much disk space it consumes, but rather focus on how to reduce the speed of building Docker images.

There are several ways to speed up the build process and reduce the size of Docker images, such as installing only necessary packages, using lightweight base images (alpine), and using the fewest layers possible. They all revolve around the issue of minimizing the download of necessary files by Docker and keeping them as lightweight as possible.

Additionally, another way to speed up the process is by leveraging Docker's layer caching. So let's continue reading this article to find out more.

Utilizing the order of image layers to your advantage

A Docker image is formed by stacking layers on top of each other. Each layer represents an instruction in the image's Dockerfile. For example, consider a Dockerfile like this:

FROM ubuntu:18.04
LABEL org.opencontainers.image.authors="[email protected]"
COPY . /app
RUN make /app
RUN rm -r $HOME/.cache
CMD python /app/app.py

Each line represents a layer, and they have different sizes depending on the amount of work they do. The total size of the layers contributes to the overall size of the image. So, to understand this better, you can use the docker history <image> command to see the details of the layers that make up an image.

Here's an example of the layers and size of a redislabs/redisearch image:

$ docker history redislabs/redisearch

The result would look something like this:

docker history redislabs/redisearch

Each time you use the docker build command, Docker has to go through each layer sequentially. However, if you take advantage of layer caching, Docker only needs to rebuild the layers starting from the layer that has been changed. This means that the layers that haven't been changed will be executed almost instantly. This way, you only spend time on the first run, and subsequent runs will be much faster as Docker utilizes the cache to build the image.

Here's an example of a Dockerfile for a Node.js application:

FROM node:18-alpine

WORKDIR /app

COPY . .  

RUN npm install

Dependencies rarely change, so the chances of needing to run the npm install command are quite low. But in the given example, the COPY command clearly disrupts this and causes changes in the layer, so normally, all the layers after it would need to be executed without any buffering. It means that you would spend time and network bandwidth running the npm install command afterwards. This is truly a nightmare in this era of cut-off cables.

To make use of the layer caching mechanism, you can modify and rearrange the order of the layers as follows:

FROM node:18-alpine

WORKDIR /app

COPY package*.json ./
RUN npm ci

COPY . .  

As you can see, if the package or package-lock.json file changes, npm ci will be executed. Otherwise, the two commands above will be executed almost instantly.

Conclusion

There are several ways to speed up the build process and reduce the size of Docker images. One of them is leveraging Docker's layer caching mechanism to arrange the order of layers in a way that minimizes changes, thereby speeding up the build process of your images.

References:

Premium
Hello

The secret stack of Blog

As a developer, are you curious about the technology secrets or the technical debts of this blog? All secrets will be revealed in the article below. What are you waiting for, click now!

As a developer, are you curious about the technology secrets or the technical debts of this blog? All secrets will be revealed in the article below. What are you waiting for, click now!

View all

Subscribe to receive new article notifications

or
* The summary newsletter is sent every 1-2 weeks, cancel anytime.

Comments (0)

Leave a comment...
Scroll or click to go to the next page