Overview

Container technology is a popular packaging method for developers and system administrators to build, ship and run distributed applications. Production use of image-based container technology requires a disciplined approach to development. This document provides guidance and recommendations for creating and managing image to control application lifecycle.

In Application Planning we discuss how to deconstruct applications into microservices, common types of images and how planning must consider the target deployment platforms.

Creating Images discusses the details about how to work with Dockerfiles, best practices, tips and tricks, and tools available to developers.

The Build section discusses the importance of automation in building, testing and maintaining images. We discuss ideal workflows, how to plan a test environment, types of testing and image certification.

Finally, Delivery covers how to get images and updates to the end-users, whether that’s inside an enterprise or public infrastructure. A key consideration is access control.

Application Planning

Careful planning is important when working with container technology.

Application Classes

There are many different kinds of applications that may be candidates for porting to container technology.

System Services

System services are a special kind of application. These are drivers or system agents that extend the functionality of the host system. They are typically single-container images. They are typically run using automation during a start-up script like cloud-init or a configuration management system. System service containers require special runtime configuration to enable the appropriate level of privilege to modify the host system. They are commonly referred as Super Privileged Containers (SPCs). They utilize the benefits of container packaging but not separation.

Client Tools

Client Tools applications are another special kind of application. These are used by end-users who do not wish to install a client using traditional packaging such as RPM. Container technology enables an end-user to add a client to their workstation without modifying the operating system.

There are an endless number of potential clients for this class of applications. A few examples include remote clients such as OpenStack or Amazon Web Services (AWS) client tools. An extension of the model is vagrant-like development environment where a tool set for a given project is packaged. For example, Red Hat’s rhel-tools image is a toolkit for interacting with an immutable Atomic host system. It includes git, strace and sosreport, for example.

Two important architectural decisions should be considered for client container images.

  1. How does the end-user interact with the container? Will they use them like traditional command line interface (CLI) tools? Or will they enter the container environment as a shell, perform some commands, then exit. The entrypoint command chosen for the container will determine the default behavior.

  2. Will end-users will need access to files on the host system? If so, will the default behavior bindmount the current working directory or a user-specified directory?

Service Components

Service components are applications that an application developer integrates with. Databases are common examples. A database is typically a component of a larger application.

The challenge of porting service components to container technology is optimizing integration. Will the application developer be able to configure the service to their needs? Will the application developer have sufficient documentation to install, configure and secure the service being integrated?

Microservice Applications

The microservice architecture is particularly well-suited to container technology. Martin Fowler describes microservice applications as "suites of independently deployable services."[1] Well-designed microservice applications have a clean separation of code, configuration and data.

Planning is especially critical for microservice applications because they are especially challenging to port to container technology. The planning phase must include experts who understand the application’s architecture and service topology, performance characteristics, configuration and dependencies such as networking and storage. While many applications can be ported to container technology without modification there are sometimes optimizations that should be made regarding configuration, storage, installation or service architecture.

Deconstructing Microservice Applications

The process of deconstructing applications varies widely depending on the complexity and architecture of the application. Consider the following steps as a guide to a generalized process.

  1. Identify the components that will be broken down into microservices. These typically map to a container images.

  2. Identify how the services will communicate. How are REST or message bus interfaces authenticated?

  3. Identify how data will be accessed by the services. Which services need read/write access to the storage?

  4. Create a service topology drawing to represent the components, lines of communication and storage. This will guide the development work, configuration discussions, debugging and potentially become part of the end-user documentation.

  5. Identify how the services will be configured, which services need to share configuration and how these services might be deployed in a highly available configuration.

Deployment Platform Considerations

Preparing applications for production distribution and deployment must carefully consider the supported deployment platforms. Production services require high uptime, injection of private or sensitive data, storage integration and configuration control. The deployment platform determines methods for load balancing, scheduling and upgrading. A platform that does not provide these services requires additional work when developing the container packaging.

Creating Images

Dockerfiles

Location

Upstream Dockerfiles should be hosted in a public GIT repository, for example GitHub. Ideally, the repository should be created under the organization relevant to a particular project. For example, Software Collections Dockerfiles are available under the GitHub sclorg organization.

Images

Upstream Docker images, such as CentOS and Fedora base images and layered images based on these, should be publicly available on Docker Hub.

For details on using the Docker Hub registry, see Docker User Guide.

Content

Docker is a platform that enables applications to be quickly assembled from components. When creating Docker images, think about the added value you can provide potential users with. The intention should always be bringing some added functionality on top of plain package installation.

As an example, take this Word Press Dockerfile. After running the image and linking it with a database image such as mysql, you will get a fully operational Word Press instance. In addition, you can also specify an external database.

This exactly is the purpose of using Docker images; instead of laborious installation and configuration of separate components, you simply pull an image from a registry, acquiring a set of tools ready to be used right out-of-the-box.

Enabling Necessary Repositories

TBD

Clearing the yum Caches

To keep images as small as possible, it is beneficial to clear the temporary yum cache files after installing or updating packages. To remove unnecessary cache files, use the yum clean command. The command is described in detail in the yum(8) man page.

Cache needs to be cleaned out of every layer that updates or installs software. That means if you perform multiple install operations within one layer, you need to run yum clean at the end of it. For example:

RUN yum install -y epel-release && \
    rpmkeys --import file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7 && \
    yum install -y --setopt=tsflags=nodocs bind-utils gettext iproute\
    v8314 mongodb24-mongodb mongodb24 && \
    yum clean all

On the other hand, if you perform multiple install/update operations on multiple layers, yum clean must be run on each layer. For example:

RUN yum install -y epel-release && \
    rpmkeys --import file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7 && \
    yum clean all

RUN yum install -y --setopt=tsflags=nodocs bind-utils gettext iproute\
    v8314 mongodb24-mongodb mongodb24 && \
    yum clean all

Installing Documentation

Because you want to keep the image as small as possible, you can avoid installing documentation files along with installing or updating software by specifying the nodocs flag. For example:

RUN yum install -y mysql --setopt=tsflags=nodocs

Updating Software supplied by Base-Image

Avoid updating software supplied by base-image unless necessary. Base images themselves are meant to be updated on a regular basis by the supplier and provide software that has been tested for a particular environment.

Also, updating base-image software in layered images can introduce unexpected problems or bring in unwanted dependencies and in certain cases significantly expand the image size.

In other words, avoid using instructions similar to this one:

RUN yum -y update

Users

TBD

Working Directory

TBD

Exposing Ports

The EXPOSE instruction declares the ports on which a container will listen for incoming connections. You should specify ports your application commonly uses; for example, as seen in this mysql example:

EXPOSE 3306
Important
The TCP/IP port numbers below 1024 are special in that normal users are not allowed to bind on them.

Therefore, for example for Apache server, ports 8080 or 8433 (HTTP or HTTPS) should be exposed. Otherwise, only the root user will be allowed to run Apache server inside a container.

Logging

TBD

…​

Dockerfile Instructions

This chapter provides a list of Docker Instructions with a short explanation and preferred usage.

General Usage

MAINTAINER

Use the MAINTAINER instruction to set the Author field of the generated images. As most projects are maintained by more than one person, it is preferable to use a universal contact, such as mailing list address, bug tracking mechanism URL or URL of a project rather than a real person’s name. Generic contact ensures consistency, allows for addressing problems in an appropriate manner and does not discourage potential contributors from collaborating by being too specific.

LABEL

Supported in Docker 1.6 and later, LABEL is meant to store metadata about images and containers in key-value pairs. LABELs should provide additional information about images and containers, indexing, searching and should be used to annotate the Docker images and containers. LABEL can also be used to provide useful information to projects that catalogize or utilize Docker images, such as Satellite, OpenShift or Atomic.

Note
Do not get LABEL mistaken with ENV — some projects, for example OpenShift, might use ENV to provide metadata temporarily, until LABEL is not fully supported across operating systems.

The following snippet shows usage of LABEL:

LABEL MYSQL_VERSION           5.5
LABEL IMAGE_DESCRIPTION       MySQL 5.5
LABEL IMAGE_TAGS              mysql,mysql55
LABEL IMAGE_EXPOSE_SERVICES   3306:mysql
Mandatory LABELs

The following LABELs should always be part of your Dockerfile. TBD

ENTRYPOINT vs CMD

ENTRYPOINT defines the default binary with which the Docker container will start. In other words, it makes the container behave like a binary. The default ENTRYPOINT for Docker is /bin/sh -c. Consider the following example:

docker run -i -t fedora /bin/bash

Here /bin/bash is passed as argument to the ENTRYPOINT which is /bin/sh -c. Docker also provides a way to override the entrypoint by providing the --entrypoint flag.

docker run --entrypoint /bin/cat -i -t fedora /etc/redhat-release

In the example above, the default ENTRYPOINT is overridden by the flag and /etc/redhat-release is passed as a parameter to /bin/cat.

CMD is used to execute the supplied command as a parameter to the ENTRYPOINT. It is advisable to use CMD unless you are absolutely sure about changing the ENTRYPOINT since all the execution will run as the parameter to the ENTRYPOINT. Using ENTRYPOINT can easily confuse the other users who are not familiar with the image and can make debugging or even obtaining a shell difficult since everything will be a parameter to the ENTRYPOINT

CMD Example:

CMD ["python","myscript.py"]

ENTRYPOINT Example:

ENTRYPOINT ["/usr/bin/python"]

systemd

tbd

non-systemd

tbd

Layering

This chapter provides guidelines on creating layers.

Minimizing the Number of Layers

In general, having fewer layers improves readability. Commands that are chained together become a part of the same layer. To reduce the number of layers, chain commands together. Find a balance, though, between a large number of layers (and a great many commands), and a small number of layers (and obscurity caused by brevity).

A new layer is created for every new instruction defined. This does not necessarily mean that one instruction should be associated with only one command or definition.

Ensure transparency and provide a good overview of the content of each layer by grouping related operations together so that they together constitute a single layer. Consider this snippet from the OpenShift Python 3.3 Dockerfile:

RUN yum install -y \
    https://www.softwarecollections.org/en/scls/rhscl/python33/epel-7-x86_64/download/rhscl-python33-epel-7-x86_64.noarch.rpm && \
    yum install -y --setopt=tsflags=nodocs --enablerepo=centosplus \
    python33 python33-python-devel python33-python-setuptools \
    epel-release && \
    yum install -y --setopt=tsflags=nodocs install nss_wrapper && \
    yum clean all -y && \
    scl enable python33 "easy_install pip" && \
    chown -R default:default /opt/openshift && \
    chmod -R og+rwx /opt/openshift

Each command that is related to the installation and configuration of sti-python is grouped together as a part of the same layer. This meaningful grouping of operations keeps the number of layers low while keeping the easy legibility of the layers high.

Squashing Layers

tbd

References

Please see the following resources for more information on the Docker container technology and project-specific guidelines.

Docker Documentation — Detailed information about the Docker platform.

OpenShift Guidelines — Guidelines for creating images specific to the OpenShift project.

Building Applications

Building a single Docker image once is a simple matter.

sudo docker build -t <registry_URL>/some/image .

This will build the image which could then be pushed to a registry location. Done. However, this immutable image will need to be updated. And this image depends on other images which will be updated, which means this image will need to be rebuilt. If this image is part of a microservice application it is just one of several images that work together as integrated services that comprise an application. Do you really want a developer to build production services from their laptop?

Serious work with container technology should automate builds. While there are some unique challenges specific to container automation, generally following continuous integration and delivery best practices is recommended.

Build Environment

A build environment should have the following characteristics

  • limits direct access to the build environment

  • limits access to configure and trigger builds

  • limits access to build sources

  • limits access to base images, those images referenced in the FROM line of a Dockerfile

  • provides access to build logs

  • provides some type of a pipeline or workflow, integrating with external services to trigger builds, report results, etc.

  • provides a way to test built images

  • provides a way to reproduce builds

  • provides a secure registry to store builds

  • provides a mechanism to promote tested builds

  • shares the same kernel as the target production runtime environment

A build environment that meets these requirements is difficult to create from scratch. An automation engine like Jenkins is essential to managing a complex pipeline. While a virtual machine-based solution could be created, it is recommended that a dedicated, purpose-built platform such as OpenShift be used.

Triggering Builds

Build triggers are events that start a build. They are critical integration points so they must be flexible to handle the different system and event types. Rudimentary build systems focus on Dockerfile source repository changes. However after initial development is complete a typical Dockerfile repo changes infrequently.

So what other events should trigger builds? Use the following list as a guide to evaluating build triggers for a given project.

  • a change in the base image, or FROM image in a Dockerfile

  • a security vulnerability identified by a scan using a Common Vulnerabilities and Exposure (CVE)-backed system such as Red Hat’s errata system

  • a source code change

  • a software package update

  • an event from an external system (REST API call)

  • an event from an external system (message bus)

To capture these triggers a system like Jenkins is recommended. It provides the plugin architecture to handle job chaining and external events required to build a pipeline workflow.

In the following section we evaluate each of the trigger events in more depth.

Base Image Change

A base image change may occur in two different ways.

  1. If the FROM line in the Dockerfile references the :latest image, explicitly or implicitly, the base image may be rebuilt without a tag change. In this scenario the triggering event is the image ID change in the registry.

  2. If the FROM line in the Dockerfile references a specific tagged release of an image the Dockerfile must be changed to reference the new release image. In this scenario the triggering event is a change to the Dockerfile source repository.

To accommodate the first scenario, OpenShift implemented a concept called Image Streams that can watch for changes of an image ID for a particular tag.

To accommodate the second scenario, a standard source control change trigger is sufficient.

An alternative approach is to dynamically manage the Dockerfile to point to a specific image ID using the Atomic Reactor project has a plugin to accomplish this to ensure the build is controlled and reproducible. Docker image may be referenced by their has value in the form of REGISTRY/REPO/IMAGE@HASH_VALUE, for example, registry.example.com:5000/cool/app@sha256:4d3a646b58685449179a0c61ad4baa19a8df8ba668e0f0704b9ad16f5e16e642

Vulnerability Scan

Container images are treated as immutable images. Updating a package or source code should be done by rebuilding the image. Image scanning should be performed on a timed basis against a CVE database. If a vulnerability is found, a build should be initiated.

Scanning may be performed on images in a registry and running containers. Tools are emerging to perform scanning.

Source Code Repository Change

There are frequently more than one source repository involved in


1. Martin Fowler, http://martinfowler.com/articles/microservices.html