- Red Hat Container Tools
In this chapter we're going to cover a plethora of container tools available in Red Hat Enterprise Linux (RHEL), including Podman, Buildah, Skopeo, CRIU and Udica. Before we get into the specific tools, it's important to understand how these are tools are provided to the end user in the Red Hat ecosystem.
The RHEL kernel, systemd, and the container tools, centered around containers and CRI-O, serve as the foundation for both RHEL Server as well as RHEL CoreOS. RHEL Server is a flexible, general purpose operating system which can be customized for many different use cases. On the other hand, RHEL CoreOS is a minimal, purpose built operating system intended to be consumed within automated environments like OpenShift. This lab will specifically cover the tools available in RHEL Server, but much of what you learn is useful with RHEL CoreOS which is built from the same bits, but packaged specifically for OpenShift and Edge use cases.
Here's a quick overview of how to think about RHEL Server versus RHEL CoreOS:
- General Purpose: User -> Podman -> RHEL Server
- OpenShift: User -> Kubernetes API -> Kubelet -> CRI-O -> RHEL CoreOS
In a RHEL Server environment, the end user will create containers directly on the container host with Podman. In an OpenShift environment, the end user will create containers through the Kubernetes API - users generally do not interact directly with CRI-O on individual hosts in the cluster. Stated another way, Podman is the primary container interface in RHEL, while Kubernetes is the primary interface in OpenShift.
For the rest of this lab, we will focus on the container tools provided in RHEL Server. The launch of RHEL8 introduced the concept of Application Streams, which provide users with access to the latest versions of software like Python, Ruby, and Podman. These Application Streams have different, and often shorter life cycles than RHEL (10+ years). Specifically, RHEL8 Server provides users with two types of Application Streams for Container tools:
- Fast: Rolling stream which is updated with new versions of Podman and other tools up to every 12 weeks, and only supported until the next version is released. This stream is for users looking for the latest features in Podman.
- Stable: Traditional streams released once per year, and supported for 24 months. Once released these streams do not update versions of Podman and other tools, and only provide security fixes. This stream is for users looking to put Podman into production depending on stability.
With either stream, the underlying RHEL kernel, systemd, and other packages are treated as a rolling stream. The only choice is is whether to use the fast stream or one of the stable streams. Since RHEL provides a very stable ABI/API Policy the vast majority of container users will not notice and should not be concerned with kernel, systemd, glibc, etc updates on the container host. If the users selects one of the stable streams, the API to Podman will remains stable and updated for security.
For a deeper dive, check out RHEL 8 enables containers with the tools of software craftsmanship. Now, let's move on to installing and using these different streams of software.
The goal of this lab is to introduce you to Podman and some of the features that make it interesting. If you have ever used Docker, the basics should be pretty familiar. Lets start with some simple commands.
Pull an image:
podman pull ubi8
List locally cached images:
podman images
Start a container and run bash interactively in the local terminal. When ready, exit:
podman run -it ubi8 bash
exit
List running containers:
podman ps -a
Now, let's move on to some features that differentiates Podman from Docker. Specifically, let's cover the two most popular reasons - Podman runs without a daemon (daemonless) and without root (rootless). Podman is an interactive command more like bash, and like bash it can be run as a regular user (aka. rootless).
Now, fire up a simple container in the background:
podman run -id ubi8 bash
Now, lets analyze a couple of interesting things that makes Podman different than Docker - it doesn't use a client server model, which is useful for wiring it into CI/CD systems, and other schedulers like Yarn:
Inspect the process tree on the system:
pstree -Slnc
You should see something similar to:
└─conmon─┬─{conmon}
└─bash(ipc,mnt,net,pid,uts)
There's no Podman process, which might be confusing. Let's explain this a bit. What many people don't know is that containers disconnect from Podman after they are started. Podman keeps track of metadata in ~/.local/share/containers
(/var/lib/containers
is only used for containers started by root) which tracks which containers are created, running, and stopped (killed). The metadata that Podman tracks is what enables a podman ps
command to work.
In the case of Podman, containers disconnect from their parent processes so that they don't die when Podman exits. In the case of Docker and CRI-O which are daemons, containers disconnect from the parent process so that they don't die when the daemon is restarted. For Podman and CRI-O, there is utility which runs before runc called conmon (Container Monitor). The conmon utility disconnects the container from the engine by doing forking twice (called a double fork). That means, the execution chain looks something like this with Podman:
bash -> podman -> conmon -> conmon -> runc -> bash
Or like this with CRI-O:
systemd -> crio -> conmon -> conmon -> runc -> bash
Or like this with Docker engine:
systemd -> dockerd -> containerd -> docker-shim -> runc -> bash
Conmon is a very small C program that monitors the standard in, standard error, and standard out of the containerized process. The conmon utility and docker-shim both serve the same purpose. When the first conmon finishes calling the second, it exits. This disconnects the second conmon and all of its child processes from the container engine. The second conmon then inherits init system (systemd) as its new parent process. This daemonless and simplified model which Podman uses can be quite useful when wiring it into other larger systems, like CI/CD, scripts, etc.
Podman doesn't require a daemon and it doesn't require root. These two features really set Podman apart from Docker. Even when you use the Docker CLI as a user, it connects to a daemon running as root, so the user always has the ability escalate a process to root and do whatever they want on the system. Worse, it bypasses sudo rules so it's not easy to track down who did it.
Now, let's move on to some other really interesting features. Rootless containers use a kernel feature called User Namespaces. This maps the one or more user IDs in the container to one or more user IDs outside of the container. This includes the root user ID in the container as well as any others which might be used by programs like nginx or Apache.
Podman makes it super easy to see this mapping. Start an nginx container to see the user and group mapping in action:
podman run -id registry.access.redhat.com/rhscl/nginx-114-rhel7 nginx -g 'daemon off;'
Now, execute the Podman bash command:
podman top -l args huser hgroup hpid user group pid seccomp label
Notice that the host user, group and process ID in the container all map to different and real IDs on the host system. The container thinks that nginx is running as the user default
and the group root
but really it's running as an arbitrary user and group. This user and group are selected from a range configured for the student
user account on this system. This list can easily be inspected with the following commands:
cat /etc/subuid
You will see something similar to this:
student:165536:65536
The first number represents the starting user ID, and the second number represents the number of user IDs which can be used from the starting number. So, in this example, our student
user can use 65,535 user IDs starting with user ID 165536. The Podman bash command should show you that nginx is running in this range of UIDs.
The user ID mappings on your system might be different because shadow utilities (useradd, usderdel, usermod, groupadd, etc) automatically creates these mappings when a user is added. As a side note, if you've updated from an older version of RHEL, you might need to add entries to /etc/subuid and /etc/subgid manually.
OK, now stop all of the running containers. No more one liners like with Docker, it's just built in with Podman:
podman kill --all
Remove all of the actively defined containers. It should be noted that this might be described as deleting the copy-on-write layer, config.json (commonly referred to as the Config Bundle) as well as any state data (whether the container is defined, running, etc):
podman rm --all
We can even delete all of the locally cached images with a single command:
podman rmi --all
The above commands show how easy and elegant Podman is to use. Podman is like a Chef's knife. It can be used for pretty much anything that you used Docker for, but let's move on to Builah and show some advanced use cases when building container images.
Now let's introduce you to Buildah and the flexibility it provides when you need to build container images your way. There are a lot of different use cases that just "feel natural" when building container images, but you often can't quite wire together and elegant solutions with the client server model of existing container engines. Enter Buildah. To get started, lets introduce some basic decisions you need to think through when building a new container image.
-
Image vs. Scratch: Do you want to start with an existing container image as the source for your new container image, or would you prefer to build completely from scratch? Source images are the most common route, but it can be nice to build from scratch if you have small, statically linked binaries.
-
Inside vs. Outside: Do you want to execute the commands to build the next container image layer inside the container, or would you prefer to use the tools on the host to build the image? This is completely new concept with Buildah, but with existing container engines, you always build from within the container. Building outside the container image can be useful when you want to build a smaller container image, or an image that will always be ran read only, and never built upon. Things like Java would normally be built in the container because they typically need a JVM running, but installing RPMs might happen from outside because you don't want the RPM database in the container.
-
External vs. Internal Data: Do you have everything you need to build the image from within the image? Or, do you need to access cached data outside of the build process? For example, It might be convenient to mount a large cached RPM cache inside the container during build, but you would never want to carry that around in the production image. The use cases for build time mounts range from SSH keys to Java build artifacts - for more ideas, see this GitHub issue.
Alright, let's walk through some common scenarios with Buildah.
Just like Podman, Buildah can execute in rootless mode, but since you have tools on the container host interacting files in the container image, you need to make Buildah think it's running as root. Buildah comes with a cool sub-command called unshare which does just this. It puts our shell into a user namespace just like when you have a root shell in a container. The difference is, this shell has access to tools installed on the container host, instead of in the container image. Before we complete the rest of this lab, execute the "buildah unshare" command. Think of this as making yourself root, without actually making yourself root:
sudo dnf install buildah
buildah unshare
Now, look at who your shell thinks you are:
whoami
It's looks like you are root, but you really aren't, but let's prove it:
touch /etc/shadow
The touch command fails because you're not actually root. Really, the touch command executed as an arbitrary user ID in your /etc/subuid range. Let that sink in. Linux containers are mind bending. OK, let's do something useful.
First declare what image you want to start with as a source. In this case, we will start with Red Hat Universal Base Image:
buildah from ubi8
This will create a "reference" to what Buildah calls a "working container" - think of them as a starting point to attach mounts and commands. Check it out here:
buildah containers
Now, we can mount the image source. In effect, this will trigger the graph driver to do its magic, pull the image layers together, add a working copy-on-write layer, and mount it so that we can access it just like any directory on the system:
buildah mount ubi8-working-container
Now, lets add a single file to the new container image layer. The Buildah mount command can be ran again to get access to the right directory:
echo "hello world" > $(buildah mount ubi8-working-container)/etc/hello.conf
Lets analyze what we just did. It's super simple, but kind of mind bending if you come from using other container engines. First, list the directory in the copy-on-write layer:
ls -alh $(buildah mount ubi8-working-container)/etc/
You should see hello.conf right there. Now, cat the file:
cat $(buildah mount ubi8-working-container)/etc/hello.conf
You should see the text you expect. Now, lets commit this copy-on-write layer as a new image layer:
buildah commit ubi8-working-container ubi8-hello
Now, we can see the new image layer in our local cache. We can view it with either Podman or Buildah (or CRI-O for that matter, they all use the same image store):
buildah images
podman images
When we are done, we can clean up our environment quite nicely. The following command will delete references to "working containers" and completely remove their mounts:
buildah delete -a
But, we still have the new image layer just how we want it. This could be pushed to a registry server to be shared with others if we like:
buildah images
podman images
Create a new working container, mount the image, and get a working copy-on-write layer:
WORKING_MOUNT=$(buildah mount $(buildah from scratch))
echo $WORKING_MOUNT
Verify that there is nothing in the directory:
ls -alh $WORKING_MOUNT
Now, lets install some basic tools (don't worry about the entitlement errors):
dnf install --installroot $WORKING_MOUNT bash coreutils --releasever 8 --setopt install_weak_deps=false -y
dnf clean all -y --installroot $WORKING_MOUNT --releasever 8
Verify that some files have been added:
ls -alh $WORKING_MOUNT
Now, commit the copy-on-write layer as a new container image layer:
buildah commit working-container minimal
Now, test the new image layer, by creating a container:
podman run -it minimal bash
exit
Clean things up for our next experiment:
buildah delete -a
We have just created a container image layer from scratch without ever installing RPM or DNF. This same pattern can be used to solve countless problems. Makefiles often have the option of specifying the output directory, etc. This can be used to build a C program without ever installing the C toolchain in a container image layer. This is best for production security where we don't want the build tools laying around in the container.
As a final example, lets use a build time mount to show how we can pull data in. This will represent some sort of cached data that we are using outside of the container. This could be a repository of Ansible Playbooks, or even Database test data:
mkdir ~/data
dd if=/dev/zero of=~/data/test.bin bs=1MB count=100
ls -alh ~/data/test.bin
Now, lets fire up a working container:
buildah from ubi8
buildah mount ubi8-working-container
To consume the data within the container, we use the buildah-run subcommand. Notice that it takes the -v option just like "run" in Podman. We also use the Z option to relabel the data for SELinux. The dd command simply represents consuming some smaller portion of the data during the build process:
buildah run -v ~/data:/data:Z ubi8-working-container dd if=/data/test.bin of=/etc/small-test.bin bs=100 count=2
Commit the new image layer and clean things up:
buildah commit ubi8-working-container ubi8-data
buildah delete -a
Test it and note that we only kept the pieces of the data that we wanted. This is just an example, but imagine using this with a Makefile cache, or Ansible playbooks, or even a copy of production database data which needs to be used to test the image build or do a schema upgrade, which must be accessed during the image build process. There are tons of places where you need to access data, only at build time, but don't want it during production deployment:
podman run -it ubi8-data ls -alh /etc/small-test.bin
Exit the user namespace:
exit
Now, you have a pretty good understanding of the cases where Buildah really shines. You can start from scratch, or use an existing image, use tools installed on the container host (not in the container image), and move data around as needed. This is a very flexible tool that should fit quite nicely in your tool belt. Buildah lets you script builds with any language you want, and build tiny images with only the bare minimum of utilities needed inside the image.
Now, lets move on to sharing containers with Skopeo...
In this step, we are going to do a couple of simple exercises with Skopeo to give you a feel for what it can do. Skopeo doesn't need to interact with the local container storage (.local/share/containers), it can move directly between registries, between container engine storage, or even directories.
First, lets start with the use case that kicked off the Skopeo project. Sometimes, it's really convenient to inspect an image remotely before pulling it down to the local cache. This allows us to inspect the metadata of the image and see if we really want to use it, without synchronizing it to the local image cache:
sudo dnf install skopeo
skopeo inspect docker://registry.fedoraproject.org/fedora
We can easily see the "Architecture" and "Os" metadata which tells us a lot about the image. We can also see the labels, which are consumed by most container engines, and passed to the runtime to be constructed as environment variables. By comparison, here's how to see this metadata in a running container:
podman run --name metadata-container -id registry.fedoraproject.org/fedora bash
podman inspect metadata-container
Like, Podman, Skopeo can be used to pull images down into the local container storage:
skopeo copy docker://registry.fedoraproject.org/fedora containers-storage:fedora
But, it can also be used to pull them into a local directory:
skopeo copy docker://registry.fedoraproject.org/fedora dir:$HOME/fedora-skopeo
This has the advantage of not being mapped into our container storage. This can be convenient for security analysis:
ls -alh ~/fedora-skopeo
The Config and Image Layers are there, but remember we need to rely on a Graph Driver in a Container Engine to map them into a RootFS.
Finally, lets copy from one registry to another. I have set up a writeable repository under my username (fatherlinux) on quay.io. To do this, you have to use the credentials provided below. Notice, that we use the "--dest-creds" option to authenticate. We can also use the "--source-cred" option to pull from a registry which requires authentication. This tool is very flexible. Designed by engineers, for engineers.
skopeo copy docker://registry.fedoraproject.org/fedora docker://quay.io/fatherlinux/fedora --dest-creds fatherlinux+fedora:5R4YX2LHHVB682OX232TMFSBGFT350IV70SBLDKU46LAFIY6HEGN4OYGJ2SCD4HI
This command just synchronized the fedora repository from the Fedora Registry to Quay.io without ever caching it in the local container storage. Very cool right?
You have a new tool in your tool belt for sharing and moving containers. Hopefully, you find other uses for Skopeo.
Congratulations! You just completed the second module of today's workshop. Please continue to Developing on OpenShift.