Skip to content

Containers 101

Updated: at 07:02 PM(13 min read)

Table of contents

Open Table of contents

Introduction


I’m sure all of you are well aware of Docker and how it works and allow us to design and implement applications based on microservices rather than a giant monolithic monster. Docker as a company dates from 2013, but the technology behind them is actually a native Linux technology which was implemented a few years prior to that, around 2008. Linux Containers or LXC is an OS-level virtualization which allows users to run multiple isolated Linux systems on a control host using one single Linux kernel. Before jumping to the quick tutorial, let’s dive into a little bit of theoretical background so you have a brief context of containers and which technologies they use, among other things.

Quoting the official LXC documentation:

“LXC is a userspace interface for the Linux kernel containment features. Through a powerful API and simple tools, it lets Linux users easily create and manage system or application containers.”

The Linux kernel, which is shared between the host machine and the LXC, provides a functionality known as control groups or cgroups, which allows the limitation and prioritization of resources without the need of starting a Virtual Machine (Just to be clear and point the difference between LXC’s and VM’s, LXC’s are significantly lighter weight than VM’s, mainly because the LXC kernel is shared with the host machine while the VM need a separate kernel). On the other hand, the Linux kernel also provides antoher isolation functionality known as namespaces which allows complete isolation of an application’s view of the host machine, including process trees, networking, user IDs and mounted file systems.

Docker can be introduced technically as an extension of the LXC’s capabilities. It is written in Go and is composed by a high-level API which provides lightweight virtualization. Similarly, Docker uses cgroups, namespaces and LXC’s. Docker acts as a portable container engine, packaging the application and all its dependencies in a virtual container that can run on any Linux server.

The main differences between Docker and LXC is that the former is designed to isolate one application in one container which allows users to isolate multiple applications in a server, whereas the latter is designed to isolate one OS in one container, which allows users to isolate multiple OS’s in a server.

cgroups


The main purpose of the cgroups is to allow the user to allocate resources - such as CPU time, system memory, network bandwidth or combinations of those resources - among user-defined groups of tasks (called processes) running on a system. The user can configure, monitor and limit the cgroups and the resources they provide.

The way cgroups are organized if hierarchical, like processes, and their attributes are inheritable by their child cgroups. However, there are some differences between the Linux processes model and the cgroups model. As we already know, the Linux processes model consist of a single tree of processes in which the root is the init process executed by the kernel at boot time and which starts other processes. However, many different hierarchies of cgroups can exist simultaneously on a system, and instead of a single tree, the cgroup model is one or more separate, unconnected trees of tasks.

namespaces


In Linux there are mainly 7 different namespaces, which provide isolation of an application’s view of the host machine, as it was introduced earlier. Those namespaces are:

Write a container in Go


Alright, here it comes the fun part. The previous introduction sounds a little bit too theoretical, so let’s dive into the practical section and you will see that all of this is not that complex. This kind of abstract topics require some time to digest, but having a working example can help immensely to interiorize these concepts.

First of all, this whole tutorial will be using Go. It is a statically typed, compiled high-level programming language that produces a statically linked binary that you just have to send to your server, and it works (this is one of the reasons Docker is developed in Go - it is multi-platform).

Quick disclaimer: This tutorial is not intended to be a Golang tutorial, but I’m going to make a brief introduction so everyone can understand what we are doing here. Go programs are organized into packages, which is a collection of source files in the same directory that are compiled together. The primary package that is compiled is named main and the basic Hello, world! example in Go would be something like:

package main

import (
	"fmt"
)

func main() {
	fmt.println("Hello, world!")
}

This can be run by executing go run . in the path where the file with the previous code is present.

If you want to learn more about Go, or just feel like you are not fully ready to comprehend the code presented below, you can follow A tour of Go, which offers a guided tutorial with a mix of theoretical and practical resources to get to know what Go is capable of.

Since Go is a statically typed language, we cannot write code snippets that use, for example, a function that has not been defined. So please, don’t exect the following code snippets to work if you just copy and paste them in your code editor. At the end of the post, you will have the complete version of the code.

First, we go through the go packages we will be using:

So, our code should include an import section with those packages. Most code editors can be configured to write Go and the import section is usually automatically completed when you use a function from those packages.

Secondly, you should know that when running (after compiling) a Go program (or any other process), you’ll find that there is a file in /proc/self/exe when the program/process is running. This is a special file that contains an in-memory copy of the current executable, i.e. it is a symlink that points to the executable file that started the currently running process. This can be used and referenced to call a program within itself. Does it sound like a container?

With the package exec we can execute /proc/self/exe. exec.Command() returns the Cmd struct to execute the named program with the given arguments, as seen in the parent function below:

func parent() {
	cmd := exec.Command("/proc/self/exe", append([]string{"child"}, os.Args[2:]...)...)
	cmd.Stdin = os.Stdin
	cmd.Stdout = os.Stdout
	cmd.Stderr = os.Stderr

	if err := cmd.Run(); err != nil {
		fmt.Println("ERROR", err)
		os.Exit(1)
	}
}

The key concept here is that the program re-executes itself with different arguments, creating a sort of isolation. When executed with run, it starts a new instance of itself with child, effectively creating two levels of processes. This is a basic form of isolation, a core feature of containerization. However, this code does not implement other aspects of containers like filesystem isolation, resource limiting, or complete process isolation as it is intented to emulate a simplified version of a contanier.

package main

import (
	"fmt"
	"os"
	"os/exec"
	"syscall"
)

func main() {
	switch os.Args[1] {
	case "run":
		parent()
	case "child":
		child()
	default:
		panic("IDK what to do!")
	}
}

func parent() {
	cmd := exec.Command("/proc/self/exe", append([]string{"child"}, os.Args[2:]...)...)
	cmd.Stdin = os.Stdin
	cmd.Stdout = os.Stdout
	cmd.Stderr = os.Stderr

	if err := cmd.Run(); err != nil {
		fmt.Println("[ PARENT ERROR ]:", err)
		os.Exit(1)
	}
}

func child() { 
	cmd := exec.Command(os.Args[2], os.Args[3:]...)
	cmd.Stdin = os.Stdin
	cmd.Stdout = os.Stdout
	cmd.Stderr = os.Stderr

	if err := cmd.Run(); err != nil {
		fmt.Println("[ CHILD ERROR ]:", err)
		os.Exit(1)
	}
}

As we said, this can be considered a very naive container since it lacks the implementation of key concepts that make containers be what they are. Therefore, let’s add some more funcitonality to our simple “isolator”.

Adding namespaces

As we already know, Linux namespaces are a feature of the Linux kernel that provide a form of lightweight process isolation. In order to do this, we can use cmd.SysProcAttr:

cmd.SysProcAttr = &syscall.SysProcAttr{
	Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID | syscall.CLONE_NEWNS | syscall.CLONE_NEWUSER,
}

Then, this will allow us to run our program inside the UTS, PID, MNT and USER namespaces. You can add this below the first line of the parent function, and you will find the complete version of the code at the end of the post.

Custom root FS

Right now, our container process is in isolated UTS, PID, MNT and USER namespaces but the filesystem is the same as the host, since we are inheriting it from the main (parent) namespace. Therefore, we need to choose a root FS. In order to do this, we need first a very simple function that will ensure that what we intend to do is complete with no issues, and we will call it must():

func must(err error) {
	if err != nil {
		panic(err)
	}
}

Then, in order to use a root FS, we can add the following few lines of code right to the start of the child() function:

	must(syscall.Mount("rootfs", "rootfs", "", syscall.MS_BIND, ""))
	must(os.MkdirAll("rootfs/oldrootfs", 0700))
	must(syscall.PivotRoot("rootfs", "rootfs/oldrootfs"))
	must(os.Chdir("/"))

In these lines of code, what’s happening is explained below, line by line:

Final code

And there we have it, the final version of the code with everything glued together is presented below:

package main

import (
	"fmt"
	"os"
	"os/exec"
	"syscall"
)

func main() {
	switch os.Args[1] {
	case "run":
		parent()
	case "child":
		child()
	default:
		panic("IDK what to do!")
	}
}

func parent() {
	cmd := exec.Command("/proc/self/exe", append([]string{"child"}, os.Args[2:]...)...)
	cmd.SysProcAttr = &syscall.SysProcAttr{
		Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID | syscall.CLONE_NEWNS | syscall.CLONE_NEWUSER,
	}
	cmd.Stdin = os.Stdin
	cmd.Stdout = os.Stdout
	cmd.Stderr = os.Stderr

	if err := cmd.Run(); err != nil {
		fmt.Println("[ PARENT ERROR ]:", err)
		os.Exit(1)
	}
}

func child() { 
	// root FS operations
	must(syscall.Mount("rootfs", "rootfs", "", syscall.MS_BIND, ""))
	must(os.MkdirAll("rootfs/oldrootfs", 0700))
	must(syscall.PivotRoot("rootfs", "rootfs/oldrootfs"))
	must(os.Chdir("/"))

	cmd := exec.Command(os.Args[2], os.Args[3:]...)
	cmd.Stdin = os.Stdin
	cmd.Stdout = os.Stdout
	cmd.Stderr = os.Stderr

	if err := cmd.Run(); err != nil {
		fmt.Println("[ CHILD ERROR ]:", err)
		os.Exit(1)
	}
}

func must(err error) {
	if err != nil {
		panic(err)
	}
}

And that is pretty much all for this time! We went through the theoretical background needed to fully grasp the concept of containers in Linux environments, as well as a very simple yet very useful practical example which lets you create your own container in less that 60 lines of code. I hope you enjoyed this post and learned something, and be sure to follow me on X/Twitter so you get to know when the next post will be uploaded or just provide me with your thoughts about this one.

Sources