mohamed alaaeldin

Posted on Feb 21

Creating a Minimal Container in Go: A Step-by-Step Guide ( part 1 )

#webdev #programming #go #linux

What is Containers any way!
Containers are lightweight, portable, and efficient, making them a popular choice for deploying and running applications. In this tutorial, we’ll guide you through the process of creating a minimal container using Go. The example code provided focuses on essential containerization concepts, including namespaces, chroot, and control groups (cgroups).

Before getting started, ensure you have the following installed:

Go programming language: Install Go
Basic understanding of Linux namespaces and control groups

introduction
So what is Linux namespaces and control groups ?

Namespaces have been part of the Linux kernel since about 2002, and over time more tooling and namespace types have been added. Real container support was added to the Linux kernel only in 2013, however. This is what made namespaces really useful and brought them to the masses.

But what are namespaces exactly? Here’s a wordy definition from Wikipedia:

“Namespaces are a feature of the Linux kernel that partitions kernel resources such that one set of processes sees one set of resources while another set of processes sees a different set of resources.”

In other words, the key feature of namespaces is that they isolate processes from each other. On a server where you are running many different services, isolating each service and its associated processes from other services means that there is a smaller blast radius for changes, as well as a smaller footprint for security‑related concerns. Mostly though, isolating services meets the architectural style of microservices as described by Martin Fowler.
Types of Namespaces

Within the Linux kernel, there are different types of namespaces. Each namespace has its own unique properties:

A user namespace has its own set of user IDs and group IDs for assignment to processes. In particular, this means that a process can have root privilege within its user namespace without having it in other user namespaces.
A process ID (PID) namespace assigns a set of PIDs to processes that are independent from the set of PIDs in other namespaces. The first process created in a new namespace has PID 1 and child processes are assigned subsequent PIDs. If a child process is created with its own PID namespace, it has PID 1 in that namespace as well as its PID in the parent process’ namespace. See below for an example.
A network namespace has an independent network stack: its own private routing table, set of IP addresses, socket listing, connection tracking table, firewall, and other network‑related resources.
A mount namespace has an independent list of mount points seen by the processes in the namespace. This means that you can mount and unmount filesystems in a mount namespace without affecting the host filesystem.
An interprocess communication (IPC) namespace has its own IPC resources, for example POSIX message queues.
A UNIX Time‑Sharing (UTS) namespace allows a single system to appear to have different host and domain names to different processes.

the container are fast isolated environment , we will focus on this part many things are involved and my main goal is to Demystifying Containers

assuming that you are on a linux machine (try Power shell Ubuntu image if you are on Windows :-)

run this command : id

host-machine $ id

uid=1000(mohamed) gid=1000(mohamed) groups=1000(mohamed) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c.1023

unshare command

Now I run the following unshare command to create a new namespace with its own user and PID namespaces. I map the root user to the new namespace (in other words, I have root privilege within the new namespace), mount a new proc filesystem, and fork my process (in this case, bash) in the newly created namespace.

unshare --user --pid --map-root-user --mount-proc --fork bash

Congratulation , you are in isolated name space and some how you are on
isolated PID in same file system and same network , your entry point /bin/bash

The ps -ef command shows there are two processes running – bash and the ps command itself – and the id command confirms that I’m root in the new namespace (which is also indicated by the changed command prompt):

root # ps -ef
UID         PID     PPID  C STIME TTY        TIME CMD
root          1        0  0 14:46 pts/0  00:00:00 bash
root         15        1  0 14:46 pts/0  00:00:00 ps -ef
root # id
uid=0(root) gid=0(root) groups=0(root) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c.1023

Namespaces and Containers

Namespaces are one of the technologies that containers are built on, used to enforce segregation of resources. We’ve shown how to create namespaces manually, but container runtimes like Docker, rkt, podman , runC , containerD , and many other container technology
one of most unique projects are https://katacontainers.io/ they claim that they are mix between container and VM’s .
What Are cgroups?

cgroups, or control groups, are a Linux kernel feature that enables the management and limitation of system resources like CPU, memory, and network bandwidth, among others. We can use cgroups to set limits on these resources and distribute them among different groups of processes.

cgroups have a hierarchical structure with root and child, each with resource limits set by controllers — for example, a CPU controller for CPU time or a memory controller for memory.

We can use cgroups for various purposes, such as controlling resource usage in a multi-tenant environment, providing Quality of Service (QoS) guarantees, and running containers.

Cgroups provide the following features:

Resource limits — You can configure a cgroup to limit how much of a particular resource (memory or CPU, for example) a process can use.
Prioritization — You can control how much of a resource (CPU, disk, or network) a process can use compared to processes in another cgroup when there is resource contention.
Accounting — Resource limits are monitored and reported at the cgroup level.
Control — You can change the status (frozen, stopped, or restarted) of all processes in a cgroup with a single command.

Creating a cgroup

The following command creates a v1 cgroup (you can tell by pathname format) called foo and sets the memory limit for it to 50,000,000 bytes (50 MB).

root # mkdir -p /sys/fs/cgroup/memory/foo
root # echo 50000000 > /sys/fs/cgroup/memory/foo/memory.limit_in_bytes

Now I can assign a process to the cgroup, thus imposing the cgroup’s memory limit on it. I’ve written a shell script called test.sh, which prints cgroup testing tool to the screen, and then waits doing nothing. For my purposes, it is a process that continues to run until I stop it.

I start test.sh in the background and its PID is reported as 2428. The script produces its output and then I assign the process to the cgroup by piping its PID into the cgroup file /sys/fs/cgroup/memory/foo/cgroup.procs.

root # ./test.sh &
[1] 2428
root # cgroup testing tool
root # echo 2428 > /sys/fs/cgroup/memory/foo/cgroup.procs

To validate that my process is in fact subject to the memory limits that I defined for cgroup foo, I run the following ps command. The -o cgroup flag displays the cgroups to which the specified process (2428) belongs. The output confirms that its memory cgroup is foo.

root # ps -o cgroup 2428
CGROUP
12:pids:/user.slice/user-0.slice/\
session-13.scope,10:devices:/user.slice,6:memory:/foo,...

By default, the operating system terminates a process when it exceeds a resource limit defined by its cgroup.

and this fair amount of information about namespace and cgroup
you can read full doc about it by Scott van Kalken of F5
at this link , also this post Demystifying Containers 101 and this one focus on Docker ecosystem “A Beginner-Friendly Introduction to Containers, VMs and Docker”
part 1 : Chroot
i will not use Namespaces , “at this part”

this may surprise however i will achieve the isolation , we will use Chroot a simple UNIX tool

chroot, short for "change root," is a Unix system call that changes the root directory of a process to a specified path, effectively creating a new root filesystem for the process and its children. This can be a powerful tool for creating isolated environments or "chroot jails."
How Chroot Works:

Setting a New Root Directory: When you execute the chroot system call or the chroot command in the shell, it changes the root directory for the process and its children. The new root directory becomes the / (root) directory for that process, isolating it from the actual root directory of the host system.
Isolation: After the chroot operation, the process and its children can only access files and directories within the new root directory. They cannot access files outside this new root, providing a level of isolation and containment.

Use Cases:

System Recovery: chroot is commonly used in system recovery scenarios. If your system becomes unbootable or experiences issues, you can boot from a live CD/USB, chroot into the broken system, and make necessary repairs without affecting the rest of the host system.
Environment Isolation: Developers and system administrators may use chroot to create isolated environments for testing or building software. This is especially common in scenarios where different versions of libraries or dependencies are required.
Security: Although chroot provides some level of isolation, it's not foolproof in terms of security. It was not designed as a security feature and should not be solely relied upon for containing malicious processes. Modern containerization technologies, like Docker, utilize more advanced mechanisms, such as Linux namespaces and cgroups, to provide stronger isolation.

Example:

Consider the following example:


mkdir mychroot
cp -r /bin /lib /lib64 /usr /mychroot
chroot /mychroot /bin/bash

In this example:

We create a directory called mychroot and copy essential binaries and libraries into it.
We use chroot to change the root directory to /mychroot.
After the chroot command, executing /bin/bash will run a Bash shell within the isolated environment.

Keep in mind that chroot by itself does not provide complete isolation; it is often used in conjunction with other tools and techniques to create more secure and robust containerized environments.
Prepare the Ubuntu Root Filesystem

now final this you will need before you start a filesystem .
we will use Docker to download Ubuntu filesystem

you will only need docker to download it , in your project root

$ docker run -d --rm --name ubuntu_fs ubuntu:20.04 sleep 1000
$ mkdir -p ./ubuntu_fs
$ docker cp ubuntu_fs:/ ./ubuntu_fs
$ docker stop ubuntu_fs

now we have ubuntu_fs inside our project , inside your main package

package main

import (
 "io/ioutil"
 "log"
 "os"
 "os/exec"
 "path/filepath"
 "strconv"
 "syscall"
 "strings"
 "fmt"
 "github.com/vishvananda/netns"

)



func main() {
 switch os.Args[1] {
 case "run":
  run(os.Args[2:]...)
 case "child":
  child(os.Args[2:]...)
 default:
  log.Fatal("Unknown command. Use run <command_name>, like `run /bin/bash` or `run echo hello`")
 }
}



func run(command ...string) {

 log.Println("Executing", command, "from run")
 cmd := exec.Command("/proc/self/exe", append([]string{"child"}, command[0:]...)...)
 cmd.Stdin = os.Stdin
 cmd.Stdout = os.Stdout
 cmd.Stderr = os.Stderr

 // Cloneflags is only available in Linux
 // CLONE_NEWUTS namespace isolates hostname
 // CLONE_NEWPID namespace isolates processes
 // CLONE_NEWNS namespace isolates mounts
 cmd.SysProcAttr = &syscall.SysProcAttr{
  Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID | syscall.CLONE_NEWNS ,
  Unshareflags: syscall.CLONE_NEWNS | syscall.CLONE_NEWNET, 
 }

 // Run child using namespaces. The command provided will be executed inside that.
  must(cmd.Run())
}




func child(command ...string) {

 // Create cgroup
 cg()





 cmd := exec.Command(command[0], command[1:]...)

 cmd.Stdin = os.Stdin
 cmd.Stdout = os.Stdout
 cmd.Stderr = os.Stderr


 must(syscall.Sethostname([]byte("container")))


 must(syscall.Chroot("./ubuntu_fs"))
 // Change directory after chroot
 must(os.Chdir("/"))
 // Mount /proc inside container so that `ps` command works
 must(syscall.Mount("proc", "proc", "proc", 0, ""))
 // Mount a temporary filesystem
 if _, err := os.Stat("mytemp"); os.IsNotExist(err) {
  must(os.Mkdir("mytemp", os.ModePerm))
 }
 must(syscall.Mount("something", "mytemp", "tmpfs", 0, ""))




 must(cmd.Run())

 // Cleanup mount
 must(syscall.Unmount("proc", 0))
 must(syscall.Unmount("mytemp", 0))
}




func cg() {
 // cgroup location in Ubuntu
 cgroups := "/sys/fs/cgroup/"

 pids := filepath.Join(cgroups, "pids")
 containers_mini := filepath.Join(pids, "containers_mini")
 os.Mkdir(containers_mini, 0755)
 // Limit to max 20 pids
 must(ioutil.WriteFile(filepath.Join(containers_mini, "pids.max"), []byte("20"), 0700))
 // Cleanup cgroup when it is not being used
 must(ioutil.WriteFile(filepath.Join(containers_mini, "notify_on_release"), []byte("1"), 0700))

 pid := strconv.Itoa(os.Getpid())
 // Apply this and any child process in this cgroup
 must(ioutil.WriteFile(filepath.Join(containers_mini, "cgroup.procs"), []byte(pid), 0700))
}

func must(err error) {
 if err != nil {
  log.Printf("Error: %v\n", err)
   panic(err)
 }
}

this code introduced by Liz Rice

https://youtu.be/Utf-A4rODH8?si=ULuzE8E5N7N17dH9

youtube.com

Understanding the Code
1. Main Function

The main function serves as the entry point of the program. It uses command-line arguments to determine whether to run a new container or act as a child process within an existing container.

func main() {
    switch os.Args[1] {
    case "run":
        run(os.Args[2:]...)
    case "child":
        child(os.Args[2:]...)
    default:
        log.Fatal("Unknown command. Use run <command_name>, like `run /bin/bash` or `run echo hello`")
    }
}

Run Function

The run function sets up the container environment and executes a specified command inside it.

func run(command ...string) { log.Println("Executing", command, "from run") cmd := exec.Command("/proc/self/exe", append([]string{"child"}, command[0:]...)...) cmd.Stdin = os.Stdin cmd.Stdout = os.Stdout cmd.Stderr = os.Stderr cmd.SysProcAttr = &syscall.SysProcAttr{ Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID | syscall.CLONE_NEWNS, Unshareflags: syscall.CLONE_NEWNS | syscall.CLONE_NEWNET, } must(cmd.Run()) }

this command cmd := exec.Command(“/proc/self/exe”, append([]string{“child”}, command[0:]…)…)
make sure that it’s append all command to same process
The Cloneflags specify the namespaces to be isolated (UTS, PID, and mount namespaces).
The Unshareflags further isolate the network namespace.
The cmd.Run() method runs the provided command within the created container.

Child Function

The child function is responsible for setting up the container filesystem and executing the specified command inside it.

func child(command ...string) {
    // ...
    cg()
    must(syscall.Sethostname([]byte("container")))
    must(syscall.Chroot("./ubuntu_fs"))
    must(os.Chdir("/"))
    must(syscall.Mount("proc", "proc", "proc", 0, ""))
    must(syscall.Mount("something", "mytemp", "tmpfs", 0, ""))
    must(cmd.Run())
    must(syscall.Unmount("proc", 0))
    must(syscall.Unmount("mytemp", 0))
}

The cg function sets up a control group (cgroup) to limit resource usage for the container.
Sethostname sets the hostname inside the container.
Chroot changes the root directory for the container.
Mount is used to mount essential filesystems like /proc and a temporary filesystem.
Finally, the command is executed within the container.

Control Groups (Cgroups)

The cg function creates and configures a cgroup for the container, limiting the number of processes.

func cg() {
 // cgroup location in Ubuntu
 cgroups := "/sys/fs/cgroup/"

 pids := filepath.Join(cgroups, "pids")
 containers_mini := filepath.Join(pids, "containers_mini")
 os.Mkdir(containers_mini, 0755)
 // Limit to max 20 pids
 must(ioutil.WriteFile(filepath.Join(containers_mini, "pids.max"), []byte("20"), 0700))
 // Cleanup cgroup when it is not being used
 must(ioutil.WriteFile(filepath.Join(containers_mini, "notify_on_release"), []byte("1"), 0700))

 pid := strconv.Itoa(os.Getpid())
 // Apply this and any child process in this cgroup
 must(ioutil.WriteFile(filepath.Join(containers_mini, "cgroup.procs"), []byte(pid), 0700))
}

Cgroups are used to control and limit resource usage for processes.
In this example, the cgroup limits the maximum number of processes to 20.

Error Handling

The must function is a simple utility function for handling errors.

func must(err error) {
    if err != nil {
        log.Printf("Error: %v\n", err)
        panic(err)
    }
}

If an error occurs, it is logged, and the program is terminated.
Building and Running the Container

To run the minimal container, follow these steps:

Build the executable: go build -o mycontainer main.go
Create a filesystem directory with an Ubuntu root filesystem, e.g., ubuntu_fs.
Run the container: sudo ./mycontainer run /bin/bash
remember you need to run it as sudo
your entry point is /bin/bash

now you are in your own minimal container , and now you have a deep understanding , may be if i have more time in the future i will add isolation layer on network , our you can do it , thank you for your time i hopped it helped anyone .

read this will help you more

namespace & golang a series of article explains namespace with go examples

“Creating Network Stacks and Connecting with the Internet” by “Shrikanta Mazumder”

https://songrgg.github.io/programming/linux-namespace-part01-uts-pid/

on next part we will create a network layer that give our container a virtual Ethernet in isolated subset that use host bridge as gateway . see you soon

part 2

you can find me on LinkedIn
https://www.linkedin.com/in/mohamed-elkerwash/

DEV Community

Creating a Minimal Container in Go: A Step-by-Step Guide ( part 1 )

Creating a cgroup

Top comments (0)

Read next

Email Management with .NET 9 and C# using MailKit

Telegram Mini Apps Creation Handbook

🚀 Master Flutter CI/CD: Automate App Deployment with GitHub Actions

The RAIL Model: Making Web Apps Feel Lightning Fast