DEV Community

Cover image for Beyond Basics: Building a More Powerful Container in Go — Network Isolation & Advanced Features
mohamed alaaeldin
mohamed alaaeldin

Posted on

Beyond Basics: Building a More Powerful Container in Go — Network Isolation & Advanced Features

Containers Uncovered: More Than Just Lightweight Virtual Machines!”

If you’re like me — always wondering how things work and eager to build them with your own mind and hands — you’re in the right place!
In the first part of this article (Part 1), I attempted to build a minimal container system using only Go, relying on Linux’s unshare and namespaces. It was purely a demonstration, and I wasn’t aiming to develop a fully functional container runtime tool. I intentionally left out many critical aspects, such as security, networking, and image management.
I initially thought it would be simple, but I quickly realized that even a basic container system involves thousands of concepts and implementations. However, my passion for understanding and building things kept me going.
Now, after a year since my first article on Building a Minimal Container in Go, I’ve realized that both my code and my original article need a fresh perspective. So, it’s time for a revisit!

System Architect

Core Components

  1. User CLI

    Responsibilities:
    Parse user commands (run, exec, ps, rm)
    Communicate with daemon via RPC or any other way
    Format and display output

Key Features:

Command completion
Output formatting (JSON/YAML)
Log streaming
Enter fullscreen mode Exit fullscreen mode
  1. Container Daemon

Responsibilities:

Manage container lifecycle
Maintain container state database
Coordinate between components
Enter fullscreen mode Exit fullscreen mode

Key Features:

REST/gRPC API
Event logging
Resource tracking
Enter fullscreen mode Exit fullscreen mode
  1. Container Runtime

Components:

Namespace Manager: CLONE_NEW* flags handling and more flags in real world .
Cgroups Manager: Resource constraints
Filesystem Setup: RootFS preparation
Enter fullscreen mode Exit fullscreen mode

Features:

OCI runtime spec compliance
User namespace remapping
Seccomp/AppArmor profiles
Enter fullscreen mode Exit fullscreen mode
  1. Image Service

Components:

Registry Client: Docker Hub integration or you own images services if you will go wiled
Layer Manager: OverlayFS/BTRFS
Snapshotter: Copy-on-write layers
Enter fullscreen mode Exit fullscreen mode

Features:

Image caching
Signature verification
Garbage collection
Enter fullscreen mode Exit fullscreen mode
  1. Network Manager

Components:

CNI Plugins: Bridge, MACVLAN, IPVLAN
IPAM: DHCP/Static allocation
Service Mesh: DNS, service discovery
Enter fullscreen mode Exit fullscreen mode

Features:

Multi-host networking
Network policies
Port mapping
Enter fullscreen mode Exit fullscreen mode
  1. Storage Driver

Components:

Volume Manager: Bind mounts
Snapshot Manager: Incremental backups
Quota Enforcer: Disk limits
Enter fullscreen mode Exit fullscreen mode

Features:

Persistent storage
Temporary filesystems
Encryption support
Enter fullscreen mode Exit fullscreen mode

this schema will give you a bigger picture


                           +---------------------+
                           |      User CLI       |
                           | (run, exec, ps, rm) |
                           +----------+----------+
                                      |
                                      | (gRPC/HTTP)
                                      v
                           +---------------------+
                           |   Container Daemon  |
                           | (State Management)  |
                           +----------+----------+
                                      |
                   +------------------+------------------+
                   |                  |                  |
         +----------+----------+ +-----+--------+ +-------+---------+
         |   Container Runtime | | Image Service| | Network Manager |
         | (namespace/cgroups) | | (OCI Images)  | | (CNI Plugins)   |
         +----------+----------+ +-----+--------+ +-------+---------+
                   |                  |                  |
         +---------v---------+ +------v-------+ +--------v---------+
         | Linux Kernel       | | Storage Driver| | Host Networking |
         | - namespaces       | | (OverlayFS)   | | (iptables/bridge)|
         | - cgroups v2       | +---------------+ +------------------+
         | - capabilities     |
         +--------------------+
Enter fullscreen mode Exit fullscreen mode

It has been a long journey for me to learn and think through every component. I encountered many challenges, especially with aspects like OverlayFS and networking.

My biggest issue in my first implementation was networking. It was really difficult to isolate the child container and set up its own bridged network.

To solve network isolation, you need to think clearly 🤔 at this stage.

First, you need to create a bridge on the host with two virtual interfaces:

The first interface remains on the host.
The second interface is moved to the child container 🫙.
Enter fullscreen mode Exit fullscreen mode

The real challenge here is managing command signaling between the host and the child container.

In my approach, I will attempt to create a proof of concept implementation.
Understanding Container Networking

When we create containers, one of the most crucial aspects is network isolation. Think of it like giving each container its own private network environment, complete with its own network interfaces, IP addresses, and routing rules. Let’s break down how we achieve this in our container implementation.
The Network Setup Process

  1. Creating the Network Namespace

First, we create a separate network namespace for our container. This is like giving the container its own private networking room:

const ContainerName = "mycontainer"

func createNetworkNamespace(name string) error {
    // Create directory for network namespaces
    if err := os.MkdirAll("/var/run/netns", 0755); err != nil {
        return err
    }

    // Create the namespace file
    nsFile := filepath.Join("/var/run/netns", name)
    fd, err := os.Create(nsFile)
    if err != nil {
        return err
    }
    fd.Close()

    // Bind mount it to make it accessible
    return syscall.Mount("/proc/self/ns/net", nsFile, "bind", syscall.MS_BIND, "")
}
Enter fullscreen mode Exit fullscreen mode
  1. Setting Up Virtual Network Interfaces

We create a virtual network cable (veth pair) to connect our container to the host system:

const (
    VethHost      = "veth0"  // Host end of the cable
    VethContainer = "veth1"  // Container end of the cable
    ContainerIP   = "10.0.0.2/24"
    HostIP        = "10.0.0.1/24"
    Gateway       = "10.0.0.1"
)
Enter fullscreen mode Exit fullscreen mode

The setup happens in two parts:
1-On the host side:

func setupHostNetwork(pid int) error {
    // Create the virtual network cable (veth pair)
    if err := exec.Command("ip", "link", "add", VethHost, "type", "veth", 
        "peer", "name", VethContainer).Run(); err != nil {
        return fmt.Errorf("failed to create veth pair: %v", err)
    }

    // Move one end to the container
    if err := exec.Command("ip", "link", "set", VethContainer, 
        "netns", fmt.Sprintf("%d", pid)).Run(); err != nil {
        return fmt.Errorf("failed to move veth to container: %v", err)
    }

    // Configure the host end
    if err := exec.Command("ip", "link", "set", VethHost, "up").Run(); err != nil {
        return fmt.Errorf("failed to bring up host interface: %v", err)
    }
    if err := exec.Command("ip", "addr", "add", HostIP, "dev", VethHost).Run(); err != nil {
        return fmt.Errorf("failed to assign IP to host interface: %v", err)
    }
}
Enter fullscreen mode Exit fullscreen mode

2 — Inside the container:

func setupContainerNetwork() error {
    // Enable the loopback interface
    if err := exec.Command("ip", "link", "set", "lo", "up").Run(); err != nil {
        return fmt.Errorf("failed to bring up lo: %v", err)
    }

    // Configure the container's network interface
    if err := exec.Command("ip", "link", "set", VethContainer, "up").Run(); err != nil {
        return fmt.Errorf("failed to bring up veth: %v", err)
    }
    if err := exec.Command("ip", "addr", "add", ContainerIP, 
        "dev", VethContainer).Run(); err != nil {
        return fmt.Errorf("failed to assign IP to veth: %v", err)
    }
    if err := exec.Command("ip", "route", "add", "default", 
        "via", Gateway).Run(); err != nil {
        return fmt.Errorf("failed to add default route: %v", err)
    }
}
Enter fullscreen mode Exit fullscreen mode

Internet Connectivity

To allow our container to access the internet, we need to set up NAT (Network Address Translation) rules. This is like setting up a router for our container:

func setupHostNetwork(pid int) error {
    // Get the host's internet-connected interface
    iface, err := getDefaultInterface()
    if err != nil {
        return fmt.Errorf("failed to get default interface: %v", err)
    }

    // Set up NAT and forwarding rules
    cmds := [][]string{
        {"sysctl", "-w", "net.ipv4.ip_forward=1"},
        {"iptables", "-t", "nat", "-A", "POSTROUTING", 
            "-s", "10.0.0.0/24", "-o", iface, "-j", "MASQUERADE"},
        {"iptables", "-A", "FORWARD", "-i", iface, 
            "-o", VethHost, "-j", "ACCEPT"},
        {"iptables", "-A", "FORWARD", "-i", VethHost, 
            "-o", iface, "-j", "ACCEPT"},
    }

    for _, cmd := range cmds {
        if out, err := exec.Command(cmd[0], cmd[1:]...).CombinedOutput(); err != nil {
            return fmt.Errorf("failed %v: %s\n%s", cmd, err, out)
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

finally , Resource Cleanup

One often overlooked but crucial aspect is cleaning up network resources when the container stops. Our implementation handles this through a ResourceManager:


type ResourceManager struct {
    containerName string
    vethHost      string
    mounts        []string
    namespaces    []string
}

func (rm *ResourceManager) cleanupNetwork() error {
    // Clean up iptables rules
    if err := rm.cleanupIptablesRules(); err != nil {
        log.Printf("Warning: iptables cleanup failed: %v", err)
    }

    // Remove the virtual network interface
    if out, err := exec.Command("ip", "link", "delete", 
        rm.vethHost).CombinedOutput(); err != nil {
        log.Printf("Warning: failed to delete veth interface: %v (%s)", err, out)
    }

    return nil
}
Enter fullscreen mode Exit fullscreen mode

How It All Works Together

When starting a container:

Create a new network namespace
Create virtual network interfaces (veth pair)
Configure IP addresses and routing
Set up NAT for internet access
Mount necessary filesystems and set up devices
Enter fullscreen mode Exit fullscreen mode

2 .During container runtime:

Container uses its virtual network interface for all network communication
Outgoing traffic goes through NAT to reach the internet
Incoming traffic is routed back to the container
Enter fullscreen mode Exit fullscreen mode

3 . When stopping a container:

Clean up iptables rules
Remove virtual interfaces
Unmount network namespace
Remove namespace files
Enter fullscreen mode Exit fullscreen mode

Common Issues and Debugging

When implementing container networking, you might encounter these common issues:

DNS Resolution Problems

Our implementation includes DNS setup:
Enter fullscreen mode Exit fullscreen mode
// in most cases this will case error , still trying to solve it 
func setupDNS() error {
    resolvHost := "/etc/resolv.conf"
    resolvContainer := filepath.Join(RootFS, "etc/resolv.conf")
    return syscall.Mount(resolvHost, resolvContainer, "bind", 
        syscall.MS_BIND|syscall.MS_RDONLY, "")
}
Enter fullscreen mode Exit fullscreen mode

2.Network Interface Issues

Always check interface status with ip link show
Verify IP assignments with ip addr show
Check routing with ip route show
Enter fullscreen mode Exit fullscreen mode

3.Connection Problems

Verify iptables rules are correctly set
Check IP forwarding is enabled
Ensure the host interface is up and working
Enter fullscreen mode Exit fullscreen mode

Security Considerations

Our implementation includes several security features:

Network Isolation

Each container gets its own network namespace
Network traffic is isolated between containers
Enter fullscreen mode Exit fullscreen mode

2.Resource Cleanup

Proper cleanup of network resources prevents resource leaks
Automatic cleanup on container exit
Enter fullscreen mode Exit fullscreen mode

This networking implementation provides a solid foundation for container isolation while maintaining internet connectivity. While it’s simpler than production container runtimes, it demonstrates the core concepts of container networking.

this was the hard part for me and i have tryed so many implemention to achive that . you have to keep in main what and where your command executted . some times you find your self trying to create vath’s in continer or you cannot connect or move the continer interface from host to chiled

you have to read my previeus articl to know what we are doing i had clean up my code and add every thing agine to test network isolation

do not forget to change RootFS to your root fs like “ubuntu or whatever image you will run”

package main

import (
 "fmt"
 "log"
 "os"
 "os/exec"
 "path/filepath"
 "strings"
 "syscall"
 "os/signal" 
)

const (
 ContainerName = "mycontainer"
 VethHost      = "veth0"
 VethContainer = "veth1"
 ContainerIP   = "10.0.0.2/24"
 HostIP        = "10.0.0.1/24"
 Gateway       = "10.0.0.1"
 RootFS        = "/mnt/drive/go-projects/lc-images-regs/ubuntu_fs"
)



type ResourceManager struct {
    containerName string
    vethHost      string
    mounts        []string
    namespaces    []string
}
func NewResourceManager(containerName string) *ResourceManager {
    return &ResourceManager{
        containerName: containerName,
        vethHost:     VethHost,
        mounts: []string{
            "/proc",
            "/dev/pts",
            "/dev",
        },
        namespaces: []string{
            "net",
            "uts",
            "pid",
            "ipc",
        },
    }
}

func (rm *ResourceManager) Setup() {
    // Set up signal handling
    sigChan := make(chan os.Signal, 1)
    signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)

    go func() {
        sig := <-sigChan
        log.Printf("Received signal %v, cleaning up...", sig)
        rm.Cleanup()
        os.Exit(1)
    }()
}

func (rm *ResourceManager) Cleanup() error {
    var errors []string

    // 1. Clean up network resources
    if err := rm.cleanupNetwork(); err != nil {
        errors = append(errors, fmt.Sprintf("network cleanup error: %v", err))
    }

    // 2. Clean up mounts
    if err := rm.cleanupMounts(); err != nil {
        errors = append(errors, fmt.Sprintf("mount cleanup error: %v", err))
    }

    // 3. Clean up namespaces
    if err := rm.cleanupNamespaces(); err != nil {
        errors = append(errors, fmt.Sprintf("namespace cleanup error: %v", err))
    }

    if len(errors) > 0 {
        return fmt.Errorf("cleanup errors: %s", strings.Join(errors, "; "))
    }
    return nil
}

func (rm *ResourceManager) cleanupNetwork() error {
    // Clean up iptables rules first
    if err := rm.cleanupIptablesRules(); err != nil {
        log.Printf("Warning: iptables cleanup failed: %v", err)
    }

    // Clean up veth interfaces
    if out, err := exec.Command("ip", "link", "delete", rm.vethHost).CombinedOutput(); err != nil {
        log.Printf("Warning: failed to delete veth interface: %v (%s)", err, out)
    }

    return nil
}

func (rm *ResourceManager) cleanupIptablesRules() error {
    iface, err := getDefaultInterface()
    if err != nil {
        return fmt.Errorf("failed to get default interface: %v", err)
    }

    rules := [][]string{
        {"iptables", "-D", "FORWARD", "-i", iface, "-o", rm.vethHost, "-j", "ACCEPT"},
        {"iptables", "-D", "FORWARD", "-i", rm.vethHost, "-o", iface, "-j", "ACCEPT"},
        {"iptables", "-t", "nat", "-D", "POSTROUTING", "-s", "10.0.0.0/24", "-o", iface, "-j", "MASQUERADE"},
    }

    for _, rule := range rules {
        if out, err := exec.Command(rule[0], rule[1:]...).CombinedOutput(); err != nil {
            log.Printf("Warning: failed to remove iptables rule: %v (%s)", err, out)
        }
    }

    return nil
}

func (rm *ResourceManager) cleanupMounts() error {
    for _, mount := range rm.mounts {
        mountPath := filepath.Join(RootFS, mount)
        if err := syscall.Unmount(mountPath, syscall.MNT_DETACH); err != nil {
            log.Printf("Warning: failed to unmount %s: %v", mountPath, err)
        }
    }
    return nil
}

func (rm *ResourceManager) cleanupNamespaces() error {
    for _, ns := range rm.namespaces {
        nsPath := filepath.Join("/var/run/netns", rm.containerName)
        if err := syscall.Unmount(nsPath, syscall.MNT_DETACH); err != nil {
            log.Printf("Warning: failed to unmount namespace %s: %v", ns, err)
        }
        if err := os.Remove(nsPath); err != nil {
            log.Printf("Warning: failed to remove namespace file %s: %v", nsPath, err)
        }
    }
    return nil
}

func cleanupExistingResources() error {
 // Cleanup network namespace
 if _, err := os.Stat("/var/run/netns/" + ContainerName); err == nil {
  if err := cleanupNetworkNamespace(ContainerName); err != nil {
   return fmt.Errorf("failed to cleanup existing network namespace: %v", err)
  }
 }

 // Cleanup veth interfaces
 if _, err := exec.Command("ip", "link", "show", VethHost).CombinedOutput(); err == nil {
  if err := exec.Command("ip", "link", "delete", VethHost).Run(); err != nil {
   return fmt.Errorf("failed to delete existing veth interface: %v", err)
  }
 }

 // Cleanup iptables rules
 if err := cleanupIptablesRules(); err != nil {
  return fmt.Errorf("failed to cleanup iptables rules: %v", err)
 }

 return nil
}

func cleanupIptablesRules() error {
 iface, err := getDefaultInterface()
 if err != nil {
  return fmt.Errorf("failed to get default interface: %v", err)
 }

 cmds := [][]string{
  {"iptables", "-D", "FORWARD", "-i", iface, "-o", VethHost, "-j", "ACCEPT"},
  {"iptables", "-D", "FORWARD", "-i", VethHost, "-o", iface, "-j", "ACCEPT"},
  {"iptables", "-t", "nat", "-D", "POSTROUTING", "-s", "10.0.0.0/24", "-o", iface, "-j", "MASQUERADE"},
 }

 for _, cmd := range cmds {
  // Ignore errors since rules might not exist
  exec.Command(cmd[0], cmd[1:]...).Run()
 }

 return nil
}
func getDefaultInterface() (string, error) {
 out, err := exec.Command("ip", "route", "show", "default").CombinedOutput()
 if err != nil {
  return "", err
 }

 fields := strings.Fields(string(out))
 for i, field := range fields {
  if field == "dev" && i+1 < len(fields) {
   return fields[i+1], nil
  }
 }

 return "", fmt.Errorf("no default interface found")
}

func main() {
 if len(os.Args) < 2 {
  log.Fatal("Usage: [run|child] command [args...]")
 }

 switch os.Args[1] {
 case "run":
  run()
 case "child":
  child()
 default:
  log.Fatalf("unknown command: %s", os.Args[1])
 }
}
func setupCgroups(ContainerName string , pid int, cpuShares, memoryLimitMB int) error {
    cgroupBase := "/sys/fs/cgroup"
    containerID := ContainerName // fmt.Sprintf("container_%d", pid)

    // Create CPU cgroup
    cpuPath := filepath.Join(cgroupBase, "cpu", containerID)
    os.MkdirAll(cpuPath, 0755)
    os.WriteFile(filepath.Join(cpuPath, "cpu.shares"), []byte(fmt.Sprintf("%d", cpuShares)), 0644)
    os.WriteFile(filepath.Join(cpuPath, "tasks"), []byte(fmt.Sprintf("%d", pid)), 0644)

    // Create memory cgroup
    memoryPath := filepath.Join(cgroupBase, "memory", containerID)
    os.MkdirAll(memoryPath, 0755)
    os.WriteFile(filepath.Join(memoryPath, "memory.limit_in_bytes"), []byte(fmt.Sprintf("%d", memoryLimitMB*1024*1024)), 0644)
    os.WriteFile(filepath.Join(memoryPath, "tasks"), []byte(fmt.Sprintf("%d", pid)), 0644)


 uidMap := fmt.Sprintf("0 %d 1", os.Getuid())
 gidMap := fmt.Sprintf("0 %d 1", os.Getgid())

 os.WriteFile(fmt.Sprintf("/proc/%d/uid_map", pid), []byte(uidMap), 0644)
 os.WriteFile(fmt.Sprintf("/proc/%d/gid_map", pid), []byte(gidMap), 0644)
    return nil
}
func run() {
 rm := NewResourceManager(ContainerName)
    rm.Setup()
    defer rm.Cleanup()

 if err := cleanupExistingResources(); err != nil {
  log.Printf("Cleanup warning: %v", err)
 }

 // Create network namespace
 if err := createNetworkNamespace(ContainerName); err != nil {
  log.Fatalf("Failed to create network namespace: %v", err)
 }

 // Start container process
 cmd := exec.Command("/proc/self/exe", append([]string{"child"}, os.Args[2:]...)...)
 cmd.Stdin = os.Stdin
 cmd.Stdout = os.Stdout
 cmd.Stderr = os.Stderr
 cmd.SysProcAttr = &syscall.SysProcAttr{
  Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID | syscall.CLONE_NEWNS | syscall.CLONE_NEWNET |

  syscall.CLONE_NEWIPC ,


 }

 if err := cmd.Start(); err != nil {
  log.Fatalf("Failed to start container: %v", err)
 }
 pid := cmd.Process.Pid
 if err :=setupCgroups(ContainerName , pid , 512 , 256  ); err != nil { // Example: 512 CPU shares, 256 MB memory limit
  log.Fatalf("Failed to setup cgroups: %v", err)
 }
 // Configure host-side networking
 if err := setupHostNetwork(cmd.Process.Pid); err != nil {
  log.Fatalf("Failed to setup host network: %v", err)
 }

 // Wait for container to exit
 if err := cmd.Wait(); err != nil {
  log.Fatalf("Container failed: %v", err)
 }

 // Cleanup
 if err := cleanupNetworkNamespace(ContainerName); err != nil {
  log.Printf("Failed to cleanup network namespace: %v", err)
 }
}

func child() {
 // Setup container environment
 if err := setupContainer(); err != nil {
  log.Fatalf("Failed to setup container: %v", err)
 }

 // Execute command
 if len(os.Args) < 3 {
  log.Fatal("No command specified")
 }
 cmd := exec.Command(os.Args[2], os.Args[3:]...)
 cmd.Stdin = os.Stdin
 cmd.Stdout = os.Stdout
 cmd.Stderr = os.Stderr
 if err := cmd.Run(); err != nil {
  log.Fatalf("Command failed: %v", err)
 }
}

func createNetworkNamespace(name string) error {
 // Create bind mount for ip netns compatibility
 if err := os.MkdirAll("/var/run/netns", 0755); err != nil {
  return err
 }

 // Create namespace file
 nsFile := filepath.Join("/var/run/netns", name)
 fd, err := os.Create(nsFile)
 if err != nil {
  return err
 }
 fd.Close()

 // Bind mount
 return syscall.Mount("/proc/self/ns/net", nsFile, "bind", syscall.MS_BIND, "")
}

func cleanupNetworkNamespace(name string) error {
 nsFile := filepath.Join("/var/run/netns", name)
 if err := syscall.Unmount(nsFile, 0); err != nil {
  return fmt.Errorf("failed to unmount network namespace: %v", err)
 }
 // Remove the file to complete cleanup.
 if err := os.Remove(nsFile); err != nil {
  return fmt.Errorf("failed to remove namespace file %s: %v", nsFile, err)
 }
 return nil
}


func setupHostNetwork(pid int) error {
 // Get host's default interface
 iface, err := getDefaultInterface()
 if err != nil {
  return fmt.Errorf("failed to get default interface: %v", err)
 }

 // Create veth pair
 if err := exec.Command("ip", "link", "add", VethHost, "type", "veth", "peer", "name", VethContainer).Run(); err != nil {
  return fmt.Errorf("failed to create veth pair: %v", err)
 }

 // Move container end to namespace
 if err := exec.Command("ip", "link", "set", VethContainer, "netns", fmt.Sprintf("%d", pid)).Run(); err != nil {
  return fmt.Errorf("failed to move veth to container: %v", err)
 }

 // Configure host interface
 if err := exec.Command("ip", "link", "set", VethHost, "up").Run(); err != nil {
  return fmt.Errorf("failed to bring up host interface: %v", err)
 }
 if err := exec.Command("ip", "addr", "add", HostIP, "dev", VethHost).Run(); err != nil {
  return fmt.Errorf("failed to assign IP to host interface: %v", err)
 }

 cmds := [][]string{

  {"sysctl", "-w", "net.ipv4.ip_forward=1"},
  {"iptables", "-t", "nat", "-A", "POSTROUTING", "-s", "10.0.0.0/24", "-o", iface, "-j", "MASQUERADE"},
  {"iptables", "-A", "FORWARD", "-i", iface, "-o", VethHost, "-j", "ACCEPT"},
  {"iptables", "-A", "FORWARD", "-i", VethHost, "-o", iface, "-j", "ACCEPT"},
 }

 for _, cmd := range cmds {
  if out, err := exec.Command(cmd[0], cmd[1:]...).CombinedOutput(); err != nil {
   return fmt.Errorf("failed %v: %s\n%s", cmd, err, out)
  }
 }

 return nil
}

func setupContainer() error {
 // Setup root filesystem
 if err := syscall.Chroot(RootFS); err != nil {
  return fmt.Errorf("chroot failed: %v", err)
 }
 if err := os.Chdir("/"); err != nil {
  return fmt.Errorf("chdir failed: %v", err)
 }

 // Mount proc
 if err := syscall.Mount("proc", "/proc", "proc", 0, ""); err != nil {
  return fmt.Errorf("failed to mount proc: %v", err)
 }

 // Setup devices
 if err := setupDevices(); err != nil {
  return fmt.Errorf("failed to setup devices: %v", err)
 }

 // Configure network
 if err := setupContainerNetwork(); err != nil {
  return fmt.Errorf("failed to setup network: %v", err)
 }

 //if err := setupDNS(); err != nil {
 // return fmt.Errorf("DNS setup failed: %v", err)
 //}

 return nil
}

func setupDNS() error {
 // Copy host's resolv.conf
 resolvHost := "/etc/resolv.conf"
 resolvContainer := filepath.Join(RootFS, "etc/resolv.conf")

 // Create container's /etc if missing
 if err := os.MkdirAll(filepath.Join(RootFS, "etc"), 0755); err != nil {
  return err
 }

 // Bind mount host's resolv.conf
 return syscall.Mount(resolvHost, resolvContainer, "bind", syscall.MS_BIND|syscall.MS_RDONLY, "")
}

func setupDevices() error {
 // Mount tmpfs for /dev
 if err := syscall.Mount("tmpfs", "/dev", "tmpfs", 0, "size=64k,mode=755"); err != nil {
  return err
 }

 // Create /dev/pts directory if missing
 devPts := "/dev/pts"
 if err := os.MkdirAll(devPts, 0755); err != nil {
  return fmt.Errorf("mkdir %s failed: %v", devPts, err)
 }

 // Mount devpts on /dev/pts for pty support
 if err := syscall.Mount("devpts", devPts, "devpts", 0, "mode=0620,ptmxmode=0666"); err != nil {
  return fmt.Errorf("failed to mount devpts on %s: %v", devPts, err)
 }
 // Create basic devices
 devices := []struct {
  name  string
  major uint32
  minor uint32
 }{
  {"null", 1, 3},
  {"zero", 1, 5},
  {"random", 1, 8},
  {"urandom", 1, 9},
 }

 for _, dev := range devices {
  path := filepath.Join("/dev", dev.name)
  if err := syscall.Mknod(path, syscall.S_IFCHR|0666, int(makedev(dev.major, dev.minor))); err != nil {
   return err
  }
 }

 return nil
}

func makedev(major, minor uint32) uint64 {
 return (uint64(major) << 8) | uint64(minor)
}

func setupContainerNetwork() error {
 // Bring up loopback
 if err := exec.Command("ip", "link", "set", "lo", "up").Run(); err != nil {
  return fmt.Errorf("failed to bring up lo: %v", err)
 }

 // Configure veth interface
 if err := exec.Command("ip", "link", "set", VethContainer, "up").Run(); err != nil {
  return fmt.Errorf("failed to bring up veth: %v", err)
 }
 if err := exec.Command("ip", "addr", "add", ContainerIP, "dev", VethContainer).Run(); err != nil {
  return fmt.Errorf("failed to assign IP to veth: %v", err)
 }
 if err := exec.Command("ip", "route", "add", "default", "via", Gateway).Run(); err != nil {
  return fmt.Errorf("failed to add default route: %v", err)
 }

 return nil
}
Enter fullscreen mode Exit fullscreen mode

Important point: You must mount and create essential virtual devices and establish communication (such as pipes or signals) between the host and child container .

func setupDevices() error {
 // Mount tmpfs for /dev
 if err := syscall.Mount("tmpfs", "/dev", "tmpfs", 0, "size=64k,mode=755"); err != nil {
  return err
 }

 // Create /dev/pts directory if missing
 devPts := "/dev/pts"
 if err := os.MkdirAll(devPts, 0755); err != nil {
  return fmt.Errorf("mkdir %s failed: %v", devPts, err)
 }

 // Mount devpts on /dev/pts for pty support
 if err := syscall.Mount("devpts", devPts, "devpts", 0, "mode=0620,ptmxmode=0666"); err != nil {
  return fmt.Errorf("failed to mount devpts on %s: %v", devPts, err)
 }
 // Create basic devices
 devices := []struct {
  name  string
  major uint32
  minor uint32
 }{
  {"null", 1, 3},
  {"zero", 1, 5},
  {"random", 1, 8},
  {"urandom", 1, 9},
 }

 for _, dev := range devices {
  path := filepath.Join("/dev", dev.name)
  if err := syscall.Mknod(path, syscall.S_IFCHR|0666, int(makedev(dev.major, dev.minor))); err != nil {
   return err
  }
 }

 return nil
}
Enter fullscreen mode Exit fullscreen mode
func NewResourceManager(containerName string) *ResourceManager {
    return &ResourceManager{
        containerName: containerName,
        vethHost:     VethHost,
        mounts: []string{
            "/proc",
            "/dev/pts",
            "/dev",
        },
        namespaces: []string{
            "net",
            "uts",
            "pid",
            "ipc",
        },
    }
}

func (rm *ResourceManager) Setup() {
    // Set up signal handling
    sigChan := make(chan os.Signal, 1)
    signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)

    go func() {
        sig := <-sigChan
        log.Printf("Received signal %v, cleaning up...", sig)
        rm.Cleanup()
        os.Exit(1)
    }()
}
Enter fullscreen mode Exit fullscreen mode

Now you have a broad overview, but you still have a long journey ahead to achieve what production-ready container runtime systems offer.

If you need system file images to test your code, you can use Docker to download one.

$ docker run -d --rm --name ubuntu_fs ubuntu:20.04 sleep 1000
$ mkdir -p ./ubuntu_fs
$ docker cp ubuntu_fs:/ ./ubuntu_fs
$ docker stop ubuntu_fs
Enter fullscreen mode Exit fullscreen mode

Or use tool like debootstrap

sudo apt-get update
sudo apt-get install debootstrap
sudo mkdir -p /path/to/rootfs
sudo debootstrap stable /path/to/rootfs http://deb.debian.org/debian
Enter fullscreen mode Exit fullscreen mode

Sometimes, while testing, you may need to install software in your container image from the host if your child container struggles to access the internet.

sudo chroot /path/to/rootfs /bin/sh -c "apk add --no-cache iproute2"
Enter fullscreen mode Exit fullscreen mode

or in ubuntu_fs

sudo chroot /mnt/drive/go-projects/lc-images-regs/ubuntu_fs /bin/sh -c "apt-get update && apt-get install -y iproute2"
Enter fullscreen mode Exit fullscreen mode

Note: Sometimes, when you try to start the container by running the following command to start Bash as the entry point, you may encounter a bug:

sudo go run Main.go run sudo /bin/bash
Enter fullscreen mode Exit fullscreen mode

you will face this bug

2025/02/21 00:26:28 Failed to setup container: failed to setup network: failed to bring up veth: exit status 1
2025/02/21 01:26:28 Failed to setup host network: failed to assign IP to host interface: exit status 1
exit status 1


Enter fullscreen mode Exit fullscreen mode

This happens due to resource cleanup errors. You can either ignore it and retry the command up to three times or fix the issue.

You still need to implement DNS to align with the original system design. What we built is just a proof of concept application.

My next step is to ensure resource limitations and create an image composer like Docker while utilizing OverlayFS. Until then, if you need any help, feel free to DM me.

this is discord channel for this topic only join me :

https://discord.gg/GX4JuVtD

Happy coding, everybody!

Top comments (0)