DEV Community

Cover image for Complete Guide: Cilium L2 Announcements for LoadBalancer Services in Bare-Metal Kubernetes
Mikhail [azalio] Petrov
Mikhail [azalio] Petrov

Posted on

Complete Guide: Cilium L2 Announcements for LoadBalancer Services in Bare-Metal Kubernetes

Table of Contents

  1. Introduction
  2. Environment Setup
  3. Nginx Deployment
  4. Ingress Configuration
  5. LoadBalancer IP Pool Configuration
  6. ARP Issue
  7. Enabling L2 Announcements
  8. How It Works
  9. Packet Path
  10. Additional Resources

Introduction: Why L2 Announcements in Kubernetes?

Target Audience

This material is intended for:

  • Kubernetes cluster administrators
  • Network engineers working with Cilium technologies (eBPF, CNI)
  • Infrastructure specialists familiar with L2/L3 networking basics
  • Professionals interested in Kubernetes networking

πŸ“š Terminology (recommended)

Gratuitous ARP - Special ARP packet broadcast by a node when changing its MAC or IP address to update neighbor ARP caches.

Kubernetes Lease - Resource leasing mechanism through API, ensuring exclusive access to resources (IP addresses in this case) for cluster nodes.

Bare-Metal Kubernetes - Cluster infrastructure deployed on physical servers without cloud providers.

Problem Statement

When working with Kubernetes in bare-metal environments, accessing LoadBalancer services from external networks often becomes challenging. Traditional solutions like MetalLB require additional components and can be overkill for simple scenarios.

Cilium provides native L2 announcement capabilities with:

  • ARP response handling for LoadBalancer IPs
  • High availability through lease mechanism
  • Integration with existing network infrastructure

This guide covers:

  1. Configuring L2 announcements in Cilium
  2. ARP response mechanics through BPF
  3. Practical tips for working with Cilium

The primary goal is to demonstrate Kubernetes service accessibility in bare-metal environments using Cilium's native capabilities.

Environment Setup

Environment setup is described in the README.md

Network diagram:

Network scheme

  • Pod Subnets:
    • 10.200.0.0/24 for server
    • 10.200.1.0/24 for node-0
    • 10.200.2.0/24 for node-1

Kubernetes and Cilium Configuration

VMs created on ARM Mac using Vagrant.

jumpbox - 192.168.56.10 - client outside Kubernetes cluster.

jumpbox network

server - 192.168.56.20 - control plane

Server networking

node-0 - 192.168.56.50 - k8s node

node-0 networking

node-1 - 192.168.56.60 - k8s node

node-1 networking

Cilium with native routing (no tunnels). Version v1.16.5.

helm upgrade --install cilium cilium/cilium --version 1.16.5 --namespace kube-system \
  --set l2announcements.enabled=true \
  --set externalIPs.enabled=true \
  --set kubeProxyReplacement=true \
  --set ipam.mode=kubernetes \
  --set k8sServiceHost=192.168.56.20 \
  --set k8sServicePort=6443 \
  --set operator.replicas=1 \
  --set routingMode=native \
  --set ipv4NativeRoutingCIDR=10.200.0.0/22 \
  --set endpointRoutes.enabled=true \
  --set ingressController.enabled=true \
  --set ingressController.loadbalancerMode=dedicated 
Enter fullscreen mode Exit fullscreen mode

Kernel info:

# uname -a
Linux node-1 6.1.0-20-arm64 #1 SMP Debian 6.1.85-1 (2024-04-11) aarch64 aarch64 aarch64 GNU/Linux
Enter fullscreen mode Exit fullscreen mode

Deploy Nginx

In this section we'll deploy a simple Nginx web server.

Step 1. Create Deployment

# vagrant ssh server
# sudo bash

cat << EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
EOF

cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
  name: nginx
  namespace: default
spec:
  selector:
    app: nginx
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
EOF
Enter fullscreen mode Exit fullscreen mode

Step 2. Verify Deployment

Check service status:

# kubectl get pod -o wide
NAME                   READY   STATUS    RESTARTS   AGE   IP             NODE     NOMINATED NODE   READINESS GATES
nginx-96b9d695-25swg   1/1     Running   0          55s   10.200.2.189   node-1   <none>           <none>

# kubectl get svc nginx
NAME    TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)   AGE
nginx   ClusterIP   10.96.79.80   <none>        80/TCP    89s

# curl 10.96.79.80
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
...

# curl 10.200.2.189
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
...
Enter fullscreen mode Exit fullscreen mode

↑ Table of Contents | ← Back

Configure Ingress

Cilium documentation

Apply ingress manifest:

cat << EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: basic-ingress
  namespace: default
spec:
  ingressClassName: cilium
  rules:
  - http:
      paths:
      - backend:
          service:
            name: nginx
            port:
              number: 80
        path: /
        pathType: Prefix
EOF

# kubectl get ingress
NAME            CLASS    HOSTS   ADDRESS   PORTS   AGE
basic-ingress   cilium   *                 80      40s
Enter fullscreen mode Exit fullscreen mode

This creates a corresponding service:

# kubectl get svc cilium-ingress-basic-ingress
NAME                           TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
cilium-ingress-basic-ingress   LoadBalancer   10.96.156.194   <pending>     80:31017/TCP,443:32600/TCP   115s
Enter fullscreen mode Exit fullscreen mode

The EXTERNAL-IP remains in pending state due to missing IP pool configuration.

↑ Table of Contents | ← Back

Configure LoadBalancer IP Pool

Cilium supports IP assignment for LoadBalancer services.

CiliumLoadBalancerIPPool

Official documentation

Create CiliumLoadBalancerIPPool:

cat << EOF | kubectl apply -f -
apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
  name: "blue-pool"
spec:
  blocks:
  - cidr: "10.0.10.0/24"
EOF
Enter fullscreen mode Exit fullscreen mode

Now the service gets an IP:

# kubectl get svc cilium-ingress-basic-ingress
NAME                           TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
cilium-ingress-basic-ingress   LoadBalancer   10.96.156.194   10.0.10.0     80:31017/TCP,443:32600/TCP   4m9s
Enter fullscreen mode Exit fullscreen mode

Service is now accessible:

# curl 10.0.10.0
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
...
Enter fullscreen mode Exit fullscreen mode

↑ Table of Contents | ← Back

ARP Issue

Testing from jumpbox client:

# vagrant ssh jumpbox
root@jumpbox:/home/vagrant# curl 10.0.10.0
curl: (7) Failed to connect to 10.0.10.0 port 80 after 3074 ms: Couldn't connect to server
Enter fullscreen mode Exit fullscreen mode

Why? No ARP responses received.

root@server:/home/vagrant# tcpdump -n -i any arp host 10.0.10.0
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes

21:15:05.927064 eth1  B   ARP, Request who-has 10.0.10.0 tell 192.168.56.10, length 46
21:15:06.948513 eth1  B   ARP, Request who-has 10.0.10.0 tell 192.168.56.10, length 46
21:15:07.973210 eth1  B   ARP, Request who-has 10.0.10.0 tell 192.168.56.10, length 46
21:15:08.998950 eth1  B   ARP, Request who-has 10.0.10.0 tell 192.168.56.10, length 46
21:15:10.024080 eth1  B   ARP, Request who-has 10.0.10.0 tell 192.168.56.10, length 46
21:15:11.050053 eth1  B   ARP, Request who-has 10.0.10.0 tell 192.168.56.10, length 46

root@jumpbox:/home/vagrant# arp -n 10.0.10.0
Address                  HWtype  HWaddress           Flags Mask            Iface
10.0.10.0                        (incomplete)                              eth1
Enter fullscreen mode Exit fullscreen mode

↑ Table of Contents | ← Back

Enable L2 Announcements

Enable ARP announcements.

L2 Announcements

root@server:/home/vagrant# tcpdump -n -i any arp host 10.0.10.0 & # background tcpdump
[1] 17207

root@server:/home/vagrant# cat << EOF | kubectl apply -f -
apiVersion: "cilium.io/v2alpha1"
kind: CiliumL2AnnouncementPolicy
metadata:
  name: policy1
spec:
  nodeSelector:
    matchExpressions:
      - key: node-role.kubernetes.io/control-plane
        operator: DoesNotExist
  interfaces:
  - ^eth[0-9]+
  externalIPs: true
  loadBalancerIPs: true
EOF
ciliuml2announcementpolicy.cilium.io/policy1 created

21:18:52.093372 eth1  B   ARP, Reply 10.0.10.0 is-at 00:0c:29:0d:b7:76, length 46
21:18:52.102795 eth0  B   ARP, Reply 10.0.10.0 is-at 00:0c:29:0d:b7:6c, length 46

root@jumpbox:/home/vagrant# tcpdump -n -i any arp
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes

21:18:52.113122 eth1  B   ARP, Reply 10.0.10.0 is-at 00:0c:29:0d:b7:76, length 46
21:18:52.113211 eth1  B   ARP, Reply 10.0.10.1 is-at 00:0c:29:e3:b1:b2, length 46
21:18:52.122245 eth0  B   ARP, Reply 10.0.10.1 is-at 00:0c:29:e3:b1:a8, length 46
21:18:52.122495 eth0  B   ARP, Reply 10.0.10.0 is-at 00:0c:29:0d:b7:6c, length 46

root@jumpbox:/home/vagrant# arp -n 10.0.10.0
Address                  HWtype  HWaddress           Flags Mask            Iface
10.0.10.0                ether   00:0c:29:0d:b7:76   C                     eth1
Enter fullscreen mode Exit fullscreen mode

The MAC 00:0c:29:0d:b7:76 belongs to node-1's eth1 interface:

root@node-1:/home/vagrant# ip addr sh | grep -1 00:0c:29:0d:b7:76
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:0c:29:0d:b7:76 brd ff:ff:ff:ff:ff:ff
    altname enp18s0
Enter fullscreen mode Exit fullscreen mode

Verify lease acquisition:

# kubectl get lease -n kube-system cilium-l2announce-default-cilium-ingress-basic-ingress
NAME                                                     HOLDER   AGE
cilium-l2announce-default-cilium-ingress-basic-ingress   node-1   8m49s
Enter fullscreen mode Exit fullscreen mode

↑ Table of Contents | ← Back

How It Works

arp-path

  1. L2 Policy Configuration
  2. Lease Acquisition
  3. BPF Map for ARP

For detailed documentation see Cilium L2 Announcements.

L2 Policy Configuration

  1. Configure L2 announcement policy
apiVersion: "cilium.io/v2alpha1"
kind: CiliumL2AnnouncementPolicy
metadata:
  name: policy1
spec:
  nodeSelector:
    matchExpressions:
      - key: node-role.kubernetes.io/control-plane
        operator: DoesNotExist
  interfaces:
  - ^eth[0-9]+
  externalIPs: true
  loadBalancerIPs: true
Enter fullscreen mode Exit fullscreen mode

Lease Acquisition

  1. Cilium acquires lease
root@server:/home/vagrant# kubectl get lease -n kube-system | grep l2announce
cilium-l2announce-default-cilium-ingress-basic-ingress   node-1                                                                      16m
cilium-l2announce-kube-system-cilium-ingress             node-0                                                                      16m
Enter fullscreen mode Exit fullscreen mode

BPF Map for ARP

  1. BPF map created for ARP responses
# kubectl get svc cilium-ingress-basic-ingress
NAME                           TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
cilium-ingress-basic-ingress   LoadBalancer   10.96.156.194   10.0.10.0     80:31017/TCP,443:32600/TCP   24h
Enter fullscreen mode Exit fullscreen mode
root@node-1:/home/cilium# bpftool map show pinned /sys/fs/bpf/tc/globals/cilium_l2_responder_v4
72: hash  name cilium_l2_respo  flags 0x1
    key 8B  value 8B  max_entries 4096  memlock 65536B
    btf_id 125
root@node-1:/home/cilium# bpftool map dump pinned /sys/fs/bpf/tc/globals/cilium_l2_responder_v4
[{
        "key": {
            "ip4": 655370, # IP
            "ifindex": 2   # Interface index
        },
        "value": {
            "responses_sent": 0
        }
    },{
        "key": {
            "ip4": 655370,
            "ifindex": 3
        },
        "value": {
            "responses_sent": 3 # ARP responses count
        }
    }
]
Enter fullscreen mode Exit fullscreen mode

The number 655370 represents IP 10.0.10.0 in little-endian format.

Packet flow through ARP response system:

arp request reply

↑ Table of Contents | ← Back

Packet Path

ARP request processing on node-1:

 +-------------------+
 |   External Host   |
 |    (jumpbox)      |
 +--------+----------+
          | ARP Request "Who has 10.0.10.0?"
          v
 +-------------------+
 |   Interface eth1  |
 |     node-1        |
 +--------+----------+
          | TC ingress
          v
 +-------------------+
 |  BPF Program      |
 | cil_from_netdev   |
 +--------+----------+
          | handle_netdev
          v
 +-------------------+
 |  do_netdev()      |
 +--------+----------+
          | ARP check
          v
 +-------------------+
 | handle_l2_announcement() |
 +--------+----------+
          | Checks:
          | 1. Agent liveness
          | 2. Valid ARP
          | 3. L2_RESPONDER_MAP4 entry
          v
 +-------------------+
 | arp_respond()     |
 +--------+----------+
          | Prepare ARP reply
          v
 +-------------------+
 | ctx_redirect()    |
 +--------+----------+
          | Egress redirect
          v
 +-------------------+
 |   Interface eth1  |
 |     node-1        |
 +--------+----------+
          | ARP Reply
          v
 +-------------------+
 |   External Host   |
 |    (jumpbox)      |
 +-------------------+
Enter fullscreen mode Exit fullscreen mode

↑ Table of Contents | ← Back

Additional Resources

Cilium L2 Announcements vs Proxy ARP

Q: How do Cilium L2 announcements differ from enabling proxy_arp?

A: Proxy ARP handles ARP at kernel level while Cilium uses Kubernetes API for distributed control and BPF for performance.

Disable L2 announcements:

root@server:/home/vagrant# kubectl delete -f workshop/l2.yaml
ciliuml2announcementpolicy.cilium.io "policy1" deleted
Enter fullscreen mode Exit fullscreen mode

Enable proxy_arp on node-1:

root@node-1:/home/vagrant# sysctl -w net.ipv4.conf.eth1.proxy_arp=1
net.ipv4.conf.eth1.proxy_arp = 1
Enter fullscreen mode Exit fullscreen mode

Proxy ARP responses come from kernel:

root@jumpbox:/home/vagrant# arping -I eth1 10.0.10.0
ARPING 10.0.10.0
Timeout
Timeout
Timeout
Timeout
60 bytes from 00:0c:29:0d:b7:76 (10.0.10.0): index=0 time=449.655 msec
Enter fullscreen mode Exit fullscreen mode

Cached vs Non-Cached BPF Maps

Diagnostic tools behavior differs for cached maps:

root@node-1:/home/cilium# cilium-dbg map list
Name                       Num entries   Num errors   Cache enabled
cilium_policy_00215        3             0            true
# ...
cilium_l2_responder_v4     0             0            false
Enter fullscreen mode Exit fullscreen mode

Actual map contents via bpftool:

root@node-1:/home/cilium# bpftool map dump pinned /sys/fs/bpf/tc/globals/cilium_l2_responder_v4
[{
        "key": {
            "ip4": 655370,
            "ifindex": 2
        },
        "value": {
            "responses_sent": 0
        }
    },{
        "key": {
            "ip4": 655370,
            "ifindex": 3
        },
        "value": {
            "responses_sent": 40
        }
    }
]
Enter fullscreen mode Exit fullscreen mode

↑ Table of Contents | ← Back

Conclusion: Benefits of Cilium L2 Announcements

This guide demonstrated Cilium's L2 announcement implementation for bare-metal Kubernetes LoadBalancer services. Key aspects covered:

  • ARP request handling via BPF programs
  • High availability through lease system
  • Diagnostic techniques

Key advantages:

  1. Native Kubernetes integration
  2. eBPF-based packet processing performance
  3. Automatic load distribution
  4. No external dependencies
  5. Standard L2 protocol support

Particularly valuable for hybrid and on-premise environments requiring cloud-like service accessibility while maintaining bare-metal flexibility.

↑ Table of Contents

Top comments (0)