Table of Contents
- Introduction
- Environment Setup
- Nginx Deployment
- Ingress Configuration
- LoadBalancer IP Pool Configuration
- ARP Issue
- Enabling L2 Announcements
- How It Works
- Packet Path
- Additional Resources
Introduction: Why L2 Announcements in Kubernetes?
Target Audience
This material is intended for:
- Kubernetes cluster administrators
- Network engineers working with Cilium technologies (eBPF, CNI)
- Infrastructure specialists familiar with L2/L3 networking basics
- Professionals interested in Kubernetes networking
π Terminology (recommended)
Gratuitous ARP - Special ARP packet broadcast by a node when changing its MAC or IP address to update neighbor ARP caches.
Kubernetes Lease - Resource leasing mechanism through API, ensuring exclusive access to resources (IP addresses in this case) for cluster nodes.
Bare-Metal Kubernetes - Cluster infrastructure deployed on physical servers without cloud providers.
Problem Statement
When working with Kubernetes in bare-metal environments, accessing LoadBalancer services from external networks often becomes challenging. Traditional solutions like MetalLB require additional components and can be overkill for simple scenarios.
Cilium provides native L2 announcement capabilities with:
- ARP response handling for LoadBalancer IPs
- High availability through lease mechanism
- Integration with existing network infrastructure
This guide covers:
- Configuring L2 announcements in Cilium
- ARP response mechanics through BPF
- Practical tips for working with Cilium
The primary goal is to demonstrate Kubernetes service accessibility in bare-metal environments using Cilium's native capabilities.
Environment Setup
Environment setup is described in the README.md
Network diagram:
-
Pod Subnets:
-
10.200.0.0/24
for server -
10.200.1.0/24
for node-0 -
10.200.2.0/24
for node-1
-
Kubernetes and Cilium Configuration
VMs created on ARM Mac using Vagrant.
jumpbox - 192.168.56.10 - client outside Kubernetes cluster.
server - 192.168.56.20 - control plane
node-0 - 192.168.56.50 - k8s node
node-1 - 192.168.56.60 - k8s node
Cilium with native routing (no tunnels). Version v1.16.5.
helm upgrade --install cilium cilium/cilium --version 1.16.5 --namespace kube-system \
--set l2announcements.enabled=true \
--set externalIPs.enabled=true \
--set kubeProxyReplacement=true \
--set ipam.mode=kubernetes \
--set k8sServiceHost=192.168.56.20 \
--set k8sServicePort=6443 \
--set operator.replicas=1 \
--set routingMode=native \
--set ipv4NativeRoutingCIDR=10.200.0.0/22 \
--set endpointRoutes.enabled=true \
--set ingressController.enabled=true \
--set ingressController.loadbalancerMode=dedicated
Kernel info:
# uname -a
Linux node-1 6.1.0-20-arm64 #1 SMP Debian 6.1.85-1 (2024-04-11) aarch64 aarch64 aarch64 GNU/Linux
Deploy Nginx
In this section we'll deploy a simple Nginx web server.
Step 1. Create Deployment
# vagrant ssh server
# sudo bash
cat << EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:latest
ports:
- containerPort: 80
EOF
cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
name: nginx
namespace: default
spec:
selector:
app: nginx
ports:
- protocol: TCP
port: 80
targetPort: 80
EOF
Step 2. Verify Deployment
Check service status:
# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-96b9d695-25swg 1/1 Running 0 55s 10.200.2.189 node-1 <none> <none>
# kubectl get svc nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nginx ClusterIP 10.96.79.80 <none> 80/TCP 89s
# curl 10.96.79.80
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
...
# curl 10.200.2.189
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
...
β Table of Contents | β Back
Configure Ingress
Apply ingress manifest:
cat << EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: basic-ingress
namespace: default
spec:
ingressClassName: cilium
rules:
- http:
paths:
- backend:
service:
name: nginx
port:
number: 80
path: /
pathType: Prefix
EOF
# kubectl get ingress
NAME CLASS HOSTS ADDRESS PORTS AGE
basic-ingress cilium * 80 40s
This creates a corresponding service:
# kubectl get svc cilium-ingress-basic-ingress
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
cilium-ingress-basic-ingress LoadBalancer 10.96.156.194 <pending> 80:31017/TCP,443:32600/TCP 115s
The EXTERNAL-IP
remains in pending
state due to missing IP pool configuration.
β Table of Contents | β Back
Configure LoadBalancer IP Pool
Cilium supports IP assignment for LoadBalancer services.
CiliumLoadBalancerIPPool
Create CiliumLoadBalancerIPPool:
cat << EOF | kubectl apply -f -
apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
name: "blue-pool"
spec:
blocks:
- cidr: "10.0.10.0/24"
EOF
Now the service gets an IP:
# kubectl get svc cilium-ingress-basic-ingress
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
cilium-ingress-basic-ingress LoadBalancer 10.96.156.194 10.0.10.0 80:31017/TCP,443:32600/TCP 4m9s
Service is now accessible:
# curl 10.0.10.0
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
...
β Table of Contents | β Back
ARP Issue
Testing from jumpbox client:
# vagrant ssh jumpbox
root@jumpbox:/home/vagrant# curl 10.0.10.0
curl: (7) Failed to connect to 10.0.10.0 port 80 after 3074 ms: Couldn't connect to server
Why? No ARP responses received.
root@server:/home/vagrant# tcpdump -n -i any arp host 10.0.10.0
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
21:15:05.927064 eth1 B ARP, Request who-has 10.0.10.0 tell 192.168.56.10, length 46
21:15:06.948513 eth1 B ARP, Request who-has 10.0.10.0 tell 192.168.56.10, length 46
21:15:07.973210 eth1 B ARP, Request who-has 10.0.10.0 tell 192.168.56.10, length 46
21:15:08.998950 eth1 B ARP, Request who-has 10.0.10.0 tell 192.168.56.10, length 46
21:15:10.024080 eth1 B ARP, Request who-has 10.0.10.0 tell 192.168.56.10, length 46
21:15:11.050053 eth1 B ARP, Request who-has 10.0.10.0 tell 192.168.56.10, length 46
root@jumpbox:/home/vagrant# arp -n 10.0.10.0
Address HWtype HWaddress Flags Mask Iface
10.0.10.0 (incomplete) eth1
β Table of Contents | β Back
Enable L2 Announcements
Enable ARP announcements.
root@server:/home/vagrant# tcpdump -n -i any arp host 10.0.10.0 & # background tcpdump
[1] 17207
root@server:/home/vagrant# cat << EOF | kubectl apply -f -
apiVersion: "cilium.io/v2alpha1"
kind: CiliumL2AnnouncementPolicy
metadata:
name: policy1
spec:
nodeSelector:
matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: DoesNotExist
interfaces:
- ^eth[0-9]+
externalIPs: true
loadBalancerIPs: true
EOF
ciliuml2announcementpolicy.cilium.io/policy1 created
21:18:52.093372 eth1 B ARP, Reply 10.0.10.0 is-at 00:0c:29:0d:b7:76, length 46
21:18:52.102795 eth0 B ARP, Reply 10.0.10.0 is-at 00:0c:29:0d:b7:6c, length 46
root@jumpbox:/home/vagrant# tcpdump -n -i any arp
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
21:18:52.113122 eth1 B ARP, Reply 10.0.10.0 is-at 00:0c:29:0d:b7:76, length 46
21:18:52.113211 eth1 B ARP, Reply 10.0.10.1 is-at 00:0c:29:e3:b1:b2, length 46
21:18:52.122245 eth0 B ARP, Reply 10.0.10.1 is-at 00:0c:29:e3:b1:a8, length 46
21:18:52.122495 eth0 B ARP, Reply 10.0.10.0 is-at 00:0c:29:0d:b7:6c, length 46
root@jumpbox:/home/vagrant# arp -n 10.0.10.0
Address HWtype HWaddress Flags Mask Iface
10.0.10.0 ether 00:0c:29:0d:b7:76 C eth1
The MAC 00:0c:29:0d:b7:76
belongs to node-1's eth1 interface:
root@node-1:/home/vagrant# ip addr sh | grep -1 00:0c:29:0d:b7:76
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 00:0c:29:0d:b7:76 brd ff:ff:ff:ff:ff:ff
altname enp18s0
Verify lease acquisition:
# kubectl get lease -n kube-system cilium-l2announce-default-cilium-ingress-basic-ingress
NAME HOLDER AGE
cilium-l2announce-default-cilium-ingress-basic-ingress node-1 8m49s
β Table of Contents | β Back
How It Works
For detailed documentation see Cilium L2 Announcements.
L2 Policy Configuration
- Configure L2 announcement policy
apiVersion: "cilium.io/v2alpha1"
kind: CiliumL2AnnouncementPolicy
metadata:
name: policy1
spec:
nodeSelector:
matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: DoesNotExist
interfaces:
- ^eth[0-9]+
externalIPs: true
loadBalancerIPs: true
Lease Acquisition
- Cilium acquires lease
root@server:/home/vagrant# kubectl get lease -n kube-system | grep l2announce
cilium-l2announce-default-cilium-ingress-basic-ingress node-1 16m
cilium-l2announce-kube-system-cilium-ingress node-0 16m
BPF Map for ARP
- BPF map created for ARP responses
# kubectl get svc cilium-ingress-basic-ingress
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
cilium-ingress-basic-ingress LoadBalancer 10.96.156.194 10.0.10.0 80:31017/TCP,443:32600/TCP 24h
root@node-1:/home/cilium# bpftool map show pinned /sys/fs/bpf/tc/globals/cilium_l2_responder_v4
72: hash name cilium_l2_respo flags 0x1
key 8B value 8B max_entries 4096 memlock 65536B
btf_id 125
root@node-1:/home/cilium# bpftool map dump pinned /sys/fs/bpf/tc/globals/cilium_l2_responder_v4
[{
"key": {
"ip4": 655370, # IP
"ifindex": 2 # Interface index
},
"value": {
"responses_sent": 0
}
},{
"key": {
"ip4": 655370,
"ifindex": 3
},
"value": {
"responses_sent": 3 # ARP responses count
}
}
]
The number 655370
represents IP 10.0.10.0 in little-endian format.
Packet flow through ARP response system:
β Table of Contents | β Back
Packet Path
ARP request processing on node-1:
+-------------------+
| External Host |
| (jumpbox) |
+--------+----------+
| ARP Request "Who has 10.0.10.0?"
v
+-------------------+
| Interface eth1 |
| node-1 |
+--------+----------+
| TC ingress
v
+-------------------+
| BPF Program |
| cil_from_netdev |
+--------+----------+
| handle_netdev
v
+-------------------+
| do_netdev() |
+--------+----------+
| ARP check
v
+-------------------+
| handle_l2_announcement() |
+--------+----------+
| Checks:
| 1. Agent liveness
| 2. Valid ARP
| 3. L2_RESPONDER_MAP4 entry
v
+-------------------+
| arp_respond() |
+--------+----------+
| Prepare ARP reply
v
+-------------------+
| ctx_redirect() |
+--------+----------+
| Egress redirect
v
+-------------------+
| Interface eth1 |
| node-1 |
+--------+----------+
| ARP Reply
v
+-------------------+
| External Host |
| (jumpbox) |
+-------------------+
β Table of Contents | β Back
Additional Resources
Cilium L2 Announcements vs Proxy ARP
Q: How do Cilium L2 announcements differ from enabling proxy_arp?
A: Proxy ARP handles ARP at kernel level while Cilium uses Kubernetes API for distributed control and BPF for performance.
Disable L2 announcements:
root@server:/home/vagrant# kubectl delete -f workshop/l2.yaml
ciliuml2announcementpolicy.cilium.io "policy1" deleted
Enable proxy_arp on node-1:
root@node-1:/home/vagrant# sysctl -w net.ipv4.conf.eth1.proxy_arp=1
net.ipv4.conf.eth1.proxy_arp = 1
Proxy ARP responses come from kernel:
root@jumpbox:/home/vagrant# arping -I eth1 10.0.10.0
ARPING 10.0.10.0
Timeout
Timeout
Timeout
Timeout
60 bytes from 00:0c:29:0d:b7:76 (10.0.10.0): index=0 time=449.655 msec
Cached vs Non-Cached BPF Maps
Diagnostic tools behavior differs for cached maps:
root@node-1:/home/cilium# cilium-dbg map list
Name Num entries Num errors Cache enabled
cilium_policy_00215 3 0 true
# ...
cilium_l2_responder_v4 0 0 false
Actual map contents via bpftool:
root@node-1:/home/cilium# bpftool map dump pinned /sys/fs/bpf/tc/globals/cilium_l2_responder_v4
[{
"key": {
"ip4": 655370,
"ifindex": 2
},
"value": {
"responses_sent": 0
}
},{
"key": {
"ip4": 655370,
"ifindex": 3
},
"value": {
"responses_sent": 40
}
}
]
β Table of Contents | β Back
Conclusion: Benefits of Cilium L2 Announcements
This guide demonstrated Cilium's L2 announcement implementation for bare-metal Kubernetes LoadBalancer services. Key aspects covered:
- ARP request handling via BPF programs
- High availability through lease system
- Diagnostic techniques
Key advantages:
- Native Kubernetes integration
- eBPF-based packet processing performance
- Automatic load distribution
- No external dependencies
- Standard L2 protocol support
Particularly valuable for hybrid and on-premise environments requiring cloud-like service accessibility while maintaining bare-metal flexibility.
Top comments (0)