Cluster networking

Where this page picks up

Ingress, TLS & Cloudflare traced the three ways a request reaches the edge of the cluster. All three end the same way: a pod receives a connection. This page is about what happens between pods — how they find each other by name, how their packets travel from one node to another, and how you can watch any of it live.

The short version: there's a CNI called Cilium running on every node, a CoreDNS resolver that answers *.svc.cluster.local, a Hubble UI that lets you watch flows in real time, and HCCM gluing it all to the Hetzner Cloud network.

Three addressing layers

Every pod and Service has an IP, and it pays to know which kind you're looking at. There are three distinct ranges:

Layer	CIDR (this cluster)	What it identifies	Lifetime
Node	`10.1.0.0/24` (Hetzner private subnet)	A Hetzner VM's private NIC.	Weeks to months. Stable.
Pod	`10.42.0.0/16` (cluster-wide, subdivided per node)	A single pod's interface inside the node.	Minutes to days. Churn with restarts.
Service (ClusterIP)	Kubernetes-assigned, small internal range	A stable virtual IP for a Service.	Tied to the Service resource.

Practical rules:

If you see 10.1.0.X, that's a node. kubectl get nodes -o wide lists them.
If you see 10.42.X.Y, that's a pod on the node whose pod-CIDR contains it. kubectl get pods -o wide shows pod IPs alongside their node.
If you see something outside both ranges and it resolves via cluster DNS (e.g. app-svc.ecommercen-clients-wecare.svc.cluster.local), that's a Service — a virtual IP that never actually appears on a wire; Cilium redirects traffic to it at the socket level. You can't ping a ClusterIP and expect anything meaningful back.

The node range is configured in Hetzner's private network network-k8s. The pod range 10.42.0.0/16 is HCCM's clusterCIDR — HCCM carves out one /24 per node and writes a route in the Hetzner network definition ("send 10.42.X.0/24 to node Y's private IP") so cross-node pod traffic finds its way.

Cilium — the CNI

A CNI (Container Network Interface) is the component that actually wires pod networking together — assigning pod IPs, giving each pod a network interface, and moving packets. Kubernetes itself doesn't do networking; it delegates to whichever CNI you've installed. We run Cilium.

Cilium runs as a DaemonSet in the kube-system namespace — one cilium-agent pod per node. Every pod on a node talks through that node's local agent. The install is managed by RKE2's HelmChartConfig (not ArgoCD); the template lives at ansible/playbooks/templates/rke2-cilium-config.yaml.j2.

Key settings from that template:

kubeProxyReplacement: true — no kube-proxy on this cluster. Cilium's eBPF programs translate Service → Pod in the kernel, skipping the iptables layer. If Cilium is unhealthy on a node, Services don't work there — there's no fallback.
routingMode: native with ipv4NativeRoutingCIDR: 10.0.0.0/8 — no VXLAN tunnel. Pod-to-pod traffic rides directly on the Hetzner private network using the routes HCCM provisions.
bpf.datapathMode: netkit — the modern eBPF datapath. Side-effect worth remembering: tcpdump on the host won't see pod traffic (no veth pairs to sniff). Use Hubble instead.
mtu: 1450 — the Hetzner private network adds overlay overhead that caps the usable MTU; Cilium is told to match. Wrong MTU shows up as "small requests work, large responses hang".

Beyond wiring packets, Cilium also supplies identity-aware network policy (see NetworkPolicy), in-kernel load balancing for Service type=LoadBalancer, and Hubble.

CoreDNS — in-cluster name resolution

When app-web wants to reach app-db-maxscale:3306, it doesn't know the Service's ClusterIP. It asks DNS.

CoreDNS is the cluster's DNS resolver. In RKE2 it's installed under kube-system as rke2-coredns-rke2-coredns (a Service) with pods labelled k8s-app=kube-dns. Every pod the kubelet starts gets a /etc/resolv.conf that points at the kube-dns Service IP.

resolv.conf also has a search list that lets short names resolve without fully qualified domains. Inside a pod in ecommercen-clients-wecare, you can say app-svc and DNS tries, in order:

app-svc.ecommercen-clients-wecare.svc.cluster.local (same namespace)
app-svc.svc.cluster.local
app-svc.cluster.local
app-svc (as-is, external)

That's why code inside pods just uses short names. Cross-namespace calls use <svc>.<ns> (e.g. grafana.observability), and FQDNs with the .svc.cluster.local suffix always work from anywhere.

Cilium transparently proxies DNS traffic at the cilium-agent level before it reaches CoreDNS — that's how Hubble can log DNS queries and how identity-aware policies that reference hostnames (rather than IPs) work. Most of the time you won't notice.

When DNS is broken, the symptoms are loud: pods time out on name lookups, nslookup <svc> returns SERVFAIL, or external lookups are slow because ndots:5 tries the cluster suffixes first. Delegate to the network-expert agent — it runs the right dig/nslookup/Hubble checks from a debug pod.

Hubble — flow observability

If tcpdump is packet-level, Hubble is flow-level — and identity-aware. Instead of "IP 10.42.3.17 talked to 10.42.7.4 on port 3306", Hubble shows "pod app-web in ecommercen-clients-wecare talked to Service app-db-maxscale on port 3306 and got verdict ALLOWED".

The Hubble UI is wired through the cloudflared tunnel at hubble-ecnv4-mgmt.ecommercen.com. Pick a namespace, and you get a live service graph of pod-to-pod communications with colour-coded verdicts (green allowed, red dropped, amber forwarded by policy). It's the fastest way to answer "does traffic actually flow between these two things".

For text, exec into the hubble-relay pod:

bash

HUBBLE_POD=$(kubectl --context ecnv4 -n kube-system \
  get pods -l k8s-app=hubble-relay -o jsonpath='{.items[0].metadata.name}')

kubectl --context ecnv4 -n kube-system exec $HUBBLE_POD -- \
  hubble observe --namespace ecommercen-clients-wecare --last 50

Useful filters: --verdict DROPPED (what's being blocked?), --protocol DNS (where are DNS queries going?), --from-pod <ns>/<pod> or --to-service <ns>/<svc>. This is routine work for the network-expert agent — in practice you'll ask Claude "show me Hubble drops from wecare in the last hour" and it'll run the right command.

HCCM & the Hetzner LB

A Kubernetes Service of type: LoadBalancer asks the cluster for an external LB. Someone has to actually create that LB; in a cloud cluster, that someone is the cloud controller manager. For us it's HCCM — the Hetzner Cloud Controller Manager — which runs in kube-system (values in manifests_v1/app-constructs/kube-system/hcloud.values.yaml).

HCCM watches LoadBalancer Services and does three jobs:

Provision a Hetzner Cloud Load Balancer to match. Our Traefik Service is annotated load-balancer.hetzner.cloud/name: "ecnv4-lb" — HCCM sees that and creates/updates the LB called ecnv4-lb in Hetzner's UI.
Keep the target set accurate. The LB forwards to every healthy node on the right NodePort. When the cluster autoscaler adds or removes a node, HCCM reconciles the LB targets.
Populate cloud routes for the 10.42.0.0/16 pod CIDR. Every ~30 seconds it looks at the node list and ensures the Hetzner private network definition has a route per node (10.42.X.0/24 → <node private IP>). Without that, pod-to-pod cross-node traffic drops on the wire.

You may notice MetalLB manifests in manifests_v1/app-constructs/metallb/. MetalLB is a popular LB implementation for bare-metal clusters that don't have a cloud provider. On Hetzner Cloud we don't need it — HCCM already provides LoadBalancer support — so MetalLB's app-*.yaml isn't in apps-enabled/ and the controller isn't running. The manifests are kept for reference and for a potential future bare-metal path.

A practical consequence: the Hetzner LB is the only public ingress point. Its public IP is in Cloudflare's origin pool, so losing the LB means losing all public traffic — LB health alerts are high-priority.

Pod-to-pod in one picture

Walk through app-web → app-db-maxscale:3306 when the two pods happen to sit on different nodes.

The packet never sees a VXLAN header (native routing), never passes through a kube-proxy (eBPF replaces it), and the ClusterIP is translated to the real pod IP before the packet leaves wecare-web-1. Same-node pod-to-pod is the same picture minus the Hetzner hop — the cilium-agent shortcut copies the packet straight into the target pod's namespace. Measured in practice: ~0.8ms pod-to-pod, ~5ms when going through a ClusterIP (the eBPF service routing overhead). If you ever see significantly worse, look at Cilium health or HCCM route freshness before anything else.

NetworkPolicy

Kubernetes ships NetworkPolicy, and Cilium also offers a richer CiliumNetworkPolicy with Layer-7 rules and identity selectors. Both give you micro-segmentation — "only these pods can talk to these other pods on these ports".

We don't use them extensively yet. The most visible example is manifests_v1/app-constructs/kube-system/networkpolicies/cloudflared-hubble-access.yaml, which restricts the Hubble UI to only accept traffic from the cloudflared namespace. Clusters without any NetworkPolicy are "flat" — every pod can talk to every other pod. That's our current default.

When we do tighten, CiliumNetworkPolicy is the preferred shape: it can reference DNS names, HTTP methods, and Cilium identities rather than raw CIDRs, which plays much better with ephemeral pod IPs.

Delegating to the network-expert agent

Anything non-trivial about packets, DNS, routes, or Cilium internals is owned by the network-expert Claude agent. Just describe the symptom — "pods in ecommercen-clients-wecare can't reach the MariaDB Service", "DNS lookups time out", "502 from outside but pod is healthy", "check Cilium health on wecare-web-1" — and it runs cilium-dbg, hubble observe, dig, traceroute, or whatever else the situation calls for. It knows the gotchas (proxy-protocol trusted CIDRs, netkit-vs-veth, MTU 1450 fragmentation, HCCM route lag for new autoscaler nodes) and will delegate out to k8s-manager or hcloud-operator when the answer is in one of their domains.

For a narrative walk-through when an alert fires, see Incident response — it sends you down the right delegation path based on the alert name.

Cluster networking ​

Where this page picks up ​

Three addressing layers ​

Cilium — the CNI ​

CoreDNS — in-cluster name resolution ​

Hubble — flow observability ​

HCCM & the Hetzner LB ​

Pod-to-pod in one picture ​

NetworkPolicy ​

Delegating to the network-expert agent ​

Further reading ​