DNS & Cloudflare
DNS records live in the Cloudflare dashboard; tunnel → service mappings live in this repo. Certificates are issued automatically by cert-manager. This page is the operator-view of where each moving piece lives and how they compose.
Where everything lives
| Thing | Location | Who edits it |
|---|---|---|
| DNS records (A / CNAME) | Cloudflare dashboard, per-zone | Humans in the dashboard |
| Tunnel → service mapping | manifests_v1/app-constructs/cloudflared/configmap.yaml | Git commit (GitOps) |
| Tunnel credentials | manifests_v1/app-constructs/cloudflared/sealed-tunnel-credentials.yaml | secrets-manager agent |
| Public TLS (browser ↔ Cloudflare) | Cloudflare (Universal SSL) | Cloudflare, automatic |
| Internal TLS (tunnel ↔ Traefik ↔ pod) | cert-manager, Let's Encrypt, DNS01 via Cloudflare | cert-manager, automatic |
| Ingress / IngressRoute | manifests_v1/app-constructs/<client>/.../ingress*.yaml | Git commit (GitOps) |
| Cloudflare Access policies (Zero Trust) | Cloudflare Zero Trust dashboard | Humans in the dashboard |
We own these zones in Cloudflare: wecare.gr, advisable.com, ecommercen.com. DNS for everything else lives somewhere else (client-owned registrars, usually).
The traffic path, in one diagram
See Ingress, TLS & Cloudflare for the full path. Short version: browser → Cloudflare edge → outbound tunnel → cloudflared pod in cluster → Traefik → Service → app pod. No inbound ports open on Hetzner; everything comes through the tunnel.
Adding a new hostname for an existing client
Scenario: shop.wecare.gr needs to route to the existing app-svc in ecommercen-clients-wecare.
- Cloudflare dashboard → DNS for
wecare.grzone: add aCNAMErecord forshoppointing at the tunnel hostname (Cloudflare shows the target under Zero Trust → Networks → Tunnels). Proxied (orange cloud) on. - Repo: add the hostname to the client's Ingress — typically alongside the existing entries in
manifests_v1/app-constructs/ecommercen-clients/wecare/adveshop4/prod/ingress.yaml:yamlspec: tls: - hosts: - www.wecare.gr - shop.wecare.gr secretName: new-wecare-gr-tls rules: - host: shop.wecare.gr http: paths: - path: / pathType: Prefix backend: service: name: app-svc port: { number: 80 } - If the tunnel doesn't already route that hostname (most don't — the shared tunnel is mostly
*-ecnv4-mgmt.ecommercen.comfor management tools), add an ingress rule tocloudflared/configmap.yamlbefore the catch-all 404 entry:yaml- hostname: shop.wecare.gr service: http://traefik.traefik.svc.cluster.local - Commit + push. ArgoCD syncs Ingress + Cloudflared ConfigMap. cert-manager picks up the new TLS host and issues a Let's Encrypt cert via DNS01 (creates a transient TXT record in Cloudflare via our API token, validates, removes it).
- DNS propagation is usually near-instant with Cloudflare; certificate issuance takes ~30s–2 min.
For routing traffic that goes directly to Traefik from the tunnel (the usual pattern for tenant apps) the Ingress is enough — the catch-all path in configmap.yaml that points at Traefik handles the routing. Explicit tunnel entries are mostly for management UIs that bypass Traefik.
Cloudflare Access — Zero Trust for internal UIs
Cloudflare Access sits in front of the cluster's management URLs — ArgoCD, Grafana, Longhorn, Hubble, Keycloak admin, this docs site, etc. Anyone hitting *.ecnv4-mgmt.ecommercen.com without a valid Access session gets the Cloudflare login page (Google SSO, @advisable.com required).
Current Access applications mirror the tunnel entries in cloudflared/configmap.yaml — add a new one whenever you expose a new admin UI via the tunnel:
- Cloudflare Zero Trust dashboard → Access → Applications → Add → Self-hosted.
- Hostname: the same hostname you added to
configmap.yaml. - Policies: usually one Allow policy for emails ending in
@advisable.com. - Save. Access is enforced immediately at the edge.
Service Auth tokens (for machine callers — CI, LLM tooling, integration tests) are generated under Service Auth in the Zero Trust dashboard and bound to an Application. The caller sends CF-Access-Client-Id + CF-Access-Client-Secret headers. These tokens sit outside the repo; rotate them in the dashboard and reseal the consuming client's secret if it stores them.
Certificates — cert-manager does the work
- Public TLS (browser ↔ Cloudflare edge) is handled by Cloudflare's Universal SSL. We don't touch that.
- Cluster-internal TLS uses cert-manager + Let's Encrypt + the DNS01 challenge. The Cloudflare API token lives in the sealed secret
certmanager-cloudflare-token; cert-manager uses it to create and destroy the_acme-challengeTXT records Let's Encrypt asks for. - Renewals are automatic at ~30 days before expiry. The
ExternalCertExpiringSoonPrometheus alert is the safety net.
You almost never interact with cert-manager directly. If a Certificate isn't issuing, check:
kubectl get certificate -A | grep -v True
kubectl describe certificate <name> -n <namespace> # events show the ACME state
kubectl -n cert-manager logs deploy/cert-manager --tail=100The Cloudflare Tunnel health itself is visible on the tunnel's page in the Zero Trust dashboard (green connectors + recent activity) and via:
kubectl -n cloudflared get pods
kubectl -n cloudflared logs deploy/cloudflared --tail=100Troubleshooting a down hostname
Walk the path from outside in:
- External: Cloudflare status page. Then
curl -I https://<hostname>from outside the cluster. 5xx → continue. DNS NXDOMAIN → step 2 never happened (check DNS record exists). - Tunnel:
kubectl -n cloudflared logs deploy/cloudflared --tail=100. Look for connection failures, route errors, hostname-not-found. - Traefik:
kubectl -n traefik get ingressroute,ingress -A | grep <hostname>. Missing → the Ingress didn't sync. Present →kubectl -n traefik logs deploy/traefik --tail=100for routing errors. - Backend: once Traefik is in, check app status on the target Service's pods.
- 🟢 Most hostname outages are actually backend outages. Start at step 4 if the alert implicates a specific app.
- 🟠 Cloudflare proxy mode matters — if you toggle a DNS record from "Proxied" to "DNS only", the request bypasses the tunnel and we have no listener on 443. Leave proxy mode on unless you explicitly want to expose a direct origin IP.
- 🔴 Don't disable Cloudflare Access on an admin UI "temporarily" to debug. It's easy to forget and the UI is exposed publicly until you remember.
For network-level issues (tunnel flapping, proxy protocol mismatch, CoreDNS) delegate to the network-expert agent — it runs Cilium/Hubble diagnostics directly.
Further reading
- Ingress, TLS & Cloudflare — the path, in full
- Rotate a secret — for the tunnel credentials or Cloudflare API token
- Incident response —
ExternalEndpointDownrunbook - Claude agents — network-expert, secrets-manager