Skip to content

DNS & Cloudflare

DNS records live in the Cloudflare dashboard; tunnel → service mappings live in this repo. Certificates are issued automatically by cert-manager. This page is the operator-view of where each moving piece lives and how they compose.

Where everything lives

ThingLocationWho edits it
DNS records (A / CNAME)Cloudflare dashboard, per-zoneHumans in the dashboard
Tunnel → service mappingmanifests_v1/app-constructs/cloudflared/configmap.yamlGit commit (GitOps)
Tunnel credentialsmanifests_v1/app-constructs/cloudflared/sealed-tunnel-credentials.yamlsecrets-manager agent
Public TLS (browser ↔ Cloudflare)Cloudflare (Universal SSL)Cloudflare, automatic
Internal TLS (tunnel ↔ Traefikpod)cert-manager, Let's Encrypt, DNS01 via Cloudflarecert-manager, automatic
Ingress / IngressRoutemanifests_v1/app-constructs/<client>/.../ingress*.yamlGit commit (GitOps)
Cloudflare Access policies (Zero Trust)Cloudflare Zero Trust dashboardHumans in the dashboard

We own these zones in Cloudflare: wecare.gr, advisable.com, ecommercen.com. DNS for everything else lives somewhere else (client-owned registrars, usually).

The traffic path, in one diagram

See Ingress, TLS & Cloudflare for the full path. Short version: browser → Cloudflare edge → outbound tunnel → cloudflared pod in cluster → Traefik → Service → app pod. No inbound ports open on Hetzner; everything comes through the tunnel.

Adding a new hostname for an existing client

Scenario: shop.wecare.gr needs to route to the existing app-svc in ecommercen-clients-wecare.

  1. Cloudflare dashboard → DNS for wecare.gr zone: add a CNAME record for shop pointing at the tunnel hostname (Cloudflare shows the target under Zero Trust → Networks → Tunnels). Proxied (orange cloud) on.
  2. Repo: add the hostname to the client's Ingress — typically alongside the existing entries in manifests_v1/app-constructs/ecommercen-clients/wecare/adveshop4/prod/ingress.yaml:
    yaml
    spec:
      tls:
        - hosts:
            - www.wecare.gr
            - shop.wecare.gr
          secretName: new-wecare-gr-tls
      rules:
        - host: shop.wecare.gr
          http:
            paths:
              - path: /
                pathType: Prefix
                backend:
                  service:
                    name: app-svc
                    port: { number: 80 }
  3. If the tunnel doesn't already route that hostname (most don't — the shared tunnel is mostly *-ecnv4-mgmt.ecommercen.com for management tools), add an ingress rule to cloudflared/configmap.yaml before the catch-all 404 entry:
    yaml
    - hostname: shop.wecare.gr
      service: http://traefik.traefik.svc.cluster.local
  4. Commit + push. ArgoCD syncs Ingress + Cloudflared ConfigMap. cert-manager picks up the new TLS host and issues a Let's Encrypt cert via DNS01 (creates a transient TXT record in Cloudflare via our API token, validates, removes it).
  5. DNS propagation is usually near-instant with Cloudflare; certificate issuance takes ~30s–2 min.

For routing traffic that goes directly to Traefik from the tunnel (the usual pattern for tenant apps) the Ingress is enough — the catch-all path in configmap.yaml that points at Traefik handles the routing. Explicit tunnel entries are mostly for management UIs that bypass Traefik.

Cloudflare Access — Zero Trust for internal UIs

Cloudflare Access sits in front of the cluster's management URLs — ArgoCD, Grafana, Longhorn, Hubble, Keycloak admin, this docs site, etc. Anyone hitting *.ecnv4-mgmt.ecommercen.com without a valid Access session gets the Cloudflare login page (Google SSO, @advisable.com required).

Current Access applications mirror the tunnel entries in cloudflared/configmap.yaml — add a new one whenever you expose a new admin UI via the tunnel:

  1. Cloudflare Zero Trust dashboard → Access → Applications → Add → Self-hosted.
  2. Hostname: the same hostname you added to configmap.yaml.
  3. Policies: usually one Allow policy for emails ending in @advisable.com.
  4. Save. Access is enforced immediately at the edge.

Service Auth tokens (for machine callers — CI, LLM tooling, integration tests) are generated under Service Auth in the Zero Trust dashboard and bound to an Application. The caller sends CF-Access-Client-Id + CF-Access-Client-Secret headers. These tokens sit outside the repo; rotate them in the dashboard and reseal the consuming client's secret if it stores them.

Certificates — cert-manager does the work

  • Public TLS (browser ↔ Cloudflare edge) is handled by Cloudflare's Universal SSL. We don't touch that.
  • Cluster-internal TLS uses cert-manager + Let's Encrypt + the DNS01 challenge. The Cloudflare API token lives in the sealed secret certmanager-cloudflare-token; cert-manager uses it to create and destroy the _acme-challenge TXT records Let's Encrypt asks for.
  • Renewals are automatic at ~30 days before expiry. The ExternalCertExpiringSoon Prometheus alert is the safety net.

You almost never interact with cert-manager directly. If a Certificate isn't issuing, check:

bash
kubectl get certificate -A | grep -v True
kubectl describe certificate <name> -n <namespace>     # events show the ACME state
kubectl -n cert-manager logs deploy/cert-manager --tail=100

The Cloudflare Tunnel health itself is visible on the tunnel's page in the Zero Trust dashboard (green connectors + recent activity) and via:

bash
kubectl -n cloudflared get pods
kubectl -n cloudflared logs deploy/cloudflared --tail=100

Troubleshooting a down hostname

Walk the path from outside in:

  1. External: Cloudflare status page. Then curl -I https://<hostname> from outside the cluster. 5xx → continue. DNS NXDOMAIN → step 2 never happened (check DNS record exists).
  2. Tunnel: kubectl -n cloudflared logs deploy/cloudflared --tail=100. Look for connection failures, route errors, hostname-not-found.
  3. Traefik: kubectl -n traefik get ingressroute,ingress -A | grep <hostname>. Missing → the Ingress didn't sync. Present → kubectl -n traefik logs deploy/traefik --tail=100 for routing errors.
  4. Backend: once Traefik is in, check app status on the target Service's pods.
  • 🟢 Most hostname outages are actually backend outages. Start at step 4 if the alert implicates a specific app.
  • 🟠 Cloudflare proxy mode matters — if you toggle a DNS record from "Proxied" to "DNS only", the request bypasses the tunnel and we have no listener on 443. Leave proxy mode on unless you explicitly want to expose a direct origin IP.
  • 🔴 Don't disable Cloudflare Access on an admin UI "temporarily" to debug. It's easy to forget and the UI is exposed publicly until you remember.

For network-level issues (tunnel flapping, proxy protocol mismatch, CoreDNS) delegate to the network-expert agent — it runs Cilium/Hubble diagnostics directly.

Further reading

Internal documentation — Advisable only