DNS & Cloudflare

DNS records live in the Cloudflare dashboard; tunnel → service mappings live in this repo. Certificates are issued automatically by cert-manager. This page is the operator-view of where each moving piece lives and how they compose.

Where everything lives

Thing	Location	Who edits it
DNS records (A / CNAME)	Cloudflare dashboard, per-zone	Humans in the dashboard
Tunnel → service mapping	`manifests_v1/app-constructs/cloudflared/configmap.yaml`	Git commit (GitOps)
Tunnel credentials	`manifests_v1/app-constructs/cloudflared/sealed-tunnel-credentials.yaml`	`secrets-manager` agent
Public TLS (browser ↔ Cloudflare)	Cloudflare (Universal SSL)	Cloudflare, automatic
Internal TLS (tunnel ↔ Traefik ↔ pod)	cert-manager, Let's Encrypt, DNS01 via Cloudflare	cert-manager, automatic
Ingress / IngressRoute	`manifests_v1/app-constructs/<client>/.../ingress*.yaml`	Git commit (GitOps)
Cloudflare Access policies (Zero Trust)	Cloudflare Zero Trust dashboard	Humans in the dashboard

We own these zones in Cloudflare: wecare.gr, advisable.com, ecommercen.com. DNS for everything else lives somewhere else (client-owned registrars, usually).

The traffic path, in one diagram

See Ingress, TLS & Cloudflare for the full path. Short version: browser → Cloudflare edge → outbound tunnel → cloudflared pod in cluster → Traefik → Service → app pod. No inbound ports open on Hetzner; everything comes through the tunnel.

Adding a new hostname for an existing client

Scenario: shop.wecare.gr needs to route to the existing app-svc in ecommercen-clients-wecare.

Cloudflare dashboard → DNS for wecare.gr zone: add a CNAME record for shop pointing at the tunnel hostname (Cloudflare shows the target under Zero Trust → Networks → Tunnels). Proxied (orange cloud) on.

Repo: add the hostname to the client's Ingress — typically alongside the existing entries in manifests_v1/app-constructs/ecommercen-clients/wecare/adveshop4/prod/ingress.yaml:

yaml

spec:
  tls:
    - hosts:
        - www.wecare.gr
        - shop.wecare.gr
      secretName: new-wecare-gr-tls
  rules:
    - host: shop.wecare.gr
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: app-svc
                port: { number: 80 }

If the tunnel doesn't already route that hostname (most don't — the shared tunnel is mostly *-ecnv4-mgmt.ecommercen.com for management tools), add an ingress rule to cloudflared/configmap.yaml before the catch-all 404 entry:
yaml
```
- hostname: shop.wecare.gr
  service: http://traefik.traefik.svc.cluster.local
```
Commit + push. ArgoCD syncs Ingress + Cloudflared ConfigMap. cert-manager picks up the new TLS host and issues a Let's Encrypt cert via DNS01 (creates a transient TXT record in Cloudflare via our API token, validates, removes it).
DNS propagation is usually near-instant with Cloudflare; certificate issuance takes ~30s–2 min.

For routing traffic that goes directly to Traefik from the tunnel (the usual pattern for tenant apps) the Ingress is enough — the catch-all path in configmap.yaml that points at Traefik handles the routing. Explicit tunnel entries are mostly for management UIs that bypass Traefik.

Cloudflare Access — Zero Trust for internal UIs

Cloudflare Access sits in front of the cluster's management URLs — ArgoCD, Grafana, Longhorn, Hubble, Keycloak admin, this docs site, etc. Anyone hitting *.ecnv4-mgmt.ecommercen.com without a valid Access session gets the Cloudflare login page (Google SSO, @advisable.com required).

Current Access applications mirror the tunnel entries in cloudflared/configmap.yaml — add a new one whenever you expose a new admin UI via the tunnel:

Cloudflare Zero Trust dashboard → Access → Applications → Add → Self-hosted.
Hostname: the same hostname you added to configmap.yaml.
Policies: usually one Allow policy for emails ending in @advisable.com.
Save. Access is enforced immediately at the edge.

Service Auth tokens (for machine callers — CI, LLM tooling, integration tests) are generated under Service Auth in the Zero Trust dashboard and bound to an Application. The caller sends CF-Access-Client-Id + CF-Access-Client-Secret headers. These tokens sit outside the repo; rotate them in the dashboard and reseal the consuming client's secret if it stores them.

Certificates — cert-manager does the work

Public TLS (browser ↔ Cloudflare edge) is handled by Cloudflare's Universal SSL. We don't touch that.
Cluster-internal TLS uses cert-manager + Let's Encrypt + the DNS01 challenge. The Cloudflare API token lives in the sealed secret certmanager-cloudflare-token; cert-manager uses it to create and destroy the _acme-challenge TXT records Let's Encrypt asks for.
Renewals are automatic at ~30 days before expiry. The ExternalCertExpiringSoon Prometheus alert is the safety net.

You almost never interact with cert-manager directly. If a Certificate isn't issuing, check:

bash

kubectl get certificate -A | grep -v True
kubectl describe certificate <name> -n <namespace>     # events show the ACME state
kubectl -n cert-manager logs deploy/cert-manager --tail=100

The Cloudflare Tunnel health itself is visible on the tunnel's page in the Zero Trust dashboard (green connectors + recent activity) and via:

bash

kubectl -n cloudflared get pods
kubectl -n cloudflared logs deploy/cloudflared --tail=100

Troubleshooting a down hostname

Walk the path from outside in:

External: Cloudflare status page. Then curl -I https://<hostname> from outside the cluster. 5xx → continue. DNS NXDOMAIN → step 2 never happened (check DNS record exists).
Tunnel: kubectl -n cloudflared logs deploy/cloudflared --tail=100. Look for connection failures, route errors, hostname-not-found.
Traefik: kubectl -n traefik get ingressroute,ingress -A | grep <hostname>. Missing → the Ingress didn't sync. Present → kubectl -n traefik logs deploy/traefik --tail=100 for routing errors.
Backend: once Traefik is in, check app status on the target Service's pods.

🟢 Most hostname outages are actually backend outages. Start at step 4 if the alert implicates a specific app.
🟠 Cloudflare proxy mode matters — if you toggle a DNS record from "Proxied" to "DNS only", the request bypasses the tunnel and we have no listener on 443. Leave proxy mode on unless you explicitly want to expose a direct origin IP.
🔴 Don't disable Cloudflare Access on an admin UI "temporarily" to debug. It's easy to forget and the UI is exposed publicly until you remember.

For network-level issues (tunnel flapping, proxy protocol mismatch, CoreDNS) delegate to the network-expert agent — it runs Cilium/Hubble diagnostics directly.

DNS & Cloudflare ​

Where everything lives ​

The traffic path, in one diagram ​

Adding a new hostname for an existing client ​

Cloudflare Access — Zero Trust for internal UIs ​

Certificates — cert-manager does the work ​

Troubleshooting a down hostname ​

Further reading ​