Add a client

Onboarding a new tenant (its own namespace, MariaDB, Redis, MaxScale, dedicated nodes, autoscaling, dashboards, TLS) is roughly 25 new files and 6 touch-ups to shared infra. The client-onboarder Claude agent automates the generation step — your job is mostly filling in a config file and reviewing diffs.

The short version

Copy the template: cp untracked/client-onboarding/config-template.yaml untracked/client-onboarding/<client>.yaml
Fill in the YAML (identity, domains, DB sizing, node types — see table below).
Ask Claude: "Onboard client <client> using untracked/client-onboarding/<client>.yaml". The client-onboarder agent does the rest.
Review the diff, walk through the delegation checklist the agent produces, commit.

The workflow

The config YAML — what you fill in

The template is commented, but here's the operator-view map:

Section	Fields	Notes
`client`	`name`, `display_name`	`name` is used everywhere (directory, namespace suffix, node label/taint, Redis prefix, S3 prefix). Lowercase alphanumeric + hyphens only.
`domain`	`production`, `staging`	Used in Ingress TLS hosts and app base URL. Cert-manager issues a Let's Encrypt cert per domain via the DNS01 challenge.
`database`	`name`, `replicas`, `image`, `storage_size`, `innodb_buffer_pool_size`, `resources`	Rule of thumb: `innodb_buffer_pool_size` ≈ 80% of the container memory limit. `storage_class: local-path` because DB nodes use NVMe local storage, not Longhorn.
`maxscale`	`replicas`	`external_access` / `ip_allowlist` are deprecated (2026-04-23, see ecnv4_manifests#11). The onboarder no longer wires an external TCP IngressRoute on port 3306 — if a future tenant genuinely needs external SQL access, open a fresh manifest on a non-default port (e.g. `:13306`) with a strict `MiddlewareTCP` ipAllowList, do not reuse port 3306.
`redis.cache` / `redis.session`	`cluster_size`, `sentinel_size`, `eviction_policy`, `resources`	Two independent Redis replication sets: `allkeys-lru` for application cache, `volatile-lru` for PHP sessions.
`app`	`image_tag`, `php_cli_tag`, `php_fpm_tag`, `s3_prefix`	If `s3_prefix` is blank the agent generates a random hash. This isolates uploaded files per client in the shared S3 bucket.
`scaling`	`min_replicas`, `max_replicas`, `cpu_limit`, `request_rate_threshold`, `cpu_threshold`	Becomes the KEDA `ScaledObject` in `<client>/adveshop4/prod/`. Defaults to 4–16 pods, 15 req/s per pod.
`nodes.db` / `nodes.web`	`count`, `type`, `autoscaler_min`, `autoscaler_max`	Hetzner server types. `ccx33` for DB (dedicated vCPU + more RAM), `ccx13` for web. Autoscaler limits apply to the web pool only.
`features`	`erp_access`, `phpmyadmin`, `redisinsight`, `physicalbackup`, `grafana_dashboards`, `staging_env`	Per-feature toggles — set `false` to skip generating that component.
`cloudflared`	`maxscale_gui`, `redisinsight`, `phpmyadmin`	Hostnames for management tools. Leave blank to auto-generate `<tool>-<client>-ecnv4-mgmt.ecommercen.com`.

What the agent generates

New files (~25): the full manifests_v1/app-constructs/ecommercen-clients/<client>/ tree — infrastructure/prod/ (MariaDB, MaxScale, Redis cache + session, phpMyAdmin, RedisInsight, backups), adveshop4/base/ (the big app.yaml + kustomization), adveshop4/prod/ and adveshop4/stg/ (overlays with ConfigMap patches, Ingress, KEDA scheduling). Plus two Grafana dashboards and a Kyverno policy.

Modified shared files (~6): Longhorn values.yaml (add tolerations for the new taints), cluster-autoscaler values.yaml (register the new web pool), Cloudflared configmap.yaml (add tunnel entries for the mgmt tools, before the catch-all 404), Kyverno setup kustomization, kube-prometheus-stack dashboards kustomization, Ansible inventory/hosts.yml.

Skeleton secrets: untracked/secrets/<client>/ — empty-value stubs for app-secrets, app-db-secrets, app-keycloak-secrets, and (if ERP is enabled) erp-db-secrets. Each stub carries the sealedsecrets.bitnami.com/cluster-wide: "true" annotation so seal-all.sh picks cluster-wide scope automatically once you fill the plaintext values in — no registry file to edit.

The client-onboarder agent pauses after generation for you to eyeball the diff before moving on — use git status + git diff.

Post-generation checklist

The agent produces a delegation report with four follow-up tasks:

🟢 terraform-manager — adds the DB and web hcloud_server resources to terraform/main.tf (with matching null_resource provisioners for labels and taints). You then run ./tf.sh plan -out=plans/<name>.tfplan and ./tf.sh apply.
🟢 secrets-manager (client secrets) — you fill in the plaintext values in untracked/secrets/<client>/, and it runs ./seal.sh cluster-seal <client>/<file> per file (or ./seal-all.sh <client> for the whole subtree) + copies everything into place (don't forget the cluster-wide shared secrets: regcred, redis-auth-secret, ecn-bucket-access).
🟢 secrets-manager (autoscaler config) — registers the new web pool in build-autoscaler-config.sh and re-seals the autoscaler config.
🟢 gitops-commit-pusher — stages everything atomically and commits with a [App: <client>] Add client … message. Push once, ArgoCD picks it up.

The ApplicationSet (appset-ecommercen-clients) auto-discovers the new config.json files — no manual app-<client>.yaml to add.

Verification once it's pushed

bash

# 1. ArgoCD generated the apps?
argocd app list | grep <client>

# 2. Pods in infrastructure namespace
kubectl -n ecommercen-clients-<client>-infrastructure get pods

# 3. Pods in app namespace (after the infrastructure pods are Ready)
kubectl -n ecommercen-clients-<client> get pods

# 4. External reachability (once DNS is wired)
curl -sI https://<production-domain> | head -5

🟠 DNS for the client's domain is not in the repo. Create the CNAME in the Cloudflare dashboard pointing at the tunnel — see DNS & Cloudflare.
🟠 Seed data (initial DB dump, S3 media) is outside the onboarder's scope. Coordinate with the dev team before the first external smoke test.
🔴 Don't cut Cloudflare DNS over before the app actually reaches Healthy in ArgoCD — a dangling CNAME to a 502 is noisier than an NXDOMAIN.

Add a client ​

The short version ​

The workflow ​

The config YAML — what you fill in ​

What the agent generates ​

Post-generation checklist ​

Verification once it's pushed ​

Further reading ​