Add a client
Onboarding a new tenant (its own namespace, MariaDB, Redis, MaxScale, dedicated nodes, autoscaling, dashboards, TLS) is roughly 25 new files and 6 touch-ups to shared infra. The client-onboarder Claude agent automates the generation step — your job is mostly filling in a config file and reviewing diffs.
The short version
- Copy the template:
cp untracked/client-onboarding/config-template.yaml untracked/client-onboarding/<client>.yaml - Fill in the YAML (identity, domains, DB sizing, node types — see table below).
- Ask Claude: "Onboard client
<client>usinguntracked/client-onboarding/<client>.yaml". Theclient-onboarderagent does the rest. - Review the diff, walk through the delegation checklist the agent produces, commit.
The workflow
The config YAML — what you fill in
The template is commented, but here's the operator-view map:
| Section | Fields | Notes |
|---|---|---|
client | name, display_name | name is used everywhere (directory, namespace suffix, node label/taint, Redis prefix, S3 prefix). Lowercase alphanumeric + hyphens only. |
domain | production, staging | Used in Ingress TLS hosts and app base URL. Cert-manager issues a Let's Encrypt cert per domain via the DNS01 challenge. |
database | name, replicas, image, storage_size, innodb_buffer_pool_size, resources | Rule of thumb: innodb_buffer_pool_size ≈ 80% of the container memory limit. storage_class: local-path because DB nodes use NVMe local storage, not Longhorn. |
maxscale | replicas, external_access, ip_allowlist | Turn external_access on only if an ERP connector needs to reach the DB from outside the cluster — it opens a TCP IngressRoute with an IP allowlist. |
redis.cache / redis.session | cluster_size, sentinel_size, eviction_policy, resources | Two independent Redis replication sets: allkeys-lru for application cache, volatile-lru for PHP sessions. |
app | image_tag, php_cli_tag, php_fpm_tag, s3_prefix | If s3_prefix is blank the agent generates a random hash. This isolates uploaded files per client in the shared S3 bucket. |
scaling | min_replicas, max_replicas, cpu_limit, request_rate_threshold, cpu_threshold | Becomes the KEDA ScaledObject in <client>/adveshop4/prod/. Defaults to 4–16 pods, 15 req/s per pod. |
nodes.db / nodes.web | count, type, autoscaler_min, autoscaler_max | Hetzner server types. ccx33 for DB (dedicated vCPU + more RAM), ccx13 for web. Autoscaler limits apply to the web pool only. |
features | erp_access, phpmyadmin, redisinsight, physicalbackup, grafana_dashboards, staging_env | Per-feature toggles — set false to skip generating that component. |
cloudflared | maxscale_gui, redisinsight, phpmyadmin | Hostnames for management tools. Leave blank to auto-generate <tool>-<client>-ecnv4-mgmt.ecommercen.com. |
What the agent generates
New files (~25): the full manifests_v1/app-constructs/ecommercen-clients/<client>/ tree — infrastructure/prod/ (MariaDB, MaxScale, Redis cache + session, phpMyAdmin, RedisInsight, backups), adveshop4/base/ (the big app.yaml + kustomization), adveshop4/prod/ and adveshop4/stg/ (overlays with ConfigMap patches, Ingress, KEDA scheduling). Plus two Grafana dashboards and a Kyverno policy.
Modified shared files (~6): Longhorn values.yaml (add tolerations for the new taints), cluster-autoscaler values.yaml (register the new web pool), Cloudflared configmap.yaml (add tunnel entries for the mgmt tools, before the catch-all 404), Kyverno setup kustomization, kube-prometheus-stack dashboards kustomization, Ansible inventory/hosts.yml.
Skeleton secrets: untracked/secrets/<client>/ — empty-value stubs for app-secrets, app-db-secrets, app-keycloak-secrets, and (if ERP is enabled) erp-db-secrets. Also adds commented-out cluster_seal lines to seal.sh for you to uncomment after filling in real values.
The client-onboarder agent pauses after generation for you to eyeball the diff before moving on — use git status + git diff.
Post-generation checklist
The agent produces a delegation report with four follow-up tasks:
- 🟢
terraform-manager— adds the DB and webhcloud_serverresources toterraform/main.tf(with matchingnull_resourceprovisioners for labels and taints). You then run./tf.sh plan -out=plans/<name>.tfplanand./tf.sh apply. - 🟢
secrets-manager(client secrets) — you fill in the plaintext values inuntracked/secrets/<client>/, uncomment theseal.shlines, and it runs the seal + copies everything into place (don't forget the cluster-wide shared secrets:regcred,redis-auth-secret,ecn-bucket-access). - 🟢
secrets-manager(autoscaler config) — registers the new web pool inbuild-autoscaler-config.shand re-seals the autoscaler config. - 🟢
gitops-commit-pusher— stages everything atomically and commits with a[App: <client>] Add client …message. Push once, ArgoCD picks it up.
The ApplicationSet (appset-ecommercen-clients) auto-discovers the new config.json files — no manual app-<client>.yaml to add.
Verification once it's pushed
# 1. ArgoCD generated the apps?
argocd app list | grep <client>
# 2. Pods in infrastructure namespace
kubectl -n ecommercen-clients-<client>-infrastructure get pods
# 3. Pods in app namespace (after the infrastructure pods are Ready)
kubectl -n ecommercen-clients-<client> get pods
# 4. External reachability (once DNS is wired)
curl -sI https://<production-domain> | head -5- 🟠 DNS for the client's domain is not in the repo. Create the CNAME in the Cloudflare dashboard pointing at the tunnel — see DNS & Cloudflare.
- 🟠 Seed data (initial DB dump, S3 media) is outside the onboarder's scope. Coordinate with the dev team before the first external smoke test.
- 🔴 Don't cut Cloudflare DNS over before the app actually reaches
Healthyin ArgoCD — a dangling CNAME to a 502 is noisier than an NXDOMAIN.
Further reading
- GitOps & ArgoCD — why the ApplicationSet auto-discovery works
- Scale the cluster — the node pool side of client onboarding
- DNS & Cloudflare — the DNS / tunnel wiring
- Tenants index — auto-generated per-tenant pages (you'll see your new client here after the next
npm run build)