Node Autoscaling
Overview
The cluster-autoscaler with Hetzner Cloud provider dynamically provisions and decommissions worker nodes based on pod scheduling demand. Each client gets a dedicated node pool — their app-web pods only run on nodes in their pool, and no other workloads can land on those nodes.
How scale-up happens
Scale-down follows the reverse path: when a node has been under-utilised (< 0.5) for 10 min, the autoscaler cordons it, drains, then calls the Hetzner API to destroy the VM.
Architecture: Dedicated Pool Per Client
Each client's node pool is isolated via a label + taint pair:
| Client | Pool name | Node label | Node taint | Instance type |
|---|---|---|---|---|
| wecare | wecare-web | wecare="" | wecare:NoSchedule | CCX13 |
The client's app-web Deployment targets the pool with:
nodeSelector:
<client>: ""
tolerations:
- key: <client>
operator: Exists
effect: NoScheduleWhy dedicated pools?
- Performance isolation — one client's traffic spike never affects another
- Independent scaling — each pool has its own
minSize/maxSize - Cost attribution — easy to track per-client infra spend
Why not node-role.kubernetes.io/app?
Kubelet forbids nodes from self-assigning labels in the kubernetes.io namespace (security restriction since K8s 1.24). Only a specific allowlist is permitted (node.kubernetes.io/*, kubelet.kubernetes.io/*, topology.kubernetes.io/*, etc.). Custom labels like wecare="" have no such restriction.
Terraform-managed nodes can use node-role.kubernetes.io/app because it's applied post-join via kubectl label from the control plane, bypassing kubelet validation. Autoscaler nodes set labels at kubelet startup via --node-labels, which is subject to the restriction.
Components
1. Autoscaler Helm values — app-constructs/cluster-autoscaler/values.yaml
Defines the autoscaling groups (one per client pool):
autoscalingGroups:
- name: wecare-web # Must match nodeConfigs key in the secret
minSize: 0
maxSize: 4
instanceType: CCX13 # 2 dedicated AMD vCPU, 8 GB RAM
region: hel12. Cluster config secret — untracked/secrets/build-autoscaler-config.sh
Builds the cluster-autoscaler-config Secret containing HCLOUD_CLUSTER_CONFIG: a JSON blob with per-pool cloud-init, labels, and taints.
# Rebuild after any cloud-init or label/taint change:
source ../../scripts/bw-unlock.sh && ./build-autoscaler-config.shThe JSON structure:
{
"imagesForArch": {"amd64": "ubuntu-24.04"},
"nodeConfigs": {
"wecare-web": {
"cloudInit": "<raw cloud-init YAML>",
"labels": {"wecare": "", "topology.kubernetes.io/region": "hel1", ...},
"taints": [{"key": "wecare", "value": "", "effect": "NoSchedule"}]
}
}
}Encoding chain: data.config = base64(base64(JSON)) — K8s decodes the outer base64 when exposing the Secret as an env var, the Hetzner provider decodes the inner base64 before JSON-parsing.
cloudInit must be raw YAML, not base64. The provider passes it directly to the Hetzner API UserData field.
3. Scheduling patch — per-client overlay
Example: app-constructs/ecommercen-clients/wecare/adveshop4/prod/app-web-scheduling-patch.yaml
spec:
template:
spec:
nodeSelector:
wecare: ""
tolerations:
- key: wecare
operator: Exists
effect: NoSchedule
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotScheduleAdding a New Client Pool
values.yaml— add an autoscaling group:yamlautoscalingGroups: - name: wecare-web ... - name: clientb-web minSize: 0 maxSize: 2 instanceType: CCX13 region: hel1build-autoscaler-config.sh— add anodeConfigsentry with the client's cloud-init (same template, different label/taint in the RKE2 config), labels, and taints. ThenodeConfigskey must match the pool name from step 1.Client scheduling patch — set
nodeSelector: {clientb: ""}and tolerateclientb:NoSchedule.Rebuild and seal the secret, commit, push.
Operational Notes
- Cloud-init takes ~20 min on CCX13 (package update + RKE2 download + image pull). Set
max-node-provision-timeaccordingly (currently 45m for safety, reduce to 25m once stable). - Scale-down is enabled with 10m unneeded time and 0.5 utilization threshold.
- Hetzner API quirk:
user_dataandssh_keysare write-only fields — they're set at server creation but not returned by GET /servers/{id}. - To debug a stuck node:
hcloud server enable-rescue <name>+ reboot, then SSH in. Filesystem is at/mnt. - Autoscaler runs on control plane nodes (nodeSelector + toleration).