Skip to content

Node Autoscaling

Overview

The cluster-autoscaler with Hetzner Cloud provider dynamically provisions and decommissions worker nodes based on pod scheduling demand. Each client gets a dedicated node pool — their app-web pods only run on nodes in their pool, and no other workloads can land on those nodes.

How scale-up happens

Scale-down follows the reverse path: when a node has been under-utilised (< 0.5) for 10 min, the autoscaler cordons it, drains, then calls the Hetzner API to destroy the VM.

Architecture: Dedicated Pool Per Client

Each client's node pool is isolated via a label + taint pair:

ClientPool nameNode labelNode taintInstance type
wecarewecare-webwecare=""wecare:NoScheduleCCX13

The client's app-web Deployment targets the pool with:

yaml
nodeSelector:
  <client>: ""
tolerations:
  - key: <client>
    operator: Exists
    effect: NoSchedule

Why dedicated pools?

  • Performance isolation — one client's traffic spike never affects another
  • Independent scaling — each pool has its own minSize/maxSize
  • Cost attribution — easy to track per-client infra spend

Why not node-role.kubernetes.io/app?

Kubelet forbids nodes from self-assigning labels in the kubernetes.io namespace (security restriction since K8s 1.24). Only a specific allowlist is permitted (node.kubernetes.io/*, kubelet.kubernetes.io/*, topology.kubernetes.io/*, etc.). Custom labels like wecare="" have no such restriction.

Terraform-managed nodes can use node-role.kubernetes.io/app because it's applied post-join via kubectl label from the control plane, bypassing kubelet validation. Autoscaler nodes set labels at kubelet startup via --node-labels, which is subject to the restriction.

Components

1. Autoscaler Helm values — app-constructs/cluster-autoscaler/values.yaml

Defines the autoscaling groups (one per client pool):

yaml
autoscalingGroups:
  - name: wecare-web      # Must match nodeConfigs key in the secret
    minSize: 0
    maxSize: 4
    instanceType: CCX13   # 2 dedicated AMD vCPU, 8 GB RAM
    region: hel1

2. Cluster config secret — untracked/secrets/build-autoscaler-config.sh

Builds the cluster-autoscaler-config Secret containing HCLOUD_CLUSTER_CONFIG: a JSON blob with per-pool cloud-init, labels, and taints.

bash
# Rebuild after any cloud-init or label/taint change:
source ../../scripts/bw-unlock.sh && ./build-autoscaler-config.sh

The JSON structure:

json
{
  "imagesForArch": {"amd64": "ubuntu-24.04"},
  "nodeConfigs": {
    "wecare-web": {
      "cloudInit": "<raw cloud-init YAML>",
      "labels": {"wecare": "", "topology.kubernetes.io/region": "hel1", ...},
      "taints": [{"key": "wecare", "value": "", "effect": "NoSchedule"}]
    }
  }
}

Encoding chain: data.config = base64(base64(JSON)) — K8s decodes the outer base64 when exposing the Secret as an env var, the Hetzner provider decodes the inner base64 before JSON-parsing.

cloudInit must be raw YAML, not base64. The provider passes it directly to the Hetzner API UserData field.

3. Scheduling patch — per-client overlay

Example: app-constructs/ecommercen-clients/wecare/adveshop4/prod/app-web-scheduling-patch.yaml

yaml
spec:
  template:
    spec:
      nodeSelector:
        wecare: ""
      tolerations:
        - key: wecare
          operator: Exists
          effect: NoSchedule
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: DoNotSchedule

Adding a New Client Pool

  1. values.yaml — add an autoscaling group:

    yaml
    autoscalingGroups:
      - name: wecare-web
        ...
      - name: clientb-web
        minSize: 0
        maxSize: 2
        instanceType: CCX13
        region: hel1
  2. build-autoscaler-config.sh — add a nodeConfigs entry with the client's cloud-init (same template, different label/taint in the RKE2 config), labels, and taints. The nodeConfigs key must match the pool name from step 1.

  3. Client scheduling patch — set nodeSelector: {clientb: ""} and tolerate clientb:NoSchedule.

  4. Rebuild and seal the secret, commit, push.

Operational Notes

  • Cloud-init takes ~20 min on CCX13 (package update + RKE2 download + image pull). Set max-node-provision-time accordingly (currently 45m for safety, reduce to 25m once stable).
  • Scale-down is enabled with 10m unneeded time and 0.5 utilization threshold.
  • Hetzner API quirk: user_data and ssh_keys are write-only fields — they're set at server creation but not returned by GET /servers/{id}.
  • To debug a stuck node: hcloud server enable-rescue <name> + reboot, then SSH in. Filesystem is at /mnt.
  • Autoscaler runs on control plane nodes (nodeSelector + toleration).

Internal documentation — Advisable only