Skip to content

Databases

In Plesk, databases were a checkbox in a panel — you picked MySQL or PostgreSQL, got a hostname, a user, a password, and phpMyAdmin. The cluster is the same idea, just with more moving parts exposed: operators watch custom resources that describe the database you want, and they spin up the pods, volumes, users, and backups to match. You write YAML instead of clicking — but you still end up with a MySQL server, a user, a password, and (yes) phpMyAdmin.

This page explains what's running, how the pieces fit together, and why we run two Redis clusters per tenant instead of one.

Three database layers

Three different operators manage three different database families. Each one is installed once at the cluster level; tenants reference it by creating CRs in their own namespace.

LayerOperatorNamespaceCRDs you'll see
MariaDB + MaxScalemariadb-operatorop-mariadbMariaDB, MaxScale, User, Grant, Database, Backup, PhysicalBackup, Restore, SqlJob
Redisredis-operator (opstree)op-redisRedisReplication, RedisSentinel
PostgreSQLZalando postgres-operatorpostgres-operatorpostgresql
RabbitMQrabbitmq-cluster-operatorrabbitmq-systemRabbitmqCluster

MariaDB is the primary Ecommercen database — this is where customer data actually lives. Redis is the cache and session store in front of it. PostgreSQL exists only for Keycloak and RabbitMQ is used by a couple of internal queues — you'll rarely touch either.

MariaDB + MaxScale

Everything above is generated from three CRs in the wecare namespace: a MariaDB, a MaxScale, and (for the ERP user) a User + Grant pair. The operator watches those and keeps the pods, volumes, replication topology, users, and backup CronJobs in sync with what you wrote.

Topology — primary + replica

We run spec.replicas: 2 with spec.replication.enabled: true. That produces two StatefulSet pods — app-db-0 and app-db-1 — connected by MariaDB GTID-based async replication. One of them is the current primary (writes go there), the other tails the binlog.

  • gtidStrictMode: true — replicas refuse to diverge silently; if a write lands on the wrong node, replication breaks loudly rather than drifting.
  • maxScaleRef — the MariaDB CR hands off failover decisions to MaxScale (see below). Without this, the operator would choose which pod is primary; with it, MaxScale chooses and the operator defers.
  • Pod anti-affinity — the two pods are forced onto different nodes (topologyKey: kubernetes.io/hostname). Losing one node never takes out both replicas.
  • updateStrategy: ReplicasFirstPrimaryLast — rolling upgrades touch the replica first, wait for it to rejoin, then switch over and upgrade the old primary. The only write-unavailable window is the switchover itself (a few seconds).

MaxScale — why it's in the path

MaxScale is MariaDB's official database proxy. Every database client — the PHP app, the ERP connector, phpMyAdmin — talks to app-db-maxscale:3306, never directly to a MariaDB pod. MaxScale does three jobs we care about:

  1. Read-write split. The readwritesplit router (rw-router) parses each query and sends writes to the primary, reads to whichever replica has capacity. The app doesn't need to know which pod is primary.
  2. Automatic failover. The monitor section polls both MariaDB pods every 2 seconds. If the primary dies, MaxScale promotes the replica (auto_failover: "true"), rewires routing, and rejoins the old primary as a replica when it comes back (auto_rejoin: "true"). Because we set maxScaleRef on the MariaDB CR, the operator respects MaxScale's decision — no tug-of-war.
  3. Connection keepalive. MaxScale pings idle backend connections every 300s so PHP-FPM persistent connections don't get killed by MariaDB's wait_timeout.

MaxScale runs with 2 replicas of its own, also on the dedicated DB nodes. There's a web GUI at maxscale-wecare-ecnv4-mgmt.ecommercen.com — you can see replication lag, routing stats, and trigger a switchover from there.

Where the data lives

spec.storage.storageClassName: local-path is a deliberate choice. local-path puts the PVC on the node's own disk — in our case the NVMe SSD of a dedicated DB node. We do not use Longhorn for MariaDB because:

  • Latency. InnoDB fsyncs every commit. Network-replicated storage adds milliseconds to every write; NVMe is measured in microseconds.
  • Redundancy elsewhere. Replication already gives us two copies of the data on two nodes. Double-replicating via Longhorn would be belt-and-braces with a performance tax.

The downside of local-path is that a pod is pinned to the node where its PVC was created. That's why the CR has node affinity on the wecare-mariadb label and a matching toleration for the wecare-mariadb:NoSchedule taint. The pool is provisioned separately — see Autoscaling / dedicated pools for how that works.

Declarative users, grants, databases

Creating a database user in Plesk was a form; here it's a User + Grant CR pair. Example from erp-persistence.yaml (the ERP connector's database access):

yaml
kind: User
spec:
  mariaDbRef: { name: app-db }
  passwordSecretKeyRef: { name: erp-db-secrets, key: ERP_DB_PWD }
  maxUserConnections: 20
---
kind: Grant
spec:
  mariaDbRef: { name: app-db }
  username: erp-db-user
  database: wecare_production
  privileges: ["ALL PRIVILEGES"]

The operator executes the equivalent CREATE USER + GRANT statements against MariaDB and keeps them in sync if you change the CR. The password itself is in a Sealed Secret. There's also a Database CRD for creating schemas, and a SqlJob CRD for one-shot SQL (migrations, one-off maintenance) — we don't lean on those for day-to-day work.

The same mechanism is how we provisioned the ERP's direct MySQL connection — cross-referenced in Ingress / Path C.

Backups

MariaDB runs two backup tracks in parallel, both writing to the same Hetzner Object Storage bucket (S3-compatible, hel1). Different mechanisms, different cadences, different best-fit scenarios — we keep both.

TrackCRDMechanismScheduleRetentionBest for
LogicalBackupmysqldump of the app schema, gzip-compresseddaily at 04:0030 dayspoint-in-time recovery, portable across MariaDB versions, table-level restore
PhysicalPhysicalBackupvolume-level snapshot via mariadb-backup, gzip-compressedhourly (0 * * * *)24 hoursfast full-cluster restore (no SQL replay), bootstrapping fresh replicas

Both CRs share the same ecn-bucket-access Sealed Secret for S3 credentials.

Logical — kind: Backup

A scheduled mysqldump. Defined in backup-sql.yaml:

  • Scope: the wecare_production database only (not MariaDB system tables).
  • Target: bucket ecn-private, prefix wecare/mariadb-sql-backup, endpoint hel1.your-objectstorage.com.

Logical backups are portable — you can restore a Backup into a completely different MariaDB version or even a different operator. They're the safety net you actually want when someone deletes a table at 3 AM.

Physical — kind: PhysicalBackup

A volume-level snapshot (via mariadb-backup), much faster to restore than replaying a mysqldump because it's a raw file copy, not SQL statements. Defined in physicalbackup-template.yaml:

  • Target pod: PreferReplica — takes the snapshot from the replica so the primary isn't stalled.
  • S3 prefix: wecare/mariadb-backup (distinct from the logical-backup prefix).
  • Why hourly: physical backups are meant to be the "fast recover" path. The shorter the window since the last snapshot, the less catching up the replay has to do.
  • Original purpose: to bootstrap new replicas — when a fresh pod joins the cluster, it pulls the latest physical backup from S3 instead of doing a full GTID stream from the primary (which can take hours on a large dataset).

bootstrapFrom currently disabled — the backups themselves run fine

The hourly PhysicalBackup CronJob is running and snapshots land in S3 as expected. What's disabled is the MariaDB CR's bootstrapFrom: + recovery: blocks (commented out in mariadb.yaml:15-23) because the bootstrap pods that consume these snapshots fail on start. A fresh replica currently falls back to the native GTID streaming clone from the primary — slower, but correct. The snapshots still exist in S3 and can be restored manually via a Restore CR. Cause still under investigation; tracked in project memory.

S3 credentials

Both backup CRs reference a shared ecn-bucket-access Sealed Secret that holds the S3 access key and secret key. Rotating those is the standard rotate-a-secret flow.

Restoring

There's a Restore CRD that takes either a backupRef (points at a past Backup run) or a raw S3 location. Writing one kicks off a Job that runs the restore. Actually doing it in anger — which backup to pick, downtime window, GTID considerations — is an incident scenario; delegate to the db-manager agent.

Redis — cache and session

Every tenant runs two Redis clusters, for different reasons:

ClusterWhat's in itEviction policyPersistenceSize
redis-cacheL2 application cache (query results, rendered fragments)allkeys-lru — any key can be evicteddisabled (save "" + appendonly no)14Gi pod, 9Gi maxmemory
redis-sessionPHP session datavolatile-lru — only keys with TTL are evictabledefault (AOF)4Gi pod, 3Gi maxmemory

Why split them? They have very different durability requirements. Losing the cache means a few minutes of cache misses and higher DB load — annoying but harmless. Losing sessions means every logged-in user gets kicked out. Separating them lets us turn persistence off for the cache (no AOF fork stalls, no BGSAVE pressure, fast restarts) while keeping it on for sessions.

Topology — replication + Sentinel

Each cluster is a RedisReplication with clusterSize: 2 (one primary, one replica) plus a matching RedisSentinel with clusterSize: 2. Sentinel is Redis's own failover coordinator: it watches the primary, promotes the replica if the primary disappears, and tells clients the new primary's address. The opstree redis-operator wires all of this together.

The redis-cache pods live on a dedicated wecare-cache node pool (separate from the DB pool) so a Redis restart doesn't compete with MariaDB for page cache. Redis session pods currently land on the wecare-mariadb pool via the node-role.kubernetes.io/db label — this is fine given their small footprint. Dashboards RedisInsight is available for peeking at keys, TTLs, and memory stats.

The redis-auth secret

Both clusters share a single Sealed Secret (redis-auth) for the password. Sentinel also needs it to authenticate to the data nodes during failover — that's the redisReplicationPassword block on each RedisSentinel CR. Rotating it is the standard secret-rotation flow, but all four stateful sets need to come back up cleanly after Reloader restarts them, so do it during a quiet window.

PostgreSQL and RabbitMQ

Both run via their own dedicated operators and are minor compared to MariaDB + Redis.

  • PostgreSQL (Zalando operator) is used by Keycloak for user/realm storage. One postgresql CR per environment. Managed by the Keycloak app, not by tenants directly.
  • RabbitMQ (cluster-operator) provides a small queue for internal messaging (a RabbitmqCluster CR per consumer). Low traffic; unlikely to demand operator attention.

If either one does misbehave: check the app's ArgoCD sync state first, then the operator pod logs in postgres-operator / rabbitmq-system.

Credentials

Every database password in the cluster is a Sealed Secret referenced by a CR (passwordSecretKeyRef, redisSecret, accessKeyIdSecretKeyRef). Plaintext never lands in the repo. The secrets-manager agent handles the seal/rotate/re-seal flow end-to-end — see rotate a secret for the runbook. MariaDB system users (root, the operator's own service accounts) are generated by the operator into Secrets at install time and are reset automatically if you delete them.

When to use which CR

NeedUse
New Ecommercen tenant database (app data)MariaDB + MaxScale (copy wecare as template)
App user for an existing tenant DBUser + Grant
Create a named schemaDatabase
One-shot SQL migrationSqlJob
Scheduled mysqldump to S3Backup
Scheduled volume snapshot to S3PhysicalBackup
Restore from either of the aboveRestore
Cache or session storeRedisReplication + RedisSentinel
Keycloak backing store (already configured)Zalando postgresql
Internal queueRabbitmqCluster

When to delegate to the db-manager agent

Anything non-trivial touching MariaDB or MaxScale — version bumps, myCnf tuning, compatibility checks, debugging a weird replication lag, planning a downgrade — should go through the db-manager Claude agent. It knows the operator CRDs cold, queries container registries for available tags, validates myCnf against the target MariaDB version, and flags misconfigurations (memory overcommit, timeout mismatches, missing anti-affinity, removed parameters) before you apply them. See the agents reference for how to invoke it.

Day-to-day reading of pod status, logs, and describe output stays with the k8s-manager agent. Drawing the line: if the question is "is this pod up?" it's k8s-manager; if the question is "should I change innodb_buffer_pool_size before or after this upgrade?" it's db-manager.

Further reading

Internal documentation — Advisable only