Skip to content

GitOps & ArgoCD

The repo is the source of truth. Everything else — the live cluster, Grafana, Google Chat alerts, Cloudflare DNS — is a downstream consequence of what's committed. If you remember only that, you'll be fine.

The one-sentence definition

GitOps is an operating model where the desired state of your system is stored in a Git repository, and a controller runs continuously to make the real system match what the repo says.

  • You don't press a "deploy" button.
  • You don't run kubectl apply by hand.
  • You write a change, commit it, push it. Within a minute the cluster has applied it.

This is very different from how Plesk worked. In Plesk, you made a change by clicking a button — state lived in Plesk's database and the running system was modified directly. There was no "ground truth" outside the running system itself.

ArgoCD is the controller

ArgoCD is the process that does the matching. It:

  1. Watches this repo (git@github.com:Advisable-com/ecnv4_manifests) for changes.
  2. Compares what the repo says should exist vs. what's actually in the cluster.
  3. Applies the difference — creates missing resources, updates changed ones, deletes removed ones.
  4. Flags any drift — when the cluster has been changed outside the repo — and reverts it if selfHeal: true is enabled (which it is for us).

Two cluster-admin-visible UIs:

The argo.sh wrapper (local tooling note)

Throughout this manual you'll see commands like argocd app list, argocd app sync <name>, argocd app diff <name>. Those are the standard ArgoCD CLI — they work, but they assume you've logged in and that Cloudflare Access is happy with your session.

For day-to-day work we have a tiny shell wrapper at untracked/scripts/argo.sh that removes both of those friction points. It:

  1. Starts a kubectl port-forward to the argocd-server Service so traffic never touches the Cloudflare Zero Trust edge — CF Access can't block what it can't see.
  2. Reads the in-cluster admin secret and authenticates for you — no password prompt, no argocd login dance.
  3. Cleans up the port-forward on exit.

How to use it: it's a drop-in replacement. Any argocd <...> command in this manual can be invoked as ./untracked/scripts/argo.sh <...> and it "just works". To target a different kubectl context: KUBECTL_CONTEXT=<ctx> ./untracked/scripts/argo.sh app list.

When to use raw argocd instead: if you've already run argocd login argocd.ecnv4-mgmt.ecommercen.com --sso in your shell and your browser session has a valid CF Access cookie, raw argocd works too. For one-off commands the wrapper is simply less hassle.

The app-of-apps pattern

If ArgoCD managed 25+ individual Application resources, you'd be constantly clicking around. Instead we use one entry point that fans out:

To enable/disable a component, you move its app-*.yaml file between apps-enabled/ and apps-disabled/. That's it — ArgoCD notices and applies (or prunes, with automated.prune: true).

ApplicationSets — one template, many apps

When we onboard a new client, we don't want to create 5 Application manifests by hand. appset-ecommercen-clients.yaml does the generating:

  • Scans manifests_v1/app-constructs/ecommercen-clients/**/config.json for files.
  • For each file found, generates a full Application resource using the config.json's fields.
  • Adds/removes apps as config.json files appear/disappear.

This is why onboarding a client is "copy a directory + commit" rather than "write 15 YAML files by hand".

Key concepts you'll see

WordWhat it means
SyncApply the state in the repo to the cluster (manual or automatic).
HealthThe applied resources' own health (pods Ready, Deployment replica count matches, etc).
OutOfSyncCluster has something different from the repo (someone ran kubectl edit).
DegradedResources are applied but unhealthy (crashloop, missing secret, etc).
ProgressingRollout in flight — transient, usually resolves in under a minute.
selfHeal: trueArgoCD reverts out-of-sync cluster changes automatically.
automated.prune: trueResources removed from the repo get deleted from the cluster.
Sync waveOrdering hint; lower numbers sync first (useful for "install CRD before anything that uses it").
ignoreDifferences"Ignore these specific fields on this resource" — used for things that are patched in-cluster by other controllers.

What you should (and shouldn't) do

  • 🟢 Do: edit YAML in the repo, commit, push, let ArgoCD sync.
  • 🟢 Do: click "Sync" in the UI if you're impatient and don't want to wait 60s for the auto-poll.
  • 🟠 Careful with: kubectl edit on anything managed by ArgoCD. Your change will be reverted within a minute by selfHeal.
  • 🔴 Don't: kubectl apply a file that contradicts what the repo says — you'll confuse yourself and a human teammate trying to reconcile later.

When to break the rules

There are legitimate times to make an in-cluster change that isn't in the repo:

  1. ignoreDifferences resourcesargocd-secret, app-kube-prometheus-stack-grafana (admin password), a handful of controller-initialised fields. These are documented case-by-case in our repo with a comment.
  2. Debugging in the momentkubectl scale deploy foo --replicas=0 to stop a bleeding fire. But you must update the repo immediately after, or selfHeal reverts you.
  3. Secret rotation via Bitwarden (see Secrets & Bitwarden) — plaintext never lands in the repo.

Further reading

Internal documentation — Advisable only