A full-stack GitOps promotion flow architecture, designed for traceability, maintainability, and multi-service scalability.
🚀 This is Part 6 — the final chapter — of my “Building a Real and Reliable GitOps Architecture” series.
👉 Part 1: Why Argo CD Wasn't Enough
👉 Part 2: From Kro RGD to Full GitOps: How I Built a Clean Deployment Flow with Argo CD
👉 Part 3: Designing a Maintainable GitOps Repo Structure: Managing Multi-Service and Multi-Env with Argo CD + Kro
👉 Part 4: GitOps Promotion with Kargo: Image Tag → Git Commit → Argo Sync
🔔 Since this is the final article, I’ll also wrap up with some reflections — and a glimpse into what’s coming next.
📖 Introduction: From Promotion Flow to GitOps System Architecture
In Part 4, I introduced a basic promotion flow:
image → commit → sync
In Part 5, I implemented it with real components:
Warehouse → Stage → Task
But in Part 6, I want to explore what happens when that flow gets dropped into a real environment—multiple services, long-term operation, growing complexity.
I wasn't just trying to make it work once.
I wanted to design a system—
A system that's traceable, adaptable, resilient, and doesn't break when things change.
In this article, I'll focus on the architectural mindset behind it—how I layered my GitOps design and modularized my promotion logic to build something that doesn't just work, but lasts.
🧱 Architecture Layering: How I Used root-app + ApplicationSet to Manage Multiple Services
Just having a working promotion flow isn't enough.
If you want your system to survive long-term, across services and teams, you need a structure that scales, decouples, and supports independent debugging.
I considered the common approach early on—
Creating a dev-app
/ prod-app
as middle-layer Applications to manage services underneath.
Looks neat in theory. But in practice? It caused two major issues:
Hard to decouple promotion strategies
When services are bundled into one App, a single change (like a hotfix) can affect the whole package.Overcoupled behavior and risk
One sync failure could block multiple services. Permission scopes get blurry. Separation of concerns becomes a nightmare.
So I went in the opposite direction—
I adopted a 3-layer GitOps architecture:
root-app
└── application-set (by namespace: develop / production)
└── application (one per service: frontend-app / backend-app ...)
- root-app: Manages overall structure; each namespace maps to an ApplicationSet
- ApplicationSets: Scans Git folders to generate Argo CD Application
- Application: Deploys and promotes a single service
This structure addresses the three pain points I care about most:
- Separation of promotion logic: Each service can define its own release strategy — fully decoupled.
- Error isolation: If one app sync fails, others continue working.
- Minimal expansion cost: To add a service, just push a new folder. The ApplicationSet auto-generates its Application.
📌 This isn't about having tidy YAML.
It's about building a system that remains stable, transparent, and maintainable as it grows.
And this clear layering enabled me to cleanly separate tool responsibilities and modularize my upgrade logic later on.
⚙️ Responsibility Decoupling: What Kro, Kargo, and Argo CD Actually Do
In this architecture, each tool serves a clear purpose—not because they can't do more, but because separating responsibilities helps boundaries stay stable.
Kro: Declarative Templates + DAG-Defined Infrastructure
Kro isn't just a YAML generator.
It defines DAG-based dependency graphs for each service.
Each ResourceGraphDefinition
(RGD) is a complete service blueprint, modeled as a graph of interdependent components: Deployment, Service, ConfigMap, etc.
Kro handles:
- Validates the DAG structure to ensure no cycles or misconfigurations.
- Generates a CRD (like WebApplication) to manage services as Kubernetes-native resources.
- Applies and reconciles manifests in DAG order when instance.yaml changes.
📌 Each service’s infra becomes a versionable, testable graph.
Changes to subcomponents or dependency ordering become safe and composable — no more hand-editing YAML everywhere.
More importantly, this DAG modeling defines the service architecture itself — making refactoring, extending, or validating services much easier and safer.
Kro is not just my templating layer — it’s my system’s graph-based modeling layer, scaling logic without duplicating boilerplate.
Kargo: Promotion Orchestration + Traceability
Kargo orchestrates my upgrade process. Key components:
Warehouse: Tracks image tags and filters them by SemVer or lexical rules.
Stage: Decides whether to promote, based on tag comparison or validations.
PromotionTask: Executes the upgrade (update → commit → push → sync).
📌 Kargo makes promotion stateful, observable, and pluggable.
It turns what would be a static pipeline into a logical graph of upgrade checkpoints.
With Kargo, I can easily insert manual approvals, CI checks, or security scans — without losing traceability or rollback control.
Argo CD: Git → Cluster Synchronization Only
Argo CD’s role is the simplest — and purest.
It syncs manifests (produced by Kro, updated by Kargo) from Git to the cluster.
No decisions.
No tag handling.
No manifest mutation.
📌 Argo CD acts purely as the executor.
It ensures state consistency while remaining easy to debug, scale, and replace.
Why separate them this way?
- ✅ Decoupling: Each logic layer can evolve independently.
- ✅ Observability: Every part of the flow is traceable.
- ✅ Flexibility: Any tool can be swapped (Flux, GitLab CI, etc.) without a full rewrite.
This isn't about stacking tools. It’s about building a system that’s composable, maintainable, and future-proof.
♻️ Promotion Flow Design: From One Line to a Traceable System
Full promotion flow:
Image
↓
Warehouse
↓
Stage (Decision)
↓
PromotionTask (Execution)
↓
Git Commit
↓
Argo CD Sync
Why each step matters:
Image → Warehouse: I don’t let CI trigger upgrades. Warehouse watches the repo, giving full visibility into “what was published” and “what changed.”
Warehouse → Stage: Stage only decides if something should be promoted — based on SemVer comparison, webhook checks, or YAML diffs. Approval gates or CI validations can plug in here.
Stage → PromotionTask: PromotionTask contains the how, not just the what. Fully reusable and parameterized.
Task → Git commit: No direct applies. Every promotion creates a Git commit — for rollback, history, and reviewability.
Git → Argo CD: Argo CD syncs the commit. It’s the final executor — not part of the promotion decision.
📌 This isn’t just a working flow—it’s a traceable, modular, and extensible promotion lifecycle.
🧩 Modular & Maintainable Design: Making Promotion Logic Reusable
This is the part of the architecture I care most about.
I don’t just want upgrade flows to work.
I want them to be:
- Reusable: No need to reimplement for each service.
- Composable: Easy to plug in validations, notifications, and more.
- Safe to evolve: Changes don’t break everything else.
1️⃣ Stage ≠ Task: Split the Responsibilities
-
Stage
: should we promote? -
PromotionTask
: how do we promote?
Why this matters: If you pack everything into Stage, you’ll copy-paste logic across every service. Change one line? Touch ten YAML files.
2️⃣ PromotionTask as Template: Parameterized, Reusable Logic
I defined a standard PromotionTask and used variables to customize it per service:
apiVersion: kargo.akuity.io/v1alpha1
kind: PromotionTask
metadata:
name: promote-image
namespace: develop
spec:
vars:
- name: imageRepo
value: docker.io/yourname/your-image
- name: instancePath
value: develop/your-service/instance.yaml
- name: appName
value: your-service-dev-app
steps:
- uses: git-clone
- uses: yaml-parse
- uses: yaml-update
- uses: git-commit
- uses: git-push
- uses: argocd-update
With this setup:
- frontend and backend share the same promotion task — just different vars.
- Want to add validation? Copy the task, add a yaml-assert step.
📌 This isn’t just templating.
It’s a modular framework for defining, evolving, and scaling promotion logic.
🧨 What I Broke (and How I Fixed It)
Freight =
: Image tag parsing failed → Switched to SemVer + quote().Git push failed: Forgot token → Moved token into vars, fully automated.
Argo CD sync error: App missing authorized-stage → Fixed by adding annotation.
Kro
instance.yaml
missing: Wrong git-clone path → Reorganized repo structure.Sync chaos →: Multiple services under one App → Split into independent Apps.
📌 These mistakes weren’t bugs to patch.
They were lessons to harden the system’s foundations.
🧭 Why This System Was Worth Building
Not built for a demo.
Built to survive.
Real systems need:
- Traceability
- Rollback
- Validation
- Modularity
📌 These are the price of long-term maintenance, not bonuses.
🔚 Final Thoughts: This Isn’t a Toolchain—It’s a System That Survives
It doesn’t just run.
It can be maintained.
It’s not rigid.
It evolves.
It’s not hardcoded.
It’s a modular, auditable system.
It wasn’t enough to make the flow work once.
It had to survive growth, adapt to change, and stay traceable across services and teams.
If you’re designing your own GitOps setup, don’t just pick tools — design for resilience. Build a system that grows with you.
📚 Wrapping Up This Series—See You in the Next One
This marks the final post in my “Designing a Real, Reliable GitOps Architecture” series.
Next up:
A brand-new series on building an MLOps architecture with MicroK8s, MLflow, Kubeflow, and vLLM.
If you're into model versioning, serving, and full-lifecycle MLOps—stay tuned.
👋 See you in the next series.
If you’re also building out your own GitOps system, I’d love to hear what approaches you’re trying.