Level 400
Hi clouders, in the previous blog you can explore the considerations to deploy at scale gitops and IaC, in this post you can learn more about how to deploy spokes cluster using the GitOps bridge framework and specific use cases with AWS enterprise capabilities and security best practices.
Architecture Overview
Spoke clusters and Scenarios
According to the best practices there are many considerations and scenarios, for example:
You can have one stack or IaC repository with the definitions for each account, team or squat. It depends on your internal organizations and share responsibility model. For this scenario suppose that the model is a decentralized DevOps Operational Model, and each squat has their custom IaC for each project. Consider keeping clear separation and isolation between environments and make match your platform strategy, for example, some organizations have a single hub for managing dev, QA and prod cluster, others have a hub to manage each environment cluster and other, and others have one hub for previous environments and another for production.
Another key point to note is the capacity planning and applications by team, some organizations allow share the same cluster for an organizational unit and keep the applications grouped by namespaces in each environment, others prefer to have one environment and cluster by workload or application. Considering the networking and security considerations has main pain and challenges for both scenarios. In this series the main assumption is that each workload has a dedicated cluster and account by environment but there is a transversal platform team that manages the cluster configuration and control plane. The following table describe the relationship between scenario and AWS accounts:
Scenario | AWS Account | Clusters |
---|---|---|
Single Hub – N Spokes | 1 for HUB – N account by environment | 1 cluster Hub, N Spoke Clusters by environment |
M Hub - N spokes | M accounts By Hub environments – N account by environments | M Hub Clusters, N Environment Clusters |
The Figure 1 depicts the architecture for this scenario.
Figure 1. GitOps bridge Deployment architecture in AWS
The FinOps practices are key point independent of your resources distribution you must consider what is the best strategy for track cost and shared resources.
Hands On
First, modify the hub infrastructure stacks to add the stack to manage the credentials cross account and allow the CI infrastructure agents to take them to register the cluster in the control plane. Also, create the role for Argocd to enable the authentication between Argocd hub and spoke cluster.
Updating Control Plane Infrastructure
So, let’s set up the credentials according to the best practices and the scenario described in the previous section, the cluster credentials are stored in parameter store and share with the organizational units for each team or business unit.
You must enable RAM as trusted service in your organization from Organizations management account and RAM Console.
For this task, a local module terraform-aws-parameter-store
was created:
└── terraform-aws-ssm-parameter-sotre
├── README.md
├── data.tf
├── main.tf
├── outputs.tf
└── variables.tf
The module creates the parameter and shares with organization id, organizational unit, or account id principals.
Now, using terragrunt the module is called to create a new stack or terragrunt unit.
#parameter_store-terragrunt.hcl
include "root" {
path = find_in_parent_folders("root.hcl")
expose = true
}
dependency "eks" {
config_path = "${get_parent_terragrunt_dir("root")}/infrastructure/containers/eks_control_plane"
mock_outputs = {
cluster_name = "dummy-cluster-name"
cluster_endpoint = "dummy_cluster_endpoint"
cluster_certificate_authority_data = "dummy_cluster_certificate_authority_data"
cluster_version = "1.31"
cluster_platform_version = "1.31"
oidc_provider_arn = "dummy_arn"
cluster_arn = "arn:aws:eks:us-east-2:105171185823:cluster/gitops-scale-dev-hub"
}
mock_outputs_merge_strategy_with_state = "shallow"
}
locals {
# Define parameters for each workspace
env = {
default = {
parameter_name = "/control_plane/${include.root.locals.environment.locals.workspace}/credentials"
sharing_principals = ["ou-w3ow-k24p2opx"]
tags = {
Environment = "control-plane"
Layer = "Operations"
}
}
"dev" = {
create = true
}
"prod" = {
create = true
}
}
# Merge parameters
environment_vars = contains(keys(local.env), include.root.locals.environment.locals.workspace) ? include.root.locals.environment.locals.workspace : "default"
workspace = merge(local.env["default"], local.env[local.environment_vars])
}
terraform {
source = "../../../modules/terraform-aws-ssm-parameter-sotre"
}
inputs = {
parameter_name = "${local.workspace["parameter_name"]}"
parameter_description = "Control plane credentials"
parameter_type = "SecureString"
parameter_tier = "Advanced"
create_kms = true
enable_sharing = true
sharing_principals= local.workspace["sharing_principals"]
parameter_value = jsonencode({
cluster_name = dependency.eks.outputs.cluster_name,
cluster_endpoint = dependency.eks.outputs.cluster_endpoint,
cluster_certificate_authority_data = dependency.eks.outputs.cluster_certificate_authority_data,
cluster_version = dependency.eks.outputs.cluster_version,
cluster_platform_version = dependency.eks.outputs.cluster_platform_version,
oidc_provider_arn = dependency.eks.outputs.oidc_provider_arn,
hub_account_id = split(":", dependency.eks.outputs.cluster_arn)[4]
} )
tags = local.workspace["tags"]
}
Now, another stack is necessary, the IAM role to enable service account for argocd use the IAM authentication with spoke clusters. The module terraform-aws-iam/iam-eks-role allows to create the IRSA role but also is necessary create a custom policy to allow assume role in the spoke accounts. The Figure 2 depicts in depth this setup.
You can create simple stack for managing the role, or a module that supports the EKS definition and IAM role.
Figure 2. GitOps authentication summary.
So, the module is in modules/terraform-aws-irsa-eks-hub
modules/
├── terraform-aws-irsa-eks-hub
│ ├── README.md
│ ├── main.tf
│ ├── outputs.tf
│ └── variables.tf
└── terraform-aws-ssm-parameter-sotre
├── README.md
├── data.tf
├── main.tf
├── outputs.tf
└── variables.tf
3 directories, 9 files
and the stack is:
#eks_role-terragrunt.hcl
include "root" {
path = find_in_parent_folders("root.hcl")
expose = true
}
dependency "eks" {
config_path = "${get_parent_terragrunt_dir("root")}/infrastructure/containers/eks_control_plane"
mock_outputs = {
cluster_name = "dummy-cluster-name"
cluster_endpoint = "dummy_cluster_endpoint"
cluster_certificate_authority_data = "dummy_cluster_certificate_authority_data"
cluster_version = "1.31"
cluster_platform_version = "1.31"
oidc_provider_arn = "dummy_arn"
}
mock_outputs_merge_strategy_with_state = "shallow"
}
locals {
# Define parameters for each workspace
env = {
default = {
environment = "control-plane"
role_name = "eks-role-hub"
tags = {
Environment = "control-plane"
Layer = "Networking"
}
}
"dev" = {
create = true
}
"prod" = {
create = true
}
}
# Merge parameters
environment_vars = contains(keys(local.env), include.root.locals.environment.locals.workspace) ? include.root.locals.environment.locals.workspace : "default"
workspace = merge(local.env["default"], local.env[local.environment_vars])
}
terraform {
source = "../../../modules/terraform-aws-irsa-eks-hub"
}
inputs = {
role_name = "${local.workspace["role_name"]}-${local.workspace["environment"]}"
cluster_service_accounts = {
"${dependency.eks.outputs.cluster_name}" = [
"argocd:argocd-application-controller",
"argocd:argo-cd-argocd-repo-server",
"argocd:argocd-server",
]
}
tags = local.workspace["tags"]
}
Finally, the gitops_bridge stack must look like:
#eks_control_plane-terragrunt.hcl
include "root" {
path = find_in_parent_folders("root.hcl")
expose = true
}
include "k8s_helm_provider" {
path = find_in_parent_folders("/common/additional_providers/provider_k8s_helm.hcl")
}
dependency "eks" {
config_path = "${get_parent_terragrunt_dir("root")}/infrastructure/containers/eks_control_plane"
mock_outputs = {
cluster_name = "dummy-cluster-name"
cluster_endpoint = "dummy_cluster_endpoint"
cluster_certificate_authority_data = "dummy_cluster_certificate_authority_data"
cluster_version = "1.31"
cluster_platform_version = "1.31"
oidc_provider_arn = "dummy_arn"
}
mock_outputs_merge_strategy_with_state = "shallow"
}
dependency "eks_role" {
config_path = "${get_parent_terragrunt_dir("root")}/infrastructure/iam/eks_role"
mock_outputs = {
iam_role_arn = "arn::..."
}
mock_outputs_merge_strategy_with_state = "shallow"
}
locals {
# Define parameters for each workspace
env = {
default = {
environment = "control-plane"
oss_addons = {
enable_argo_workflows = true
#enable_foo = true
# you can add any addon here, make sure to update the gitops repo with the corresponding application set
}
addons_metadata = merge(
{
addons_repo_url = "https://github.com/gitops-bridge-dev/gitops-bridge-argocd-control-plane-template"
addons_repo_basepath = ""
addons_repo_path ="bootstrap/control-plane/addons"
addons_repo_revision = "HEAD"
}
)
argocd_apps = {
addons = file("./bootstrap/addons.yaml")
#workloads = file("./bootstrap/workloads.yaml")
}
tags = {
Environment = "control-plane"
Layer = "Networking"
}
}
"dev" = {
create = true
}
"prod" = {
create = true
}
}
# Merge parameters
environment_vars = contains(keys(local.env), include.root.locals.environment.locals.workspace) ? include.root.locals.environment.locals.workspace : "default"
workspace = merge(local.env["default"], local.env[local.environment_vars])
}
terraform {
source = "tfr:///gitops-bridge-dev/gitops-bridge/helm?version=0.1.0"
}
inputs = {
cluster_name = dependency.eks.outputs.cluster_name
cluster_endpoint = dependency.eks.outputs.cluster_endpoint
cluster_platform_version = dependency.eks.outputs.cluster_platform_version
oidc_provider_arn = dependency.eks.outputs.oidc_provider_arn
cluster_certificate_authority_data = dependency.eks.outputs.cluster_certificate_authority_data
cluster = {
cluster_name = dependency.eks.outputs.cluster_name
environment = local.workspace["environment"]
metadata = local.workspace["addons_metadata"]
addons = merge(local.workspace["oss_addons"], { kubernetes_version = dependency.eks.outputs.cluster_version })
}
apps = local.workspace["argocd_apps"]
argocd = {
namespace = "argocd"
#set = [
# {
# name = "server.service.type"
# value = "LoadBalancer"
# }
#]
values = [
yamlencode(
{
configs = {
params = {
"server.insecure" = true
}
}
server = {
"serviceAccount" = {
annotations = {
"eks.amazonaws.com/role-arn" = dependency.eks_role.outputs.iam_role_arn
}
}
service = {
type = "NodePort"
}
ingress = {
enabled = false
controller = "aws"
ingressClassName : "alb"
aws = {
serviceType : "NodePort"
}
annotations = {
#"alb.ingress.kubernetes.io/backend-protocol" = "HTTPS"
#"alb.ingress.kubernetes.io/ssl-redirect" = "443"
#"service.beta.kubernetes.io/aws-load-balancer-type" = "external"
#"service.beta.kubernetes.io/aws-load-balancer-nlb-target-type" = "ip"
#"alb.ingress.kubernetes.io/listen-ports" : "[{\"HTTPS\":443}]"
}
}
}
controller = {
"serviceAccount" = {
annotations = {
"eks.amazonaws.com/role-arn" = dependency.eks_role.outputs.iam_role_arn
}
}
}
repoServer = {
"serviceAccount" = {
annotations = {
"eks.amazonaws.com/role-arn" = dependency.eks_role.outputs.iam_role_arn
}
}
}
}
)
]
}
tags = local.workspace["tags"]
}
Basically the main changes was the introduction to argocd map to setup the values for helm chart deployment to enable to use the IRSA role.
When you are using cross account deployment the profile that creates the secrets in hub cluster must to have access and permissions, for example in the repository the eks_control_plane stack introduce a new access entry:
#eks_control_plane-terragrunt.hcl
include "root" {
path = find_in_parent_folders("root.hcl")
expose = true
}
dependency "vpc" {
config_path = "${get_parent_terragrunt_dir("root")}/infrastructure/network/vpc"
mock_outputs = {
vpc_id = "vpc-04e3e1e302f8c8f06"
public_subnets = [
"subnet-0e4c5aedfc2101502",
"subnet-0d5061f70b69eda14",
]
private_subnets = [
"subnet-0e4c5aedfc2101502",
"subnet-0d5061f70b69eda14",
"subnet-0d5061f70b69eda15",
]
}
mock_outputs_merge_strategy_with_state = "shallow"
}
locals {
# Define parameters for each workspace
env = {
default = {
create = false
cluster_name = "${include.root.locals.common_vars.locals.project}-${include.root.locals.environment.locals.workspace}-hub"
cluster_version = "1.32"
# Optional
cluster_endpoint_public_access = true
# Optional: Adds the current caller identity as an administrator via cluster access entry
enable_cluster_creator_admin_permissions = true
access_entries = {
#####################################################################################################################
# Admin installation and setup for spoke accounts - Demo purpose- must be the ci Agent Role
####################################################################################################################
admins_sso = {
kubernetes_groups = []
principal_arn = "arn:aws:sts::123456781234:role/aws-reserved/sso.amazonaws.com/us-east-2/AWSReservedSSO_AWSAdministratorAccess_877fe9e4127a368d"
user_name = "arn:aws:sts::123456781234:assumed-role/AWSReservedSSO_AWSAdministratorAccess_877fe9e4127a368d/{{SessionName}}"
policy_associations = {
single = {
policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy"
access_scope = {
type = "cluster"
}
}
}
}
}
cluster_compute_config = {
enabled = true
node_pools = ["general-purpose"]
}
tags = {
Environment = "control-plane"
Layer = "Networking"
}
}
"dev" = {
create = true
}
"prod" = {
create = true
}
}
# Merge parameters
environment_vars = contains(keys(local.env), include.root.locals.environment.locals.workspace) ? include.root.locals.environment.locals.workspace : "default"
workspace = merge(local.env["default"], local.env[local.environment_vars])
}
terraform {
source = "tfr:///terraform-aws-modules/eks/aws?version=20.33.1"
}
inputs = {
create = local.workspace["create"]
cluster_name = local.workspace["cluster_name"]
cluster_version = local.workspace["cluster_version"]
# Optional
cluster_endpoint_public_access = local.workspace["cluster_endpoint_public_access"]
# Optional: Adds the current caller identity as an administrator via cluster access entry
enable_cluster_creator_admin_permissions = local.workspace["enable_cluster_creator_admin_permissions"]
cluster_compute_config = local.workspace["cluster_compute_config"]
vpc_id = dependency.vpc.outputs.vpc_id
subnet_ids = dependency.vpc.outputs.private_subnets
access_entries = local.workspace["access_entries"]
tags = {
Environment = include.root.locals.environment.locals.workspace
Terraform = "true"
}
tags = local.workspace["tags"]
}
The final Hub infrastructure is:
Figure 3. Control Plane Infrastructure.
👇 Get the code here.
velez94
/
terragrunt_aws_gitops_blueprint
Public demo for Gitops bridge using terragrunt, OpenTofu and EKS
AWS GitOps Blueprint with Terragrunt
This project provides a blueprint for implementing GitOps on AWS using Terragrunt and Argo CD. It offers a structured approach to managing infrastructure as code and deploying applications across multiple environments.
The blueprint is designed to streamline the process of setting up a GitOps workflow on AWS, leveraging Terragrunt for managing Terraform configurations and Argo CD for continuous deployment. It includes configurations for essential AWS services such as EKS (Elastic Kubernetes Service) and VPC (Virtual Private Cloud), as well as GitOps components for managing cluster addons and platform-level resources.
Key features of this blueprint include:
- Modular infrastructure setup using Terragrunt
- EKS cluster configuration for container orchestration
- VPC network setup for secure and isolated environments
- GitOps bridge for seamless integration between infrastructure and application deployments
- Argo CD ApplicationSets for managing cluster addons and platform resources
- Environment-specific configurations for multi-environment deployments
Repository Structure
terragrunt_aws_gitops_blueprint/
├── common/
│
…Creating spoke Cluster Infrastructure
The second step is creating the spoke cluster infrastructure. 🧑🏾💻
To manage a spoke cluster a single terragrunt or tofu project is created independently for each team from a template, so, infrastructure has an individual pipeline to manage it, and each team could have custom CI/CD agents, also more flexibility to add features and components to the infrastructure stacks. In some cases, you can have a single pipeline to manage the infrastructure setup and work with environment or parameters managed by the orchestration CI/CD tool. This approach is utilized when necessary to have central governance and control, but consider the changes rates, common tasks and environment setup and CI/CD workers capacity assigned to central ops.
The code is like the hub repository; however, the main difference is in the stack GitOps bridge.
Let’s watch it in depth. 🕵️♀️
First, a new provider configuration is necessary, the cluster hub provider, in the terragrunt_aws_gitops_spoke_blueprint/common/additional_providers
:
locals {
workspace = get_env("TF_VAR_env", "dev")
pipeline = "false"
hub_account_id = "105171185823"
}
generate "k8s_helm_provider" {
path = "k8s_helm_provider.tf"
if_exists = "overwrite"
contents = <<EOF
################################################################################
# Kubernetes Access for Spoke Cluster
################################################################################
# First, define the parameter store data source
data "aws_ssm_parameter" "hub_cluster_config" {
count = 1
with_decryption = true
name = "arn:aws:ssm:us-east-2:${local.hub_account_id}:parameter/control_plane/${local.workspace}/credentials"
#"/control_plane/${local.workspace}/credentials" # Adjust the parameter path as needed
}
provider "kubernetes" {
host = try(jsondecode(data.aws_ssm_parameter.hub_cluster_config[0].value).cluster_endpoint, var.cluster_endpoint)
cluster_ca_certificate = try(base64decode(jsondecode(data.aws_ssm_parameter.hub_cluster_config[0].value).cluster_certificate_authority_data), var.cluster_certificate_authority_data)
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "aws"
args = [
"eks",
"get-token",
"--cluster-name",
try(jsondecode(data.aws_ssm_parameter.hub_cluster_config[0].value).cluster_name, var.cluster_name),
"--region",
try(jsondecode(data.aws_ssm_parameter.hub_cluster_config[0].value).cluster_region, data.aws_region.current.name),
"--profile",
var.profile["${local.workspace}"]["profile"]
]
}
alias = "hub"
}
EOF
}
The gitops_bridge stack is:
#eks_control_plane-terragrunt.hcl
include "root" {
path = find_in_parent_folders("root.hcl")
expose = true
}
include "k8s_helm_provider" {
path = find_in_parent_folders("/common/additional_providers/provider_k8s_hub.hcl")
}
dependency "eks" {
config_path = "${get_parent_terragrunt_dir("root")}/infrastructure/containers/eks_spoke"
mock_outputs = {
cluster_name = "dummy-cluster-name"
cluster_endpoint = "dummy_cluster_endpoint"
cluster_certificate_authority_data = "dummy_cluster_certificate_authority_data"
cluster_version = "1.31"
cluster_platform_version = "1.31"
oidc_provider_arn = "dummy_arn"
}
mock_outputs_merge_strategy_with_state = "shallow"
}
locals {
# Define parameters for each workspace
env = {
default = {
environment = "control-plane"
oss_addons = {
enable_argo_workflows = true
#enable_foo = true
# you can add any addon here, make sure to update the gitops repo with the corresponding application set
}
addons_metadata = merge(
{
addons_repo_url = "https://github.com/gitops-bridge-dev/gitops-bridge-argocd-control-plane-template"
addons_repo_basepath = ""
addons_repo_path ="bootstrap/control-plane/addons"
addons_repo_revision = "HEAD"
}
)
tags = {
Environment = "control-plane"
Layer = "Networking"
}
}
"dev" = {
create = true
}
"prod" = {
create = true
}
}
# Merge parameters
environment_vars = contains(keys(local.env), include.root.locals.environment.locals.workspace) ? include.root.locals.environment.locals.workspace : "default"
workspace = merge(local.env["default"], local.env[local.environment_vars])
}
terraform {
source = "../../../modules/terraform-aws-gitops-bridge-spoke"
}
inputs = {
cluster_name = dependency.eks.outputs.cluster_name
cluster_endpoint = dependency.eks.outputs.cluster_endpoint
cluster_platform_version = dependency.eks.outputs.cluster_platform_version
oidc_provider_arn = dependency.eks.outputs.oidc_provider_arn
cluster_certificate_authority_data = dependency.eks.outputs.cluster_certificate_authority_data
create_kubernetes_resources = false
cluster = {
cluster_name = dependency.eks.outputs.cluster_name
environment = local.workspace["environment"]
metadata = local.workspace["addons_metadata"]
addons = merge(local.workspace["oss_addons"], { kubernetes_version = dependency.eks.outputs.cluster_version })
}
hub_account_id = include.root.locals.common_vars.locals.hub_account_id
tags = local.workspace["tags"]
}
This stacks defines the data for cluster secret in hub cluster and metadata information for addons, the local module terraform-aws-gitops-bridge-spoke
creates the access entry and for enable hub access using spoke role according to Figure 2 and reuse gitops-bridge-dev/gitops-bridge/helm
with parameters to deploy the secret but not the argocd installation in spoke clusters. So the infrastructure composition for spokes IaC is:
Figure 4. Infrastructure composition.
An alternative approach is to manage the external secrets Operator and secrets manager, let a comment if you want to know how do it.
Finally, after runs the spoke stacks in Argocd Hub Server you can watch the clusters and metadata information:
Figure 5. Cluster and metadata Information.
For example some addons was deployed in spoke cluster using applications Set.
Figure 6. Applications Set in spoke clusters.
In the next post, you can learn how to custom addons and add advance setups. 🦸🦸
👇 Get the code here.
velez94
/
terragrunt_aws_gitops_spoke_blueprint
Infrastructure for Spoke clusters blueprint for GitOps Bridge
AWS GitOps Scale Infrastructure with EKS and VPC
A comprehensive Infrastructure as Code (IaC) solution that enables scalable GitOps deployments on AWS using EKS clusters in a hub-spoke architecture with automated infrastructure provisioning and configuration management.
This project provides a complete infrastructure setup using Terragrunt and Terraform to create and manage AWS resources including VPC networking, EKS clusters, and GitOps tooling. It implements a hub-spoke architecture where a central hub cluster manages multiple spoke clusters through GitOps practices, enabling consistent and automated application deployments at scale.
The solution includes automated VPC creation with proper network segmentation, EKS cluster provisioning with secure configurations, and integration with GitOps tools through a bridge component that enables declarative infrastructure and application management.
Repository Structure
├── common/ # Common configuration and variable definitions
│ ├── additional_providers/ # Provider configurations for Kubernetes, Helm, etc.
│ ├── common.hcl # Common Terragrunt configuration
│ ├── common.tfvars #
…Thanks for reading and sharing! 🫶