Level 400

Hi clouders, in the previous blog you can explore the considerations to deploy at scale gitops and IaC, in this post you can learn more about how to deploy spokes cluster using the GitOps bridge framework and specific use cases with AWS enterprise capabilities and security best practices.

Architecture Overview

Spoke clusters and Scenarios

According to the best practices there are many considerations and scenarios, for example:

  • You can have one stack or IaC repository with the definitions for each account, team or squat. It depends on your internal organizations and share responsibility model. For this scenario suppose that the model is a decentralized DevOps Operational Model, and each squat has their custom IaC for each project. Consider keeping clear separation and isolation between environments and make match your platform strategy, for example, some organizations have a single hub for managing dev, QA and prod cluster, others have a hub to manage each environment cluster and other, and others have one hub for previous environments and another for production.

  • Another key point to note is the capacity planning and applications by team, some organizations allow share the same cluster for an organizational unit and keep the applications grouped by namespaces in each environment, others prefer to have one environment and cluster by workload or application. Considering the networking and security considerations has main pain and challenges for both scenarios. In this series the main assumption is that each workload has a dedicated cluster and account by environment but there is a transversal platform team that manages the cluster configuration and control plane. The following table describe the relationship between scenario and AWS accounts:

Scenario AWS Account Clusters
Single Hub – N Spokes 1 for HUB – N account by environment 1 cluster Hub, N Spoke Clusters by environment
M Hub - N spokes M accounts By Hub environments – N account by environments M Hub Clusters, N Environment Clusters

The Figure 1 depicts the architecture for this scenario.

GitOps Final AWS EKS
Figure 1. GitOps bridge Deployment architecture in AWS

The FinOps practices are key point independent of your resources distribution you must consider what is the best strategy for track cost and shared resources.

Hands On

First, modify the hub infrastructure stacks to add the stack to manage the credentials cross account and allow the CI infrastructure agents to take them to register the cluster in the control plane. Also, create the role for Argocd to enable the authentication between Argocd hub and spoke cluster.

Updating Control Plane Infrastructure

So, let’s set up the credentials according to the best practices and the scenario described in the previous section, the cluster credentials are stored in parameter store and share with the organizational units for each team or business unit.

You must enable RAM as trusted service in your organization from Organizations management account and RAM Console.

For this task, a local module terraform-aws-parameter-store was created:

└── terraform-aws-ssm-parameter-sotre
    ├── README.md
    ├── data.tf
    ├── main.tf
    ├── outputs.tf
    └── variables.tf

The module creates the parameter and shares with organization id, organizational unit, or account id principals.

Now, using terragrunt the module is called to create a new stack or terragrunt unit.

#parameter_store-terragrunt.hcl

include "root" {
  path = find_in_parent_folders("root.hcl")
  expose = true
}

dependency "eks" {
  config_path = "${get_parent_terragrunt_dir("root")}/infrastructure/containers/eks_control_plane"
  mock_outputs = {
    cluster_name = "dummy-cluster-name"
    cluster_endpoint = "dummy_cluster_endpoint"
    cluster_certificate_authority_data = "dummy_cluster_certificate_authority_data"
    cluster_version = "1.31"
    cluster_platform_version = "1.31"
    oidc_provider_arn =  "dummy_arn"
    cluster_arn = "arn:aws:eks:us-east-2:105171185823:cluster/gitops-scale-dev-hub"
  }
  mock_outputs_merge_strategy_with_state = "shallow"
}

locals {
  # Define parameters for each workspace
  env = {
    default = {

      parameter_name = "/control_plane/${include.root.locals.environment.locals.workspace}/credentials"
      sharing_principals = ["ou-w3ow-k24p2opx"]
      tags = {
        Environment = "control-plane"
        Layer       = "Operations"
      }
    }
    "dev" = {

      create = true
    }
    "prod" = {

      create = true
    }
  }
  # Merge parameters
  environment_vars = contains(keys(local.env), include.root.locals.environment.locals.workspace) ? include.root.locals.environment.locals.workspace : "default"
  workspace = merge(local.env["default"], local.env[local.environment_vars])
}


terraform {
  source = "../../../modules/terraform-aws-ssm-parameter-sotre"

}

inputs = {
  parameter_name = "${local.workspace["parameter_name"]}"
  parameter_description = "Control plane credentials"
  parameter_type = "SecureString"
  parameter_tier = "Advanced"
  create_kms = true
  enable_sharing = true
  sharing_principals= local.workspace["sharing_principals"]
  parameter_value = jsonencode({
    cluster_name                       = dependency.eks.outputs.cluster_name,
    cluster_endpoint                   = dependency.eks.outputs.cluster_endpoint,
    cluster_certificate_authority_data = dependency.eks.outputs.cluster_certificate_authority_data,
    cluster_version                    = dependency.eks.outputs.cluster_version,
    cluster_platform_version           = dependency.eks.outputs.cluster_platform_version,
    oidc_provider_arn                  = dependency.eks.outputs.oidc_provider_arn,
    hub_account_id = split(":", dependency.eks.outputs.cluster_arn)[4]
  }  )
  tags = local.workspace["tags"]

}

Now, another stack is necessary, the IAM role to enable service account for argocd use the IAM authentication with spoke clusters. The module terraform-aws-iam/iam-eks-role allows to create the IRSA role but also is necessary create a custom policy to allow assume role in the spoke accounts. The Figure 2 depicts in depth this setup.

You can create simple stack for managing the role, or a module that supports the EKS definition and IAM role.

GitOps Authentication
Figure 2. GitOps authentication summary.

So, the module is in modules/terraform-aws-irsa-eks-hub

modules/
├── terraform-aws-irsa-eks-hub
│   ├── README.md
│   ├── main.tf
│   ├── outputs.tf
│   └── variables.tf
└── terraform-aws-ssm-parameter-sotre
    ├── README.md
    ├── data.tf
    ├── main.tf
    ├── outputs.tf
    └── variables.tf

3 directories, 9 files

and the stack is:

#eks_role-terragrunt.hcl

include "root" {
  path = find_in_parent_folders("root.hcl")
  expose = true
}


dependency "eks" {
  config_path = "${get_parent_terragrunt_dir("root")}/infrastructure/containers/eks_control_plane"
  mock_outputs = {
    cluster_name = "dummy-cluster-name"
    cluster_endpoint = "dummy_cluster_endpoint"
    cluster_certificate_authority_data = "dummy_cluster_certificate_authority_data"
    cluster_version = "1.31"
    cluster_platform_version = "1.31"
    oidc_provider_arn =  "dummy_arn"
  }
  mock_outputs_merge_strategy_with_state = "shallow"
}
locals {
  # Define parameters for each workspace
  env = {
    default = {

      environment  = "control-plane"
      role_name    = "eks-role-hub"
      tags = {
        Environment = "control-plane"
        Layer       = "Networking"
      }
    }
    "dev" = {

      create = true
    }
    "prod" = {

      create = true
    }
  }
  # Merge parameters
  environment_vars = contains(keys(local.env), include.root.locals.environment.locals.workspace) ? include.root.locals.environment.locals.workspace : "default"
  workspace = merge(local.env["default"], local.env[local.environment_vars])
}


terraform {
  source = "../../../modules/terraform-aws-irsa-eks-hub"

}

inputs = {
  role_name = "${local.workspace["role_name"]}-${local.workspace["environment"]}"

  cluster_service_accounts = {
    "${dependency.eks.outputs.cluster_name}" = [
      "argocd:argocd-application-controller",
      "argocd:argo-cd-argocd-repo-server",
      "argocd:argocd-server",
    ]
  }
  tags = local.workspace["tags"]

}

Finally, the gitops_bridge stack must look like:

#eks_control_plane-terragrunt.hcl
include "root" {
  path = find_in_parent_folders("root.hcl")
  expose = true
}

include "k8s_helm_provider" {
  path = find_in_parent_folders("/common/additional_providers/provider_k8s_helm.hcl")
}

dependency "eks" {
  config_path = "${get_parent_terragrunt_dir("root")}/infrastructure/containers/eks_control_plane"
  mock_outputs = {
    cluster_name = "dummy-cluster-name"
    cluster_endpoint = "dummy_cluster_endpoint"
    cluster_certificate_authority_data = "dummy_cluster_certificate_authority_data"
    cluster_version = "1.31"
    cluster_platform_version = "1.31"
    oidc_provider_arn =  "dummy_arn"
  }
  mock_outputs_merge_strategy_with_state = "shallow"
}
dependency "eks_role" {
  config_path = "${get_parent_terragrunt_dir("root")}/infrastructure/iam/eks_role"
  mock_outputs = {
    iam_role_arn = "arn::..."
  }
  mock_outputs_merge_strategy_with_state = "shallow"

}
locals {
  # Define parameters for each workspace
  env = {
    default = {

      environment  = "control-plane"
      oss_addons = {
        enable_argo_workflows = true
        #enable_foo            = true
        # you can add any addon here, make sure to update the gitops repo with the corresponding application set
      }

      addons_metadata = merge(
        {
          addons_repo_url      = "https://github.com/gitops-bridge-dev/gitops-bridge-argocd-control-plane-template"
          addons_repo_basepath = ""
          addons_repo_path     ="bootstrap/control-plane/addons"
          addons_repo_revision = "HEAD"
        }
      )
      argocd_apps = {
        addons = file("./bootstrap/addons.yaml")
        #workloads = file("./bootstrap/workloads.yaml")
      }

      tags = {
        Environment = "control-plane"
        Layer       = "Networking"
      }
    }
    "dev" = {

      create = true
    }
    "prod" = {

      create = true
    }
  }
  # Merge parameters
  environment_vars = contains(keys(local.env), include.root.locals.environment.locals.workspace) ? include.root.locals.environment.locals.workspace : "default"
  workspace = merge(local.env["default"], local.env[local.environment_vars])
}


terraform {
  source = "tfr:///gitops-bridge-dev/gitops-bridge/helm?version=0.1.0"

}

inputs = {
  cluster_name                       = dependency.eks.outputs.cluster_name
  cluster_endpoint                   = dependency.eks.outputs.cluster_endpoint
  cluster_platform_version           = dependency.eks.outputs.cluster_platform_version
  oidc_provider_arn                  = dependency.eks.outputs.oidc_provider_arn
  cluster_certificate_authority_data = dependency.eks.outputs.cluster_certificate_authority_data

  cluster = {
    cluster_name =   dependency.eks.outputs.cluster_name
    environment  = local.workspace["environment"]
    metadata     = local.workspace["addons_metadata"]
    addons = merge(local.workspace["oss_addons"], { kubernetes_version = dependency.eks.outputs.cluster_version })

  }
  apps = local.workspace["argocd_apps"]
  argocd = {
    namespace = "argocd"
    #set = [
    #  {
    #    name  = "server.service.type"
    #    value = "LoadBalancer"
    #  }
    #]
    values = [
      yamlencode(
        {
          configs = {
            params = {
              "server.insecure" = true
            }
          }
          server = {
            "serviceAccount" = {
              annotations = {
                "eks.amazonaws.com/role-arn" = dependency.eks_role.outputs.iam_role_arn
              }

            }
            service = {
              type = "NodePort"
            }

            ingress = {
              enabled    = false
              controller = "aws"
              ingressClassName : "alb"


              aws = {
                serviceType : "NodePort"
              }

              annotations = {
                #"alb.ingress.kubernetes.io/backend-protocol" = "HTTPS"
                #"alb.ingress.kubernetes.io/ssl-redirect"                       = "443"
                #"service.beta.kubernetes.io/aws-load-balancer-type"            = "external"
                #"service.beta.kubernetes.io/aws-load-balancer-nlb-target-type" = "ip"
                #"alb.ingress.kubernetes.io/listen-ports" : "[{\"HTTPS\":443}]"
              }
            }
          }
          controller = {
            "serviceAccount" = {
              annotations = {
                "eks.amazonaws.com/role-arn" =  dependency.eks_role.outputs.iam_role_arn
              }

            }
          }
          repoServer = {
            "serviceAccount" = {
              annotations = {
                "eks.amazonaws.com/role-arn" =  dependency.eks_role.outputs.iam_role_arn
              }

            }
          }
        }
      )
    ]

  }

  tags = local.workspace["tags"]

}

Basically the main changes was the introduction to argocd map to setup the values for helm chart deployment to enable to use the IRSA role.

When you are using cross account deployment the profile that creates the secrets in hub cluster must to have access and permissions, for example in the repository the eks_control_plane stack introduce a new access entry:

#eks_control_plane-terragrunt.hcl
include "root" {
  path   = find_in_parent_folders("root.hcl")
  expose = true
}
dependency "vpc" {
  config_path = "${get_parent_terragrunt_dir("root")}/infrastructure/network/vpc"
  mock_outputs = {
    vpc_id = "vpc-04e3e1e302f8c8f06"
    public_subnets = [
      "subnet-0e4c5aedfc2101502",
      "subnet-0d5061f70b69eda14",
    ]
    private_subnets = [
      "subnet-0e4c5aedfc2101502",
      "subnet-0d5061f70b69eda14",
      "subnet-0d5061f70b69eda15",
    ]
  }
  mock_outputs_merge_strategy_with_state = "shallow"
}

locals {
  # Define parameters for each workspace
  env = {
    default = {
      create          = false
      cluster_name    = "${include.root.locals.common_vars.locals.project}-${include.root.locals.environment.locals.workspace}-hub"
      cluster_version = "1.32"

      # Optional
      cluster_endpoint_public_access = true

      # Optional: Adds the current caller identity as an administrator via cluster access entry
      enable_cluster_creator_admin_permissions = true
      access_entries = {
        #####################################################################################################################
        # Admin installation and setup for spoke accounts - Demo purpose- must be the ci Agent Role
        ####################################################################################################################
        admins_sso = {
          kubernetes_groups = []
          principal_arn     = "arn:aws:sts::123456781234:role/aws-reserved/sso.amazonaws.com/us-east-2/AWSReservedSSO_AWSAdministratorAccess_877fe9e4127a368d"
          user_name         = "arn:aws:sts::123456781234:assumed-role/AWSReservedSSO_AWSAdministratorAccess_877fe9e4127a368d/{{SessionName}}"

          policy_associations = {
            single = {
              policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy"
              access_scope = {
                type = "cluster"
              }
            }
          }
        }
      }

      cluster_compute_config = {
        enabled    = true
        node_pools = ["general-purpose"]
      }

      tags = {
        Environment = "control-plane"
        Layer       = "Networking"
      }
    }
    "dev" = {

      create = true
    }
    "prod" = {

      create = true
    }
  }
  # Merge parameters
  environment_vars = contains(keys(local.env), include.root.locals.environment.locals.workspace) ? include.root.locals.environment.locals.workspace : "default"
  workspace        = merge(local.env["default"], local.env[local.environment_vars])
}


terraform {
  source = "tfr:///terraform-aws-modules/eks/aws?version=20.33.1"

}

inputs = {
  create          = local.workspace["create"]
  cluster_name    = local.workspace["cluster_name"]
  cluster_version = local.workspace["cluster_version"]

  # Optional
  cluster_endpoint_public_access = local.workspace["cluster_endpoint_public_access"]

  # Optional: Adds the current caller identity as an administrator via cluster access entry
  enable_cluster_creator_admin_permissions = local.workspace["enable_cluster_creator_admin_permissions"]

  cluster_compute_config = local.workspace["cluster_compute_config"]

  vpc_id         = dependency.vpc.outputs.vpc_id
  subnet_ids     = dependency.vpc.outputs.private_subnets
  access_entries = local.workspace["access_entries"]


  tags = {
    Environment = include.root.locals.environment.locals.workspace
    Terraform   = "true"
  }

  tags = local.workspace["tags"]

}

The final Hub infrastructure is:

Hub- infrasctructure
Figure 3. Control Plane Infrastructure.

👇 Get the code here.

GitHub logo velez94 / terragrunt_aws_gitops_blueprint

Public demo for Gitops bridge using terragrunt, OpenTofu and EKS

AWS GitOps Blueprint with Terragrunt

This project provides a blueprint for implementing GitOps on AWS using Terragrunt and Argo CD. It offers a structured approach to managing infrastructure as code and deploying applications across multiple environments.

The blueprint is designed to streamline the process of setting up a GitOps workflow on AWS, leveraging Terragrunt for managing Terraform configurations and Argo CD for continuous deployment. It includes configurations for essential AWS services such as EKS (Elastic Kubernetes Service) and VPC (Virtual Private Cloud), as well as GitOps components for managing cluster addons and platform-level resources.

Key features of this blueprint include:

  • Modular infrastructure setup using Terragrunt
  • EKS cluster configuration for container orchestration
  • VPC network setup for secure and isolated environments
  • GitOps bridge for seamless integration between infrastructure and application deployments
  • Argo CD ApplicationSets for managing cluster addons and platform resources
  • Environment-specific configurations for multi-environment deployments

Repository Structure

terragrunt_aws_gitops_blueprint/
├── common/
│

Creating spoke Cluster Infrastructure

The second step is creating the spoke cluster infrastructure. 🧑🏾‍💻

To manage a spoke cluster a single terragrunt or tofu project is created independently for each team from a template, so, infrastructure has an individual pipeline to manage it, and each team could have custom CI/CD agents, also more flexibility to add features and components to the infrastructure stacks. In some cases, you can have a single pipeline to manage the infrastructure setup and work with environment or parameters managed by the orchestration CI/CD tool. This approach is utilized when necessary to have central governance and control, but consider the changes rates, common tasks and environment setup and CI/CD workers capacity assigned to central ops.

The code is like the hub repository; however, the main difference is in the stack GitOps bridge.

Let’s watch it in depth. 🕵️‍♀️

First, a new provider configuration is necessary, the cluster hub provider, in the terragrunt_aws_gitops_spoke_blueprint/common/additional_providers:

locals {
  workspace = get_env("TF_VAR_env", "dev")
  pipeline = "false"
  hub_account_id    = "105171185823"
}
generate "k8s_helm_provider" {
  path      = "k8s_helm_provider.tf"
  if_exists = "overwrite"
  contents = <<EOF
################################################################################
# Kubernetes Access for Spoke Cluster
################################################################################

# First, define the parameter store data source
data "aws_ssm_parameter" "hub_cluster_config" {
  count = 1
  with_decryption = true
  name  = "arn:aws:ssm:us-east-2:${local.hub_account_id}:parameter/control_plane/${local.workspace}/credentials"
 #"/control_plane/${local.workspace}/credentials"  # Adjust the parameter path as needed
}

provider "kubernetes" {
  host = try(jsondecode(data.aws_ssm_parameter.hub_cluster_config[0].value).cluster_endpoint, var.cluster_endpoint)
  cluster_ca_certificate = try(base64decode(jsondecode(data.aws_ssm_parameter.hub_cluster_config[0].value).cluster_certificate_authority_data), var.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "aws"
    args = [
      "eks",
      "get-token",
      "--cluster-name",
      try(jsondecode(data.aws_ssm_parameter.hub_cluster_config[0].value).cluster_name, var.cluster_name),
      "--region",
      try(jsondecode(data.aws_ssm_parameter.hub_cluster_config[0].value).cluster_region, data.aws_region.current.name),
      "--profile",
      var.profile["${local.workspace}"]["profile"]
    ]
  }
  alias = "hub"
}

EOF
}

The gitops_bridge stack is:

#eks_control_plane-terragrunt.hcl
include "root" {
  path = find_in_parent_folders("root.hcl")
  expose = true
}

include "k8s_helm_provider" {
  path = find_in_parent_folders("/common/additional_providers/provider_k8s_hub.hcl")
}

dependency "eks" {
  config_path = "${get_parent_terragrunt_dir("root")}/infrastructure/containers/eks_spoke"
  mock_outputs = {
    cluster_name = "dummy-cluster-name"
    cluster_endpoint = "dummy_cluster_endpoint"
    cluster_certificate_authority_data = "dummy_cluster_certificate_authority_data"
    cluster_version = "1.31"
    cluster_platform_version = "1.31"
    oidc_provider_arn =  "dummy_arn"
  }
  mock_outputs_merge_strategy_with_state = "shallow"
}
locals {
  # Define parameters for each workspace
  env = {
    default = {

      environment  = "control-plane"
      oss_addons = {
        enable_argo_workflows = true
        #enable_foo            = true
        # you can add any addon here, make sure to update the gitops repo with the corresponding application set
      }

      addons_metadata = merge(
        {
          addons_repo_url      = "https://github.com/gitops-bridge-dev/gitops-bridge-argocd-control-plane-template"
          addons_repo_basepath = ""
          addons_repo_path     ="bootstrap/control-plane/addons"
          addons_repo_revision = "HEAD"
        }
      )


      tags = {
        Environment = "control-plane"
        Layer       = "Networking"
      }
    }
    "dev" = {

      create = true
    }
    "prod" = {

      create = true
    }
  }
  # Merge parameters
  environment_vars = contains(keys(local.env), include.root.locals.environment.locals.workspace) ? include.root.locals.environment.locals.workspace : "default"
  workspace = merge(local.env["default"], local.env[local.environment_vars])
}


terraform {
  source = "../../../modules/terraform-aws-gitops-bridge-spoke"

}

inputs = {
  cluster_name                       = dependency.eks.outputs.cluster_name
  cluster_endpoint                   = dependency.eks.outputs.cluster_endpoint
  cluster_platform_version           = dependency.eks.outputs.cluster_platform_version
  oidc_provider_arn                  = dependency.eks.outputs.oidc_provider_arn
  cluster_certificate_authority_data = dependency.eks.outputs.cluster_certificate_authority_data
  create_kubernetes_resources = false
  cluster = {
    cluster_name =   dependency.eks.outputs.cluster_name
    environment  = local.workspace["environment"]
    metadata     = local.workspace["addons_metadata"]
    addons = merge(local.workspace["oss_addons"], { kubernetes_version = dependency.eks.outputs.cluster_version })

  }
  hub_account_id =  include.root.locals.common_vars.locals.hub_account_id

  tags = local.workspace["tags"]

}

This stacks defines the data for cluster secret in hub cluster and metadata information for addons, the local module terraform-aws-gitops-bridge-spoke creates the access entry and for enable hub access using spoke role according to Figure 2 and reuse gitops-bridge-dev/gitops-bridge/helm with parameters to deploy the secret but not the argocd installation in spoke clusters. So the infrastructure composition for spokes IaC is:

Infrastructure Composition
Figure 4. Infrastructure composition.

An alternative approach is to manage the external secrets Operator and secrets manager, let a comment if you want to know how do it.

Finally, after runs the spoke stacks in Argocd Hub Server you can watch the clusters and metadata information:

Cluster and metadata Information
Figure 5. Cluster and metadata Information.

For example some addons was deployed in spoke cluster using applications Set.

Applications Set in spoke clusters
Figure 6. Applications Set in spoke clusters.

In the next post, you can learn how to custom addons and add advance setups. 🦸🦸

👇 Get the code here.

GitHub logo velez94 / terragrunt_aws_gitops_spoke_blueprint

Infrastructure for Spoke clusters blueprint for GitOps Bridge

AWS GitOps Scale Infrastructure with EKS and VPC

A comprehensive Infrastructure as Code (IaC) solution that enables scalable GitOps deployments on AWS using EKS clusters in a hub-spoke architecture with automated infrastructure provisioning and configuration management.

This project provides a complete infrastructure setup using Terragrunt and Terraform to create and manage AWS resources including VPC networking, EKS clusters, and GitOps tooling. It implements a hub-spoke architecture where a central hub cluster manages multiple spoke clusters through GitOps practices, enabling consistent and automated application deployments at scale.

The solution includes automated VPC creation with proper network segmentation, EKS cluster provisioning with secure configurations, and integration with GitOps tools through a bridge component that enables declarative infrastructure and application management.

Repository Structure


├── common/                          # Common configuration and variable definitions
│   ├── additional_providers/        # Provider configurations for Kubernetes, Helm, etc.
│   ├── common.hcl                  # Common Terragrunt configuration
│   ├── common.tfvars               #

Thanks for reading and sharing! 🫶