Alejandro Velez for AWS Community Builders

Posted on Feb 13

GitOps and IaC at Scale – AWS, ArgoCD, Terragrunt, and OpenTofu – Part 1

#devops #aws #terraform #gitops

level 400

Welcome, builders! In this blog series, we will explore the exciting world of GitOps and Infrastructure as Code (IaC) at scale, focusing on the powerful combination of ArgoCD and Open Tofu and how those tools and frameworks can live together for improving your deployments and reduce the TOIL. The main purpose is to learn some patterns and considerations for increasing efficiency, improving the automation infrastructure provisioning without loss compliance and security requirements

Scenario

Suppose that you are a Cloud Engineer or DevOps engineer, and you have a mission, deploy a cloud native infrastructure to support a modern e-commerce application that offers the possibility to any buy and sell gems. You must implement a system to allow the operations at scale, keep the governance, use IaC and consider the least manual tasks for the setup and onboarding for the developers. The cloud provider for the organization is AWS and the main infrastructure services are EKS, RDS, VPC. Don’t forget that availability, security, and cost efficiency requirements.

In the previous blogs about IaC at scale the main questions and base guidelines was answered to apply the best practices, however, when talking about Kubernetes and cloud native apps many questions emerge from the shadows and suppose al challenge for the technical skills, for example:

What are the best practices to manage the cluster configuration at scale?
How can you connect both layers IaC and GitOps?
What are the GitOps topologies?
What are the best tools?
How can you improve developer experience?

Of course, with an AI assistant you can find pretty answers for those questions and some guides. But, we are builders and here we put a real-life example based on experiences and some challenges in a productive environment.

First, some theory and good practices 😎

The deployment patterns and GitOps Topology

If you want to know more about gitops, patterns, courses and more please visit opengitops

Key points based on the requirements:
• You must have multiple environments.
• Each environment must isolate from others.
• You must manage the cluster configuration at scale.

There are tree main Gitops topologies:
• Standalone topology
• Hub-Spoke Push Model
• Hub-Spoke Push and Pull model

For this scenario, the hub-spoke push model is the best solution. This setup involves deploying GitOps tools on a central "Hub" cluster. The Hub cluster acts as the control plane for GitOps, managing deployments to various clusters dedicated to different environments or workloads.

So, how can you deploy this topology in AWS using EKS?

First, let’s get some assumptions:
• Your accounts are in the same AWS organization.
• There is already a Networking layer configured.
• ArgoCD is the gitops tool.
• Github the VSC.
• There is no CI tool predefined, in this blog series Codepipeline and previous blog posts will be use.

Security considerations:

• The IAM roles and policies for your agents must be clear and apply the least privileged principle. Sometimes the infrastructure deployment role is different from orchestration deployment role.
• Use secrets manager to share the credentials between accounts using solid resource policy or use parameters store secure string with RAM. Both must be encrypted using KMS.
• Don’t expose your cluster endpoints to the internet, if it is necessary then restrict the access by IP range.
• The security of the pipeline must be a priority, the deployment agents can assume many privileged roles, so keep in mind to avoid security breaches.

Solution overview

Open-Source projects and tools:
Open Tofu
GitOps bridge
ArgoCD

AWS Services:

EKS
KMS
RAM
Parameter Store
ECR
IAM

The following diagrams shows the solution:

1- The code is versioned in git system, usually you have tree main kind of repositories: Infrastructure as code repositories, Argocd applications Set definitions, and app or microservices repositories. Many teams keep separated the Kubernetes manifests from code applications. This is a good practice due the code application and infrastructure definitions don’t follow the same lifecycle but other factors as mono repositories or multiple repositories are key factors. Just keep in different folders 😊.

2- The CI/CD service deploys the infrastructure components: the CI/CD agents deploy the control plane cluster, create shared resources to spoke AWS accounts, and create the argocd secret for hub cluster with the metadata to allow operations and bootstrapping configurations.

The agent’s role must be part of an access entry in EKS to allow this operation.

3- The shared resources are create: is a common practices use secure string in parameter store and share it with RAM is more cost effective and practical for this setup, however, you could use secrets manager with resource policies.

4- The CI/CD service deploys the spoke resources: the agent deploys the spoke cluster and other resources. Here the agent can access the Hub Kubernetes cluster to create a secret for spoke cluster with the metadata to allow operations and bootstrapping configuration.

5- ArgoCD detects new cluster metadata and prepare to deploy de addons: Once the metadata is loaded the applications are deployed in the spoke or hub cluster, to accomplish this the Argocd application controller must assume the role in the spoke account.

In ArgoCD, the limit for metadata annotations is 262,144 characters (or 256 KB)

6- The application are created: here the deployments al created into the spoken cluster and Argocd sync with the desired state.

7- The cluster are ready to workloads.

It sounds pretty, but how can integrate both layers without overhead or manual operations?

The answer is to use the GitOps bridge framework: Is a community project that aims to showcase best practices and patterns for bridging the process of creating a Kubernetes cluster to subsequently managing everything through GitOps. It focuses on using ArgoCD or FluxCD, both of which are CNN-graduated projects.

gitops-bridge-dev / gitops-bridge

GitOps Bridge

The GitOps Bridge is a community project that aims to showcase best practices and patterns for bridging the process of creating a Kubernetes cluster to subsequently managing everything through GitOps. It focuses on using ArgoCD or FluxCD, both of which are CNCF-graduated projects.

For an example template on bootstrapping ArgoCD, see the GitHub repository GitOps Control Plane.

There are many tools available for creating Kubernetes clusters. These include "roll-your-own" solutions like kubeadm, minikube, and kind, as well as cloud-managed services like Amazon EKS. The method of cluster creation should not impact GitOps compatibility; GitOps engines should work with any tool that the user chooses for cluster creation. This includes scenarios where Kubernetes is used to create other Kubernetes clusters, such as with CAPI/CAPA, Crossplane, ACK, or any tool running inside Kubernetes to deploy Kubernetes.

The GitOps Bridge becomes extremely important in the context…

View on GitHub

Hands On

Creating a control plane

So, let’s hand in to the code for building a control plane.

1- The terragrunt project is created with the following structure:

The project also can be created with OpenTofu but here is an example using terragrunt a wrapper for opentofu and a powerful tool to keep DRY and manage IaC at scale.

In the infrastructure folder you can find two major stacks: network and containers. In containers stacks the cluster for control plane is deployed using the public aws modules and gitops bridge is deployed into the account using the official terraform module.

For example, the content for eks_control_plane stack:

#eks_control_plane-terragrunt.hcl
include "root" {
  path = find_in_parent_folders("root.hcl")
  expose = true
}
dependency "vpc" {
  config_path  = "${get_parent_terragrunt_dir("root")}/infrastructure/network/vpc"
  mock_outputs = {
    vpc_id         = "vpc-04e3e1e302f8c8f06"
    public_subnets = [
      "subnet-0e4c5aedfc2101502",
      "subnet-0d5061f70b69eda14",
    ]
     private_subnets                             = [
     "subnet-0e4c5aedfc2101502",
      "subnet-0d5061f70b69eda14",
      "subnet-0d5061f70b69eda15",
     ]
  }
  mock_outputs_merge_strategy_with_state = "shallow"
}

locals {
  # Define parameters for each workspace
  env = {
    default = {
      create       = false
      cluster_name = "${include.root.locals.common_vars.locals.project}-${include.root.locals.environment.locals.workspace}-hub"
      cluster_version = "1.31"

      # Optional
      cluster_endpoint_public_access = true

      # Optional: Adds the current caller identity as an administrator via cluster access entry
      enable_cluster_creator_admin_permissions = true

      cluster_compute_config = {
        enabled = true
        node_pools = ["general-purpose"]
      }

      tags = {
        Environment = "control-plane"
        Layer       = "Networking"
      }
    }
    "dev" = {

      create = true
    }
    "prod" = {

      create = true
    }
  }
  # Merge parameters
  environment_vars = contains(keys(local.env), include.root.locals.environment.locals.workspace) ?  include.root.locals.environment.locals.workspace : "default"
  workspace = merge(local.env["default"], local.env[local.environment_vars])
}


terraform {
  source = "tfr:///terraform-aws-modules/eks/aws?version=20.33.1"

}

inputs = {
  create= local.workspace["create"]
  cluster_name = local.workspace["cluster_name"]
  cluster_version = local.workspace["cluster_version"]

  # Optional
  cluster_endpoint_public_access = local.workspace["cluster_endpoint_public_access"]

  # Optional: Adds the current caller identity as an administrator via cluster access entry
  enable_cluster_creator_admin_permissions = local.workspace["enable_cluster_creator_admin_permissions"]

  cluster_compute_config = local.workspace["cluster_compute_config"]

  vpc_id = dependency.vpc.outputs.vpc_id
  subnet_ids = dependency.vpc.outputs.private_subnets

  tags = {
    Environment = include.root.locals.environment.locals.workspace
    Terraform   = "true"
  }

  tags = local.workspace["tags"]

}

Watch that the main properties for this stack are the cluster version and in this case the auto mode for EKS is enabled.

In the next blogs you can learn more about this!

For other hand, for gitops_bridge stack:

include "root" {
  path = find_in_parent_folders("root.hcl")
  expose = true
}


include "k8s_helm_provider" {
  path = find_in_parent_folders("/common/additional_providers/provider_k8s_helm.hcl")
}

dependency "eks" {
  config_path = "${get_parent_terragrunt_dir("root")}/infrastructure/containers/eks_control_plane"
  mock_outputs = {
    cluster_name = "dummy-cluster-name"
    cluster_endpoint = "dummy_cluster_endpoint"
    cluster_certificate_authority_data = "dummy_cluster_certificate_authority_data"
    cluster_version = "1.31"
  }
  mock_outputs_merge_strategy_with_state = "shallow"
}

locals {
  # Define parameters for each workspace
  env = {
    default = {

      environment  = "control-plane"
      oss_addons = {
        enable_argo_workflows = true
        #enable_foo            = true
        # you can add any addon here, make sure to update the gitops repo with the corresponding application set
      }

      addons_metadata = merge(
        {
          addons_repo_url      = "https://github.com/gitops-bridge-dev/gitops-bridge-argocd-control-plane-template"
          addons_repo_basepath = ""
          addons_repo_path     ="bootstrap/control-plane/addons"
          addons_repo_revision = "HEAD"
        }
      )
      argocd_apps = {
        addons = file("./bootstrap/addons.yaml")
        #workloads = file("./bootstrap/workloads.yaml")
      }

      tags = {
        Environment = "control-plane"
        Layer       = "Networking"
      }
    }
    "dev" = {

      create = true
    }
    "prod" = {

      create = true
    }
  }
  # Merge parameters
  environment_vars = contains(keys(local.env), include.root.locals.environment.locals.workspace) ? include.root.locals.environment.locals.workspace : "default"
  workspace = merge(local.env["default"], local.env[local.environment_vars])
}


terraform {
  source = "tfr:///gitops-bridge-dev/gitops-bridge/helm?version=0.1.0"

}

inputs = {
  cluster_name                       = dependency.eks.outputs.cluster_name
  cluster_endpoint                   = dependency.eks.outputs.cluster_endpoint
  cluster_platform_version           = dependency.eks.outputs.cluster_platform_version
  oidc_provider_arn                  = dependency.eks.outputs.oidc_provider_arn
  cluster_certificate_authority_data = dependency.eks.outputs.cluster_certificate_authority_data

  cluster = {
    cluster_name =   dependency.eks.outputs.cluster_name
    environment  = local.workspace["environment"]
    metadata     = local.workspace["addons_metadata"]
    addons = merge(local.workspace["oss_addons"], { kubernetes_version = dependency.eks.outputs.cluster_version })

  }
  apps = local.workspace["argocd_apps"]

  tags = local.workspace["tags"]

}

Here the addonsare enabled or disable using flags, in oss_addonsmap also you can and the ArgoCD application Set to bootstrap the cluster.

2- Create the application set for addons:

kind: ApplicationSet
metadata:
  name: cluster-addons
  namespace: argocd
spec:
  syncPolicy:
    preserveResourcesOnDeletion: true
  generators:
  - clusters:
      selector:
        matchLabels:
          environment: control-plane
  template:
    metadata:
      name: 'cluster-addons'
    spec:
      project: default
      source:
        repoURL: '{{metadata.annotations.addons_repo_url}}'
        path: '{{metadata.annotations.addons_repo_basepath}}{{metadata.annotations.addons_repo_path}}'
        targetRevision: '{{metadata.annotations.addons_repo_revision}}'
        directory:
          recurse: true
      destination:
        namespace: 'argocd'
        name: '{{name}}'
      syncPolicy:
        automated:
          allowEmpty: true

The key point here is the generators, where the cluster selector is just deploy the application in the clusters that have the label environment equal to control plane.
The source for de applications is the addons_repo_url defined in gitops bridge stack terragrunt_aws_gitops_blueprint/infrastructure/containers/gitops_bridge/terragrunt.hcl.

3- Connect to the cluster using a k8s client.
Setup the kube context using aws cli:

aws eks update-kubeconfig --name my-cluster --region us-east-1 --alias my-cluster-alias

For example:

aws eks update-kubeconfig --name gitops-scale-dev-hub --region us-east-2 --alias labvel-devsecops-hub --profile labvel-dev

Run:

k9s --context labvel-devsecops-hub

for interactive mode or just find the default ArgoCD server admin secret and run kubectl to create a forwarding to connect with the argocd server.

kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" --context labvel-devsecops-hub | base64 -d; echo

kubectl port-forward svc/argo-cd-argocd-server -n argocd 8080:443 --context labvel-devsecops-hub

Finally through the web browser you can find the argo apps:

The application Set:

And the cluster metadata:

In the next part you can learn more about enable and disable addons and apps and go deeper using AWS controllers.

You can find the public code here:

velez94 / terragrunt_aws_gitops_blueprint

Public demo for Gitops bridge using terragrunt, OpenTofu and EKS

AWS GitOps Blueprint with Terragrunt

This project provides a blueprint for implementing GitOps on AWS using Terragrunt and Argo CD. It offers a structured approach to managing infrastructure as code and deploying applications across multiple environments.

The blueprint is designed to streamline the process of setting up a GitOps workflow on AWS, leveraging Terragrunt for managing Terraform configurations and Argo CD for continuous deployment. It includes configurations for essential AWS services such as EKS (Elastic Kubernetes Service) and VPC (Virtual Private Cloud), as well as GitOps components for managing cluster addons and platform-level resources.

Key features of this blueprint include:

Modular infrastructure setup using Terragrunt
EKS cluster configuration for container orchestration
VPC network setup for secure and isolated environments
GitOps bridge for seamless integration between infrastructure and application deployments
Argo CD ApplicationSets for managing cluster addons and platform resources
Environment-specific configurations for multi-environment deployments

Repository Structure

terragrunt_aws_gitops_blueprint/
├── common/
│

…

View on GitHub

Thanks for reading and sharing! ☺️⭐

DEV Community