DEV Community

Cover image for Understanding Terraform Drift Detection and Remediation
Amr Saafan for Nile Bits

Posted on • Originally published at nilebits.com

Understanding Terraform Drift Detection and Remediation

Introduction to Terraform and Infrastructure as Code (IaC)

We now manage and deploy infrastructure in a completely new way thanks to Infrastructure as Code (IaC). Consistent and repeatable infrastructure deployment is made possible by IaC through the use of configuration files. One of the industry's most widely used IaC tools is Terraform, which was created by HashiCorp. Users may collaborate, automate, and version infrastructure as code thanks to this feature.

However, maintaining infrastructure with Terraform is not without its challenges. One of the main issues is drift in the infrastructure. Infrastructure drift is the term for when the actual state of your infrastructure differs from the state that is defined in your Terraform setup. This page discusses Terraform drift detection and repair, providing code samples, thorough explanations, and suggested practices for effectively managing infrastructure drift.

What is Infrastructure Drift?

Infrastructure drift happens when changes are made to your infrastructure outside of Terraform's control. These changes can be intentional or accidental and may occur due to:

Manual changes made by administrators directly in the cloud console.

Changes made by other automation tools or scripts.

Modifications resulting from cloud provider updates or changes in service behavior.

Drift can lead to inconsistencies, unexpected behavior, and security vulnerabilities. Therefore, detecting and remediating drift is crucial to maintaining the desired state of your infrastructure.

How Terraform Manages State

Before diving into drift detection, it's essential to understand how Terraform manages state. Terraform uses a state file to keep track of the infrastructure it manages. This state file is a critical component, as it maps the configuration files to the real-world resources.

The state file is usually stored locally or remotely in a secure storage backend, such as AWS S3, HashiCorp Consul, or Terraform Cloud. Terraform uses this state file during operations to plan and apply changes to your infrastructure.

Here's an example of a simple Terraform configuration and the corresponding state file:

# main.tf
provider "aws" {
  region = "us-west-2"
}

resource "aws_instance" "example" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t2.micro"
}
Enter fullscreen mode Exit fullscreen mode

After running terraform apply, Terraform creates a state file (terraform.tfstate) that looks something like this:

{
  "version": 4,
  "terraform_version": "1.0.0",
  "resources": [
    {
      "mode": "managed",
      "type": "aws_instance",
      "name": "example",
      "provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
      "instances": [
        {
          "schema_version": 1,
          "attributes": {
            "ami": "ami-0c55b159cbfafe1f0",
            "instance_type": "t2.micro",
            "id": "i-1234567890abcdef0",
            "tags": null
          }
        }
      ]
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

The state file is used by Terraform to map resources in your configuration to real-world resources. Any changes made outside of Terraform's control can lead to drift.

Detecting Drift in Terraform

The command "terraform plan" is included into Terraform and may be used to identify drift. Terraform compares the desired state specified in your configuration files with the present state of your infrastructure when you run terraform plan. Terraform will indicate any differences that it finds.

Here's how you can use terraform plan to detect drift:

terraform plan

Enter fullscreen mode Exit fullscreen mode

The output will show any differences between the actual state and the desired state. If there's no drift, the output will indicate that no changes are needed. If there is drift, the output will show the necessary changes to reconcile the state.

For example:

# terraform plan output
...
  ~ aws_instance.example
      instance_type: "t2.micro" => "t2.small"
...
Enter fullscreen mode Exit fullscreen mode

In this example, the instance type has changed from t2.micro to t2.small, indicating drift.

Automating Drift Detection

Manually running terraform plan to detect drift is not always practical, especially in large or complex environments. Automating drift detection can help ensure that drift is identified and remediated promptly.

One approach to automate drift detection is to use CI/CD pipelines. Tools like Jenkins, GitHub Actions, GitLab CI, or CircleCI can be used to run terraform plan on a scheduled basis or whenever a change is made to the configuration files.

Here's an example of how you can set up a drift detection pipeline using GitHub Actions:

# .github/workflows/terraform-drift-detection.yml
name: Terraform Drift Detection

on:
  schedule:
    - cron: '0 0 * * *' # Run daily at midnight

jobs:
  drift-detection:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
        uses: actions/checkout@v2

      - name: Set up Terraform
        uses: hashicorp/setup-terraform@v1
        with:
          terraform_version: 1.0.0

      - name: Initialize Terraform
        run: terraform init

      - name: Run Terraform Plan
        run: terraform plan -detailed-exitcode
Enter fullscreen mode Exit fullscreen mode

In this example, the GitHub Actions workflow runs terraform plan daily at midnight. The -detailed-exitcode flag ensures that the workflow fails if there are any changes detected, which can then trigger notifications or further actions.

Remediating Drift in Terraform

Once drift is detected, the next step is remediation. Remediation involves updating the Terraform configuration to match the desired state or applying changes to the infrastructure to bring it back in line with the configuration.

There are two primary approaches to remediation:

Update Configuration Files: If the drift represents a desired change, update the Terraform configuration files to reflect the new state. After updating the configuration, run terraform apply to update the state file.

   # Update main.tf
   resource "aws_instance" "example" {
     ami           = "ami-0c55b159cbfafe1f0"
     instance_type = "t2.small" # Updated instance type
   }

   # Apply changes
   terraform apply
Enter fullscreen mode Exit fullscreen mode

Revert Changes: If the drift represents an unintended change, run terraform apply to revert the changes and bring the infrastructure back to the desired state.

   terraform apply

Enter fullscreen mode Exit fullscreen mode

In both cases, Terraform will update the state file to match the desired state.

Best Practices for Managing Drift

Managing drift effectively requires a combination of best practices and tooling. Here are some best practices to consider:

Use Remote State: Store your Terraform state file in a remote backend to ensure consistency and accessibility across your team.

Implement Version Control: Use version control systems like Git to track changes to your Terraform configuration files.

Automate Testing and Validation: Use CI/CD pipelines to automate testing, validation, and drift detection.

Restrict Manual Changes: Minimize manual changes to your infrastructure by enforcing the use of Terraform for all changes.

Regular Audits: Perform regular audits of your infrastructure to detect and remediate drift promptly.

Leverage Infrastructure Monitoring: Use infrastructure monitoring tools to detect changes in real-time and alert you to potential drift.

Code Example: Full Workflow

Let's walk through a full workflow example of managing drift with Terraform. This example will include a Terraform configuration, automation of drift detection, and remediation.

Terraform Configuration:

# main.tf
   provider "aws" {
     region = "us-west-2"
   }

   resource "aws_instance" "example" {
     ami           = "ami-0c55b159cbfafe1f0"
     instance_type = "t2.micro"
   }
Enter fullscreen mode Exit fullscreen mode

Initialize Terraform:

   terraform init

Enter fullscreen mode Exit fullscreen mode

Apply Configuration:

   terraform apply

Enter fullscreen mode Exit fullscreen mode

Automate Drift Detection: Create a GitHub Actions workflow:

   # .github/workflows/terraform-drift-detection.yml
   name: Terraform Drift Detection

   on:
     schedule:
       - cron: '0 0 * * *' # Run daily at midnight

   jobs:
     drift-detection:
       runs-on: ubuntu-latest
       steps:
         - name: Checkout repository
           uses: actions/checkout@v2

         - name: Set up Terraform
           uses: hashicorp/setup-terraform@v1
           with:
             terraform_version: 1.0.0

         - name: Initialize Terraform
           run: terraform init

         - name: Run Terraform Plan
           run: terraform plan -detailed-exitcode
Enter fullscreen mode Exit fullscreen mode

Remediation: If drift is detected (e.g., instance type changed), update the configuration and apply changes:

   # Update main.tf
   resource "aws_instance" "example" {
     ami           = "ami-0c55b159cbfafe1f0"
     instance_type = "t2.small" # Updated instance type
   }

   # Apply changes
   terraform apply
Enter fullscreen mode Exit fullscreen mode

Top comments (0)