Alexey Ryazhskikh

Posted on Nov 23, 2024

Multi-environment infrastructure with terraform variables files

#terraform

In our company we have thousands of resources managed by Terraform. Which are deployed to multiple environments (dev, staging, production) and different regions.

The key principles we have for our Terraform codebase are:

Use the same Terraform codebase (.tf files) for all environments (dev, stage, prod).
All environment specific settings should be managed via Terraform variable files (.tfvars).

Below is an our typical Terraform codebase structure:

src/
├── environments/
│   ├── dev.tfvars
│   ├── stage.tfvars
│   └── prod.tfvars
├── variables.tf
├── db_server.tf
├── main.tf
├── terraform.tf
├── providers.tf
└── ...

variables.tf file contains the variables definitions:

variable "resource_group_name" {
  description = "The resource group name to deploy db server"
  type = string
}

variable "location" {
  description = "The db server location"
  type = string
}

variable "enable_replication" {
  description = "Enable DB replication of resources to other region"
  type = bool
}

src/environments/dev.tfvars contains the environment specific settings:

resource_group_name = "rg-dev"
location = "eastus"
enable_replication = false

Each environment has corresponding terraform state. So we need to specify the state file and .tfvars file to run terraform apply command for specific environment:

terraform apply -var-file="src/environments/dev.tfvars" -state="dev.tfstate"
terraform apply -var-file="src/environments/stage.tfvars" -state="stage.tfstate"
terraform apply -var-file="src/environments/prod.tfvars" -state="prod.tfstate"

If you use terraform cloud, you probably need to specify workspace name with TF_WORKSPACE environment variable instead of state file.

Feature flags

The Terraform variables file can also store feature flags together with terraform modules.

This approach allows to test the Terraform code in the dev and stage environment before it is applied to other environments. If staging and production environments have the same settings we can have the same code coverage for production.

Terraform has no if-else logic, so the only way to implement feature flags is to use for_each and count statements.

In the following example we create an azure resource group and role assignments if the enable_replication variable is true:

variable "enable_replication" {
  type = bool
}

resource "azurerm_resource_group" "replica_rg" {
  count = var.enable_replication ? 1 : 0
  name     = "replica-rg"
  location = var.location   
}

resource "azurerm_role_assignment" "role_assignments" {
  count = var.enable_replication ? 1 : 0
  scope                = azurerm_resource_group.replica_rg[0].id
  role_definition_name = "Contributor"
  principal_id         = "12345678-1234-1234-1234-123456789"  
}

This approach is not perfect, because count condition should be added to each dependent resource. Better to group dependent resources into the local module, to have a single count condition for entire module. For example:

variable "enable_replication" {
  type = bool
}

module "replica_rg" {
  source = "./modules/replica_rg"
  count = var.enable_replication ? 1 : 0
  rg_name     = "replica-rg"
  contributor_id = "12345678-1234-1234-1234-123456789"
}

Terraform variables as DSL

The terraform variables definitions becomes another layer of abstraction: instead of defining particular resources we define business entities and feature settings.
In fact, .tfvars files management becomes programming on DSL language defined by terraform variables blocks..
For example, the definition for CI/CD build agents:

variables.tf:

variable "build_agents" {
  description = "Build agents settings"
  type = map(object({
    number_of_vms = number
    vm_size = optional(string, "Standard_N2_v2")
    private_network_access = optional(bool, false) # limit access to agent vms
  }))
}

dev.tfvars:

build_agents = {    
    build_pool = {
        number_of_vms = 10
        vm_size = "Standard_D2_v2"
        private_network_access = false
    }
    deployment_pool = {
        number_of_vms = 5
        vm_size = "Standard_D2_v2"
        private_network_access = true
    }
}

Here we are not defining azure resources, but our infrastructure assets, which, in fact could be implemented differently.

Control interface

The .tfvars approach allows to define the control interface for infrastructure operators.
So the settings in .tfvars are working like control panel in pilot cockpit hiding the underlying resources and their dependencies.
For example, if ci/cd admin needs to increase number of agents he don't need to search for resources he needs to reconfigure in terraform codebase. He just changes the number_of_vms in .tfvars file.

Refactoring

Naming for the terraform variables and object properties is a challenge.
Time to time we need to do refactoring of variables to change objects structure, or introduce the new properties for all objects. Which also leads changes in all .tfvars files and states.

Normally, such refactoring has the following steps:

Modifying variables definitions in variables.tf file.
Modifying all .tfvars files.
Modifying terraform code to support the changes.
Generating moved blocks in terraform code.

For .tfvars modification and code generation you can use python libraries like
python-hcl2.
Unfortunately, hcl2 parsers are not available for many other languages, so previously I converted .tfvars to json and used json as an intermediate format.
I used this go application which is a wrapper over official Hashicorp hcl2 go library: https://github.com/musukvl/tfvars-parser

Recently I created my own C# dotnet library to work with .tfvars files: amba-tfvars.
The library focused on .tfvars file refactoring.
It can extract not only terraform variables data, but also code comments from .tfvars files, which could be very important to keep during the .tfvars files transformation.
Sometimes it is important to keep original formatting so the library collects information about original maps and lists code style: if they were one-liners, or each property has its own line.

Conclusion

I think the .tfvars files approach is a good way to manage multi-environment Terraform codebase for huge projects. It allows naturally to implement feature flags and truck based development for Infrastructure as Code.

The article repository: https://github.com/musukvl/article-terraform-tfvars-infro

Top comments (3)

LH8PPL • Nov 24 '24

That is a great idea, I have about 50 aws accounts to manage, so I created a ci process In gitlab with scripts to automatically create a folder for each account, and have a central folder that i put template files to create the terraform files in each account for resources i need in all accounts, and I manually create terraform files in a specific account if I need.
Do you think your approach can make it better? You think I can incorporate your idea?

Alexey Ryazhskikh • Nov 25 '24 • Edited

@lh8ppl I assume you use templates because you need to have 50 providers configured and that is key limitation you have.
The tfvars approach will suite you only if you agree to have 50 terraform states and 50 separate terraform apply runs.
On the one hand if one provider is broken you will have ability to run other 49 applies.
On the other hand one state for all accounts generates only one plan, which easy to review.

If I have very few resources for each account I would continue with 1 state and templates, if the total number of resources becomes more than 1000 I would like to have 50 states.

LH8PPL • Nov 25 '24

Thanks, need to think about it

DEV Community

Multi-environment infrastructure with terraform variables files

Feature flags

Terraform variables as DSL

Control interface

Refactoring

Conclusion

Top comments (3)

Read next

8 API Monitoring Tools Every Developer Should Know

Building and Monetizing AI Model APIs

8 Essential API Security Best Practices

End-to-End API Testing Guide and Best Practices