Matt Bacchi for AWS Community Builders

Posted on Jan 8 • Originally published at bacchi.org

Switching to the Terraform S3 Backend with Native State File Locks

#aws #terraform #s3

Terraform is a flexible, cloud agnostic infrastructure as code (IaC) tool. As it constructs infrastructure resources, it builds a ledger used to track resources that have successfully been created as well as additional metadata (such as id.) Terraform stores this state in a binary formatted file with the extension .tfstate.

What is the Terraform S3 Backend

The Terraform state file described above by default is stored in the same directory as the Terraform infrastructure definition files you wrote. But with this state on your local computer it is vulnerable to being lost or overwritten, and it cannot be shared with or managed by other team members. Using a distributed storage mechanism to store this state file is straightforward with Terraform, and they provide many backend options. For AWS users, the Terraform S3 Backend allows storing this state file in AWS S3.

What is State File Locking

So now we know this state file is stored in distributed object storage (AWS S3,) and more than one user can manage resources within it. But to safely manage this state file, we require a locking mechanism (often called a mutex in computing) that disallows multiple users from attempting to write to it at once. We would have a mess if we allowed more than one user to write to it at the same time, potentially losing the resources that were created and intended to be stored in this state file.

Terraform state locking capability has been available for the S3 backend for quite some time. But unfortunately it has required an additional DynamoDB table to be created that tracked the state file locking status.

Until now.

This DynamoDB table is an extra resource that seemed tangential to the Terraform state backend process and complicated the process of configuring your backend. That requirement has been rendered obsolete with a recent feature that was added to AWS S3, conditional writes.

AWS S3 Conditional Writes

In August, AWS announced the addition of the S3 Conditional Writes feature. This feature of AWS S3 compels S3 clients to check for the existence of an object before writing it, and if it already exists to fail. If the file exists the S3 client returns a 412 Precondition Failed error response.

S3 Conditional Write Support Added in Terraform v1.10

Support for S3 Conditional Writes was added to Terraform release v1.10. (If you want to see some great background and architecture detail from the developer Bruno Schaatsbergen about the implementation look here.)

Thankfully this is completely transparent to the Terraform user (unless it returns an error attempting to lock the state file.)

Configuring the S3 Backend to Use Native State File Locking

The Terraform documentation describes the new configuration parameter use_lockfile to enable S3 state locking. It also currently describes the old DynamoDB method as still available. (It's common for software to support both an old and new related feature for some time until all users can migrate to the new methodology.)

This means you can actually use both locking mechanisms at the same time. But this is both unnecessary overkill, and could lead to confusion and problems. I would recommend that you replace your old DynamoDB locking configuration with S3 state locking immediately. It will be cheaper (without having to pay for an extra DynamoDB table or reads/writes to that table,) and less error prone.

Here's how to change your Terraform backend configuration.

Terraform Configuration Specifics

The old DynamoDB method used a configuration parameter named dynamodb_table.

The new S3 state locking method uses a configuration parameter named use_lockfile.

Both are covered in the current Terraform documentation.

Version Constraints

We also recommend that when you switch to the S3 native state locking method, you set the Terraform configuration parameter required_version to the minimum version that supports S3 native state file locking. If you have users with an earlier version of Terraform, they won't be able to use this feature and will see errors if the use_lockfile parameter is enabled. Setting the required_version to v1.10 at a minumum makes your configuration more resilient and doesn't let someone attempt to create or update resources using an older Terraform version. Think of it as a prerequisite.

This Terraform version constraints configuration is documented here, and looks like this:

  required_version = "~> 1.10"

Sample Configuration

With all this background information about the configuration parameters, here's a sample Terraform configuration with both the old and new parameters present:

provider "aws" {
  region = "us-west-2"
}

terraform {
  backend "s3" {
    encrypt        = true
    bucket         = "tfstate-lock-test-0bhfxn8x1"
    region         = "us-west-2"
    key            = "example/terraform-state-lock-test.tfstate"
    dynamodb_table = "tfstate-lock-test"
    use_lockfile   = true
  }

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "5.82.2"
    }

  # This sets the version constraint to a minimum of 1.10 for native state file locking support
  required_version = "~> 1.10"
}

In order to switch away from using the old DynamoDB locking method, remove that dynamodb_table configuration parameter.

State File Locking in Action

We now know how to configure Terraform S3 native state file locking, but how does it perform and what will we see if you cannot get the mutex to lock the file?

I've tested both methods and will show you the output from each when state file locking fails.

Error from DynamoDB State File Locking

The old DyamoDB state file locking method would return an error such as the below:

$ terraform apply plan.out 
Acquiring state lock. This may take a few moments...
╷
│ Error: Error acquiring the state lock
│ 
│ Error message: operation error DynamoDB: PutItem, https response error StatusCode: 400, RequestID: CP9U4IRC04OBONKIQHM4LUOLLJVV4KQNSO5AEMVJF66Q9ASUAAJG, ConditionalCheckFailedException: The conditional request failed
│ Lock Info:
│   ID:        39f64263-4ad8-a563-faf7-f28f8a042a00
│   Path:      tfstate-lock-test-0bhfxn8x1/example/terraform-state-lock-test.tfstate
│   Operation: OperationTypeApply
│   Who:       user@hostname
│   Version:   1.10.3
│   Created:   2025-01-08 03:38:13.121564614 +0000 UTC
│   Info:      
│ 
│ 
│ Terraform acquires a state lock to protect the state from being written
│ by multiple users at the same time. Please resolve the issue above and try
│ again. For most commands, you can disable locking with the "-lock=false"
│ flag, but this is not recommended.

Error from S3 Backend Native State File Locking

The new S3 backend native state file locking method will return an error that looks like this:

$ terraform apply plan2.out
╷
│ Error: Error acquiring the state lock
│ 
│ Error message: operation error S3: PutObject, https response error StatusCode: 412, RequestID: N0BGGAFN8V1N2WCQ, HostID: 4G32Xus/86u8CehvNbvzv8NoqiyTvsBWGYBXYK6E8Vn0E4+wom+6Jm6WFVUFSaCE7C1TBP5Vauo=, api error PreconditionFailed: At least one of the pre-conditions you specified did not hold
│ Lock Info:
│   ID:        837482e8-441e-9ff6-d30b-333ee83d8fc4
│   Path:      tfstate-lock-test-0bhfxn8x1/example/terraform-state-lock-test.tfstate
│   Operation: OperationTypeApply
│   Who:       user@hostname
│   Version:   1.10.3
│   Created:   2025-01-08 03:43:11.691479913 +0000 UTC
│   Info:      
│ 
│ 
│ Terraform acquires a state lock to protect the state from being written
│ by multiple users at the same time. Please resolve the issue above and try
│ again. For most commands, you can disable locking with the "-lock=false"
│ flag, but this is not recommended.

Dealing with Stale State File Locks

NOTE: After publishing this blog, I was asked whether the terraform force-unlock command still worked. I tested this and can say it does perform as expected with the old S3 DynamoDB state file locking mechanism. Here's an example session showing this:

$ terraform apply plan2.out
╷
│ Error: Error acquiring the state lock
│ 
│ Error message: operation error S3: PutObject, https response error StatusCode: 412, RequestID: NGQM2VGSTSDWPCZF, HostID: dzPZArnTy31oVeuVLI8Dm61HXnuL6M3R2tlWFe2suztP0zkh4Bwv/eJFBLqVfitAI40I5BvIeds=, api error PreconditionFailed: At least one of the pre-conditions you specified did not hold
│ Lock Info:
│   ID:        bde40e3b-2bfb-f577-fea5-44923c9d5275
│   Path:      tfstate-lock-test-0bhfxn8x1/example/terraform-state-lock-test.tfstate
│   Operation: OperationTypeApply
│   Who:       user@hostname
│   Version:   1.10.3
│   Created:   2025-01-08 16:55:35.808464751 +0000 UTC
│   Info:      
│ 
│ 
│ Terraform acquires a state lock to protect the state from being written
│ by multiple users at the same time. Please resolve the issue above and try
│ again. For most commands, you can disable locking with the "-lock=false"
│ flag, but this is not recommended.
╵

You can remove the lock, but only do this if you know the lock is stale. To do this, first note the lock ID above, then run the force-unlock command:

$ terraform force-unlock bde40e3b-2bfb-f577-fea5-44923c9d5275
Do you really want to force-unlock?
  Terraform will remove the lock on the remote state.
  This will allow local Terraform commands to modify this state, even though it
  may still be in use. Only 'yes' will be accepted to confirm.

  Enter a value: yes

Terraform state has been successfully unlocked!

The state has been unlocked, and Terraform commands should now be able to
obtain a new lock on the remote state.

Delete Your Old DynamoDB Tables

Now that you've switched from using the old Terraform DynamoDB locking to the new S3 native state file locking, you can remove the old DynamoDB table used to track these locks!

Yay, one less resource to manage and be charged by AWS for.

Summary

Hopefully you see the advantage of using the new Terraform S3 backend native state file locking mechanism, and how to configure it for your environment.

Happy Terraforming!