AWS CodeDeploy Blue/green deployments can automatically copy the previous Auto Scaling Group configuration. One thing that we discovered the hard way was: after a deployment the newly created Auto Scaling Group, that had a Target Group attached, no longer has it.
It's is a known issue that hasn't been fixed yet. You could have noticed it incidentally or you could have noticed, as in our case, due a major outage.
How could that lead to an outage?
Target Groups are responsible for checking the health of instances and remove them from the Load Balancer, when they are unhealthy. An Auto Scaling Group configured to maintain a fixed number of instances takes care of launching new instances to replace unhealthy ones and terminate the unhealthy instances after the replacements report as healthy.
Since the Auto Scaling Group created by CodeDeploy is not attached to the existing Target Group used by the original Auto Scaling Group from which CodeDeploy copied information, the unhealthy instances will be removed from the Load Balancer by the Target Group but no new instances will be launched by the new Auto Scaling Group.
If your unhealthy instances can't recover by themselves that means your application will be completely down. In our situation it turned out that the we were using a version of the Erlang VM that would suddenly crash and make our instances unhealthy.
How could we fix it?
We imagined that we could attach the Target Group into to newly created Auto Scaling Group as part of the deployment process. As seen in the documentation the last hook for a Blue/green deployment is the AfterAllowTraffic
hook, so we decided to attach them in it.
We've created a simple bash script that uses the AWS Command Line Interface (CLI) to find the necessary information and then attach the Target Group to the Auto Scaling Group, again using CLI. You can see the Environment Variables available for hooks to fetch information in the official documentation.
targetGroupName=$(aws deploy get-deployment --deployment-id $DEPLOYMENT_ID --query "deploymentInfo.loadBalancerInfo.targetGroupInfoList[*].name" --output text)
targetGroupArn=$(aws elbv2 describe-target-groups --names $targetGroupName --query "TargetGroups[*].TargetGroupArn" --output text)
autoscalingGroupName=$(aws deploy get-deployment --deployment-id $DEPLOYMENT_ID --query "deploymentInfo.targetInstances.autoScalingGroups" --output text)
aws autoscaling attach-load-balancer-target-groups --auto-scaling-group-name "$autoscalingGroupName" --target-group-arns "$targetGroupArn"
To attach the Target Group to the Auto Scaling Group we need the Auto Scaling Group's name and the Target Group's Amazon Resource Name (ARN). We find the Target Group's name using the $DEPLOYMENT_ID
of the current deploy and then use that to find the Target Group's ARN. We find the Auto Scaling Group's name with the same call we used to find the Target Group's name but with a different query. Finally we attach them using the attach-load-balancer-target-groups
command.
It's worth mentioning that to be able to run those commands your instances would need permission to the following actions:
codedeploy:GetDeployment
elasticloadbalancing:DescribeTargetGroups
autoscaling:AttachLoadBalancerTargetGroups
That fixes the problem of having the Auto Scaling Group not being attached to the Target Group after a deploy. But what happens when the Auto Scaling Group decides to launch a new instance?
Auto Scaling Group decides to launch a new instance
When the Auto Scaling Group decides to launch a new instance, a new deploy will be triggered for the that instance after it is started. But that deploy has a different Initiating Event. A normal Blue/green deployment would have a Initiating Event as user
while a deploy created by an Auto Scaling Group action would have a Initiating Event as autoScaling
.
Important to us, though, is that the information used by our naive script in the last section would be different. More specifically, since the deployment is not copying the Auto Scaling Group configuration this time, there's no information about the Auto Scaling Group. So our command to find the autoscalingGroupName
would return "None"
. That leads to a infinite loop of instances failing the deployment and the Auto Scaling Group launching new instances and trying to deploy to them again.
Since the Target Group would already been attached to the Auto Scaling Group when the script ran in the AfterAllowTraffic
during the Blue/green deployment, we can change our script to only attach them if the autoscalingGroupName
is different from "None"
.
targetGroupName=$(aws deploy get-deployment --deployment-id $DEPLOYMENT_ID --query "deploymentInfo.loadBalancerInfo.targetGroupInfoList[*].name" --output text)
targetGroupArn=$(aws elbv2 describe-target-groups --names $targetGroupName --query "TargetGroups[*].TargetGroupArn" --output text)
autoscalingGroupName=$(aws deploy get-deployment --deployment-id $DEPLOYMENT_ID --query "deploymentInfo.targetInstances.autoScalingGroups" --output text)
if [ "$autoscalingGroupName" != "None" ]
then
aws autoscaling attach-load-balancer-target-groups --auto-scaling-group-name "$autoscalingGroupName" --target-group-arns "$targetGroupArn"
fi
Conclusion
Running this script as part of our deployment process helped us to mitigate an outage. Bear in mind that we see the use of the aforementioned bash script as a temporary fix while we wait for the issue to be fixed in the AWS CodeDeploy side.
After further testing and simulation of the case that caused the outage, we're now confident that the replacement of unhealthy instances are working as expected. That brought us immense peace of mind.
Top comments (6)
I am having this issue but the problem is that I get the error just after the pipeline triggers the deploy therefore I can't follow/use this approach, any ideas? I am using EC2 instances.
It works only after creating the deployment for the first time, but when the pipeline triggers the deployment, the autoscaling groups get copied (without the target groups) and therefore It can proceed with the deployment.
It works only when using classic ELB.
Sorry, I've just saw your question now. Unfortunately I don't work at that company anymore to be able to troubleshoot with you. Did you get around the problem?
Hi, I was missing this in the role policy:
Are you sure about that? I got the policies right, but still getting the same error just after triggering the pipeline.
Indeed yesterday my pipeline returned the same error, after a couple of hours it worked again, I don't know exactly if is something from the AWS side or if they are working on it seems still an open issue: forums.aws.amazon.com/thread.jspa?...
we were able to solve this for Windows instances as well by implementing a couple powershell scripts as part of your CodeDeploy package and referencing them in the appspec.yml
appspec.yml
version: 0.0
os: windows
files:
- source: \
destination: c:\your\application\path
hooks:
BeforeInstall:
- location: \add-target-group.ps1
add-target-group.ps1
# Set script directory as variable
$scriptDir = Split-Path -Path $MyInvocation.MyCommand.Definition -Parent
# Source Script to install required PS Modules
. $scriptDir\codedeploy-prerequisites.ps1
# Source Script to set ARN values
. $scriptDir\target-group-values.ps1
# Set New AutoScaling Group Name
$new_asg = "CodeDeploy_" + $env:DEPLOYMENT_GROUP_NAME + "_" + $env:DEPLOYMENT_ID
Add-ASLoadBalancerTargetGroup -AutoScalingGroupName $new_asg -TargetGroupARNs $TARGET_GROUP_ARNS
codedeploy-prerequisites.ps1
# Install AWSTools modules for PowerShell for adding Target Group
Install-PackageProvider -Name NuGet -MinimumVersion 2.8.5.201 -Force
Install-Module -Name AWS.Tools.Installer -Force
Install-AWSToolsModule -Name AWS.Tools.Autoscaling -Force -AllowClobber
target-group-values.ps1
# If you have multiple deployments, you can set this file up with If statements that check the current $env:DEPLOYMENT_GROUP_NAME against specific values
If ($env:DEPLOYMENT_GROUP_NAME -eq 'My_Staging_DepGroup') {
$TARGET_GROUP_ARNS = "arn:aws:elasticloadbalancing:...YOUR_STAGING_ARN"
} ElseIf ($env:DEPLOYMENT_GROUP_NAME -eq 'My_Production_Devgroup') {
$TARGET_GROUP_ARNS = "arn:aws:elasticloadbalancing:...YOUR_PRODUCTION_ARN"
} Else {
# Write error to file on instance
Write-Host "Unknown Deployment Group. Target Group ARN Not Set" | Out-File c:\codedeploy_Add-ASLoadBalancer
}