TL;DR:
While deploying an OpenSearch domain with logging via CDK, I hit a “Resource limit exceeded” error due to more than 10 resource access policies for CloudWatch log groups. To fix this, I created CloudWatch log groups in CDK and passed their ARNs to a Lambda function. In the Lambda, I used the AWS SDK’s putResourcePolicy
to update the existing policy with OpenSearch as the principal and attached the log group ARNs. I also set suppressLogsResourcePolicy: true
in OpenSearch to stop CDK from creating resource policies automatically. This bypassed the limit and gave me full control over the policies.
Introduction
While deploying my CDK stack for OpenSearch with logging enabled, I encountered this error:
Received response status [FAILED] from custom resource. Message returned: Resource limit exceeded.
This error occurs when attempting to create more than 10 resource access policies for CloudWatch. Finding a solution wasn't straightforward, especially since there weren't many resources addressing this specific issue. Let me share how I resolved it.
P.S. This works for updating existing resource policies for any service and is NOT specific to OpenSearch — just skip the last step.
The Solution
Step 1: Create Log Groups required
First, I created the necessary CloudWatch log groups with appropriate naming and retention policies:
this.opensearchAppLogGroup = new logs.LogGroup(this, `${props.id}-opensearch-app-loggroup`, {
logGroupName: `/aws/opensearch/app`,
removalPolicy: props.domainRemovalPolicy,
});
this.opensearchSlowIndexLogGroup = new logs.LogGroup(this, `${props.id}-opensearch-slowIndex-loggroup`, {
logGroupName: `/aws/opensearch/slow-index`,
removalPolicy: props.domainRemovalPolicy,
});
this.opensearchSlowSearchLogGroup = new logs.LogGroup(this, `${props.id}-opensearch-slowSearch-loggroup`, {
logGroupName: `/aws/opensearch/slow-search`,
removalPolicy: props.domainRemovalPolicy,
});
Step 2: Set Up Lambda to Update the Existing Resource Policy
Implementing the SDK inside CDK was a gamechanger for me. I wrote a Lambda function that checks for existing CloudWatch resource policies, adds missing log group ARNs, and updates or creates a statement allowing OpenSearch (es.amazonaws.com) to access the log groups. It then applies the updated policy using put_resource_policy to ensure OpenSearch has the correct permissions.
import json
import boto3
import os
cloudwatch_logs = boto3.client('logs')
def handler(event, context):
policy_name = os.environ.get('POLICY_NAME')
new_resources = os.environ.get('LOG_GROUP_ARN', None)
if isinstance(new_resources, str):
new_resources = new_resources.split(',')
else:
new_resources = []
try:
response = cloudwatch_logs.describe_resource_policies()
# Check if the specified policy exists
existing_policy = next(
(policy for policy in response.get('resourcePolicies', [])
if policy.get('policyName') == policy_name),
None
)
if existing_policy:
policy_document = json.loads(existing_policy['policyDocument'])
else:
print('Policy not found. Creating a new policy.')
policy_document = {
"Version": "2012-10-17",
"Statement": []
}
new_resources_str = [str(resource) for resource in new_resources]
existing_policy_document_str = json.dumps(policy_document)
resources_to_add = []
for new_resource in new_resources_str:
if new_resource not in existing_policy_document_str:
print(f"New resource {new_resource} not found, adding it.")
resources_to_add.append(new_resource)
if resources_to_add:
es_statement = next(
(stmt for stmt in policy_document['Statement']
if stmt['Principal'].get('Service') == 'es.amazonaws.com'),
None
)
if es_statement:
if 'Resource' not in es_statement:
es_statement['Resource'] = []
es_statement['Resource'].extend(resources_to_add)
else:
es_statement = {
"Effect": "Allow",
"Principal": {
"Service": "es.amazonaws.com"
},
"Action": [
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": resources_to_add
}
policy_document['Statement'].append(es_statement)
else:
print('No new resources to add.')
update_params = {
'policyName': policy_name,
'policyDocument': json.dumps(policy_document),
}
update_response = cloudwatch_logs.put_resource_policy(**update_params)
return {'status': 'SUCCESS'}
except Exception as e:
print('Error occurred while updating policy:', str(e))
return {'status': 'FAILURE', 'error': str(e)}
Step 3: Create a Custom Resource
After creating my Lambda function, I needed to trigger it at the right time during stack deployment.
Step 4: Update OpenSearch Domain Configuration
Finally, I configured OpenSearch to use the log groups and disabled automatic resource policy creation:
logging: {
appLogEnabled: true,
appLogGroup: opensearchAppLogGroup,
slowIndexLogEnabled: true,
slowIndexLogGroup: opensearchSlowIndexLogGroup,
slowSearchLogEnabled: true,
slowSearchLogGroup: opensearchSlowSearchLogGroup,
},
suppressLogsResourcePolicy: true,
Gotchas and Lessons Learned
What Didn't Work
I tried using logs.fromLogGroupName()
with its addToResourcePolicy
method, but that gave me the same resource limit error. Apparently, you can't modify resources outside your stack this way (thanks, GitHub issue #6548!).
Mistakes along the way
Initially, I created both the log groups and the Lambda function within the same construct, while OpenSearch was placed in a different construct that received the log groups as props. This setup caused issues because the log groups were being referenced by the correct log group name in the same construct and a different one outside of the construct in the opensearch construct, which led to the Lambda failing to update the policy correctly. The solution came when I realized the importance of ensuring that the log groups were referenced by their concrete names. This insight came from reviewing the CDK implementation, which can be found in this file and the docs of it say:
"Returns an environment-sensitive token that should be used for the resource's 'name' attribute (e.g., bucket.bucketName). Normally, this token will resolve to
nameAttr
, but if the resource is referenced across environments, it will be resolved tothis.physicalName
, which will be a concrete name."
Final Thoughts
While this solution worked for me, I’m still relatively new to CDK, so there might be better approaches out there. I just wanted to document my findings in one place, hoping that it might help someone, even in the smallest way. The GitHub issues I referenced were incredibly helpful in providing context and guiding me toward a solution. I honestly wouldn’t have been able to find a resolution without those discussions. If you have a more elegant solution or if I’ve made any mistakes in my understanding, I’d truly appreciate hearing your thoughts!
References
https://github.com/aws/aws-cdk/pull/28707
https://github.com/aws/aws-cdk/issues/23637
https://github.com/aws/aws-cdk/issues/6548
https://github.com/aws/aws-cdk/blob/main/packages/aws-cdk-lib/aws-logs/lib/log-group.ts
Top comments (1)
Very helpful , it saved me days on my work.
Thanks alot