Introduction
In the recent blog post Building a Data Ingestion Solution for Amazon Bedrock Knowledge Bases, we created a data ingestion solution that includes job completion notifications with a status pull mechanism. Not satisfied with how frequently the Lambda function must run to check job statuses, I looked into whether a push mechanism is available.
From my research, I found that Bedrock Knowledge Bases supports observability logs and that it logs events related to content ingestion. With support for log delivery to CloudWatch Logs, it unlocks the possibility of using a subscription filter to push ingestion job completion log events. Consequently, I dedicated this blog post to reviewing this feature and determining how to enable it efficiently using Terraform.
With this context, let’s first look at how CloudWatch log delivery works in general and how it applies to Bedrock Knowledge Bases.
Creating a Delivery Source for the Knowledge Base
Bedrock Knowledge Bases is one of the AWS services that uses the log delivery feature in CloudWatch Logs to write vended logs. This is a framework that provides a standard interface to configure logging, which typically involves a delivery source, a delivery destination, and a delivery that enables logging by linking the two.
As per Monitor knowledge bases using CloudWatch Logs, Bedrock Knowledge Bases currently only support application logs. Thus, we can create the delivery source in Terraform using the aws_cloudwatch_log_delivery_source
resource as follows:
resource "aws_cloudwatch_log_delivery_source" "kb_logs" {
count = var.enable_kb_log_delivery_cloudwatch_logs || var.enable_kb_log_delivery_s3 || var.enable_kb_log_delivery_data_firehose ? 1 : 0
name = "bedrock-kb-${var.kb_id}"
log_type = "APPLICATION_LOGS"
resource_arn = "arn:${local.partition}:bedrock:${local.region}:${local.account_id}:knowledge-base/${var.kb_id}"
}
Notice that there is a condition to create the resource only if one of the log delivery options is enabled using variables, which will also be used in the destination-specific configurations that are explained in subsequent sections. This makes the configuration more generic and pluggable to your Terraform configuration that manages your Bedrock Agents and Knowledge Bases.
Sending Logs to CloudWatch Logs
To send logs to CloudWatch Logs, we need to create a log group and configure it as destination for delivery. Creating a log group is simple enough using the aws_cloudwatch_log_group
resource. The log group name should follow the default format provided by AWS, which is /aws/vendedlogs/bedrock/knowledge-base/APPLICATION_LOGS/<KB_ID>
, where <KB_ID>
is the Bedrock knowledge base ID.
Using the log group for log delivery requires a log group resource policy. As per the AWS documentation, CloudWatch Logs can automatically add the appropriate policy if the log group does not have a resource policy, and the user setting up the logging has the appropriate permissions. Although, for the sake of completeness, we should manually create the resource policy as described in the aforementioned documentation using the aws_cloudwatch_log_resource_policy
resource.
Lastly, we need to create a delivery destination for it using the aws_cloudwatch_log_delivery_destination
resource, and then establish the delivery from the source (i.e., the knowledge base) to the destination (i.e., the log group) using the aws_cloudwatch_log_delivery
resource. The resulting Terraform configuration should look like the following:
resource "aws_cloudwatch_log_group" "kb_logs" {
count = var.enable_kb_log_delivery_cloudwatch_logs ? 1 : 0
name = "/aws/vendedlogs/bedrock/knowledge-base/APPLICATION_LOGS/${var.kb_id}"
}
resource "aws_cloudwatch_log_resource_policy" "kb_logs" {
count = var.enable_kb_log_delivery_cloudwatch_logs ? 1 : 0
policy_name = "bedrock-kb-${var.kb_id}-policy"
policy_document = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "AWSLogDeliveryWrite20150319"
Effect = "Allow"
Principal = {
Service = ["delivery.logs.amazonaws.com"]
}
Action = [
"logs:CreateLogStream",
"logs:PutLogEvents"
]
Resource = ["${aws_cloudwatch_log_group.kb_logs[0].arn}:log-stream:*"]
Condition = {
StringEquals = {
"aws:SourceAccount" = ["${local.account_id}"]
},
ArnLike = {
"aws:SourceArn" = ["arn:${local.partition}:logs:${local.region}:${local.account_id}:*"]
}
}
}
]
})
}
resource "aws_cloudwatch_log_delivery_destination" "kb_logs_cloudwatch_logs" {
count = var.enable_kb_log_delivery_cloudwatch_logs ? 1 : 0
name = "bedrock-kb-${var.kb_id}-cloudwatch-logs"
delivery_destination_configuration {
destination_resource_arn = aws_cloudwatch_log_group.kb_logs[0].arn
}
depends_on = [aws_cloudwatch_log_resource_policy.kb_logs]
}
resource "aws_cloudwatch_log_delivery" "kb_logs_cloudwatch_logs" {
count = var.enable_kb_log_delivery_cloudwatch_logs ? 1 : 0
delivery_destination_arn = aws_cloudwatch_log_delivery_destination.kb_logs_cloudwatch_logs[0].arn
delivery_source_name = aws_cloudwatch_log_delivery_source.kb_logs[0].name
}
Sending Logs to S3
The process to enable S3 as a delivery destination follows a similar pattern. The first step is to create the S3 bucket using the aws_s3_bucket
resource with a bucket policy that provides the appropriate permissions for log delivery as described in the AWS documentation. Note that if you are using SSE-KMS for server-side encryption, you’ll also add the appropriate permissions to the key policy for the CMK. For completeness, we also choose not to rely on CloudWatch Logs to set the bucket policy and instead use the aws_s3_bucket_policy
resource to manage it.
We also need to create a delivery destination for it using the aws_cloudwatch_log_delivery_destination
resource, then establish the delivery from the source (i.e. the knowledge base) and the destination (i.e. the S3 bucket) using the aws_cloudwatch_log_delivery
resource. Note that updating multiple aws_cloudwatch_logs_delivery
resources in parallel will cause concurrency issues, so we must ensure that they are created sequentially using the depends_on
meta-argument. In this case, the delivery resource for S3 depends on that of CloudWatch Logs.
The resulting Terraform configuration should look like the following:
resource "aws_s3_bucket" "kb_logs_s3" {
count = var.enable_kb_log_delivery_s3 ? 1 : 0
bucket = "bedrock-kb-logs-${lower(var.kb_id)}-${local.region_short}-${local.account_id}"
force_destroy = true
}
resource "aws_s3_bucket_policy" "kb_logs_s3" {
count = var.enable_kb_log_delivery_s3 ? 1 : 0
bucket = aws_s3_bucket.kb_logs_s3[0].id
policy = jsonencode({
Version = "2012-10-17"
Id = "AWSLogDeliveryWrite20150319"
"Statement" : [
{
Sid = "AWSLogDeliveryWrite171157658"
Effect = "Allow"
Principal = {
Service = "delivery.logs.amazonaws.com"
}
Action = "s3:PutObject"
Resource = "${aws_s3_bucket.kb_logs_s3[0].arn}/AWSLogs/${local.account_id}/bedrock/knowledgebases/*"
Condition = {
StringEquals = {
"aws:SourceAccount" = "${local.account_id}"
"s3:x-amz-acl" = "bucket-owner-full-control"
}
ArnLike = {
"aws:SourceArn" = "${aws_cloudwatch_log_delivery_source.kb_logs[0].arn}"
}
}
}
]
})
}
resource "aws_cloudwatch_log_delivery_destination" "kb_logs_s3" {
count = var.enable_kb_log_delivery_s3 ? 1 : 0
name = "bedrock-kb-${var.kb_id}-s3"
delivery_destination_configuration {
destination_resource_arn = aws_s3_bucket.kb_logs_s3[0].arn
}
depends_on = [aws_s3_bucket_policy.kb_logs_s3[0]]
}
resource "aws_cloudwatch_log_delivery" "kb_logs_s3" {
count = var.enable_kb_log_delivery_s3 ? 1 : 0
delivery_destination_arn = aws_cloudwatch_log_delivery_destination.kb_logs_s3[0].arn
delivery_source_name = aws_cloudwatch_log_delivery_source.kb_logs[0].name
depends_on = [aws_cloudwatch_log_delivery.kb_logs_cloudwatch_logs]
}
Sending Logs to Data Firehose
Sending logs to Data Firehose is slightly more involved because of the Firehose delivery stream’s configuration. Since this blog post does not focus on the downstream destination at the Firehose level, we will use an S3 bucket with basic configuration. To set up a Firehose delivery stream, we first need to create an IAM role that the delivery stream uses to send data to its destination (that is, the S3 bucket). Controlling access with Amazon Data Firehose provides IAM policy examples for different configuration, including one for an S3 destination. To create the Firehose delivery stream, we use the aws_kinesis_firehose_delivery_stream
resource. One thing to know is that the Firehose delivery stream needs to have the tag LogDeliveryEnabled
set to true
, which the service-linked role that CloudWatch Logs creates uses to write to Firehose delivery streams.
We also need to create a delivery destination for it using the aws_cloudwatch_log_delivery_destination
resource, then establish the delivery from the source (i.e. the knowledge base) and the destination (i.e. the S3 bucket) using the aws_cloudwatch_log_delivery
resource. To ensure that the delivery resources are created sequentially so to avoid concurrent modification issues, this delivery resource depends on that of S3.
The resulting Terraform configuration should look like the following:
resource "aws_s3_bucket" "kb_logs_data_firehose" {
count = var.enable_kb_log_delivery_data_firehose ? 1 : 0
bucket = "bedrock-kb-logs-data-firehose-${lower(var.kb_id)}-${local.region_short}-${local.account_id}"
force_destroy = true
}
resource "aws_iam_role" "kb_logs_data_firehose" {
count = var.enable_kb_log_delivery_data_firehose ? 1 : 0
name = "S3RoleForDataFirehose-bedrock-kb-logs-${var.kb_id}"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "firehose.amazonaws.com"
}
Condition = {
StringEquals = {
"sts:ExternalId" = "${local.account_id}"
}
}
}
]
})
}
resource "aws_iam_role_policy" "kb_logs_data_firehose" {
count = var.enable_kb_log_delivery_data_firehose ? 1 : 0
name = "S3PolicyForDataFirehose-bedrock-kb-logs-${var.kb_id}"
role = aws_iam_role.kb_logs_data_firehose[0].name
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = [
"s3:AbortMultipartUpload",
"s3:GetBucketLocation",
"s3:GetObject",
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:PutObject"
]
Effect = "Allow"
Resource = [
aws_s3_bucket.kb_logs_data_firehose[0].arn,
"${aws_s3_bucket.kb_logs_data_firehose[0].arn}/*"
]
}
]
})
}
resource "aws_kinesis_firehose_delivery_stream" "kb_logs" {
count = var.enable_kb_log_delivery_data_firehose ? 1 : 0
name = "bedrock-kb-logs-${var.kb_id}"
destination = "extended_s3"
extended_s3_configuration {
role_arn = aws_iam_role.kb_logs_data_firehose[0].arn
bucket_arn = aws_s3_bucket.kb_logs_data_firehose[0].arn
}
tags = {
"LogDeliveryEnabled" = "true"
}
depends_on = [aws_iam_role_policy.kb_logs_data_firehose]
}
resource "aws_cloudwatch_log_delivery_destination" "kb_logs_data_firehose" {
count = var.enable_kb_log_delivery_data_firehose ? 1 : 0
name = "bedrock-kb-${var.kb_id}-data-firehose"
delivery_destination_configuration {
destination_resource_arn = aws_kinesis_firehose_delivery_stream.kb_logs[0].arn
}
}
resource "aws_cloudwatch_log_delivery" "kb_logs_data_firehose" {
count = var.enable_kb_log_delivery_data_firehose ? 1 : 0
delivery_destination_arn = aws_cloudwatch_log_delivery_destination.kb_logs_data_firehose[0].arn
delivery_source_name = aws_cloudwatch_log_delivery_source.kb_logs[0].name
depends_on = [aws_cloudwatch_log_delivery.kb_logs_s3]
}
Testing the Configuration
✅You can find the complete Terraform configuration and source code in the
4_kb_logging
directory in this GitHub repository.
To deploy and test the configuration, you need a knowledge base with at least one data source that has content to ingest either in an S3 bucket or a crawlable website. You can set this up in the Bedrock console using the vector database quick start options. Alternatively, deploy a sample knowledge base using the Terraform configuration from my blog post How To Manage an Amazon Bedrock Knowledge Base Using Terraform. This configuration is also available in the same GitHub repository under the 2_knowledge_base
directory.
With the prerequisites in place, deploy the solution as follows:
From the root of the cloned GitHub repository, navigate to
4_kb_logging
.Copy
terraform.tfvars.example
asterraform.tfvars
and update the variables to match your configuration.
* All log delivery destinations are enabled is enabled in `terraform.tfvars.example`. However, only delivery to CloudWatch Logs is enabled by default in the variable definition.
Configure your AWS credentials.
Run
terraform init
andterraform apply -var-file terraform.tfvars
.
Once the configuration is applied, you can open the target knowledge base in the Amazon Bedrock Console and click Edit in the Knowledge Base overview section to review the logging configuration:
Assuming all three log destinations are enabled, it should look something like:
⚠ While working on this blog post, I encountered an issue where the log deliveries section does not load and shows a spinner indefinitely if multi-session support is enabled in the AWS Management Console. Disabling the feature will work around the problem. I have opened an AWS support case for this issue, which I hope will be fixed soon.
For good measure, we can perform a task with the knowledge base that generates application logs and see if logs are being delivered. At the time of writing, Bedrock Knowledge Bases only generate logs from ingestion job events. As such, we can trigger a sync of a data source in the knowledge base. The log group should have logs similar to the following:
Next, the S3 bucket should have logs similar to the following:
Lastly, the destination of the Firehose delivery stream, which in our case is another S3 bucket, should have logs similar to the following:
If you don’t need the resources after testing, be sure to delete them to avoid unexpected costs.
Summary
In this blog post, we examined how logging works for Amazon Bedrock Knowledge Bases, which uses the log delivery feature in CloudWatch Logs. We created and tested Terraform configuration that demonstrates knowledge base log delivery to all three supported destinations - CloudWatch Logs, S3, and Data Firehose. You can also repurpose the Terraform configuration for other AWS services that use the log delivery mechanism with minimal changes, should you have a need.
At this point, we have the know-how to write ingestion logs to CloudWatch Logs, so we can update the data ingestion solution I previously wrote about to improve how ingestion job notifications are triggered. Please stay tuned for my next blog post on this topic. Thanks for reading, as always, and be sure to check out the Avangards Blog for more AWS and Terraform content.
Top comments (0)