Sidra Saleem for SUDO Consultants

Posted on Feb 28 • Originally published at sudoconsultants.com

Redshift Serverless: Cost Optimization with Workload Management (WLM)

#costoptimization #redshift

Amazon Redshift Serverless is a fully managed, petabyte-scale data warehouse service that allows you to run analytics without the need to manage infrastructure. With Redshift Serverless, you can focus on your data and analytics while AWS handles the underlying infrastructure. One of the key features of Redshift Serverless is its ability to optimize costs through Workload Management (WLM). In this article, we will explore how to tune query queues and automate scaling for BI dashboards using Redshift Serverless.

Understanding Redshift Serverless and Workload Management (WLM)

What is Redshift Serverless?

Redshift Serverless is a deployment option for Amazon Redshift that automatically provisions and scales compute resources based on the workload. It eliminates the need to manage clusters, allowing you to pay only for the queries you run. This makes it an ideal choice for organizations that have variable workloads or want to reduce the operational overhead of managing a data warehouse.

What is Workload Management (WLM)?

Workload Management (WLM) in Amazon Redshift is a feature that allows you to manage and prioritize queries based on their importance and resource requirements. WLM enables you to create query queues, assign queries to these queues, and set concurrency limits for each queue. This ensures that critical queries get the necessary resources while less important queries do not monopolize the system.

Tuning Query Queues for Cost Optimization

Step 1: Analyze Your Workload

Before you can tune your query queues, you need to understand your workload. This involves identifying the types of queries that are being run, their frequency, and their resource requirements. You can use the following SQL query to analyze your workload:

SELECT 
    query, 
    userid, 
    starttime, 
    endtime, 
    elapsed, 
    cpu_time, 
    rows, 
    label, 
    queue_time, 
    execution_time 
FROM 
    stl_query 
ORDER BY 
    starttime DESC;

This query retrieves information about the queries that have been executed, including their start and end times, CPU time, and the queue they were assigned to.

Step 2: Create Query Queues

Once you have analyzed your workload, you can create query queues based on the types of queries you want to prioritize. For example, you might want to create separate queues for ETL jobs, BI dashboards, and ad-hoc queries.

To create a query queue, you can use the following SQL command:

CREATE QUEUE etl_queue WITH (QUERY_GROUP = 'etl');
CREATE QUEUE bi_queue WITH (QUERY_GROUP = 'bi');
CREATE QUEUE adhoc_queue WITH (QUERY_GROUP = 'adhoc');

In this example, we have created three queues: etl_queue, bi_queue, and adhoc_queue. Each queue is associated with a query group, which allows you to assign queries to the appropriate queue.

Step 3: Assign Queries to Queues

After creating the queues, you need to assign queries to the appropriate queue. You can do this by setting the query_group parameter for each query. For example:

SET query_group TO 'etl';
-- Your ETL query here

SET query_group TO 'bi';
-- Your BI dashboard query here

SET query_group TO 'adhoc';
-- Your ad-hoc query here

By setting the query_group parameter, you ensure that the query is assigned to the correct queue.

Step 4: Set Concurrency Limits

To prevent one queue from monopolizing resources, you can set concurrency limits for each queue. This ensures that each queue gets a fair share of the available resources. You can set concurrency limits using the following SQL command:

ALTER QUEUE etl_queue SET CONCURRENCY 5;
ALTER QUEUE bi_queue SET CONCURRENCY 10;
ALTER QUEUE adhoc_queue SET CONCURRENCY 2;

In this example, we have set the concurrency limit for the etl_queue to 5, the bi_queue to 10, and the adhoc_queue to 2. This means that no more than 5 ETL queries, 10 BI dashboard queries, and 2 ad-hoc queries can run concurrently.

Step 5: Monitor and Adjust

After setting up your query queues and concurrency limits, you should monitor the performance of your queries and adjust the settings as needed. You can use the following SQL query to monitor the performance of your queues:

SELECT 
    queue, 
    COUNT(*) AS num_queries, 
    AVG(execution_time) AS avg_execution_time, 
    MAX(execution_time) AS max_execution_time 
FROM 
    stl_wlm_query 
GROUP BY 
    queue 
ORDER BY 
    queue;

This query retrieves information about the number of queries executed in each queue, as well as the average and maximum execution times. Based on this information, you can adjust the concurrency limits or create additional queues if necessary.

Automating Scaling for BI Dashboards

Step 1: Enable Auto Scaling

Redshift Serverless automatically scales compute resources based on the workload. However, you can further optimize cost and performance by enabling auto-scaling for specific workloads, such as BI dashboards. To enable auto-scaling, you need to create a scaling policy.

Step 2: Create a Scaling Policy

You can create a scaling policy using the AWS Management Console or the AWS CLI. To create a scaling policy using the AWS CLI, use the following command:

aws application-autoscaling put-scaling-policy \
    --policy-name bi-dashboard-scaling-policy \
    --service-namespace redshift \
    --resource-id namespace/your-namespace-name \
    --scalable-dimension redshift:serverless:NamespaceRPU \
    --policy-type TargetTrackingScaling \
    --target-tracking-scaling-policy-configuration file://scaling-policy-config.json

In this example, we are creating a scaling policy named bi-dashboard-scaling-policy for the Redshift Serverless namespace. The scaling-policy-config.json file contains the configuration for the scaling policy, which might look like this:

{
    "TargetValue": 70.0,
    "PredefinedMetricSpecification": {
        "PredefinedMetricType": "RedshiftServerlessAverageRPUUtilization"
    },
    "ScaleOutCooldown": 300,
    "ScaleInCooldown": 300
}

This configuration sets a target value of 70% for the average RPU (Redshift Processing Unit) utilization. The scaling policy will scale out if the average RPU utilization exceeds 70% and scale in if it falls below 70%.

Step 3: Monitor Scaling Activity

After creating the scaling policy, you should monitor the scaling activity to ensure that it is working as expected. You can use the following AWS CLI command to view the scaling activities:

aws application-autoscaling describe-scaling-activities \
    --service-namespace redshift \
    --resource-id namespace/your-namespace-name \
    --scalable-dimension redshift:serverless:NamespaceRPU

This command retrieves information about the scaling activities for the specified Redshift Serverless namespace.

Step 4: Adjust Scaling Policy as Needed

Based on the scaling activity, you may need to adjust the scaling policy. For example, if you find that the scaling policy is too aggressive, you can increase the cooldown periods or adjust the target value. You can update the scaling policy using the following AWS CLI command:

aws application-autoscaling put-scaling-policy \
    --policy-name bi-dashboard-scaling-policy \
    --service-namespace redshift \
    --resource-id namespace/your-namespace-name \
    --scalable-dimension redshift:serverless:NamespaceRPU \
    --policy-type TargetTrackingScaling \
    --target-tracking-scaling-policy-configuration file://updated-scaling-policy-config.json

In this example, we are updating the scaling policy with a new configuration file (updated-scaling-policy-config.json).

Real-Life Implementation and Case Studies

Case Study 1: Optimizing BI Dashboard Performance for a Retail Company

A retail company was using Amazon Redshift Serverless to power their BI dashboards. However, they were experiencing performance issues during peak hours, as the dashboards were competing for resources with other workloads. By implementing WLM and creating separate queues for BI dashboards, ETL jobs, and ad-hoc queries, they were able to prioritize the BI dashboard queries and ensure that they received the necessary resources. Additionally, they enabled auto-scaling for the BI dashboard queue, which allowed Redshift Serverless to automatically scale up during peak hours and scale down during off-peak hours. As a result, the company saw a significant improvement in dashboard performance and a reduction in costs.

Case Study 2: Automating Scaling for a SaaS Company

A SaaS company was using Amazon Redshift Serverless to analyze customer usage data. They had a variable workload, with spikes in query activity during certain times of the day. To optimize costs, they implemented a scaling policy that automatically adjusted the number of RPUs based on the workload. By setting a target value of 70% for the average RPU utilization, they were able to ensure that the system scaled up during peak hours and scaled down during off-peak hours. This resulted in a 30% reduction in costs while maintaining optimal performance.

Conclusion

Amazon Redshift Serverless offers a powerful and cost-effective solution for running analytics workloads. By leveraging Workload Management (WLM) and auto-scaling, you can optimize costs and ensure that your queries receive the necessary resources. In this article, we explored how to tune query queues, set concurrency limits, and automate scaling for BI dashboards. We also looked at real-life case studies that demonstrate the benefits of these techniques. By following the steps outlined in this article, you can achieve optimal performance and cost efficiency for your Redshift Serverless workloads.

DEV Community