Ibrahim Cisse

Posted on Feb 5

How KEDA HTTP Add-on for Autoscaling HTTP request on Kubernetes works

#kubernetes #scaling #http

When it comes to autoscaling in Kubernetes, there are several methods available, but one that stands out for handling HTTP-based workloads is the KEDA HTTP Add-on. Unlike other event sources, such as queues or message brokers, HTTP-based autoscaling comes with unique challenges. In this article, we’ll dive deep into the KEDA HTTP Add-on, its architecture, and how it differs from traditional autoscaling methods.

What Makes KEDA HTTP Add-on Different?

The KEDA HTTP Add-on is not as conventional as other event sources in Kubernetes. Here’s why:

Unpredictable Traffic: Unlike Kafka or other event-based sources, where we can easily monitor the length of a queue or message rate, HTTP requests are unpredictable. We cannot simply call an API to determine how much traffic will be coming. In other words, the scaling criteria aren’t always clear in advance.

Synchronous Nature: HTTP traffic is synchronous by nature — this means requests need to be handled in real-time. As a result, scaling to zero (where no pods are running) is more complex. To address this, an intermediate routing layer is required to temporarily hold incoming HTTP requests until new pods are scaled up and ready to serve those requests.

The core Kubernetes autoscalers, such as the Horizontal Pod Autoscaler (HPA), rely on easily trackable metrics like CPU, memory, or custom metrics. However, for HTTP-based autoscaling, the KEDA HTTP Add-on introduces a different approach to meet the needs of scale-to-zero HTTP applications.

Architecture of KEDA HTTP-Add-on.

The Key Components of KEDA HTTP Add-on Architecture

The KEDA HTTP Add-on architecture involves several components that work together to ensure your HTTP service can scale efficiently. Here’s a breakdown of the key components:

Interceptor:

The Interceptor is the first line of defense when handling incoming HTTP requests. It accepts the requests and places them into a pending request queue while checking if the backend service is scaled up to handle the load.

Scaling to Zero

If the service is scaled down to zero replicas, the Interceptor will hold the incoming HTTP requests until new instances of the backend service are ready.

Once the backend service scales up, the Interceptor forwards the requests to the appropriate service.
External Scaler:

The External Scaler is a component that constantly pings the Interceptor to retrieve metrics about the number of pending HTTP requests.
This data is then sent to KEDA, which processes the information and triggers the autoscaling actions. Essentially, the External Scaler acts as the push mechanism that informs KEDA about the scaling requirements based on the HTTP traffic.

Operator:

The HTTP Operator is responsible for managing HTTPScaledObject CRDs (Custom Resource Definitions). It listens for the creation of these CRDs and takes action by configuring the necessary resources (such as the Interceptor and External Scaler) to allow autoscaling based on HTTP request traffic.

The Operator makes the whole autoscaling process easier for the user by automating the setup and configuration of the necessary components.
The Flow of HTTP Requests in KEDA HTTP Add-on

Here’s a simplified flow of how the components interact with each other:

Load Balancer:

The Load Balancer makes the application available to the outside world by routing incoming HTTP requests to the appropriate service.
Request Handling:

The Load Balancer sends the incoming requests to the Kubernetes service that hosts the HTTP service. From here, the request is passed through the Interceptor.

Interceptor:

The Interceptor temporarily holds the requests in case there are no backend pods running to handle them. It also sends metrics about pending HTTP requests to the External Scaler.

Scaling Decision:

The External Scaler pings the Interceptor for pending HTTP queue metrics, sends the data to KEDA, which then evaluates whether to scale the backend service up or down.

Scaling Action:

If the traffic is high, KEDA triggers the creation of new pods for the service to handle the load, and once scaled up, the Interceptor forwards the requests to the new pods.

If the traffic reduces, KEDA scales the service down accordingly, even scaling it to zero if necessary.

Key Benefits of the KEDA HTTP Add-on

Scale-to-Zero Support: One of the biggest benefits of using the KEDA HTTP Add-on is the ability to scale applications to zero. This means that during periods of low or no traffic, you can save resources by having no running pods while still being able to handle new traffic as soon as it arrives.

Autoscaling Based on HTTP Requests: With the KEDA HTTP Add-on, scaling decisions are based on HTTP request traffic, ensuring that your backend service is dynamically scaled to meet real-time demand without any manual intervention.

Efficient Traffic Handling: The Interceptor ensures that no HTTP requests are lost during the scaling process, providing a smooth experience for users without request drops.

Challenges and Limitations

While the KEDA HTTP Add-on is a powerful tool, there are a few challenges you might encounter:

Complex Setup: Setting up the entire KEDA HTTP Add-on system can be more complex than standard Kubernetes autoscaling. It requires configuring multiple components like the Interceptor, External Scaler, and HTTPScaledObject, which might be tricky, especially for beginners.

Scalability at Scale: While the system can scale effectively, you may encounter delays when handling very high traffic spikes, especially when scaling from zero. Proper tuning of scaling parameters and careful management of the request queue is crucial.

Compatibility: The KEDA HTTP Add-on works best when combined with KEDA’s existing autoscaling functionality, but when used alongside other scaling mechanisms (e.g., HPA), it may require extra configuration to avoid conflicts.

Conclusion

The KEDA HTTP Add-on provides a smart solution for scaling HTTP-based applications in Kubernetes, addressing challenges like unpredictable traffic and scale-to-zero requirements. By introducing components like the Interceptor, External Scaler, and Operator, KEDA makes it easier to autoscale services dynamically based on HTTP traffic, ensuring that your application is always responsive to user demand.

By leveraging KEDA’s unique autoscaling capabilities, you can efficiently manage your Kubernetes workloads, reduce resource wastage during idle periods, and scale your services seamlessly without manual intervention.

DEV Community

How KEDA HTTP Add-on for Autoscaling HTTP request on Kubernetes works

Top comments (0)

Read next

A Production Ready EKS Deployment with IaC & GitOps - Part 4 - Deploying VPC

Integrando Monitoramento de Kubernetes com Pipelines CI/CD: Como e Por Que Fazer?

Migrating Kubernetes volume contents to another

Creating An AI Agent For Kubernetes Performance Optimization