Introduction
- Kubernetes scaling is an art that requires precision, especially when working with event-driven workloads. I recently fine-tuned an AWS EKS workload using KEDA (Kubernetes Event-Driven Autoscaling). KEDA provides two primary scaling mechanisms: ScaledObject and ScaleJob.
- ScaledObject dynamically adjusts pod replicas for Deployments or StatefulSets based on metrics like SQS queue length.
- ScaleJob is ideal for short-lived tasks, spinning up Jobs to process a single message before terminating.
What situation we encountered?
- Initially, we faced challenges in achieving precise autoscaling for an event-driven SQS message processing workload on AWS EKS. The key requirements were:
- A minimum of one pod should always be running.
- If there were fewer than four messages, they should be processed by a single pod.
- Scaling should start when messages exceeded five.
- We explored KEDA’s ScaleJob, but it terminated pods after processing individual messages. Next, we tried KEDA’s built-in SQS trigger, which lacked precise control, specifically:
- It scaled down too soon, ignoring in-flight messages.
- It didn’t allow fine-tuned scaling logic based on both visible and in-flight messages.
- For my use case—an SQS message processing application—ScaledObject was the right choice, keeping pods alive instead of terminating them.
Why we took this approach?
- To overcome these limitations, we implemented a custom scaler using Flask and Prometheus, allowing us to:
- Continuously monitor both visible and in-flight SQS messages.
- Maintain pod count until all messages were processed.
- Dynamically scale based on real-time queue depth.
- Prevent premature scale-downs using a stateful tracking mechanism.
Scaling Requirement: Precise Autoscaling with SQL Messages
- Target Scaling Behavior
- Scale-up Logic:
- 1 pod for 0–5 messages
- 2 pods for 6 messages
- 3 pods for 7 messages
- 10 pods for 14+ messages
- Scale-down Logic:
- Maintain current pod count until both visible + in-flight messages hit zero.
- After 120 seconds of inactivity, scale down to 1 pod.
- To reduce costs, these pods were deployed on AWS Spot Instances, ensuring resilience while taking advantage of lower pricing.
- Scale-up Logic:
Initial Approach: Using KEDA's Built-in SQS Trigger
- KEDA provides an aws-sqs-queue trigger to auto-scale based on queue length:
- queueLength: Defines how many messages in the queue correspond to one pod.
- Formula: Desired Pods = Total Messages / queueLength
- targetValue: The threshold metric value at which scaling occurs (e.g., desired messages per pod).
- Formula: Desired Pods = Total Messages / targetValue
- activationQueueLength: Minimum number of messages required in the queue before scaling starts.
- Formula: Scaling Triggers If Total Messages ≥ activationQueueLength
Aspect | ScaledObject | ScaledJob |
---|---|---|
Workload | Long-running (e.g., Deployments) | Short-lived (e.g., Jobs) |
Scaling | Adjusts pod replicas (0 to N) | Creates new Job instances |
Use Case | Continuous services (e.g., web apps) | One-off tasks (e.g., batch processing) |
Execution | Pods stay active, process multiple events | Jobs run once per event, then terminate |
Concurrency | Multiple pods run in parallel | Multiple Jobs run independently |
Custom Scaler Solution: Achieving Precision
- To overcome this, I built a Flask-based custom scaler, exposing Prometheus metrics via /api/v1/query. The logic ensures pods remain active until both visible and in-flight messages reach zero.
- Python Code:
last_replicas = 1 def calculate_replicas(visible, not_visible): global last_replicas total_messages = visible + not_visible app.logger.info(f"Visible: {visible}, NotVisible: {not_visible}, Total: {total_messages}") if total_messages == 0: app.logger.info("Queue empty, returning: 1") last_replicas = 1 return 1 else: if total_messages <= 5: desired_replicas = 1 app.logger.info(f"Total <= 5, desired replicas: {desired_replicas}") else: desired_replicas = min(total_messages - 4, 10) app.logger.info(f"Total > 5, calculated desired replicas: {desired_replicas}") if desired_replicas > last_replicas: last_replicas = desired_replicas app.logger.info(f"Scaling up, updating last_replicas to: {last_replicas}") else: app.logger.info(f"Maintaining last_replicas at: {last_replicas} (no scale-down until queue is empty)") return last_replicas
- Key Features:
- Maintains pod count until the queue is fully processed.
- Prevents premature scale-down using last_replicas.
- Scales dynamically based on visible + in-flight messages.
Deployment: Integrating with KEDA ScaledObject
- To use this custom scaler in KEDA, we configured a ScaledObject pointing to the Flask service:
- KEDA Configuration:
apiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: custom-sqs-scaler spec: scaleTargetRef: kind: Deployment name: app pollingInterval: 10 # Every 10 seconds cooldownPeriod: 120 # Wait 120s before scaling down minReplicaCount: 1 maxReplicaCount: 10 triggers: - type: prometheus metadata: serverAddress: "http://custom-sqs-scaler.non-prod.svc.cluster.local:8080" metricName: queue_messages query: "queue_messages" threshold: "1"
- Why This Works?
- The Prometheus trigger queries the Flask app for real-time SQS message count.
- The custom logic ensures pods remain active until all messages are processed.
- Scaling down is prevented until the queue is empty.
Results: Efficient, COst-effective Scaling
- Here’s proof that the scaler correctly handled a 12-message workload:
- Log Output:
INFO:scaler:Visible: 2, In-flight: 10, Total: 12 INFO:scaler:Total > 5, calculated desired replicas: 8 INFO:scaler:Maintaining last_replicas at: 10 (no scale-down until queue is empty)
- Key Takeaways:
- Maintained optimal pod count dynamically.
- Ensured cost-effective scaling with spot instances.
- Eliminated premature scale-down issues.
How this has helped?
- By using a Prometheus-based custom scaler, we achieved:
- Full control over scaling behavior: Pods scale up accurately based on total messages (visible + in-flight).
- No premature scale-down: The last known pod count is maintained until the queue is fully processed.
- Cost-effective scaling: By deploying on AWS Spot Instances, we optimized costs while ensuring workload resilience.
- Seamless workload management: The system efficiently handled varying message loads without delays or bottlenecks.
Final Thoughts
- KEDA’s built-in triggers provide a great starting point but can fall short in handling complex scaling scenarios. By implementing a custom scaler, we achieved:
- Precision scaling based on visible + in-flight messages.
- No premature scale-down, ensuring messages don’t pile up.
- Cost-optimized scaling with AWS Spot Instances.
- For workloads requiring precise autoscaling, implementing a Prometheus-based custom scaler significantly enhances efficiency and cost optimization.
No comments:
Post a Comment