Introduction

Kubernetes scaling is an art that requires precision, especially when working with event-driven workloads. I recently fine-tuned an AWS EKS workload using KEDA (Kubernetes Event-Driven Autoscaling). KEDA provides two primary scaling mechanisms: ScaledObject and ScaleJob.
ScaledObject dynamically adjusts pod replicas for Deployments or StatefulSets based on metrics like SQS queue length.
ScaleJob is ideal for short-lived tasks, spinning up Jobs to process a single message before terminating.

What situation we encountered?

Initially, we faced challenges in achieving precise autoscaling for an event-driven SQS message processing workload on AWS EKS. The key requirements were:
- A minimum of one pod should always be running.
- If there were fewer than four messages, they should be processed by a single pod.
- Scaling should start when messages exceeded five.
We explored KEDA’s ScaleJob, but it terminated pods after processing individual messages. Next, we tried KEDA’s built-in SQS trigger, which lacked precise control, specifically:
It scaled down too soon, ignoring in-flight messages.
It didn’t allow fine-tuned scaling logic based on both visible and in-flight messages.
For my use case—an SQS message processing application—ScaledObject was the right choice, keeping pods alive instead of terminating them.

To overcome these limitations, we implemented a custom scaler using Flask and Prometheus, allowing us to:
Continuously monitor both visible and in-flight SQS messages.
Maintain pod count until all messages were processed.
Dynamically scale based on real-time queue depth.
Prevent premature scale-downs using a stateful tracking mechanism.

Target Scaling Behavior
- Scale-up Logic:
  - 1 pod for 0–5 messages
  - 2 pods for 6 messages
  - 3 pods for 7 messages
  - 10 pods for 14+ messages
- Scale-down Logic:
  - Maintain current pod count until both visible + in-flight messages hit zero.
  - After 120 seconds of inactivity, scale down to 1 pod.
  - To reduce costs, these pods were deployed on AWS Spot Instances, ensuring resilience while taking advantage of lower pricing.

KEDA provides an aws-sqs-queue trigger to auto-scale based on queue length:
queueLength: Defines how many messages in the queue correspond to one pod.
- Formula: Desired Pods = Total Messages / queueLength
targetValue: The threshold metric value at which scaling occurs (e.g., desired messages per pod).
- Formula: Desired Pods = Total Messages / targetValue
activationQueueLength: Minimum number of messages required in the queue before scaling starts.
- Formula: Scaling Triggers If Total Messages ≥ activationQueueLength

Aspect	ScaledObject	ScaledJob
Workload	Long-running (e.g., Deployments)	Short-lived (e.g., Jobs)
Scaling	Adjusts pod replicas (0 to N)	Creates new Job instances
Use Case	Continuous services (e.g., web apps)	One-off tasks (e.g., batch processing)
Execution	Pods stay active, process multiple events	Jobs run once per event, then terminate
Concurrency	Multiple pods run in parallel	Multiple Jobs run independently

To overcome this, I built a Flask-based custom scaler, exposing Prometheus metrics via /api/v1/query. The logic ensures pods remain active until both visible and in-flight messages reach zero.
Python Code:
last_replicas = 1 def calculate_replicas(visible, not_visible): global last_replicas total_messages = visible + not_visible app.logger.info(f"Visible: {visible}, NotVisible: {not_visible}, Total: {total_messages}") if total_messages == 0: app.logger.info("Queue empty, returning: 1") last_replicas = 1 return 1 else: if total_messages <= 5: desired_replicas = 1 app.logger.info(f"Total <= 5, desired replicas: {desired_replicas}") else: desired_replicas = min(total_messages - 4, 10) app.logger.info(f"Total > 5, calculated desired replicas: {desired_replicas}") if desired_replicas > last_replicas: last_replicas = desired_replicas app.logger.info(f"Scaling up, updating last_replicas to: {last_replicas}") else: app.logger.info(f"Maintaining last_replicas at: {last_replicas} (no scale-down until queue is empty)") return last_replicas
Key Features:
- Maintains pod count until the queue is fully processed.
- Prevents premature scale-down using last_replicas.
- Scales dynamically based on visible + in-flight messages.

To use this custom scaler in KEDA, we configured a ScaledObject pointing to the Flask service:
KEDA Configuration:
apiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: custom-sqs-scaler spec: scaleTargetRef: kind: Deployment name: app pollingInterval: 10 # Every 10 seconds cooldownPeriod: 120 # Wait 120s before scaling down minReplicaCount: 1 maxReplicaCount: 10 triggers: - type: prometheus metadata: serverAddress: "http://custom-sqs-scaler.non-prod.svc.cluster.local:8080" metricName: queue_messages query: "queue_messages" threshold: "1"
Why This Works?
- The Prometheus trigger queries the Flask app for real-time SQS message count.
- The custom logic ensures pods remain active until all messages are processed.
- Scaling down is prevented until the queue is empty.

Here’s proof that the scaler correctly handled a 12-message workload:
Log Output:
INFO:scaler:Visible: 2, In-flight: 10, Total: 12 INFO:scaler:Total > 5, calculated desired replicas: 8 INFO:scaler:Maintaining last_replicas at: 10 (no scale-down until queue is empty)
Key Takeaways:
- Maintained optimal pod count dynamically.
- Ensured cost-effective scaling with spot instances.
- Eliminated premature scale-down issues.

By using a Prometheus-based custom scaler, we achieved:
Full control over scaling behavior: Pods scale up accurately based on total messages (visible + in-flight).
No premature scale-down: The last known pod count is maintained until the queue is fully processed.
Cost-effective scaling: By deploying on AWS Spot Instances, we optimized costs while ensuring workload resilience.
Seamless workload management: The system efficiently handled varying message loads without delays or bottlenecks.

KEDA’s built-in triggers provide a great starting point but can fall short in handling complex scaling scenarios. By implementing a custom scaler, we achieved:
Precision scaling based on visible + in-flight messages.
No premature scale-down, ensuring messages don’t pile up.
Cost-optimized scaling with AWS Spot Instances.
For workloads requiring precise autoscaling, implementing a Prometheus-based custom scaler significantly enhances efficiency and cost optimization.