Introduction
It's 2 a.m. Your e-commerce platform just hit the front page of a major news site. Your SQS order queue skyrockets from 200 to 200,000 messages in minutes. Pods are overwhelmed. Customers see errors. Your on-call engineer is manually scaling deployments.
This is exactly the failure mode that event-driven scaling prevents. By combining KEDA (Kubernetes Event-Driven Autoscaler) and Karpenter, you can build a platform that reacts to demand automatically - scaling pods and nodes in seconds, then returning to zero when load disappears - all without human intervention.
Why Kubernetes HPA Falls Short for Event-Driven Workloads
Kubernetes' built-in Horizontal Pod Autoscaler (HPA) is often configured with CPU and memory metrics. While HPA does technically support custom and external metrics through the Kubernetes metrics API, wiring it up to event sources like SQS queue depth requires additional metrics adapters and careful configuration. Even then, a deeper problem remains: HPA reacts to metrics after the fact. For event-driven workloads, this creates a dangerous lag:
- An SQS queue floods with messages; pods start struggling
- CPU climbs - HPA detects the spike after a 15s scrape and sync cycle
- New pods are scheduled, but nodes are full - they sit Pending
- Cluster Autoscaler requests EC2 nodes - taking 3–5 minutes to arrive
- By then, the queue has grown by tens of thousands of messages
KEDA: Scale Pods on Real Events
KEDA is a CNCF Graduated project that extends Kubernetes to scale workloads based on external event sources - SQS, Kafka, Prometheus, DynamoDB Streams, and 70+ built-in scalers. It installs as a lightweight operator and works alongside your existing setup, connecting workloads directly to event sources without requiring custom metrics adapters.
KEDA introduces two core resources:
- ScaledObject - scales long-running Deployments/StatefulSets (APIs, background workers)
- ScaledJob - spawns individual Kubernetes Jobs per event (ETL, ML inference, video transcoding)
Installing KEDA via Helm
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda \
--namespace keda --create-namespace
Configuring AWS Authentication via TriggerAuthentication
The recommended approach for AWS authentication is a TriggerAuthentication resource using EKS Pod Identity or IRSA. The older identityOwner field on the scaler itself was deprecated in KEDA v2.13 and will be removed in v3 - avoid teaching or using it in new deployments.
First, create a TriggerAuthentication that references your KEDA operator's IAM role via pod identity:
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: keda-aws-auth
namespace: order-processing
spec:
podIdentity:
provider: aws # Uses EKS Pod Identity (recommended)
# provider: aws-eks # Use this if still on IRSA
ScaledObject Example: SQS Queue Depth
This scales an order-processing Deployment to maintain 10 messages per pod. With 500 messages, KEDA targets 50 pods - capped at maxReplicaCount.
Production note on in-flight messages: By default, KEDA's SQS scaler counts both ApproximateNumberOfMessages (queued) and ApproximateNumberOfMessagesNotVisible (in-flight / being processed). This means pods processing messages are included in the scaling calculation, which is usually the right behaviour. However, if your workers have long processing times or you see unexpected scale-down events mid-processing, tune scaleOnInFlight, your SQS visibility timeout, and your worker shutdown handling carefully - and ensure a Dead Letter Queue is configured to catch messages that fail repeatedly.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: order-processor-scaledobject
spec:
scaleTargetRef:
name: order-processor
pollingInterval: 15 # Check queue every 15s
cooldownPeriod: 60 # Wait 60s before scaling down
minReplicaCount: 0 # Scale to zero when idle
maxReplicaCount: 100
triggers:
- type: aws-sqs-queue
authenticationRef:
name: keda-aws-auth # References TriggerAuthentication above
metadata:
queueURL: https://sqs.us-east-1.amazonaws.com/123456/orders
queueLength: '10' # Target messages-per-pod ratio
awsRegion: us-east-1
scaleOnInFlight: 'true' # Default: true. Set false to exclude in-flight messages
Karpenter: Right-Sized Nodes, Right Now
When KEDA scales your pods, those pods need nodes to land on. Karpenter watches for Pending pods, then automatically provisions the optimal EC2 instance type to satisfy them - typically in under 60 seconds. It also continuously bin-packs workloads and terminates underutilized nodes.
Karpenter vs. Cluster Autoscaler
| Feature | Cluster Autoscaler | Karpenter |
|---|---|---|
| Provisioning Speed | 3–5+ minutes | Typically 30–60 seconds |
| Instance Selection | Pre-configured ASG groups | Dynamic - picks optimal type per workload |
| Spot Support | Manual node group setup | Native, single NodePool |
| Node Consolidation | Limited | Automatic bin-packing |
NodePool Configuration
The NodePool resource defines what Karpenter is allowed to provision. The example below targets the stable karpenter.sh/v1 API (available from Karpenter v1.0+) and configures a mixed Spot/On-Demand pool for batch workloads. Note that in v1, the consolidation policy is named WhenEmptyOrUnderutilized (renamed from WhenUnderutilized in v1beta1), and consolidateAfter is now supported alongside it and is required.
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: batch-workers
spec:
template:
spec:
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: batch-workers
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ['spot', 'on-demand'] # Spot-first, On-Demand fallback
- key: node.kubernetes.io/instance-category
operator: In
values: ['c', 'm', 'r'] # General, compute, memory families
- key: kubernetes.io/arch
operator: In
values: ['amd64', 'arm64'] # Support Graviton for savings
limits:
cpu: 1000 # Safety cap on total cluster cost
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized # v1 name (was WhenUnderutilized in v1beta1)
consolidateAfter: 30s # Required in v1; set to 0s for immediate
End-to-End Architecture Flow
When an SQS burst hits, the full scale-up sequence - from event arrival to active pod processing - completes in roughly one to two minutes in a well-tuned cluster. Actual time depends on image pull speed, node bootstrap, daemonset startup, and workload readiness. Here is the sequence:
| 01 | Amazon SQS queue depth spikes (e.g., 200,000 messages) |
| 02 | KEDA polls queue every 15s, calculates required pod count, updates Deployment replica target |
| 03 | New pods are created - many land in Pending state (no capacity yet) |
| 04 | Karpenter detects Pending pods, selects optimal EC2 instance types, launches Spot nodes - typically in under 60s |
| 05 | Nodes join the cluster; pods are scheduled and begin processing messages |
| 06 | Queue drains → KEDA scales pods to 0 → Karpenter terminates idle nodes → worker compute cost drops to zero |
Production Best Practices
KEDA
- Always set
maxReplicaCountto guard against runaway scaling from a misconfigured scaler - Use
cooldownPeriod: 60–120sto prevent scale-down thrashing near zero - Authenticate via
TriggerAuthenticationwithpodIdentity.provider: aws- theidentityOwnerfield on the scaler is deprecated since v2.13 and will be removed in KEDA v3 - Set
scaleDown.stabilizationWindowSecondsto smooth out spiky workloads - For SQS workers, configure visibility timeout,
scaleOnInFlight, and graceful shutdown carefully - and always attach a Dead Letter Queue to catch failed messages - Test scale-to-zero in staging - some apps have cold-start latency that affects first-message SLA
Karpenter
- Use the stable
karpenter.sh/v1API - v1beta1 is supported but planned for deprecation - Use
consolidationPolicy: WhenEmptyOrUnderutilized(the v1 name;WhenUnderutilizedfrom v1beta1 is renamed) - Specify multiple instance families (
c,m,r) so Karpenter can find available Spot capacity - Set
consolidateAfterexplicitly - it is required in v1 when usingWhenEmptyOrUnderutilized; use0sfor the same behaviour as v1beta1 - Include
arm64(Graviton) in your NodePool - AWS Graviton instances cost up to 20% less per hour than comparable x86 instances, with equal or better performance for most cloud-native workloads - Set
cpuandmemorylimits on the NodePool as a hard cost cap - Tag all EC2NodeClass nodes with
environment,team, andcost-centerfor AWS Cost Explorer analysis
Observing the Stack in Production
With two autoscalers operating in tandem, visibility across KEDA, Karpenter, SQS, and EC2 simultaneously is what separates a smooth on-call experience from a painful one. When something goes wrong - pods not scaling, nodes not terminating, queue backing up - you need correlated signals from all layers at once.
- Expose KEDA's
/metricsendpoint to Prometheus - scaler values, replica counts, and error rates are all there - Use CloudWatch Container Insights for correlated node + pod metrics
- Alert on SQS
ApproximateAgeOfOldestMessageto catch backlogs before they compound - Dashboard pod count (KEDA) and node count (Karpenter) together - a node spike without pods often means a misconfigured NodePool
- Monitor SQS
NumberOfMessagesMovedon your Dead Letter Queue - a rising DLQ count signals worker failures that scaling alone cannot fix
Conclusion
KEDA and Karpenter together eliminate the manual scaling work that falls on your on-call engineer at the worst possible moment - scaling pods from real event signals, provisioning the right nodes in seconds, and returning to zero when load clears. Getting the details right (authentication, API versions, SQS in-flight behaviour, consolidation policy) is what makes this stack hold up under pressure in production.
If you have any questions or need help implementing this on your platform, you can reach out to our DevOps & Cloud Engineering team here.

No comments:
Post a Comment