May 7, 2026

Event-Driven Autoscaling on EKS with KEDA and Karpenter

Introduction

It's 2 a.m. Your e-commerce platform just hit the front page of a major news site. Your SQS order queue skyrockets from 200 to 200,000 messages in minutes. Pods are overwhelmed. Customers see errors. Your on-call engineer is manually scaling deployments.

This is exactly the failure mode that event-driven scaling prevents. By combining KEDA (Kubernetes Event-Driven Autoscaler) and Karpenter, you can build a platform that reacts to demand automatically - scaling pods and nodes in seconds, then returning to zero when load disappears - all without human intervention.

Why Kubernetes HPA Falls Short for Event-Driven Workloads

Kubernetes' built-in Horizontal Pod Autoscaler (HPA) is often configured with CPU and memory metrics. While HPA does technically support custom and external metrics through the Kubernetes metrics API, wiring it up to event sources like SQS queue depth requires additional metrics adapters and careful configuration. Even then, a deeper problem remains: HPA reacts to metrics after the fact. For event-driven workloads, this creates a dangerous lag:

  • An SQS queue floods with messages; pods start struggling
  • CPU climbs - HPA detects the spike after a 15s scrape and sync cycle
  • New pods are scheduled, but nodes are full - they sit Pending
  • Cluster Autoscaler requests EC2 nodes - taking 3–5 minutes to arrive
  • By then, the queue has grown by tens of thousands of messages

KEDA: Scale Pods on Real Events

KEDA is a CNCF Graduated project that extends Kubernetes to scale workloads based on external event sources - SQS, Kafka, Prometheus, DynamoDB Streams, and 70+ built-in scalers. It installs as a lightweight operator and works alongside your existing setup, connecting workloads directly to event sources without requiring custom metrics adapters.

KEDA introduces two core resources:

  • ScaledObject - scales long-running Deployments/StatefulSets (APIs, background workers)
  • ScaledJob - spawns individual Kubernetes Jobs per event (ETL, ML inference, video transcoding)

Installing KEDA via Helm

helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda \
  --namespace keda --create-namespace

Configuring AWS Authentication via TriggerAuthentication

The recommended approach for AWS authentication is a TriggerAuthentication resource using EKS Pod Identity or IRSA. The older identityOwner field on the scaler itself was deprecated in KEDA v2.13 and will be removed in v3 - avoid teaching or using it in new deployments.

First, create a TriggerAuthentication that references your KEDA operator's IAM role via pod identity:

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: keda-aws-auth
  namespace: order-processing
spec:
  podIdentity:
    provider: aws               # Uses EKS Pod Identity (recommended)
    # provider: aws-eks         # Use this if still on IRSA

ScaledObject Example: SQS Queue Depth

This scales an order-processing Deployment to maintain 10 messages per pod. With 500 messages, KEDA targets 50 pods - capped at maxReplicaCount.

Production note on in-flight messages: By default, KEDA's SQS scaler counts both ApproximateNumberOfMessages (queued) and ApproximateNumberOfMessagesNotVisible (in-flight / being processed). This means pods processing messages are included in the scaling calculation, which is usually the right behaviour. However, if your workers have long processing times or you see unexpected scale-down events mid-processing, tune scaleOnInFlight, your SQS visibility timeout, and your worker shutdown handling carefully - and ensure a Dead Letter Queue is configured to catch messages that fail repeatedly.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: order-processor-scaledobject
spec:
  scaleTargetRef:
    name: order-processor
  pollingInterval: 15       # Check queue every 15s
  cooldownPeriod: 60        # Wait 60s before scaling down
  minReplicaCount: 0        # Scale to zero when idle
  maxReplicaCount: 100
  triggers:
  - type: aws-sqs-queue
    authenticationRef:
      name: keda-aws-auth   # References TriggerAuthentication above
    metadata:
      queueURL: https://sqs.us-east-1.amazonaws.com/123456/orders
      queueLength: '10'     # Target messages-per-pod ratio
      awsRegion: us-east-1
      scaleOnInFlight: 'true'   # Default: true. Set false to exclude in-flight messages

Karpenter: Right-Sized Nodes, Right Now

When KEDA scales your pods, those pods need nodes to land on. Karpenter watches for Pending pods, then automatically provisions the optimal EC2 instance type to satisfy them - typically in under 60 seconds. It also continuously bin-packs workloads and terminates underutilized nodes.

Karpenter vs. Cluster Autoscaler

Feature Cluster Autoscaler Karpenter
Provisioning Speed 3–5+ minutes Typically 30–60 seconds
Instance Selection Pre-configured ASG groups Dynamic - picks optimal type per workload
Spot Support Manual node group setup Native, single NodePool
Node Consolidation Limited Automatic bin-packing

NodePool Configuration

The NodePool resource defines what Karpenter is allowed to provision. The example below targets the stable karpenter.sh/v1 API (available from Karpenter v1.0+) and configures a mixed Spot/On-Demand pool for batch workloads. Note that in v1, the consolidation policy is named WhenEmptyOrUnderutilized (renamed from WhenUnderutilized in v1beta1), and consolidateAfter is now supported alongside it and is required.

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: batch-workers
spec:
  template:
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: batch-workers
      requirements:
      - key: karpenter.sh/capacity-type
        operator: In
        values: ['spot', 'on-demand']    # Spot-first, On-Demand fallback
      - key: node.kubernetes.io/instance-category
        operator: In
        values: ['c', 'm', 'r']          # General, compute, memory families
      - key: kubernetes.io/arch
        operator: In
        values: ['amd64', 'arm64']       # Support Graviton for savings
  limits:
    cpu: 1000                            # Safety cap on total cluster cost
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized   # v1 name (was WhenUnderutilized in v1beta1)
    consolidateAfter: 30s                            # Required in v1; set to 0s for immediate

End-to-End Architecture Flow

When an SQS burst hits, the full scale-up sequence - from event arrival to active pod processing - completes in roughly one to two minutes in a well-tuned cluster. Actual time depends on image pull speed, node bootstrap, daemonset startup, and workload readiness. Here is the sequence:

01 Amazon SQS queue depth spikes (e.g., 200,000 messages)
02 KEDA polls queue every 15s, calculates required pod count, updates Deployment replica target
03 New pods are created - many land in Pending state (no capacity yet)
04 Karpenter detects Pending pods, selects optimal EC2 instance types, launches Spot nodes - typically in under 60s
05 Nodes join the cluster; pods are scheduled and begin processing messages
06 Queue drains → KEDA scales pods to 0 → Karpenter terminates idle nodes → worker compute cost drops to zero

Production Best Practices

KEDA

  • Always set maxReplicaCount to guard against runaway scaling from a misconfigured scaler
  • Use cooldownPeriod: 60–120s to prevent scale-down thrashing near zero
  • Authenticate via TriggerAuthentication with podIdentity.provider: aws - the identityOwner field on the scaler is deprecated since v2.13 and will be removed in KEDA v3
  • Set scaleDown.stabilizationWindowSeconds to smooth out spiky workloads
  • For SQS workers, configure visibility timeout, scaleOnInFlight, and graceful shutdown carefully - and always attach a Dead Letter Queue to catch failed messages
  • Test scale-to-zero in staging - some apps have cold-start latency that affects first-message SLA

Karpenter

  • Use the stable karpenter.sh/v1 API - v1beta1 is supported but planned for deprecation
  • Use consolidationPolicy: WhenEmptyOrUnderutilized (the v1 name; WhenUnderutilized from v1beta1 is renamed)
  • Specify multiple instance families (c, m, r) so Karpenter can find available Spot capacity
  • Set consolidateAfter explicitly - it is required in v1 when using WhenEmptyOrUnderutilized; use 0s for the same behaviour as v1beta1
  • Include arm64 (Graviton) in your NodePool - AWS Graviton instances cost up to 20% less per hour than comparable x86 instances, with equal or better performance for most cloud-native workloads
  • Set cpu and memory limits on the NodePool as a hard cost cap
  • Tag all EC2NodeClass nodes with environment, team, and cost-center for AWS Cost Explorer analysis

Observing the Stack in Production

With two autoscalers operating in tandem, visibility across KEDA, Karpenter, SQS, and EC2 simultaneously is what separates a smooth on-call experience from a painful one. When something goes wrong - pods not scaling, nodes not terminating, queue backing up - you need correlated signals from all layers at once.

  • Expose KEDA's /metrics endpoint to Prometheus - scaler values, replica counts, and error rates are all there
  • Use CloudWatch Container Insights for correlated node + pod metrics
  • Alert on SQS ApproximateAgeOfOldestMessage to catch backlogs before they compound
  • Dashboard pod count (KEDA) and node count (Karpenter) together - a node spike without pods often means a misconfigured NodePool
  • Monitor SQS NumberOfMessagesMoved on your Dead Letter Queue - a rising DLQ count signals worker failures that scaling alone cannot fix

Conclusion

KEDA and Karpenter together eliminate the manual scaling work that falls on your on-call engineer at the worst possible moment - scaling pods from real event signals, provisioning the right nodes in seconds, and returning to zero when load clears. Getting the details right (authentication, API versions, SQS in-flight behaviour, consolidation policy) is what makes this stack hold up under pressure in production.

If you have any questions or need help implementing this on your platform, you can reach out to our DevOps & Cloud Engineering team here.

No comments:

Post a Comment