SharePoint Consulting Blog

Introduction

It's 2 a.m. Your e-commerce platform just hit the front page of a major news site. Your SQS order queue skyrockets from 200 to 200,000 messages in minutes. Pods are overwhelmed. Customers see errors. Your on-call engineer is manually scaling deployments.

This is exactly the failure mode that event-driven scaling prevents. By combining KEDA (Kubernetes Event-Driven Autoscaler) and Karpenter, you can build a platform that reacts to demand automatically - scaling pods and nodes in seconds, then returning to zero when load disappears - all without human intervention.

Why Kubernetes HPA Falls Short for Event-Driven Workloads

Kubernetes' built-in Horizontal Pod Autoscaler (HPA) is often configured with CPU and memory metrics. While HPA does technically support custom and external metrics through the Kubernetes metrics API, wiring it up to event sources like SQS queue depth requires additional metrics adapters and careful configuration. Even then, a deeper problem remains: HPA reacts to metrics after the fact. For event-driven workloads, this creates a dangerous lag:

An SQS queue floods with messages; pods start struggling
CPU climbs - HPA detects the spike after a 15s scrape and sync cycle
New pods are scheduled, but nodes are full - they sit Pending
Cluster Autoscaler requests EC2 nodes - taking 3–5 minutes to arrive
By then, the queue has grown by tens of thousands of messages

KEDA: Scale Pods on Real Events

KEDA is a CNCF Graduated project that extends Kubernetes to scale workloads based on external event sources - SQS, Kafka, Prometheus, DynamoDB Streams, and 70+ built-in scalers. It installs as a lightweight operator and works alongside your existing setup, connecting workloads directly to event sources without requiring custom metrics adapters.

KEDA introduces two core resources:

ScaledObject - scales long-running Deployments/StatefulSets (APIs, background workers)
ScaledJob - spawns individual Kubernetes Jobs per event (ETL, ML inference, video transcoding)

Installing KEDA via Helm

helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda \
  --namespace keda --create-namespace

Configuring AWS Authentication via TriggerAuthentication

The recommended approach for AWS authentication is a TriggerAuthentication resource using EKS Pod Identity or IRSA. The older identityOwner field on the scaler itself was deprecated in KEDA v2.13 and will be removed in v3 - avoid teaching or using it in new deployments.

First, create a TriggerAuthentication that references your KEDA operator's IAM role via pod identity:

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: keda-aws-auth
  namespace: order-processing
spec:
  podIdentity:
    provider: aws               # Uses EKS Pod Identity (recommended)
    # provider: aws-eks         # Use this if still on IRSA

ScaledObject Example: SQS Queue Depth

This scales an order-processing Deployment to maintain 10 messages per pod. With 500 messages, KEDA targets 50 pods - capped at maxReplicaCount.

Production note on in-flight messages: By default, KEDA's SQS scaler counts both ApproximateNumberOfMessages (queued) and ApproximateNumberOfMessagesNotVisible (in-flight / being processed). This means pods processing messages are included in the scaling calculation, which is usually the right behaviour. However, if your workers have long processing times or you see unexpected scale-down events mid-processing, tune scaleOnInFlight, your SQS visibility timeout, and your worker shutdown handling carefully - and ensure a Dead Letter Queue is configured to catch messages that fail repeatedly.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: order-processor-scaledobject
spec:
  scaleTargetRef:
    name: order-processor
  pollingInterval: 15       # Check queue every 15s
  cooldownPeriod: 60        # Wait 60s before scaling down
  minReplicaCount: 0        # Scale to zero when idle
  maxReplicaCount: 100
  triggers:
  - type: aws-sqs-queue
    authenticationRef:
      name: keda-aws-auth   # References TriggerAuthentication above
    metadata:
      queueURL: https://sqs.us-east-1.amazonaws.com/123456/orders
      queueLength: '10'     # Target messages-per-pod ratio
      awsRegion: us-east-1
      scaleOnInFlight: 'true'   # Default: true. Set false to exclude in-flight messages

Karpenter: Right-Sized Nodes, Right Now

When KEDA scales your pods, those pods need nodes to land on. Karpenter watches for Pending pods, then automatically provisions the optimal EC2 instance type to satisfy them - typically in under 60 seconds. It also continuously bin-packs workloads and terminates underutilized nodes.

Karpenter vs. Cluster Autoscaler

Feature	Cluster Autoscaler	Karpenter
Provisioning Speed	3–5+ minutes	Typically 30–60 seconds
Instance Selection	Pre-configured ASG groups	Dynamic - picks optimal type per workload
Spot Support	Manual node group setup	Native, single NodePool
Node Consolidation	Limited	Automatic bin-packing

NodePool Configuration

The NodePool resource defines what Karpenter is allowed to provision. The example below targets the stable karpenter.sh/v1 API (available from Karpenter v1.0+) and configures a mixed Spot/On-Demand pool for batch workloads. Note that in v1, the consolidation policy is named WhenEmptyOrUnderutilized (renamed from WhenUnderutilized in v1beta1), and consolidateAfter is now supported alongside it and is required.

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: batch-workers
spec:
  template:
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: batch-workers
      requirements:
      - key: karpenter.sh/capacity-type
        operator: In
        values: ['spot', 'on-demand']    # Spot-first, On-Demand fallback
      - key: node.kubernetes.io/instance-category
        operator: In
        values: ['c', 'm', 'r']          # General, compute, memory families
      - key: kubernetes.io/arch
        operator: In
        values: ['amd64', 'arm64']       # Support Graviton for savings
  limits:
    cpu: 1000                            # Safety cap on total cluster cost
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized   # v1 name (was WhenUnderutilized in v1beta1)
    consolidateAfter: 30s                            # Required in v1; set to 0s for immediate

End-to-End Architecture Flow

When an SQS burst hits, the full scale-up sequence - from event arrival to active pod processing - completes in roughly one to two minutes in a well-tuned cluster. Actual time depends on image pull speed, node bootstrap, daemonset startup, and workload readiness. Here is the sequence:

01	Amazon SQS queue depth spikes (e.g., 200,000 messages)
02	KEDA polls queue every 15s, calculates required pod count, updates Deployment replica target
03	New pods are created - many land in Pending state (no capacity yet)
04	Karpenter detects Pending pods, selects optimal EC2 instance types, launches Spot nodes - typically in under 60s
05	Nodes join the cluster; pods are scheduled and begin processing messages
06	Queue drains → KEDA scales pods to 0 → Karpenter terminates idle nodes → worker compute cost drops to zero

Production Best Practices

KEDA

Always set maxReplicaCount to guard against runaway scaling from a misconfigured scaler
Use cooldownPeriod: 60–120s to prevent scale-down thrashing near zero
Authenticate via TriggerAuthentication with podIdentity.provider: aws - the identityOwner field on the scaler is deprecated since v2.13 and will be removed in KEDA v3
Set scaleDown.stabilizationWindowSeconds to smooth out spiky workloads
For SQS workers, configure visibility timeout, scaleOnInFlight, and graceful shutdown carefully - and always attach a Dead Letter Queue to catch failed messages
Test scale-to-zero in staging - some apps have cold-start latency that affects first-message SLA

Karpenter

Use the stable karpenter.sh/v1 API - v1beta1 is supported but planned for deprecation
Use consolidationPolicy: WhenEmptyOrUnderutilized (the v1 name; WhenUnderutilized from v1beta1 is renamed)
Specify multiple instance families (c, m, r) so Karpenter can find available Spot capacity
Set consolidateAfter explicitly - it is required in v1 when using WhenEmptyOrUnderutilized; use 0s for the same behaviour as v1beta1
Include arm64 (Graviton) in your NodePool - AWS Graviton instances cost up to 20% less per hour than comparable x86 instances, with equal or better performance for most cloud-native workloads
Set cpu and memory limits on the NodePool as a hard cost cap
Tag all EC2NodeClass nodes with environment, team, and cost-center for AWS Cost Explorer analysis

Observing the Stack in Production

With two autoscalers operating in tandem, visibility across KEDA, Karpenter, SQS, and EC2 simultaneously is what separates a smooth on-call experience from a painful one. When something goes wrong - pods not scaling, nodes not terminating, queue backing up - you need correlated signals from all layers at once.

Expose KEDA's /metrics endpoint to Prometheus - scaler values, replica counts, and error rates are all there
Use CloudWatch Container Insights for correlated node + pod metrics
Alert on SQS ApproximateAgeOfOldestMessage to catch backlogs before they compound
Dashboard pod count (KEDA) and node count (Karpenter) together - a node spike without pods often means a misconfigured NodePool
Monitor SQS NumberOfMessagesMoved on your Dead Letter Queue - a rising DLQ count signals worker failures that scaling alone cannot fix

Conclusion

KEDA and Karpenter together eliminate the manual scaling work that falls on your on-call engineer at the worst possible moment - scaling pods from real event signals, provisioning the right nodes in seconds, and returning to zero when load clears. Getting the details right (authentication, API versions, SQS in-flight behaviour, consolidation policy) is what makes this stack hold up under pressure in production.

If you have any questions or need help implementing this on your platform, you can reach out to our DevOps & Cloud Engineering team here.

Blog

May 7, 2026

Event-Driven Autoscaling on EKS with KEDA and Karpenter