mongosh (MongoDB Shell) is an interactive JavaScript-based shell used to interact with MongoDB databases. It replaces the legacy mongo shell and provides a modern, developer-friendly experience with better usability and scripting capabilities.
Whether you're debugging data issues, running quick queries, or executing scripts—mongosh is your most powerful direct interface to MongoDB.
When to Use mongosh vs MongoDB Compass
Both mongosh and MongoDB Compass are powerful tools for interacting with MongoDB, but they serve different purposes depending on the task.
MongoDB Compass — Best for Visualization & Exploration
MongoDB Compass is ideal when you want a visual interface to explore your database without writing commands manually.
Use Compass when you need to:
Browse collections visually
Inspect document structures
Quickly test simple queries
View indexes and schema information
Analyze aggregation pipelines visually
Work comfortably with smaller datasets
Compass is especially useful for beginners or during early-stage debugging where seeing the data structure helps more than scripting.
However, Compass has limitations when operations become more complex or repetitive.
mongosh — Best for Power Operations & Automation
mongosh gives developers direct control over MongoDB using JavaScript-based commands and scripts.
Use mongosh when you need to:
Perform bulk updates or deletions
Run loops and conditional logic
Execute migration or cleanup scripts
Handle duplicate data removal
Automate repetitive database tasks
Debug production-level data issues
Run advanced aggregation workflows
Execute commands faster than GUI interactions
Unlike Compass, mongosh allows scripting and automation, making it extremely valuable for backend developers and DevOps workflows.
For example:
Removing thousands of duplicate records
Updating fields across multiple collections
Writing one-time migration scripts
Performing production hotfixes
These tasks are significantly easier and more efficient in mongosh.
Which One Should You Choose?
In practice, developers often use both tools together:
Use Compass for exploration and visualization
Use mongosh for execution, automation, and production fixes
Think of Compass as the visual dashboard, while mongosh is the developer power tool for serious database operations.
An enterprise running seven department-specific SharePoint intranet sites needed AI that could actually operate within their systems - not just answer questions about them. Here is what a single task looked like before we got involved - and after.
Before - Sales proposal workflow
After - same task, one sentence
Open Pipeline Tracker on SharePoint - find the deal
Type: "Draft a proposal for Meridian Corp's cloud migration deal and send it to their CTO."
Open Word - hunt for the proposal template on the shared drive
Agent looks up CTO in Client Contacts - name, email, title
Manually fill in deal value, scope, and terms
Branded proposal generated from python-docx template and uploaded to Sales Collateral
Save, switch back to SharePoint, upload to Sales Collateral library
Email compose panel opens - pre-filled with CTO's address, subject, and body
Open Outlook - type the CTO's email, write the body, attach, send
Review, tweak one line, hit Send
40 minutes · 6 applications · 1 task
One sentence · Five systems · Done
We assessed the organisation's requirements against Copilot Cowork and Claude Cowork. Both are genuinely capable products - but neither could query custom SharePoint list schemas, connect to a SQL employee database, generate documents from branded templates, or switch to a different specialist persona depending on which department site the user was on. They needed something those products are structurally not built to do.
Same panel. Same backend. Same LLM. Completely different specialist per site, loaded at runtime from a JSON manifest. This is what we built. Here is how:
The Architecture in One Sentence
A single SPFx floating panel on every SharePoint page sends messages to a FastAPI backend, which enters a Claude agentic loop with a filtered set of MCP tools - filtered by which site the user is on, what they are allowed to do, and which department's data model applies. Each site loads its own agent persona, slash commands, and skill files - deep knowledge modules that teach the agent how a specific department's data is structured, what business rules apply, and which workflows exist.
System Architecture - SPFx panel → FastAPI backend → Claude agentic loop → MCP server → enterprise data sources
That sentence hides a lot of machinery. Let me unpack it through the folder structure, because the folder structure is the architecture.
The Folder Structure That Makes It Work
Backend/
├── api/
│ ├── main.py ← FastAPI app, startup, CORS
│ └── routers/
│ └── agent_chat.py ← The agentic loop: SSE streaming, tool orchestration
│
├── core/
│ ├── mcp/
│ │ ├── server.py ← In-process MCP server (58 tool routes)
│ │ ├── builder.py ← Tool definitions with input schemas
│ │ └── tools/ ← Tool implementations by connector type
│ ├── auth/ ← OBO token exchange + certificate auth
│ ├── session/ ← Session resolver (who, where, what permissions)
│ ├── registry/ ← Plugin auto-discovery at startup
│ ├── audit/ ← Write-operation audit logger
│ ├── alerts/handlers/ ← Proactive alert checks (5 built-in)
│ └── notifications/ ← Custom notification rules engine
│
├── connectors/ ← 9 connector modules
│ ├── sharepoint/ ← List CRUD, library browse, search, upload
│ ├── msgraph/ ← Users, calendar, mail, Teams channels
│ ├── ems/ ← Direct SQL: employee lookup, org chart, leave
│ ├── email/ ← Draft + send via Graph
│ ├── docgen/ ← python-docx reports, offer letters, proposals
│ ├── azdevops/ ← Work items, sprints
│ ├── analytics/ ← NL reports, anomaly detection
│ ├── knowledge/ ← Federated search, expert finder
│ └── automation/ ← Notification rule CRUD
│
├── plugins/ ← THIS IS THE KEY DIRECTORY
│ ├── SALES/
│ │ ├── .claude-plugin/
│ │ │ └── plugin.json ← Manifest: persona, tools, commands, skills, schemas
│ │ ├── agents/ ← Persona markdown files
│ │ ├── commands/ ← Slash command definitions
│ │ └── skills/ ← Domain knowledge modules
│ ├── FINANCE/ ← Same structure, different specialist
│ ├── PEOPLE/ ← Same structure, different specialist
│ ├── DELIVERY/ ← Same structure, different specialist
│ ├── TECHNOLOGY/ ← Same structure, different specialist
│ ├── OPERATIONS/ ← Same structure, different specialist
│ └── CORE/ ← Same structure, different specialist
│
└── db/migrations/ ← 6 SQL migrations (sessions, audit, alerts)
Every architectural decision is visible in this tree. Let me walk through the three that matter most.
Decision 1: The Plugin Manifest
When a user opens the Sales site and the panel initialises, the backend resolves their session: who are they (from the OBO token), which site are they on (from the site_url passed by the frontend), and what can they do (from the plugin manifest). The manifest is a single JSON file:
The entry_agent points to a markdown file that defines the agent's personality, role, behaviour rules, and domain expertise. The Sales agent is "friendly, professional, and results-driven." The Finance agent is precise and compliance-aware. The People agent is warm and policy-oriented. Same LLM, different specialist.
It filters the tool palette.
The connectors array controls which of the 58 MCP tools appear in Claude's tool list for this session. Sales gets SharePoint, Graph, Email, DocGen, EMS, Knowledge, Analytics, and Automation - but not Azure DevOps. Delivery gets Azure DevOps too. Technology gets everything. The LLM only sees tools it is allowed to use.
It loads skill files.
This is where the deep domain knowledge lives. Each skill is a markdown file that describes how a specific aspect of the department works. The Sales plugin has skills for pipeline management, client engagement, and competitive intelligence. A skill might describe how the Pipeline Tracker list is structured, what each column means, which OData filters produce useful results, what "stalled deal" means in this organisation's context, and what steps the agent should follow for a pipeline review. Skills are loaded into the system prompt based on the site and the user's query - they are the agent's domain training, delivered at runtime through text, not fine-tuning.
It injects list schemas.
The sharepoint_lists block is injected into the system prompt. This is how the agent knows the column is called PipelineStage, not Stage. It knows DealValue is the field for revenue, not Value or Amount. It never guesses. If a column is not in the manifest, the agent cannot reference it.
It registers slash commands.
Each command is a markdown file with trigger phrases, required parameters, the agent to hand off to, and step-by-step instructions. When a user types /pipeline-report, the command's markdown gets injected into the system prompt for that turn.
The consequence of this design is that adding a new department - an eighth site, a regional hub, a project workspace - requires creating one folder with one JSON manifest and a few markdown files. No code changes. No backend redeployment. The registry discovers it on next restart.
The key insight: The plugin folder is the architecture. Not the LLM, not the framework, not the cloud infrastructure. The entire system's behaviour - which specialist responds, which tools are available, which data is accessible, which commands exist, which domain knowledge the agent draws on - is determined by which plugin.json file gets loaded and which skill files sit alongside it. Everything else is shared plumbing. This is what makes the system maintainable at scale: changing a department's AI behaviour is editing a JSON file and a few markdown documents, not shipping code.
Decision 2: Dual-Auth and the Invisible Session
Authentication is where most custom AI implementations get it wrong. Either they use the user's token for everything (writes are unauditable and tied to individual permissions) or they use an app-only token for everything (losing the identity context of who asked for the action).
We split it:
Reads use the user's delegated OBO token.
When the agent queries a SharePoint list or fetches calendar events, it does so as the user. If the user cannot see a site, neither can the agent. The existing M365 permission model is preserved - no data leakage, no privilege escalation.
Writes use an app-only certificate token.
All write operations - list item creation, document upload, email send - execute through a controlled service identity, not the user's token.
Every write is routed through an audit logger: who requested it, what changed, tool name, payload, status, and duration.
site_url, OBO token, app-only token, and the user's permission set are never parameters the LLM sees - they are injected from the session object before the agentic loop begins.
In the initial version, SharePoint tools accepted site_url as a tool parameter. During early integration testing, Claude hallucinated a URL - it substituted the Sales site URL when a user on the Finance site asked about budget items. The query returned nothing. The agent confidently reported "no budget items found." It was a silent, plausible failure - the worst kind. We refactored site_url out of every tool interface within the day and moved it to session-level injection. The LLM never sees it, never chooses it, never hallucinates it.
If the LLM does not need a value to reason about the task, do not put it in the tool interface. Every parameter visible to the LLM is a surface for hallucination. Session-level injection eliminates that surface.
Decision 3: The Agentic Loop
When a user sends a message, the backend does not call Claude once and return the answer. It enters a loop:
Agentic Loop - Request Lifecycle with tool execution feedback loop
Multi-step tool chains. A single message like "find stalled deals, draft follow-ups, and notify the sales lead" triggers five sequential tool calls - SharePoint OData query, EMS employee lookup, three email draft preparations - each feeding the next decision.
Real-time status streaming. Each tool call emits an SSE event to the frontend ("Fetching list data: Pipeline Tracker...", "Looking up employee...") so the user sees the agent working, not a loading spinner.
Model is a config variable. Claude Sonnet is the default - fast enough for real-time streaming, capable enough for multi-step orchestration. Swapping models is a single environment variable change; the architecture is not coupled to any provider.
We tested Azure OpenAI + Semantic Kernel. The loop runs and tools get called. But in head-to-head testing, Claude produced fewer hallucinated tool parameters, followed complex multi-step prompts more faithfully, and handled branching logic - where a tool result determines whether to call the next tool or skip to a different action - more consistently. That is why it is our default.
What the Agent Can Actually Do
58 tools across 9 connectors. Here are the ones that matter most to daily operations:
Connector
What it does
Available on
SharePoint
Read list items with OData filters, create and update items, browse document libraries, upload files, search across the site. Primary data layer.
All sites
EMS
Direct SQL via pyodbc against the HR database: employee lookup, org chart traversal, leave balances, skills search, project assignments, team allocation, capacity planning, budget tracking. No REST wrapper.
All sites
Document generation
Branded offer letters, proposals, and reports via python-docx. Budget workbooks via openpyxl. Generated on the backend, uploaded to SharePoint, download card in chat.
Sales, People, Finance, Operations
Email
Two-step workflow: agent drafts, human reviews in inline compose panel, then confirms Send. Never auto-sends.
All sites
Azure DevOps
Query and create work items, get sprint status.
Delivery, Technology only
Proactive alerts
Five APScheduler handlers: stalled deals, budget thresholds, expiring contracts, onboarding gaps, morning brief digest. Push to Teams channels and notification bell before anyone opens a browser.
All sites (handler-specific)
MS Graph
Users, calendar, Teams channels, mail integration.
All sites
Analytics
Natural language reports, anomaly detection.
All sites
Knowledge
Federated search, expert finder.
All sites
Lessons from the Build
Every implementation surfaces lessons that inform the next one.
SharePoint field name complexity runs deeper than expected.
Display names and internal names diverge in non-obvious ways. "Status" might be stored as BudgetStatus. "Details" might be AnnouncementDetails. The initial deployment surfaced field name mismatches that cost debugging cycles. Our fix - a PowerShell schema export script that generates verified field mappings per site - is now a standard first step in our deployment methodology.
Not every data source needs an API layer.
Our first design had a full REST API wrapper around the SQL employee database. We replaced it with direct pyodbc queries wrapped in the MCP tool contract. Simpler, faster, easier to maintain.
The in-process MCP server will need extraction.
Running the tool server in-process was the right call for initial development - no serialisation overhead, easy debugging. But for production scale, tool execution needs to run independently so long-running operations (document generation, cross-site searches) do not block the API layer. The architecture was designed for this extraction - MCP's HTTP transport makes it a configuration change, not a rewrite - which is planned for the next phase.
Retry logic is now day-one infrastructure.
Claude's API occasionally returns transient errors under load. Without exponential backoff (1s, 2s, 4s), these surface as user-facing failures. Retry logic is now part of our standard agentic infrastructure from the start, based on this experience.
Production hardening - monitoring, observability, error recovery, and scale testing - deserves its own post. We will publish that next.
A Note on Portability
This case study uses SharePoint as the enterprise platform and Claude as the LLM, but the pattern is platform-agnostic. The same plugin-per-site architecture has been applied with Confluence, custom intranets, and internal portals. The frontend can be any web surface that supports a JavaScript embed - the SPFx panel is one implementation, not a requirement. The connectors, the plugin manifest pattern, the dual-auth model, and the session-scoped tool filtering are all transferable. If your organisation runs on a different stack, the architecture adapts to it.
Conclusion
The plugin-per-site pattern, dual-auth model, session-scoped tool filtering, and in-process MCP server described here form a reusable enterprise architecture - not a one-off implementation. Adding a new department means creating a folder with a JSON manifest and a few markdown skill files. No code changes, no redeployment. The system's behaviour is entirely determined by configuration, not by the codebase.
This architecture was designed and deployed by Binary Republik. It is adaptable to any organisation's departmental structure, data model, and governance requirements - and transferable to any enterprise platform with programmatic APIs.
If you have any questions you can reach out to our AI Consulting team here.
Choosing the right AWS compute service isn't just a technical decision - it directly affects your team's velocity, your operational costs, and how quickly your business can respond to change. Pick the wrong one and you'll either pay for complexity you don't need, or find yourself locked into a setup that can't scale with you.
EC2, ECS, and EKS are not competing services. They solve fundamentally different problems at different levels of abstraction. EC2 gives you a raw virtual machine with full infrastructure control. ECS is a managed container platform built natively on AWS - no Kubernetes required. EKS brings the full power of Kubernetes to AWS for teams that need portability and advanced orchestration at scale.
Here is what we cover:
What EC2, ECS, and EKS actually are - in plain terms
Key differences across abstraction, scaling, cost, and operational overhead
What each service costs, with real numbers from AWS's official pricing pages
Real-world use cases and business implications for each
A decision guide to help you pick the right one
Understanding the Basics
Amazon EC2 - Virtual Machines
EC2 is AWS's virtual machine service. You choose the operating system, CPU, memory, and storage - AWS provisions the server. Think of it as renting a physical server in the cloud. Your team owns everything from that point on: patching, scaling, monitoring, and security. Maximum control, maximum responsibility.
Business implication: EC2 requires engineering time to manage and maintain. That time has a cost. It is best suited for workloads where your team needs deep infrastructure control, or for migrating existing applications that weren't built for containers.
Amazon ECS - Managed Containers
ECS is AWS's native container orchestration service. You define your Docker image, CPU and memory requirements, and scaling rules. AWS handles the rest - scheduling, cluster management, health checks, and integrations with AWS services like Application Load Balancer, IAM, and CloudWatch. No Kubernetes knowledge required.
Business implication: ECS reduces the operational surface area significantly. Smaller teams can run production container workloads without dedicated DevOps headcount. It is AWS's recommended starting point for teams new to containers.
Amazon EKS - Managed Kubernetes
EKS runs Kubernetes on AWS. AWS manages the control plane - the brain of the cluster - while you manage the worker nodes, or offload them to AWS Fargate for a fully serverless setup. Kubernetes is the industry-standard container orchestration platform, and EKS makes it available as a managed service.
Business implication: EKS is the most powerful and flexible option, but it brings the highest operational complexity and cost. It pays off at scale, for teams running complex microservices architectures, or for organisations that want the option to run workloads across multiple cloud providers.
Key Differences at a Glance
Feature
EC2
ECS
EKS
Abstraction Level
Low (Infrastructure)
Medium (Platform)
High (Ecosystem)
Primary Unit
VM / Server
Container Task
Pod
Setup Complexity
High
Low
Very High
Scaling Speed
Slow (VM boot)
Fast
Fast
OS / Host Access
Full
None
Limited
Kubernetes Support
No
No
Yes
Multi-cloud Portability
Low
Low
High
Operational Overhead
High
Medium
Very High
Learning Curve
Low–Medium
Low
High
Relative Cost (entry)
Low–Medium
Medium
High
Best for
Legacy apps, full control
AWS-native container apps
Complex microservices, multi-cloud
Cost Breakdown - With Real Numbers
Cost is one of the most common decision factors, and it is also one of the most misunderstood. The sticker price of compute is only part of the picture. Operational overhead - the engineering time spent managing infrastructure - is a real cost that doesn't show up on your AWS bill but does show up on your payroll.
EC2
With EC2, you pay for instance uptime. AWS offers four main pricing models: On-Demand (pay by the second, no commitment), Reserved Instances (commit to 1 or 3 years for up to 72% off On-Demand rates, per official AWS pricing), Savings Plans (flexible commitment-based discounts, AWS's currently recommended approach over Reserved Instances), and Spot Instances (spare capacity at up to 90% off, but AWS can reclaim with two minutes notice). Most production workloads that run around the clock should be on Reserved Instances or a Savings Plan - running purely On-Demand for steady-state workloads leaves significant money on the table.
Business implication: EC2 can be the most cost-efficient option for stable, predictable workloads - but only if your team actively manages instance sizing and pricing commitments. Teams that over-provision and forget tend to pay more than they should.
ECS
ECS has no additional management fee. You pay for the underlying compute - either EC2 instances you manage yourself, or AWS Fargate (fully serverless, where AWS manages the infrastructure). With Fargate on ECS, billing is per second based on the vCPU and memory your containers actually use. As a reference, in the US East (N. Virginia) region, Fargate charges approximately $0.04048 per vCPU-hour and $0.004445 per GB-hour for memory, per AWS's official Fargate pricing page. A container running 2 vCPUs and 4 GB of RAM costs roughly $0.099 per hour, or around $72 per month running continuously.
Business implication: Fargate on ECS is often the most cost-effective choice for bursty, unpredictable, or variable traffic patterns - you pay only for what you use, and there is no idle compute cost. For high-volume, steady-state workloads, EC2-backed ECS can be cheaper.
EKS
EKS has a fixed control plane fee of $0.10 per cluster per hour - approximately $73 per month per cluster - regardless of cluster size or workload, per AWS's official EKS pricing page. This is just the management fee; worker node compute (EC2 or Fargate), storage, load balancers, and data transfer are all billed separately. Teams running dev, staging, and production environments on separate clusters pay this fee three times minimum. Additionally, if a Kubernetes version ages past its 14-month standard support window without being upgraded, the fee jumps to $0.60 per hour - a 6x increase that catches many teams by surprise.
Business implication: EKS is expensive at small scale and cost-efficient at large scale, where advanced resource scheduling and bin-packing justify the overhead. For small to medium workloads, ECS on Fargate will almost always be the cheaper option. The $73/month control plane fee is a fixed entry cost per cluster - for multi-cluster strategies, it adds up quickly.
Real-World Use Cases
Use EC2 when:
You are migrating a legacy or monolithic application that was not built for containers
Your application requires custom OS-level configuration, kernel parameters, or specific hardware access
Your team needs full visibility and control over the underlying server environment
You have predictable, steady-state workloads and want to maximise savings through Reserved Instances or Savings Plans
Business implication: EC2 is the right lift-and-shift vehicle. It minimises application changes during migration and gives your team time to modernise at their own pace. The trade-off is that it demands the most ongoing engineering attention.
Use ECS when:
You want to run containers on AWS without learning Kubernetes
You need fast, reliable deployments with minimal DevOps overhead
Your workloads are bursty or variable and Fargate's pay-per-use model suits your traffic patterns
Your team is AWS-first and wants tight, native integration with IAM, CloudWatch, and ALB
Business implication: ECS on Fargate is the fastest path to production for container workloads. It requires the least infrastructure expertise, no cluster maintenance, and scales automatically. It is a strong default choice for startups, product teams, and organisations without dedicated platform engineering.
Use EKS when:
Your team already uses Kubernetes and has the expertise to operate it
You need multi-cloud portability - the ability to run the same workloads on AWS, GCP, or Azure
You are running complex microservices that benefit from Kubernetes-native features like Horizontal Pod Autoscaler, custom scheduling, or service meshes
You are operating at a scale where advanced resource optimisation (bin-packing, Spot node groups, Karpenter) delivers meaningful cost savings
Business implication: EKS is a long-term platform investment. It requires Kubernetes expertise to operate well - either in-house or through a managed services partner. The payoff is flexibility, portability, and the ability to run sophisticated workloads that outgrow what ECS can offer.
Quick Decision Guide
Two questions cut through most of the noise:
Do you need Kubernetes?
Yes - your team uses it already, or you need multi-cloud portability → EKS
No → Move to question 2
Do you want containers?
Yes → ECS
No, or you have a legacy app to migrate → EC2
Situation
Recommended Service
Why
Small team, new to containers
ECS (Fargate)
Lowest ops overhead, no cluster to manage
Legacy app migration
EC2
Minimal app changes, full control
Startup scaling fast on AWS
ECS (Fargate)
Fast deployments, pay-as-you-go, AWS-native
Enterprise microservices
EKS
Advanced orchestration, multi-team platform
Existing Kubernetes users
EKS
Familiar tooling, avoid re-platforming cost
Multi-cloud strategy
EKS
Kubernetes runs anywhere - avoids AWS lock-in
Predictable, high-volume compute
EC2 (Reserved / Savings Plan)
Up to 72% savings over On-Demand with commitment
Conclusion
EC2, ECS, and EKS are tools for different jobs at different stages of growth. The right choice depends on your team's skills, your application's architecture, your traffic patterns, and your budget - both the AWS bill and the engineering time to manage it.
Want simplicity and speed? → ECS on Fargate
Have a legacy app to migrate? → EC2
Need Kubernetes power and portability? → EKS
ECS on Fargate handles a huge range of production workloads reliably and cost-effectively - and it is far easier to migrate to EKS later than to unwind unnecessary Kubernetes complexity from day one.
Azure Key Vault is a cloud service that provides a secure and centralized way to store and manage secrets, keys, and certificates used by applications and services. It helps teams avoid hardcoding sensitive values like API keys, connection strings, or passwords directly into code or configuration files.
In this guide, you will learn how to update and retrieve secrets from Azure Key Vault using the REST API - a useful approach for automation scripts, CI/CD pipelines, and external integrations where using an SDK is not preferred or available.
Prerequisites
An Azure Key Vault - if you don't already have one, create it from the Azure portal.
At least one secret inside the Key Vault - click Generate/Import inside the vault to create your first secret.
Enable Azure RBAC on the Key Vault (Required)
Azure Key Vault supports two permission models: Vault Access Policy (legacy) and Azure RBAC. To use IAM role assignments (like Key Vault Secrets Officer), your Key Vault must have Azure RBAC enabled. Without this, role assignments won't grant access to secrets.
For a new Key Vault: During creation, go to the Access configuration tab and under Permission model, select Azure role-based access control (RBAC).
For an existing Key Vault:
Open your Key Vault in the Azure Portal.
Go to Settings → Access configuration.
Under the Permission model, select Azure role-based access control.
Click Save.
Important: If you switch an existing Key Vault from Vault Access Policy to Azure RBAC, all previously configured access policies will stop working. Make sure you reassign equivalent Azure roles before or immediately after switching.
Create an Azure AD App Registration (Required)
To access Key Vault through the REST API, you must authenticate with an Azure AD application.
Assign API Permissions
Go to: API Permissions → Add Permission → Azure Key Vault → Delegated Permissions.
Select: user_impersonation
Then click Grant Admin consent.
Create a Client Secret
In the App Registration:
Go to Certificates & Secrets
Click New client secret
Copy the generated secret value (you will need it in API calls)
Copy the Client ID and Tenant ID
From the Overview page of your App Registration, copy:
Client ID (Application ID)
Tenant ID (Directory ID)
Assign IAM Role on the Key Vault
To allow the App Registration to get or update secrets, assign it one of the following roles:
Key Vault Secrets Officer OR Key Vault Administrator
Path: Key Vault → Access control (IAM) → Add Role Assignment
Select the role and assign it to your App Registration.
Generate an Access Token
Before calling the Key Vault REST API, you must generate an OAuth 2.0 access token.
With these steps, you can easily authenticate through Azure AD, retrieve secrets, and update values in Azure Key Vault using REST API calls. This approach is beneficial for automation, CI/CD pipelines, and external integrations where SDKs are not preferred.
It's 2 a.m. Your e-commerce platform just hit the front page of a major news site. Your SQS order queue skyrockets from 200 to 200,000 messages in minutes. Pods are overwhelmed. Customers see errors. Your on-call engineer is manually scaling deployments.
This is exactly the failure mode that event-driven scaling prevents. By combining KEDA (Kubernetes Event-Driven Autoscaler) and Karpenter, you can build a platform that reacts to demand automatically - scaling pods and nodes in seconds, then returning to zero when load disappears - all without human intervention.
Why Kubernetes HPA Falls Short for Event-Driven Workloads
Kubernetes' built-in Horizontal Pod Autoscaler (HPA) is often configured with CPU and memory metrics. While HPA does technically support custom and external metrics through the Kubernetes metrics API, wiring it up to event sources like SQS queue depth requires additional metrics adapters and careful configuration. Even then, a deeper problem remains: HPA reacts to metrics after the fact. For event-driven workloads, this creates a dangerous lag:
An SQS queue floods with messages; pods start struggling
CPU climbs - HPA detects the spike after a 15s scrape and sync cycle
New pods are scheduled, but nodes are full - they sit Pending
By then, the queue has grown by tens of thousands of messages
KEDA: Scale Pods on Real Events
KEDA is a CNCF Graduated project that extends Kubernetes to scale workloads based on external event sources - SQS, Kafka, Prometheus, DynamoDB Streams, and 70+ built-in scalers. It installs as a lightweight operator and works alongside your existing setup, connecting workloads directly to event sources without requiring custom metrics adapters.
ScaledJob - spawns individual Kubernetes Jobs per event (ETL, ML inference, video transcoding)
Installing KEDA via Helm
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda \
--namespace keda --create-namespace
Configuring AWS Authentication via TriggerAuthentication
The recommended approach for AWS authentication is a TriggerAuthentication resource using EKS Pod Identity or IRSA. The older identityOwner field on the scaler itself was deprecated in KEDA v2.13 and will be removed in v3 - avoid teaching or using it in new deployments.
First, create a TriggerAuthentication that references your KEDA operator's IAM role via pod identity:
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: keda-aws-auth
namespace: order-processing
spec:
podIdentity:
provider: aws # Uses EKS Pod Identity (recommended)
# provider: aws-eks # Use this if still on IRSA
ScaledObject Example: SQS Queue Depth
This scales an order-processing Deployment to maintain 10 messages per pod. With 500 messages, KEDA targets 50 pods - capped at maxReplicaCount.
Production note on in-flight messages: By default, KEDA's SQS scaler counts both ApproximateNumberOfMessages (queued) and ApproximateNumberOfMessagesNotVisible (in-flight / being processed). This means pods processing messages are included in the scaling calculation, which is usually the right behaviour. However, if your workers have long processing times or you see unexpected scale-down events mid-processing, tune scaleOnInFlight, your SQS visibility timeout, and your worker shutdown handling carefully - and ensure a Dead Letter Queue is configured to catch messages that fail repeatedly.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: order-processor-scaledobject
spec:
scaleTargetRef:
name: order-processor
pollingInterval: 15 # Check queue every 15s
cooldownPeriod: 60 # Wait 60s before scaling down
minReplicaCount: 0 # Scale to zero when idle
maxReplicaCount: 100
triggers:
- type: aws-sqs-queue
authenticationRef:
name: keda-aws-auth # References TriggerAuthentication above
metadata:
queueURL: https://sqs.us-east-1.amazonaws.com/123456/orders
queueLength: '10' # Target messages-per-pod ratio
awsRegion: us-east-1
scaleOnInFlight: 'true' # Default: true. Set false to exclude in-flight messages
Karpenter: Right-Sized Nodes, Right Now
When KEDA scales your pods, those pods need nodes to land on. Karpenter watches for Pending pods, then automatically provisions the optimal EC2 instance type to satisfy them - typically in under 60 seconds. It also continuously bin-packs workloads and terminates underutilized nodes.
Karpenter vs. Cluster Autoscaler
Feature
Cluster Autoscaler
Karpenter
Provisioning Speed
3–5+ minutes
Typically 30–60 seconds
Instance Selection
Pre-configured ASG groups
Dynamic - picks optimal type per workload
Spot Support
Manual node group setup
Native, single NodePool
Node Consolidation
Limited
Automatic bin-packing
NodePool Configuration
The NodePool resource defines what Karpenter is allowed to provision. The example below targets the stable karpenter.sh/v1 API (available from Karpenter v1.0+) and configures a mixed Spot/On-Demand pool for batch workloads. Note that in v1, the consolidation policy is named WhenEmptyOrUnderutilized (renamed from WhenUnderutilized in v1beta1), and consolidateAfter is now supported alongside it and is required.
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: batch-workers
spec:
template:
spec:
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: batch-workers
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ['spot', 'on-demand'] # Spot-first, On-Demand fallback
- key: node.kubernetes.io/instance-category
operator: In
values: ['c', 'm', 'r'] # General, compute, memory families
- key: kubernetes.io/arch
operator: In
values: ['amd64', 'arm64'] # Support Graviton for savings
limits:
cpu: 1000 # Safety cap on total cluster cost
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized # v1 name (was WhenUnderutilized in v1beta1)
consolidateAfter: 30s # Required in v1; set to 0s for immediate
End-to-End Architecture Flow
When an SQS burst hits, the full scale-up sequence - from event arrival to active pod processing - completes in roughly one to two minutes in a well-tuned cluster. Actual time depends on image pull speed, node bootstrap, daemonset startup, and workload readiness. Here is the sequence:
KEDA polls queue every 15s, calculates required pod count, updates Deployment replica target
03
New pods are created - many land in Pending state (no capacity yet)
04
Karpenter detects Pending pods, selects optimal EC2 instance types, launches Spot nodes - typically in under 60s
05
Nodes join the cluster; pods are scheduled and begin processing messages
06
Queue drains → KEDA scales pods to 0 → Karpenter terminates idle nodes → worker compute cost drops to zero
Production Best Practices
KEDA
Always set maxReplicaCount to guard against runaway scaling from a misconfigured scaler
Use cooldownPeriod: 60–120s to prevent scale-down thrashing near zero
Authenticate via TriggerAuthentication with podIdentity.provider: aws - the identityOwner field on the scaler is deprecated since v2.13 and will be removed in KEDA v3
Set scaleDown.stabilizationWindowSeconds to smooth out spiky workloads
For SQS workers, configure visibility timeout, scaleOnInFlight, and graceful shutdown carefully - and always attach a Dead Letter Queue to catch failed messages
Test scale-to-zero in staging - some apps have cold-start latency that affects first-message SLA
Karpenter
Use the stable karpenter.sh/v1 API - v1beta1 is supported but planned for deprecation
Use consolidationPolicy: WhenEmptyOrUnderutilized (the v1 name; WhenUnderutilized from v1beta1 is renamed)
Specify multiple instance families (c, m, r) so Karpenter can find available Spot capacity
Set consolidateAfter explicitly - it is required in v1 when using WhenEmptyOrUnderutilized; use 0s for the same behaviour as v1beta1
Include arm64 (Graviton) in your NodePool - AWS Graviton instances cost up to 20% less per hour than comparable x86 instances, with equal or better performance for most cloud-native workloads
Set cpu and memory limits on the NodePool as a hard cost cap
Tag all EC2NodeClass nodes with environment, team, and cost-center for AWS Cost Explorer analysis
Observing the Stack in Production
With two autoscalers operating in tandem, visibility across KEDA, Karpenter, SQS, and EC2 simultaneously is what separates a smooth on-call experience from a painful one. When something goes wrong - pods not scaling, nodes not terminating, queue backing up - you need correlated signals from all layers at once.
Expose KEDA's /metrics endpoint to Prometheus - scaler values, replica counts, and error rates are all there
Use CloudWatch Container Insights for correlated node + pod metrics
Alert on SQS ApproximateAgeOfOldestMessage to catch backlogs before they compound
Dashboard pod count (KEDA) and node count (Karpenter) together - a node spike without pods often means a misconfigured NodePool
Monitor SQS NumberOfMessagesMoved on your Dead Letter Queue - a rising DLQ count signals worker failures that scaling alone cannot fix
Conclusion
KEDA and Karpenter together eliminate the manual scaling work that falls on your on-call engineer at the worst possible moment - scaling pods from real event signals, provisioning the right nodes in seconds, and returning to zero when load clears. Getting the details right (authentication, API versions, SQS in-flight behaviour, consolidation policy) is what makes this stack hold up under pressure in production.
If you have any questions or need help implementing this on your platform, you can reach out to our DevOps & Cloud Engineering team here.
Binary Republik is a global technology consulting and development company, specializing in Microsoft technologies and AI. With deep expertise in SharePoint, Office 365, Power Platform, Azure, and artificial intelligence, we help organizations worldwide transform productivity, collaboration, and business outcomes through custom-built solutions. Our expert team delivers end-to-end consulting, AI integration, migration, and application development to meet diverse business needs across Microsoft 365, SharePoint, and emerging AI technologies.