SharePoint Consulting Blog

May 7, 2026

Building Site-Aware Enterprise AI Agents on Microsoft 365 Using Claude Agent SDK

Introduction

An enterprise running seven department-specific SharePoint intranet sites needed AI that could actually operate within their systems - not just answer questions about them. Here is what a single task looked like before we got involved - and after.

Before - Sales proposal workflow	After - same task, one sentence
Open Pipeline Tracker on SharePoint - find the deal	Type: "Draft a proposal for XYZ Corp's cloud migration deal and send it to their CTO."
Switch to Client Contacts - find the CTO's email	Agent queries Pipeline Tracker - pulls deal value, stage, scope notes
Open Word - hunt for the proposal template on the shared drive	Agent looks up CTO in Client Contacts - name, email, title
Manually fill in deal value, scope, and terms	Branded proposal generated from python-docx template and uploaded to Sales Collateral
Save, switch back to SharePoint, upload to Sales Collateral library	Email compose panel opens - pre-filled with CTO's address, subject, and body
Open Outlook - type the CTO's email, write the body, attach, send	Review, tweak one line, hit Send
40 minutes · 6 applications · 1 task	One sentence · Five systems · Done

We assessed the organisation's requirements against Copilot Cowork and Claude Cowork. Both are genuinely capable products - but neither could query custom SharePoint list schemas, connect to a SQL employee database, generate documents from branded templates, or switch to a different specialist persona depending on which department site the user was on. They needed something those products are structurally not built to do.

Same panel. Same backend. Same LLM. Completely different specialist per site, loaded at runtime from a JSON manifest. This is what we built. Here is how:

The Architecture in One Sentence

A single SPFx floating panel on every SharePoint page sends messages to a FastAPI backend, which enters a Claude agentic loop with a filtered set of MCP tools - filtered by which site the user is on, what they are allowed to do, and which department's data model applies. Each site loads its own agent persona, slash commands, and skill files - deep knowledge modules that teach the agent how a specific department's data is structured, what business rules apply, and which workflows exist.

System Architecture - SPFx panel → FastAPI backend → Claude agentic loop → MCP server → enterprise data sources

That sentence hides a lot of machinery. Let me unpack it through the folder structure, because the folder structure is the architecture.

The Folder Structure That Makes It Work

Backend/
├── api/
│ ├── main.py                    ← FastAPI app, startup, CORS
│ └── routers/
│ └── agent_chat.py          ← The agentic loop: SSE streaming, tool orchestration
│
├── core/
│ ├── mcp/
│ │ ├── server.py              ← In-process MCP server (58 tool routes)
│ │ ├── builder.py             ← Tool definitions with input schemas
│ │ └── tools/                 ← Tool implementations by connector type
│ ├── auth/                      ← OBO token exchange + certificate auth
│ ├── session/                   ← Session resolver (who, where, what permissions)
│ ├── registry/                  ← Plugin auto-discovery at startup
│ ├── audit/                     ← Write-operation audit logger
│ ├── alerts/handlers/           ← Proactive alert checks (5 built-in)
│ └── notifications/             ← Custom notification rules engine
│
├── connectors/                    ← 9 connector modules
│ ├── sharepoint/                ← List CRUD, library browse, search, upload
│ ├── msgraph/                   ← Users, calendar, mail, Teams channels
│ ├── ems/                       ← Direct SQL: employee lookup, org chart, leave
│ ├── email/                     ← Draft + send via Graph
│ ├── docgen/                    ← python-docx reports, offer letters, proposals
│ ├── azdevops/                  ← Work items, sprints
│ ├── analytics/                 ← NL reports, anomaly detection
│ ├── knowledge/                 ← Federated search, expert finder
│ └── automation/                ← Notification rule CRUD
│
├── plugins/                       ← THIS IS THE KEY DIRECTORY
│ ├── SALES/
│ │ ├── .claude-plugin/
│ │ │ └── plugin.json        ← Manifest: persona, tools, commands, skills, schemas
│ │ ├── agents/                ← Persona markdown files
│ │ ├── commands/              ← Slash command definitions
│ │ └── skills/                ← Domain knowledge modules
│ ├── FINANCE/                   ← Same structure, different specialist
│ ├── PEOPLE/                    ← Same structure, different specialist
│ ├── DELIVERY/                  ← Same structure, different specialist
│ ├── TECHNOLOGY/                ← Same structure, different specialist
│ ├── OPERATIONS/                ← Same structure, different specialist
│ └── CORE/                      ← Same structure, different specialist
│
└── db/migrations/                 ← 6 SQL migrations (sessions, audit, alerts)

Every architectural decision is visible in this tree. Let me walk through the three that matter most.

Decision 1: The Plugin Manifest

When a user opens the Sales site and the panel initialises, the backend resolves their session: who are they (from the OBO token), which site are they on (from the site_url passed by the frontend), and what can they do (from the plugin manifest). The manifest is a single JSON file:

{
"name": "sales",
"display_name": "Sales Assistant",
"entry_agent": "agents/sales-assistant.md",
"connectors": ["sharepoint", "msgraph", "email", "docgen", "ems",
"knowledge", "analytics", "automation"],
"commands": ["commands/pipeline-report.md", "commands/add-lead.md", ...],
"skills": ["skills/pipeline-management/SKILL.md",
"skills/client-engagement/SKILL.md", ...],
"sharepoint_lists": {
"lists": {
"Pipeline Tracker": "columns: OpportunityName, DealValue, PipelineStage, ..."
}
}
}

This manifest does five things at session start:

It loads a persona.

The entry_agent points to a markdown file that defines the agent's personality, role, behaviour rules, and domain expertise. The Sales agent is "friendly, professional, and results-driven." The Finance agent is precise and compliance-aware. The People agent is warm and policy-oriented. Same LLM, different specialist.

It filters the tool palette.

The connectors array controls which of the 58 MCP tools appear in Claude's tool list for this session. Sales gets SharePoint, Graph, Email, DocGen, EMS, Knowledge, Analytics, and Automation - but not Azure DevOps. Delivery gets Azure DevOps too. Technology gets everything. The LLM only sees tools it is allowed to use.

It loads skill files.

This is where the deep domain knowledge lives. Each skill is a markdown file that describes how a specific aspect of the department works. The Sales plugin has skills for pipeline management, client engagement, and competitive intelligence. A skill might describe how the Pipeline Tracker list is structured, what each column means, which OData filters produce useful results, what "stalled deal" means in this organisation's context, and what steps the agent should follow for a pipeline review. Skills are loaded into the system prompt based on the site and the user's query - they are the agent's domain training, delivered at runtime through text, not fine-tuning.

It injects list schemas.

The sharepoint_lists block is injected into the system prompt. This is how the agent knows the column is called PipelineStage, not Stage. It knows DealValue is the field for revenue, not Value or Amount. It never guesses. If a column is not in the manifest, the agent cannot reference it.

It registers slash commands.

Each command is a markdown file with trigger phrases, required parameters, the agent to hand off to, and step-by-step instructions. When a user types /pipeline-report, the command's markdown gets injected into the system prompt for that turn.

Plugin-Per-Site Architecture - 7 sites, auto-discovered, manifest-driven, session-filtered

The consequence of this design is that adding a new department - an eighth site, a regional hub, a project workspace - requires creating one folder with one JSON manifest and a few markdown files. No code changes. No backend redeployment. The registry discovers it on next restart.

The key insight: The plugin folder is the architecture. Not the LLM, not the framework, not the cloud infrastructure. The entire system's behaviour - which specialist responds, which tools are available, which data is accessible, which commands exist, which domain knowledge the agent draws on - is determined by which plugin.json file gets loaded and which skill files sit alongside it. Everything else is shared plumbing. This is what makes the system maintainable at scale: changing a department's AI behaviour is editing a JSON file and a few markdown documents, not shipping code.

Decision 2: Dual-Auth and the Invisible Session

Authentication is where most custom AI implementations get it wrong. Either they use the user's token for everything (writes are unauditable and tied to individual permissions) or they use an app-only token for everything (losing the identity context of who asked for the action).

We split it:

Reads use the user's delegated OBO token.

When the agent queries a SharePoint list or fetches calendar events, it does so as the user. If the user cannot see a site, neither can the agent. The existing M365 permission model is preserved - no data leakage, no privilege escalation.

Writes use an app-only certificate token.

All write operations - list item creation, document upload, email send - execute through a controlled service identity, not the user's token.
Every write is routed through an audit logger: who requested it, what changed, tool name, payload, status, and duration.
site_url, OBO token, app-only token, and the user's permission set are never parameters the LLM sees - they are injected from the session object before the agentic loop begins.

In the initial version, SharePoint tools accepted site_url as a tool parameter. During early integration testing, Claude hallucinated a URL - it substituted the Sales site URL when a user on the Finance site asked about budget items. The query returned nothing. The agent confidently reported "no budget items found." It was a silent, plausible failure - the worst kind. We refactored site_url out of every tool interface within the day and moved it to session-level injection. The LLM never sees it, never chooses it, never hallucinates it.

If the LLM does not need a value to reason about the task, do not put it in the tool interface. Every parameter visible to the LLM is a surface for hallucination. Session-level injection eliminates that surface.

Decision 3: The Agentic Loop

When a user sends a message, the backend does not call Claude once and return the answer. It enters a loop:

Agentic Loop - Request Lifecycle with tool execution feedback loop

Multi-step tool chains. A single message like "find stalled deals, draft follow-ups, and notify the sales lead" triggers five sequential tool calls - SharePoint OData query, EMS employee lookup, three email draft preparations - each feeding the next decision.
Real-time status streaming. Each tool call emits an SSE event to the frontend ("Fetching list data: Pipeline Tracker...", "Looking up employee...") so the user sees the agent working, not a loading spinner.
Model is a config variable. Claude Sonnet is the default - fast enough for real-time streaming, capable enough for multi-step orchestration. Swapping models is a single environment variable change; the architecture is not coupled to any provider.
We tested Azure OpenAI + Semantic Kernel. The loop runs and tools get called. But in head-to-head testing, Claude produced fewer hallucinated tool parameters, followed complex multi-step prompts more faithfully, and handled branching logic - where a tool result determines whether to call the next tool or skip to a different action - more consistently. That is why it is our default.

What the Agent Can Actually Do

58 tools across 9 connectors. Here are the ones that matter most to daily operations:

Connector	What it does	Available on
SharePoint	Read list items with OData filters, create and update items, browse document libraries, upload files, search across the site. Primary data layer.	All sites
EMS	Direct SQL via pyodbc against the HR database: employee lookup, org chart traversal, leave balances, skills search, project assignments, team allocation, capacity planning, budget tracking. No REST wrapper.	All sites
Document generation	Branded offer letters, proposals, and reports via python-docx. Budget workbooks via openpyxl. Generated on the backend, uploaded to SharePoint, download card in chat.	Sales, People, Finance, Operations
Email	Two-step workflow: agent drafts, human reviews in inline compose panel, then confirms Send. Never auto-sends.	All sites
Azure DevOps	Query and create work items, get sprint status.	Delivery, Technology only
Proactive alerts	Five APScheduler handlers: stalled deals, budget thresholds, expiring contracts, onboarding gaps, morning brief digest. Push to Teams channels and notification bell before anyone opens a browser.	All sites (handler-specific)
MS Graph	Users, calendar, Teams channels, mail integration.	All sites
Analytics	Natural language reports, anomaly detection.	All sites
Knowledge	Federated search, expert finder.	All sites

Lessons from the Build

Every implementation surfaces lessons that inform the next one.

SharePoint field name complexity runs deeper than expected.

Display names and internal names diverge in non-obvious ways. "Status" might be stored as BudgetStatus. "Details" might be AnnouncementDetails. The initial deployment surfaced field name mismatches that cost debugging cycles. Our fix - a PowerShell schema export script that generates verified field mappings per site - is now a standard first step in our deployment methodology.

Not every data source needs an API layer.

Our first design had a full REST API wrapper around the SQL employee database. We replaced it with direct pyodbc queries wrapped in the MCP tool contract. Simpler, faster, easier to maintain.

The in-process MCP server will need extraction.

Running the tool server in-process was the right call for initial development - no serialisation overhead, easy debugging. But for production scale, tool execution needs to run independently so long-running operations (document generation, cross-site searches) do not block the API layer. The architecture was designed for this extraction - MCP's HTTP transport makes it a configuration change, not a rewrite - which is planned for the next phase.

Retry logic is now day-one infrastructure.

Claude's API occasionally returns transient errors under load. Without exponential backoff (1s, 2s, 4s), these surface as user-facing failures. Retry logic is now part of our standard agentic infrastructure from the start, based on this experience.

Production hardening - monitoring, observability, error recovery, and scale testing - deserves its own post. We will publish that next.

A Note on Portability

This case study uses SharePoint as the enterprise platform and Claude as the LLM, but the pattern is platform-agnostic. The same plugin-per-site architecture has been applied with Confluence, custom intranets, and internal portals. The frontend can be any web surface that supports a JavaScript embed - the SPFx panel is one implementation, not a requirement. The connectors, the plugin manifest pattern, the dual-auth model, and the session-scoped tool filtering are all transferable. If your organisation runs on a different stack, the architecture adapts to it.

Conclusion

The plugin-per-site pattern, dual-auth model, session-scoped tool filtering, and in-process MCP server described here form a reusable enterprise architecture - not a one-off implementation. Adding a new department means creating a folder with a JSON manifest and a few markdown skill files. No code changes, no redeployment. The system's behaviour is entirely determined by configuration, not by the codebase.

This architecture was designed and deployed by Binary Republik. It is adaptable to any organisation's departmental structure, data model, and governance requirements - and transferable to any enterprise platform with programmatic APIs.

If you have any questions you can reach out to our AI Consulting team here.

AWS Compute Services Explained: EC2 vs ECS vs EKS

Introduction

Choosing the right AWS compute service isn't just a technical decision - it directly affects your team's velocity, your operational costs, and how quickly your business can respond to change. Pick the wrong one and you'll either pay for complexity you don't need, or find yourself locked into a setup that can't scale with you.

EC2, ECS, and EKS are not competing services. They solve fundamentally different problems at different levels of abstraction. EC2 gives you a raw virtual machine with full infrastructure control. ECS is a managed container platform built natively on AWS - no Kubernetes required. EKS brings the full power of Kubernetes to AWS for teams that need portability and advanced orchestration at scale.

Here is what we cover:

What EC2, ECS, and EKS actually are - in plain terms
Key differences across abstraction, scaling, cost, and operational overhead
What each service costs, with real numbers from AWS's official pricing pages
Real-world use cases and business implications for each
A decision guide to help you pick the right one

Understanding the Basics

Amazon EC2 - Virtual Machines

EC2 is AWS's virtual machine service. You choose the operating system, CPU, memory, and storage - AWS provisions the server. Think of it as renting a physical server in the cloud. Your team owns everything from that point on: patching, scaling, monitoring, and security. Maximum control, maximum responsibility.

Business implication: EC2 requires engineering time to manage and maintain. That time has a cost. It is best suited for workloads where your team needs deep infrastructure control, or for migrating existing applications that weren't built for containers.

Amazon ECS - Managed Containers

ECS is AWS's native container orchestration service. You define your Docker image, CPU and memory requirements, and scaling rules. AWS handles the rest - scheduling, cluster management, health checks, and integrations with AWS services like Application Load Balancer, IAM, and CloudWatch. No Kubernetes knowledge required.

Business implication: ECS reduces the operational surface area significantly. Smaller teams can run production container workloads without dedicated DevOps headcount. It is AWS's recommended starting point for teams new to containers.

Amazon EKS - Managed Kubernetes

EKS runs Kubernetes on AWS. AWS manages the control plane - the brain of the cluster - while you manage the worker nodes, or offload them to AWS Fargate for a fully serverless setup. Kubernetes is the industry-standard container orchestration platform, and EKS makes it available as a managed service.

Business implication: EKS is the most powerful and flexible option, but it brings the highest operational complexity and cost. It pays off at scale, for teams running complex microservices architectures, or for organisations that want the option to run workloads across multiple cloud providers.

Key Differences at a Glance

Feature	EC2	ECS	EKS
Abstraction Level	Low (Infrastructure)	Medium (Platform)	High (Ecosystem)
Primary Unit	VM / Server	Container Task	Pod
Setup Complexity	High	Low	Very High
Scaling Speed	Slow (VM boot)	Fast	Fast
OS / Host Access	Full	None	Limited
Kubernetes Support	No	No	Yes
Multi-cloud Portability	Low	Low	High
Operational Overhead	High	Medium	Very High
Learning Curve	Low–Medium	Low	High
Relative Cost (entry)	Low–Medium	Medium	High
Best for	Legacy apps, full control	AWS-native container apps	Complex microservices, multi-cloud

Cost Breakdown - With Real Numbers

Cost is one of the most common decision factors, and it is also one of the most misunderstood. The sticker price of compute is only part of the picture. Operational overhead - the engineering time spent managing infrastructure - is a real cost that doesn't show up on your AWS bill but does show up on your payroll.

EC2

With EC2, you pay for instance uptime. AWS offers four main pricing models: On-Demand (pay by the second, no commitment), Reserved Instances (commit to 1 or 3 years for up to 72% off On-Demand rates, per official AWS pricing), Savings Plans (flexible commitment-based discounts, AWS's currently recommended approach over Reserved Instances), and Spot Instances (spare capacity at up to 90% off, but AWS can reclaim with two minutes notice). Most production workloads that run around the clock should be on Reserved Instances or a Savings Plan - running purely On-Demand for steady-state workloads leaves significant money on the table.

Business implication: EC2 can be the most cost-efficient option for stable, predictable workloads - but only if your team actively manages instance sizing and pricing commitments. Teams that over-provision and forget tend to pay more than they should.

ECS

ECS has no additional management fee. You pay for the underlying compute - either EC2 instances you manage yourself, or AWS Fargate (fully serverless, where AWS manages the infrastructure). With Fargate on ECS, billing is per second based on the vCPU and memory your containers actually use. As a reference, in the US East (N. Virginia) region, Fargate charges approximately $0.04048 per vCPU-hour and $0.004445 per GB-hour for memory, per AWS's official Fargate pricing page. A container running 2 vCPUs and 4 GB of RAM costs roughly $0.099 per hour, or around $72 per month running continuously.

Business implication: Fargate on ECS is often the most cost-effective choice for bursty, unpredictable, or variable traffic patterns - you pay only for what you use, and there is no idle compute cost. For high-volume, steady-state workloads, EC2-backed ECS can be cheaper.

EKS

EKS has a fixed control plane fee of $0.10 per cluster per hour - approximately $73 per month per cluster - regardless of cluster size or workload, per AWS's official EKS pricing page. This is just the management fee; worker node compute (EC2 or Fargate), storage, load balancers, and data transfer are all billed separately. Teams running dev, staging, and production environments on separate clusters pay this fee three times minimum. Additionally, if a Kubernetes version ages past its 14-month standard support window without being upgraded, the fee jumps to $0.60 per hour - a 6x increase that catches many teams by surprise.

Business implication: EKS is expensive at small scale and cost-efficient at large scale, where advanced resource scheduling and bin-packing justify the overhead. For small to medium workloads, ECS on Fargate will almost always be the cheaper option. The $73/month control plane fee is a fixed entry cost per cluster - for multi-cluster strategies, it adds up quickly.

Real-World Use Cases

Use EC2 when:

You are migrating a legacy or monolithic application that was not built for containers
Your application requires custom OS-level configuration, kernel parameters, or specific hardware access
Your team needs full visibility and control over the underlying server environment
You have predictable, steady-state workloads and want to maximise savings through Reserved Instances or Savings Plans

Business implication: EC2 is the right lift-and-shift vehicle. It minimises application changes during migration and gives your team time to modernise at their own pace. The trade-off is that it demands the most ongoing engineering attention.

Use ECS when:

You want to run containers on AWS without learning Kubernetes
You need fast, reliable deployments with minimal DevOps overhead
Your workloads are bursty or variable and Fargate's pay-per-use model suits your traffic patterns
Your team is AWS-first and wants tight, native integration with IAM, CloudWatch, and ALB

Business implication: ECS on Fargate is the fastest path to production for container workloads. It requires the least infrastructure expertise, no cluster maintenance, and scales automatically. It is a strong default choice for startups, product teams, and organisations without dedicated platform engineering.

Use EKS when:

Your team already uses Kubernetes and has the expertise to operate it
You need multi-cloud portability - the ability to run the same workloads on AWS, GCP, or Azure
You are running complex microservices that benefit from Kubernetes-native features like Horizontal Pod Autoscaler, custom scheduling, or service meshes
You are operating at a scale where advanced resource optimisation (bin-packing, Spot node groups, Karpenter) delivers meaningful cost savings

Business implication: EKS is a long-term platform investment. It requires Kubernetes expertise to operate well - either in-house or through a managed services partner. The payoff is flexibility, portability, and the ability to run sophisticated workloads that outgrow what ECS can offer.

Quick Decision Guide

Two questions cut through most of the noise:

Do you need Kubernetes?
- Yes - your team uses it already, or you need multi-cloud portability → EKS
- No → Move to question 2
Do you want containers?
- Yes → ECS
- No, or you have a legacy app to migrate → EC2

Situation	Recommended Service	Why
Small team, new to containers	ECS (Fargate)	Lowest ops overhead, no cluster to manage
Legacy app migration	EC2	Minimal app changes, full control
Startup scaling fast on AWS	ECS (Fargate)	Fast deployments, pay-as-you-go, AWS-native
Enterprise microservices	EKS	Advanced orchestration, multi-team platform
Existing Kubernetes users	EKS	Familiar tooling, avoid re-platforming cost
Multi-cloud strategy	EKS	Kubernetes runs anywhere - avoids AWS lock-in
Predictable, high-volume compute	EC2 (Reserved / Savings Plan)	Up to 72% savings over On-Demand with commitment

Conclusion

EC2, ECS, and EKS are tools for different jobs at different stages of growth. The right choice depends on your team's skills, your application's architecture, your traffic patterns, and your budget - both the AWS bill and the engineering time to manage it.

Want simplicity and speed? → ECS on Fargate
Have a legacy app to migrate? → EC2
Need Kubernetes power and portability? → EKS

ECS on Fargate handles a huge range of production workloads reliably and cost-effectively - and it is far easier to migrate to EKS later than to unwind unnecessary Kubernetes complexity from day one.

If you have any questions, you can reach out to our AWS Cloud Consulting team here.

How to Update and Retrieve Secrets from Azure Key Vault Using the REST API

Introduction

Azure Key Vault is a cloud service that provides a secure and centralized way to store and manage secrets, keys, and certificates used by applications and services. It helps teams avoid hardcoding sensitive values like API keys, connection strings, or passwords directly into code or configuration files.

In this guide, you will learn how to update and retrieve secrets from Azure Key Vault using the REST API - a useful approach for automation scripts, CI/CD pipelines, and external integrations where using an SDK is not preferred or available.

Prerequisites

An Azure Key Vault - if you don't already have one, create it from the Azure portal.
At least one secret inside the Key Vault - click Generate/Import inside the vault to create your first secret.

Enable Azure RBAC on the Key Vault (Required)

Azure Key Vault supports two permission models: Vault Access Policy (legacy) and Azure RBAC. To use IAM role assignments (like Key Vault Secrets Officer), your Key Vault must have Azure RBAC enabled. Without this, role assignments won't grant access to secrets.
For a new Key Vault: During creation, go to the Access configuration tab and under Permission model, select Azure role-based access control (RBAC).

For an existing Key Vault:

Open your Key Vault in the Azure Portal.
Go to Settings → Access configuration.
Under the Permission model, select Azure role-based access control.
Click Save.

Important: If you switch an existing Key Vault from Vault Access Policy to Azure RBAC, all previously configured access policies will stop working. Make sure you reassign equivalent Azure roles before or immediately after switching.

Create an Azure AD App Registration (Required)

To access Key Vault through the REST API, you must authenticate with an Azure AD application.

Assign API Permissions

Go to: API Permissions → Add Permission → Azure Key Vault → Delegated Permissions.
Select: user_impersonation
Then click Grant Admin consent.

Create a Client Secret

In the App Registration:
Go to Certificates & Secrets
Click New client secret
Copy the generated secret value (you will need it in API calls)

Copy the Client ID and Tenant ID

From the Overview page of your App Registration, copy:
Client ID (Application ID)
Tenant ID (Directory ID)

Assign IAM Role on the Key Vault

To allow the App Registration to get or update secrets, assign it one of the following roles:
Key Vault Secrets Officer OR Key Vault Administrator
Path: Key Vault → Access control (IAM) → Add Role Assignment
Select the role and assign it to your App Registration.

Generate an Access Token

Before calling the Key Vault REST API, you must generate an OAuth 2.0 access token.
Method: POST
URL: https://login.microsoftonline.com/{TenantId}/oauth2/v2.0/token
Headers: Content-Type: application/x-www-form-urlencoded
Body: client_id={ClientId}&scope=https://vault.azure.net/.default&client_secret={ClientSecret}&grant_type=client_credentials
This returns an access_token used in all Key Vault requests.

Get Secret Value from Azure Key Vault

Use this API to retrieve a secret.
Method: GET
URL: https://{Key_Vault_Name}.vault.azure.net/secrets/{Secret_Name}?api-version=2026-02-01
Headers: Authorization: Bearer {Access_Token}
Content-Type: application/json

Set or Update a Secret in Azure Key Vault

Use the PUT Request to create or update a secret.
Method: PUT
URL: https://{Key_Vault_Name}.vault.azure.net/secrets/{Secret_Name}?api-version=2026-02-01
Headers: Authorization: Bearer {Access_Token}
Content-Type: application/json
Body:
- { "value": "{Value}", "tags": { "source": "Postman" }, "contentType": "text/plain" }

Conclusion

With these steps, you can easily authenticate through Azure AD, retrieve secrets, and update values in Azure Key Vault using REST API calls. This approach is beneficial for automation, CI/CD pipelines, and external integrations where SDKs are not preferred.

If you have any questions, you can reach out to our Azure Cloud Consulting team here.

Event-Driven Autoscaling on EKS with KEDA and Karpenter

Introduction

It's 2 a.m. Your e-commerce platform just hit the front page of a major news site. Your SQS order queue skyrockets from 200 to 200,000 messages in minutes. Pods are overwhelmed. Customers see errors. Your on-call engineer is manually scaling deployments.

This is exactly the failure mode that event-driven scaling prevents. By combining KEDA (Kubernetes Event-Driven Autoscaler) and Karpenter, you can build a platform that reacts to demand automatically - scaling pods and nodes in seconds, then returning to zero when load disappears - all without human intervention.

Why Kubernetes HPA Falls Short for Event-Driven Workloads

Kubernetes' built-in Horizontal Pod Autoscaler (HPA) is often configured with CPU and memory metrics. While HPA does technically support custom and external metrics through the Kubernetes metrics API, wiring it up to event sources like SQS queue depth requires additional metrics adapters and careful configuration. Even then, a deeper problem remains: HPA reacts to metrics after the fact. For event-driven workloads, this creates a dangerous lag:

An SQS queue floods with messages; pods start struggling
CPU climbs - HPA detects the spike after a 15s scrape and sync cycle
New pods are scheduled, but nodes are full - they sit Pending
Cluster Autoscaler requests EC2 nodes - taking 3–5 minutes to arrive
By then, the queue has grown by tens of thousands of messages

KEDA: Scale Pods on Real Events

KEDA is a CNCF Graduated project that extends Kubernetes to scale workloads based on external event sources - SQS, Kafka, Prometheus, DynamoDB Streams, and 70+ built-in scalers. It installs as a lightweight operator and works alongside your existing setup, connecting workloads directly to event sources without requiring custom metrics adapters.

KEDA introduces two core resources:

ScaledObject - scales long-running Deployments/StatefulSets (APIs, background workers)
ScaledJob - spawns individual Kubernetes Jobs per event (ETL, ML inference, video transcoding)

Installing KEDA via Helm

helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda \
  --namespace keda --create-namespace

Configuring AWS Authentication via TriggerAuthentication

The recommended approach for AWS authentication is a TriggerAuthentication resource using EKS Pod Identity or IRSA. The older identityOwner field on the scaler itself was deprecated in KEDA v2.13 and will be removed in v3 - avoid teaching or using it in new deployments.

First, create a TriggerAuthentication that references your KEDA operator's IAM role via pod identity:

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: keda-aws-auth
  namespace: order-processing
spec:
  podIdentity:
    provider: aws               # Uses EKS Pod Identity (recommended)
    # provider: aws-eks         # Use this if still on IRSA

ScaledObject Example: SQS Queue Depth

This scales an order-processing Deployment to maintain 10 messages per pod. With 500 messages, KEDA targets 50 pods - capped at maxReplicaCount.

Production note on in-flight messages: By default, KEDA's SQS scaler counts both ApproximateNumberOfMessages (queued) and ApproximateNumberOfMessagesNotVisible (in-flight / being processed). This means pods processing messages are included in the scaling calculation, which is usually the right behaviour. However, if your workers have long processing times or you see unexpected scale-down events mid-processing, tune scaleOnInFlight, your SQS visibility timeout, and your worker shutdown handling carefully - and ensure a Dead Letter Queue is configured to catch messages that fail repeatedly.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: order-processor-scaledobject
spec:
  scaleTargetRef:
    name: order-processor
  pollingInterval: 15       # Check queue every 15s
  cooldownPeriod: 60        # Wait 60s before scaling down
  minReplicaCount: 0        # Scale to zero when idle
  maxReplicaCount: 100
  triggers:
  - type: aws-sqs-queue
    authenticationRef:
      name: keda-aws-auth   # References TriggerAuthentication above
    metadata:
      queueURL: https://sqs.us-east-1.amazonaws.com/123456/orders
      queueLength: '10'     # Target messages-per-pod ratio
      awsRegion: us-east-1
      scaleOnInFlight: 'true'   # Default: true. Set false to exclude in-flight messages

Karpenter: Right-Sized Nodes, Right Now

When KEDA scales your pods, those pods need nodes to land on. Karpenter watches for Pending pods, then automatically provisions the optimal EC2 instance type to satisfy them - typically in under 60 seconds. It also continuously bin-packs workloads and terminates underutilized nodes.

Karpenter vs. Cluster Autoscaler

Feature	Cluster Autoscaler	Karpenter
Provisioning Speed	3–5+ minutes	Typically 30–60 seconds
Instance Selection	Pre-configured ASG groups	Dynamic - picks optimal type per workload
Spot Support	Manual node group setup	Native, single NodePool
Node Consolidation	Limited	Automatic bin-packing

NodePool Configuration

The NodePool resource defines what Karpenter is allowed to provision. The example below targets the stable karpenter.sh/v1 API (available from Karpenter v1.0+) and configures a mixed Spot/On-Demand pool for batch workloads. Note that in v1, the consolidation policy is named WhenEmptyOrUnderutilized (renamed from WhenUnderutilized in v1beta1), and consolidateAfter is now supported alongside it and is required.

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: batch-workers
spec:
  template:
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: batch-workers
      requirements:
      - key: karpenter.sh/capacity-type
        operator: In
        values: ['spot', 'on-demand']    # Spot-first, On-Demand fallback
      - key: node.kubernetes.io/instance-category
        operator: In
        values: ['c', 'm', 'r']          # General, compute, memory families
      - key: kubernetes.io/arch
        operator: In
        values: ['amd64', 'arm64']       # Support Graviton for savings
  limits:
    cpu: 1000                            # Safety cap on total cluster cost
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized   # v1 name (was WhenUnderutilized in v1beta1)
    consolidateAfter: 30s                            # Required in v1; set to 0s for immediate

End-to-End Architecture Flow

When an SQS burst hits, the full scale-up sequence - from event arrival to active pod processing - completes in roughly one to two minutes in a well-tuned cluster. Actual time depends on image pull speed, node bootstrap, daemonset startup, and workload readiness. Here is the sequence:

01	Amazon SQS queue depth spikes (e.g., 200,000 messages)
02	KEDA polls queue every 15s, calculates required pod count, updates Deployment replica target
03	New pods are created - many land in Pending state (no capacity yet)
04	Karpenter detects Pending pods, selects optimal EC2 instance types, launches Spot nodes - typically in under 60s
05	Nodes join the cluster; pods are scheduled and begin processing messages
06	Queue drains → KEDA scales pods to 0 → Karpenter terminates idle nodes → worker compute cost drops to zero

Production Best Practices

KEDA

Always set maxReplicaCount to guard against runaway scaling from a misconfigured scaler
Use cooldownPeriod: 60–120s to prevent scale-down thrashing near zero
Authenticate via TriggerAuthentication with podIdentity.provider: aws - the identityOwner field on the scaler is deprecated since v2.13 and will be removed in KEDA v3
Set scaleDown.stabilizationWindowSeconds to smooth out spiky workloads
For SQS workers, configure visibility timeout, scaleOnInFlight, and graceful shutdown carefully - and always attach a Dead Letter Queue to catch failed messages
Test scale-to-zero in staging - some apps have cold-start latency that affects first-message SLA

Karpenter

Use the stable karpenter.sh/v1 API - v1beta1 is supported but planned for deprecation
Use consolidationPolicy: WhenEmptyOrUnderutilized (the v1 name; WhenUnderutilized from v1beta1 is renamed)
Specify multiple instance families (c, m, r) so Karpenter can find available Spot capacity
Set consolidateAfter explicitly - it is required in v1 when using WhenEmptyOrUnderutilized; use 0s for the same behaviour as v1beta1
Include arm64 (Graviton) in your NodePool - AWS Graviton instances cost up to 20% less per hour than comparable x86 instances, with equal or better performance for most cloud-native workloads
Set cpu and memory limits on the NodePool as a hard cost cap
Tag all EC2NodeClass nodes with environment, team, and cost-center for AWS Cost Explorer analysis

Observing the Stack in Production

With two autoscalers operating in tandem, visibility across KEDA, Karpenter, SQS, and EC2 simultaneously is what separates a smooth on-call experience from a painful one. When something goes wrong - pods not scaling, nodes not terminating, queue backing up - you need correlated signals from all layers at once.

Expose KEDA's /metrics endpoint to Prometheus - scaler values, replica counts, and error rates are all there
Use CloudWatch Container Insights for correlated node + pod metrics
Alert on SQS ApproximateAgeOfOldestMessage to catch backlogs before they compound
Dashboard pod count (KEDA) and node count (Karpenter) together - a node spike without pods often means a misconfigured NodePool
Monitor SQS NumberOfMessagesMoved on your Dead Letter Queue - a rising DLQ count signals worker failures that scaling alone cannot fix

Conclusion

KEDA and Karpenter together eliminate the manual scaling work that falls on your on-call engineer at the worst possible moment - scaling pods from real event signals, provisioning the right nodes in seconds, and returning to zero when load clears. Getting the details right (authentication, API versions, SQS in-flight behaviour, consolidation policy) is what makes this stack hold up under pressure in production.

If you have any questions or need help implementing this on your platform, you can reach out to our DevOps & Cloud Engineering team here.

Blog

May 7, 2026

Building Site-Aware Enterprise AI Agents on Microsoft 365 Using Claude Agent SDK

Introduction

The Architecture in One Sentence

The Folder Structure That Makes It Work

Decision 1: The Plugin Manifest

It loads a persona.

It filters the tool palette.

It loads skill files.

It injects list schemas.

It registers slash commands.

Decision 2: Dual-Auth and the Invisible Session

Reads use the user's delegated OBO token.

Writes use an app-only certificate token.

Decision 3: The Agentic Loop

What the Agent Can Actually Do

Lessons from the Build

SharePoint field name complexity runs deeper than expected.

Not every data source needs an API layer.

The in-process MCP server will need extraction.

Retry logic is now day-one infrastructure.

A Note on Portability

Conclusion

AWS Compute Services Explained: EC2 vs ECS vs EKS

Introduction

Understanding the Basics

Amazon EC2 - Virtual Machines

Amazon ECS - Managed Containers

Amazon EKS - Managed Kubernetes

Key Differences at a Glance

Cost Breakdown - With Real Numbers

EC2

ECS

EKS

Real-World Use Cases

Use EC2 when:

Use ECS when:

Use EKS when:

Quick Decision Guide

Conclusion

How to Update and Retrieve Secrets from Azure Key Vault Using the REST API

Introduction

Prerequisites

Enable Azure RBAC on the Key Vault (Required)

Create an Azure AD App Registration (Required)

Assign API Permissions

Create a Client Secret

Copy the Client ID and Tenant ID

Assign IAM Role on the Key Vault

Generate an Access Token

Get Secret Value from Azure Key Vault

Set or Update a Secret in Azure Key Vault

Conclusion

Event-Driven Autoscaling on EKS with KEDA and Karpenter

Introduction

Why Kubernetes HPA Falls Short for Event-Driven Workloads

KEDA: Scale Pods on Real Events

KEDA introduces two core resources:

Installing KEDA via Helm

Configuring AWS Authentication via TriggerAuthentication

ScaledObject Example: SQS Queue Depth

Karpenter: Right-Sized Nodes, Right Now

Karpenter vs. Cluster Autoscaler

NodePool Configuration

End-to-End Architecture Flow

Production Best Practices

KEDA

Karpenter

Observing the Stack in Production

Conclusion

About

Blog Archive