Introduction
An enterprise running seven department-specific SharePoint intranet sites needed AI that could actually operate within their systems - not just answer questions about them. Here is what a single task looked like before we got involved - and after.
| Before - Sales proposal workflow | After - same task, one sentence |
|---|---|
| Open Pipeline Tracker on SharePoint - find the deal | Type: "Draft a proposal for Meridian Corp's cloud migration deal and send it to their CTO." |
| Switch to Client Contacts - find the CTO's email | Agent queries Pipeline Tracker - pulls deal value, stage, scope notes |
| Open Word - hunt for the proposal template on the shared drive | Agent looks up CTO in Client Contacts - name, email, title |
| Manually fill in deal value, scope, and terms | Branded proposal generated from python-docx template and uploaded to Sales Collateral |
| Save, switch back to SharePoint, upload to Sales Collateral library | Email compose panel opens - pre-filled with CTO's address, subject, and body |
| Open Outlook - type the CTO's email, write the body, attach, send | Review, tweak one line, hit Send |
| 40 minutes · 6 applications · 1 task | One sentence · Five systems · Done |
We assessed the organisation's requirements against Copilot Cowork and Claude Cowork. Both are genuinely capable products - but neither could query custom SharePoint list schemas, connect to a SQL employee database, generate documents from branded templates, or switch to a different specialist persona depending on which department site the user was on. They needed something those products are structurally not built to do.
Same panel. Same backend. Same LLM. Completely different specialist per site, loaded at runtime from a JSON manifest. This is what we built. Here is how:
The Architecture in One Sentence
A single SPFx floating panel on every SharePoint page sends messages to a FastAPI backend, which enters a Claude agentic loop with a filtered set of MCP tools - filtered by which site the user is on, what they are allowed to do, and which department's data model applies. Each site loads its own agent persona, slash commands, and skill files - deep knowledge modules that teach the agent how a specific department's data is structured, what business rules apply, and which workflows exist.
System Architecture - SPFx panel → FastAPI backend → Claude agentic loop → MCP server → enterprise data sources
That sentence hides a lot of machinery. Let me unpack it through the folder structure, because the folder structure is the architecture.
The Folder Structure That Makes It Work
Backend/
├── api/
│ ├── main.py ← FastAPI app, startup, CORS
│ └── routers/
│ └── agent_chat.py ← The agentic loop: SSE streaming, tool orchestration
│
├── core/
│ ├── mcp/
│ │ ├── server.py ← In-process MCP server (58 tool routes)
│ │ ├── builder.py ← Tool definitions with input schemas
│ │ └── tools/ ← Tool implementations by connector type
│ ├── auth/ ← OBO token exchange + certificate auth
│ ├── session/ ← Session resolver (who, where, what permissions)
│ ├── registry/ ← Plugin auto-discovery at startup
│ ├── audit/ ← Write-operation audit logger
│ ├── alerts/handlers/ ← Proactive alert checks (5 built-in)
│ └── notifications/ ← Custom notification rules engine
│
├── connectors/ ← 9 connector modules
│ ├── sharepoint/ ← List CRUD, library browse, search, upload
│ ├── msgraph/ ← Users, calendar, mail, Teams channels
│ ├── ems/ ← Direct SQL: employee lookup, org chart, leave
│ ├── email/ ← Draft + send via Graph
│ ├── docgen/ ← python-docx reports, offer letters, proposals
│ ├── azdevops/ ← Work items, sprints
│ ├── analytics/ ← NL reports, anomaly detection
│ ├── knowledge/ ← Federated search, expert finder
│ └── automation/ ← Notification rule CRUD
│
├── plugins/ ← THIS IS THE KEY DIRECTORY
│ ├── SALES/
│ │ ├── .claude-plugin/
│ │ │ └── plugin.json ← Manifest: persona, tools, commands, skills, schemas
│ │ ├── agents/ ← Persona markdown files
│ │ ├── commands/ ← Slash command definitions
│ │ └── skills/ ← Domain knowledge modules
│ ├── FINANCE/ ← Same structure, different specialist
│ ├── PEOPLE/ ← Same structure, different specialist
│ ├── DELIVERY/ ← Same structure, different specialist
│ ├── TECHNOLOGY/ ← Same structure, different specialist
│ ├── OPERATIONS/ ← Same structure, different specialist
│ └── CORE/ ← Same structure, different specialist
│
└── db/migrations/ ← 6 SQL migrations (sessions, audit, alerts)
Every architectural decision is visible in this tree. Let me walk through the three that matter most.
Decision 1: The Plugin Manifest
When a user opens the Sales site and the panel initialises, the backend resolves their session: who are they (from the OBO token), which site are they on (from the site_url passed by the frontend), and what can they do (from the plugin manifest). The manifest is a single JSON file:
{
"name": "sales",
"display_name": "Sales Assistant",
"entry_agent": "agents/sales-assistant.md",
"connectors": ["sharepoint", "msgraph", "email", "docgen", "ems",
"knowledge", "analytics", "automation"],
"commands": ["commands/pipeline-report.md", "commands/add-lead.md", ...],
"skills": ["skills/pipeline-management/SKILL.md",
"skills/client-engagement/SKILL.md", ...],
"sharepoint_lists": {
"lists": {
"Pipeline Tracker": "columns: OpportunityName, DealValue, PipelineStage, ..."
}
}
}
This manifest does five things at session start:
It loads a persona.
The entry_agent points to a markdown file that defines the agent's personality, role, behaviour rules, and domain expertise. The Sales agent is "friendly, professional, and results-driven." The Finance agent is precise and compliance-aware. The People agent is warm and policy-oriented. Same LLM, different specialist.
It filters the tool palette.
The connectors array controls which of the 58 MCP tools appear in Claude's tool list for this session. Sales gets SharePoint, Graph, Email, DocGen, EMS, Knowledge, Analytics, and Automation - but not Azure DevOps. Delivery gets Azure DevOps too. Technology gets everything. The LLM only sees tools it is allowed to use.
It loads skill files.
This is where the deep domain knowledge lives. Each skill is a markdown file that describes how a specific aspect of the department works. The Sales plugin has skills for pipeline management, client engagement, and competitive intelligence. A skill might describe how the Pipeline Tracker list is structured, what each column means, which OData filters produce useful results, what "stalled deal" means in this organisation's context, and what steps the agent should follow for a pipeline review. Skills are loaded into the system prompt based on the site and the user's query - they are the agent's domain training, delivered at runtime through text, not fine-tuning.
It injects list schemas.
The sharepoint_lists block is injected into the system prompt. This is how the agent knows the column is called PipelineStage, not Stage. It knows DealValue is the field for revenue, not Value or Amount. It never guesses. If a column is not in the manifest, the agent cannot reference it.
It registers slash commands.
Each command is a markdown file with trigger phrases, required parameters, the agent to hand off to, and step-by-step instructions. When a user types /pipeline-report, the command's markdown gets injected into the system prompt for that turn.
Plugin-Per-Site Architecture - 7 sites, auto-discovered, manifest-driven, session-filtered
The consequence of this design is that adding a new department - an eighth site, a regional hub, a project workspace - requires creating one folder with one JSON manifest and a few markdown files. No code changes. No backend redeployment. The registry discovers it on next restart.
The key insight: The plugin folder is the architecture. Not the LLM, not the framework, not the cloud infrastructure. The entire system's behaviour - which specialist responds, which tools are available, which data is accessible, which commands exist, which domain knowledge the agent draws on - is determined by which plugin.json file gets loaded and which skill files sit alongside it. Everything else is shared plumbing. This is what makes the system maintainable at scale: changing a department's AI behaviour is editing a JSON file and a few markdown documents, not shipping code.
Decision 2: Dual-Auth and the Invisible Session
Authentication is where most custom AI implementations get it wrong. Either they use the user's token for everything (writes are unauditable and tied to individual permissions) or they use an app-only token for everything (losing the identity context of who asked for the action).
We split it:
Reads use the user's delegated OBO token.
When the agent queries a SharePoint list or fetches calendar events, it does so as the user. If the user cannot see a site, neither can the agent. The existing M365 permission model is preserved - no data leakage, no privilege escalation.
Writes use an app-only certificate token.
- All write operations - list item creation, document upload, email send - execute through a controlled service identity, not the user's token.
- Every write is routed through an audit logger: who requested it, what changed, tool name, payload, status, and duration.
- site_url, OBO token, app-only token, and the user's permission set are never parameters the LLM sees - they are injected from the session object before the agentic loop begins.
In the initial version, SharePoint tools accepted site_url as a tool parameter. During early integration testing, Claude hallucinated a URL - it substituted the Sales site URL when a user on the Finance site asked about budget items. The query returned nothing. The agent confidently reported "no budget items found." It was a silent, plausible failure - the worst kind. We refactored site_url out of every tool interface within the day and moved it to session-level injection. The LLM never sees it, never chooses it, never hallucinates it.
If the LLM does not need a value to reason about the task, do not put it in the tool interface. Every parameter visible to the LLM is a surface for hallucination. Session-level injection eliminates that surface.
Decision 3: The Agentic Loop
When a user sends a message, the backend does not call Claude once and return the answer. It enters a loop:
Agentic Loop - Request Lifecycle with tool execution feedback loop
- Multi-step tool chains. A single message like "find stalled deals, draft follow-ups, and notify the sales lead" triggers five sequential tool calls - SharePoint OData query, EMS employee lookup, three email draft preparations - each feeding the next decision.
- Real-time status streaming. Each tool call emits an SSE event to the frontend ("Fetching list data: Pipeline Tracker...", "Looking up employee...") so the user sees the agent working, not a loading spinner.
- Model is a config variable. Claude Sonnet is the default - fast enough for real-time streaming, capable enough for multi-step orchestration. Swapping models is a single environment variable change; the architecture is not coupled to any provider.
- We tested Azure OpenAI + Semantic Kernel. The loop runs and tools get called. But in head-to-head testing, Claude produced fewer hallucinated tool parameters, followed complex multi-step prompts more faithfully, and handled branching logic - where a tool result determines whether to call the next tool or skip to a different action - more consistently. That is why it is our default.
What the Agent Can Actually Do
58 tools across 9 connectors. Here are the ones that matter most to daily operations:
| Connector | What it does | Available on |
|---|---|---|
| SharePoint | Read list items with OData filters, create and update items, browse document libraries, upload files, search across the site. Primary data layer. | All sites |
| EMS | Direct SQL via pyodbc against the HR database: employee lookup, org chart traversal, leave balances, skills search, project assignments, team allocation, capacity planning, budget tracking. No REST wrapper. | All sites |
| Document generation | Branded offer letters, proposals, and reports via python-docx. Budget workbooks via openpyxl. Generated on the backend, uploaded to SharePoint, download card in chat. | Sales, People, Finance, Operations |
| Two-step workflow: agent drafts, human reviews in inline compose panel, then confirms Send. Never auto-sends. | All sites | |
| Azure DevOps | Query and create work items, get sprint status. | Delivery, Technology only |
| Proactive alerts | Five APScheduler handlers: stalled deals, budget thresholds, expiring contracts, onboarding gaps, morning brief digest. Push to Teams channels and notification bell before anyone opens a browser. | All sites (handler-specific) |
| MS Graph | Users, calendar, Teams channels, mail integration. | All sites |
| Analytics | Natural language reports, anomaly detection. | All sites |
| Knowledge | Federated search, expert finder. | All sites |
Lessons from the Build
Every implementation surfaces lessons that inform the next one.
SharePoint field name complexity runs deeper than expected.
Display names and internal names diverge in non-obvious ways. "Status" might be stored as BudgetStatus. "Details" might be AnnouncementDetails. The initial deployment surfaced field name mismatches that cost debugging cycles. Our fix - a PowerShell schema export script that generates verified field mappings per site - is now a standard first step in our deployment methodology.
Not every data source needs an API layer.
Our first design had a full REST API wrapper around the SQL employee database. We replaced it with direct pyodbc queries wrapped in the MCP tool contract. Simpler, faster, easier to maintain.
The in-process MCP server will need extraction.
Running the tool server in-process was the right call for initial development - no serialisation overhead, easy debugging. But for production scale, tool execution needs to run independently so long-running operations (document generation, cross-site searches) do not block the API layer. The architecture was designed for this extraction - MCP's HTTP transport makes it a configuration change, not a rewrite - which is planned for the next phase.
Retry logic is now day-one infrastructure.
Claude's API occasionally returns transient errors under load. Without exponential backoff (1s, 2s, 4s), these surface as user-facing failures. Retry logic is now part of our standard agentic infrastructure from the start, based on this experience.
Production hardening - monitoring, observability, error recovery, and scale testing - deserves its own post. We will publish that next.
A Note on Portability
This case study uses SharePoint as the enterprise platform and Claude as the LLM, but the pattern is platform-agnostic. The same plugin-per-site architecture has been applied with Confluence, custom intranets, and internal portals. The frontend can be any web surface that supports a JavaScript embed - the SPFx panel is one implementation, not a requirement. The connectors, the plugin manifest pattern, the dual-auth model, and the session-scoped tool filtering are all transferable. If your organisation runs on a different stack, the architecture adapts to it.
Conclusion
The plugin-per-site pattern, dual-auth model, session-scoped tool filtering, and in-process MCP server described here form a reusable enterprise architecture - not a one-off implementation. Adding a new department means creating a folder with a JSON manifest and a few markdown skill files. No code changes, no redeployment. The system's behaviour is entirely determined by configuration, not by the codebase.
This architecture was designed and deployed by Binary Republik. It is adaptable to any organisation's departmental structure, data model, and governance requirements - and transferable to any enterprise platform with programmatic APIs.
If you have any questions you can reach out to our AI Consulting team here.

No comments:
Post a Comment