December 18, 2025

Security Trimming in Azure AI Search for Safe and Compliant RAG Pipelines

In modern enterprises, access to the right information drives productivity — but controlling who can see what is just as important. From HR policies and payroll guidelines to financial reports and legal contracts, sensitive documents must only be available to authorized users. This is where security trimming in Azure AI Search plays a crucial role. It ensures that users can only access the data they are permitted to see, even when using Retrieval-Augmented Generation (RAG) AI pipelines, which pull together enterprise data to answer natural language questions.


What is Security Trimming?

Security trimming in Azure AI Search is the process of filtering search results at query time based on user identity, group membership, or other security principals. Instead of enforcing direct authentication or full access-control lists (ACLs) on the search service, Azure AI Search utilizes a filterable field in the search index, such as group_ids, to simulate document-level authorization by dynamically filtering search results.

For example, when a user queries the search index, a filter expression matches their group IDs to the document’s group_ids field, so only authorized documents appear in the results.

Example Filter Query:

{
  "filter": "group_ids/any(g:search.in(g, 'group_id1, group_id2'))"
}

Here, group_ids is a field in your Azure AI Search index that stores which groups a document belongs to.


Why Security Trimming Matters for RAG

Retrieval-Augmented Generation (RAG) pipelines are architectures combining document retrieval with generative AI models. These pipelines synthesize answers based strictly on enterprise data sources. Without security trimming, RAG pipelines risk exposing confidential or restricted content to unauthorized users, leading to compliance violations and privacy risks.

Without Security Trimming:

  • Sensitive documents might be exposed to unauthorized users.
  • Compliance violations can occur.
  • AI-generated answers may leak confidential information.

Key Benefits of Security Trimming:

  • Confidentiality: Ensures sensitive documents are only accessible to authorized users.
  • Compliance: Adheres to internal policies and regulatory requirements like GDPR.
  • Context-Aware Generation: Answers are produced only from documents the user can access, preventing accidental leaks.

Use Case

Consider an enterprise scenario with two distinct user groups: HR and Finance.

  • HR users should access documents like leave policies, working guidelines, and salary rules, but should never see finance records.
  • Finance users require access to budgets, audits, and financial statements, but are barred from HR files.

Step 1: Defining the Search Index With a Security Field

Create an index schema including a filterable security field (group_ids) that stores group or user IDs as a collection of strings. The field should be filterable but not retrievable.

POST https://[search-service].search.windows.net/indexes/securedfiles?api-version=2025-09-01
Content-Type: application/json
api-key: [ADMIN_API_KEY]

{
  "name": "securedfiles",
  "fields": [
    { "name": "file_id", "type": "Edm.String", "key": true, "searchable": false },
    { "name": "file_name", "type": "Edm.String", "searchable": true },
    { "name": "file_description", "type": "Edm.String", "searchable": true },
    { "name": "group_ids", "type": "Collection(Edm.String)", "filterable": true, "retrievable": false }
  ]
}

Key Points:

  • filterable: true → allows filtering by group IDs.
  • retrievable: false → prevents exposing group IDs in search responses.

With the index schema in place, your foundation for secure, scalable search is ready—each document will now respect access policies from the start.


Step 2: Upload Documents With Group IDs

Push documents to the index, including the groups authorized to access each document.

POST https://[search-service].search.windows.net/indexes/securedfiles/docs/index?api-version=2025-09-01
Content-Type: application/json
api-key: [ADMIN_API_KEY]

{
  "value": [
    {
      "@search.action": "upload",
      "file_id": "1",
      "file_name": "secured_file_a",
      "file_description": "File access restricted to Human Resources",
      "group_ids": ["group_id1"]
    },
    {
      "@search.action": "upload",
      "file_id": "2",
      "file_name": "secured_file_b",
      "file_description": "File access restricted to HR and Recruiting",
      "group_ids": ["group_id1", "group_id2"]
    },
    {
      "@search.action": "upload",
      "file_id": "3",
      "file_name": "secured_file_c",
      "file_description": "File access restricted to Operations and Logistics",
      "group_ids": ["group_id5", "group_id6"]
    }
  ]
}

If document groups need updating, use the merge or mergeOrUpload action:

POST https://[search-service].search.windows.net/indexes/securedfiles/docs/index?api-version=2025-09-01
Content-Type: application/json
api-key: [ADMIN_API_KEY]

{
  "value": [
    {
      "@search.action": "mergeOrUpload",
      "file_id": "3",
      "group_ids": ["group_id7", "group_id8", "group_id9"]
    }
  ]
}

By assigning group IDs at upload, you ensure that every document is automatically filtered for the right audience—security is built into your search pipeline.


Step 3: Perform Filterable Search Query

When a user searches, issue a search query with a filter that restricts results to documents containing the user’s authorized groups.

POST https://[search-service].search.windows.net/indexes/securedfiles/docs/search?api-version=2025-09-01
Content-Type: application/json
api-key: [QUERY_API_KEY]

{
  "search": "*",
  "filter": "group_ids/any(g:search.in(g, 'group_id1, group_id2'))"
}
  • This query returns only documents where group_ids contains either "group_id1" or "group_id2", matching the user’s groups.

Sample response:

[
  {
    "@search.score": 1.0,
    "file_id": "1",
    "file_name": "secured_file_a"
  },
  {
    "@search.score": 1.0,
    "file_id": "2",
    "file_name": "secured_file_b"
  }
]

Executing a filtered search now guarantees that users see only what they’re authorized to access—empowering secure, context-aware AI responses.


How Security Trimming Works Under the Hood

Azure AI Search uses OData filter expressions to simulate document-level authorization. It filters results purely based on string values stored in the security field (group_ids) without direct authentication or ACL enforcement. This approach provides simple, performant security filtering that scales to large enterprises and integrates seamlessly into RAG AI pipelines.


Conclusion

Security trimming in Azure AI Search is essential for building enterprise-grade, compliant knowledge retrieval systems. Implementing group-based access filtering at the search layer empowers organizations to deliver personalized, secure AI experiences while safeguarding sensitive content and meeting regulatory requirements.

For AI-powered knowledge assistants leveraging RAG, security trimming should be the first priority—ensuring users receive answers strictly from content they are authorized to access.

By implementing security trimming in Azure AI Search, your enterprise ensures that AI-driven insights are both powerful and secure - delivering the right information to the right people, every time.

Extending C# MCP Server with GitHub Copilot and Custom Tools

Introduction

AI assistants are becoming more capable, but their real power emerges when they can tap into their own systems, logic, and data. The Model Context Protocol (MCP) makes this possible by providing a standardized way for tools and services to interact directly with assistants like GitHub Copilot Chat. By exposing your backend capabilities through an MCP server, you can extend Copilot far beyond code suggestions and turn it into a practical interface for your applications.

Key Topics Covered:

  • A breakdown of how the Model Context Protocol works and the components that make up its architecture.
  • Steps to create an MCP server in C# and implement your own custom tools.
  • How to link your .NET-based MCP server with GitHub Copilot Chat in VS Code so they can communicate seamlessly.

What is MCP?

MCP defines a standard protocol for AI clients to connect to external servers.

  • MCP Server → your app or API that provides tools.
  • MCP Client → AI assistant (like GitHub Copilot Chat) that calls those tools.

Think of it like plugins for Copilot, but built with simple attributes and a lightweight protocol.


Project Setup:

  1. Create a new .NET Core application to serve as the base for your MCP server.
  2. Add the required dependencies, including:
  • ModelContextProtocol.AspNetCore
  • Microsoft.Azure.Functions.Worker.Extensions.Mcp
  • System.Data.SqlClient (for database communication)

Defining a tool:

  • A tool is a simple class decorated with McpServerToolType. Each method marked McpServerTool is automatically exposed to the MCP client.
  • Define tools with clear, detailed descriptions so the LLM can interpret them effectively and deliver more accurate responses.
  • Below is an example of the EmployeeTool.cs that has the tool defined:
[McpServerTool, Description("Get Employee details")]
public string GetEmployeeDetails(
    [McpToolTrigger("employee_tool", "MCP Tool that fetches employee records based on hiring dates.")]
      ToolInvocationContext trigger,
      [McpToolProperty("startDate", "string",
          "Start date of the provided date range."
      )]
      string startDate,
      [McpToolProperty("endDate", "string",
          "End date of the provided date range."
      )]
      string endDate
)
{
    // Write business logic to retrieve data
    return $"Fetching employees hired between {startDate} and {endDate}";
}

  • Register your tool in the Program.cs file as shown below.
var builder = WebApplication.CreateBuilder(args);

builder.Services
    .AddMcpServer()
    .WithHttpTransport()
    .WithToolsFromAssembly();

builder.Services.AddSingleton<EmployeeTool>();

var app = builder.Build();

app.MapMcp();

app.Run();
  • Once you’ve defined your tools, run the project:

Connecting with GitHub Copilot Chat:

Now that the tools are defined, let’s connect them to Copilot Chat.

  • A GitHub account
  • The GitHub Copilot and GitHub Copilot Chat extensions are installed in VS Code

Next, we’ll add the server using the steps below:

  1. Open the Command Palette.
  2. Search for “MCP: Add Server” and select it.
  3. Choose HTTP as the transport mode.
  4. Enter the server URL (for example: http://localhost:5000).
  5. Provide a name for your server and choose whether to save it as Global (user) or just for the current workspace.
  6. When asked, confirm that you trust this MCP server.
  7. Your MCP server is now registered and ready to be used through Copilot Chat.

Verify the Server:

  • Access the Command Palette and select “MCP: List Servers” to verify the server’s presence in the list.
  • Alternatively, navigate to the Extensions view and examine the section labeled MCP Servers => Installed.

Using MCP Tools Inside Copilot Chat:

Once the MCP server is added, you can start using the tools directly inside Copilot Chat:

  1. Open the Copilot Chat interface in VS Code.
  2. Switch to Agent mode from the drop-down beneath the chat box.
  3. Click the Tools icon to explore available MCP tools.
  4. Provide a prompt like: “Provide me with the employees hired in the last month.”
  5. To explicitly invoke a tool, type # and select it by name.
  6. When Copilot suggests a tool invocation, review it and click Continue to execute.

That’s it - Copilot will now call your MCP tools and return live data straight into chat.


Conclusion

In this guide, we explored how to build a custom MCP server using C# .NET, define powerful tools, and integrate them with GitHub Copilot Chat to extend its capabilities. With MCP, you can enable Copilot to access real-time data, execute business logic, and provide accurate, context-aware responses.

For more details and official documentation, check out the C# MCP SDK on GitHub.

November 20, 2025

Preventing Context Loss in RAG Pipelines with Azure AI Search: A Semantic Chunking and Retrieval Strategy


RAG: that’s short for Retrieval-Augmented Generation is the secret sauce behind many AI systems that actually know what they’re talking about.
Instead of relying only on a language model’s memory, RAG lets the model search for relevant facts and use them as context when generating responses. It’s like giving your AI assistant a reading assignment before it answers your question.

Sounds great, right?

Well… almost.

Because there’s one sneaky issue that ruins the magic: "Context loss"
You ask a question like “Explain how AI evolved in the 1940s and 50s,” and the model gives you:

  • Just half the answer.
  • Or skips the definition of an important term.
  • Or mixes up two unrelated paragraphs.

This happens when the chunks of information fed into the model are either:

  • Too small to be meaningful, or
  • Too isolated to carry the full picture.

Today, we’re going to fix that.
We’ll build a smarter RAG pipeline using Azure AI Search, and along the way you’ll learn how to:

  • Chop up documents semantically (not just every 500 tokens)
  • Retrieve passages using both keywords and vector similarity
  • Stitch back the right context (even when your query didn’t know it needed it)

By the end, you’ll have a clean, modular setup that’s ready to power any LLM app that needs rich, relevant context without losing the thread.
Let’s start with what actually goes wrong and why it happens more often than you think.

The Problem: Context Loss in RAG Pipelines

On paper, a RAG setup sounds simple:
Break your documents into parts → Search through them → Provide the context to your model → Get a fact-based answer.

But in practice, there's a common issue that quietly sneaks in:
You lose the context right when it matters most.
Let’s say you’re indexing a long research doc. Somewhere in there, a paragraph says:


“This mechanism is a variation of Hebbian theory, which we introduced in the previous section.”

And now a user asks:
“What is Hebbian theory?”

Guess what?
Your retriever grabs the current chunk with that line but not the previous section that actually explains what Hebbian theory is.

Here’s why this happens so often:

Most pipelines split documents every N token (say, 500–800). That’s easy for machines, but brutal for meaning:

  • Sentences get cut mid-way.
  • Tables get sliced in half.
  • References point to nowhere.

Shallow retrieval:

RAG systems often rely on:

  • Keyword matches (BM25)
  • Or a single vector field (semantic similarity)

Both are good, but not enough on their own:

  • Keywords might miss reworded passages.
  • Vectors might pull something conceptually close… but not specific enough. 

Context isolation:

Even when you retrieve the right chunk, it might need its neighbors:

  • The chunk before might define a term.
  • The chunk after might finish the logic.
  • And they’re often left out entirely.

Most RAG pipelines are good at fetching passages,

but not great at reconstructing context.

Now let’s fix that without rewriting your whole stack. 

The Solution Strategy: Keep Your Context, Serve Better Answers

To solve the context-loss problem, we use a combination of semantic chunking, hybrid search, and smart indexing all powered by Azure OpenAI and Azure AI Search.

Here’s the game plan broken down:

Step 1: Semantic Chunking (Not Just Slicing Text)

We split your documents by meaning, not just fixed size. That means paragraphs that “belong together” stay together preserving the flow of thought.
This preserves semantic integrity, so the model sees the whole story.

Step 2: Index with Azure AI Search

Once we’ve chunked the content, we store it in a searchable index. Each chunk gets its own embedding and metadata (source URI, headings, position in doc, etc.).

Why this matters:

  • You get fast semantic search with vector support
  • Plus, keyword fallback when needed (hybrid search FTW!)

Step 3: Hybrid Retrieval = Vector + Keyword 

When the user asks a question, we combine:

  • Vector similarity: Find semantically close matches
  • BM25 keyword matching: Catch exact terms (e.g., "Turing Test")
  • Neighbor expansion – fetches previous and next chunks for continuity

Together, this improves precision + recall the model sees more relevant chunks, grounded in the user's intent.

Step 4: Feed to the Model as Context

We pass the top-k matching chunks to Azure OpenAI as context in your prompt.

This gives your model:

  • Enough signal to answer clearly
  • No noise from unrelated data
  • A better shot at staying grounded

let's jump in to the implementation

Prerequisites & Setup: 

Before we dive into code, let’s make sure we’ve got all the tools and ingredients ready. Think of this as your RAG recipe checklist

Python Packages to Install


pip install azure-search-documents openai python-docx tiktoken tenacity python-dotenv


Environment Variables:

Create a .env file with your credentials (never hardcode in scripts!):
AZURE_OPENAI_API_KEY=""
AZURE_OPENAI_ENDPOINT=""
AZURE_OPENAI_EMBEDDING_DEPLOYMENT="text-embedding-3-small"

AZURE_SEARCH_ENDPOINT=""
AZURE_SEARCH_API_KEY=""
AZURE_SEARCH_INDEX_NAME="my-index-name"

Step 1: Semantic Chunking with Azure OpenAI

Before we send anything to a vector index, we need to split our text into smaller, meaningful chunks not just by paragraph or sentence, but by semantic boundaries (where the topic naturally shifts). That’s where SemanticChunker shines!

1. Setup Azure OpenAI Embeddings

from langchain_openai.embeddings import AzureOpenAIEmbeddings
import os

def get_azure_embeddings():
    """
    Creates an embedding client for Azure OpenAI
    Returns:
        AzureOpenAIEmbeddings: LangChain embedding object
    """
    return AzureOpenAIEmbeddings(
        azure_deployment=os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT"),
        api_key=os.getenv("AZURE_OPENAI_API_KEY"),
        azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
        api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
    )

2. Semantic Chunking with LangChain

from langchain_experimental.text_splitter import SemanticChunker

def chunk_text_semantically(text: str, embeddings) -> list:
    """
    Splits long text into semantically meaningful chunks using Azure OpenAI embeddings.
    
    Args:
        text (str): Full document text
        embeddings: An AzureOpenAIEmbeddings object
    
    Returns:
        list: A list of Document chunks
    """
    splitter = SemanticChunker(
        embeddings=embeddings,
        breakpoint_threshold_type="percentile",   # How aggressive to split
        breakpoint_threshold_amount=95.0,         # Top 5% breakpoint
        min_chunk_size=120                        # Avoid tiny chunks
    )
    
    return splitter.create_documents([text])

Example Usage:

embeddings = get_azure_embeddings()
chunks = chunk_text_semantically(doc_text, embeddings)

print(f"Total chunks created: {len(chunks)}")
print("Sample Chunk:\n", chunks[0].page_content[:500])

Step 2: Indexing Chunks into Azure AI Search

Azure AI Search doesn’t just take your text and call it a day you need to prepare it right. Each chunk becomes a document with fields like id, content, and embedding.

Here’s how we do it, step by step

1. Define Your Index Schema (if not already created)

from azure.search.documents.indexes.models import (
    SearchIndex, SimpleField, SearchableField, VectorSearch, VectorSearchAlgorithmConfiguration
)

def build_search_index_schema(index_name: str) -> SearchIndex:
    return SearchIndex(
        name=index_name,
        fields=[
            SimpleField(name="id", type="Edm.String", key=True),
            SearchableField(name="content", type="Edm.String"),
            SimpleField(name="chunk_id", type="Edm.Int32"),
            SimpleField(name="doc_id", type="Edm.String"),
            SimpleField(name="source_uri", type="Edm.String"),
            SimpleField(name="prev_id", type="Edm.String"),
            SimpleField(name="next_id", type="Edm.String"),
            SimpleField(name="page_no", type="Edm.Int32", filterable=True),
            SimpleField(name="embedding", type="Collection(Edm.Single)", searchable=True, vector_search_dimensions=1536),
        ],
        vector_search=VectorSearch(
            algorithm_configurations=[
                VectorSearchAlgorithmConfiguration(
                    name="default-vector-config",
                    kind="hnsw",
                    parameters={"m": 4, "efConstruction": 400}
                )
            ]
        )
    )

2. Format Chunks with prev_id and next_id

import uuid

def format_chunks_for_indexing(chunks: list, doc_id: str, source_uri: str) -> list:
    formatted = []
    for i, chunk in enumerate(chunks):
        formatted.append({
            "id": f"{doc_id}_{i}".replace("#", "_"),
            "doc_id": doc_id,
            "chunk_id": i,
            "source_uri": source_uri,
            "page_no": chunk.metadata.get("page", None),
            "content": chunk.page_content,
            "prev_id": f"{doc_id}_{i-1}" if i > 0 else None,
            "next_id": f"{doc_id}_{i+1}" if i < len(chunks)-1 else None
        })
    return formatted

3. Embed and Upload to Azure AI Search

from azure.search.documents import SearchClient
from azure.core.credentials import AzureKeyCredential

def index_chunks_to_azure(chunks: list, embedding_fn, search_client: SearchClient):
    for chunk in chunks:
        chunk["embedding"] = embedding_fn(chunk["content"])
    search_client.upload_documents(documents=chunks)
    print(f"Uploaded {len(chunks)} chunks to Azure Search")

Putting It All Together:

# 1. Setup SearchClient
search_client = SearchClient(
    endpoint=os.getenv("AZURE_SEARCH_ENDPOINT"),
    index_name=os.getenv("AZURE_SEARCH_INDEX_NAME"),
    credential=AzureKeyCredential(os.getenv("AZURE_SEARCH_API_KEY"))
)

# 2. Setup embeddings
embedding_model = get_azure_embeddings()
embedding_fn = lambda text: embedding_model.embed_query(text)

# 3. Format and push
doc_id = "ai_intro_doc"
formatted_chunks = format_chunks_for_indexing(chunks, doc_id, "document-path")
index_chunks_to_azure(formatted_chunks, embedding_fn, search_client)

Step 3: Semantic Retrieval with Context-Aware Expansion

Once your semantic chunks are indexed, it's time to make them useful. A great RAG system doesn’t just match keywords it understands meaning and also respects structure. That’s why we use Hybrid Search.
We'll:
  1. Embed the user query (for semantic search)
  2. Perform hybrid search: text + vector
  3. Pull neighboring chunks via `prev_id` and `next_id` to prevent context loss
  4. Format results for your model prompt

1. Embed the User Query

We’ll use the same embedding model to turn the query into a vector, so we can find the closest semantic matches in the index.
def get_query_embedding(query: str, embedding_model) -> list:
    return embedding_model.embed_query(query)

2. Perform Hybrid Search in Azure AI Search

Azure Search supports sending both a search_text (keyword) and vector (semantic) to run hybrid
from azure.search.documents.models import Vector

def hybrid_search(query: str, query_vector: list, search_client, k: int = 5):
    vector = Vector(value=query_vector, k=k, fields="embedding")

    results = search_client.search(
        search_text=query,
        vectors=[vector],
        select=["id", "content", "doc_id", "prev_id", "next_id"],
        top=k,
    )
    return list(results)

3. Expand Results with Prev/Next Context

def fetch_with_context(results, search_client):
    related_ids = set()
    for r in results:
        related_ids.add(r["id"])
        if r.get("prev_id"):
            related_ids.add(r["prev_id"])
        if r.get("next_id"):
            related_ids.add(r["next_id"])

    # Filter to fetch all related IDs
    filter_expr = " or ".join([f"id eq '{rid}'" for rid in related_ids])
    expanded_results = search_client.search(
        search_text="*", 
        filter=filter_expr,
        select=["id", "content", "doc_id"],
    )
    return list(expanded_results)

Putting It All Together

query = "What were the key milestones in early AI history?"
query_vector = get_query_embedding(query, embedding_model)

top_chunks = hybrid_search(query, query_vector, search_client, k=4)
contextual_chunks = fetch_with_context(top_chunks, search_client)

# Sort by doc_id/chunk_id to preserve flow
contextual_chunks = sorted(contextual_chunks, key=lambda c: c["id"])

# Display sample
for chunk in contextual_chunks:
    print(f"\n{chunk['id']}\n{chunk['content'][:300]}...")

Step 4: Stitch Chunks, Prompt the Model (The RAG Finale)

Once we’ve retrieved the best matching chunks including their neighbors it’s time to give them to the model.

But wait it’s not just “Top 3 chunks → Dump into prompt.”
We make sure the chunks are:

  • Deduplicated (no repeats)
  • Sorted (in reading order)
  • Joined with separators (so the model can distinguish them)

def prepare_prompt_context(results: list, k: int = 3) -> str:
    """
    Collects top-k search results, expands with neighbors, deduplicates, and prepares prompt-ready context.
    """
    seen = set()
    selected = []

    for doc in results[:k]:
        for chunk in [doc["prev_chunk"], doc["current_chunk"], doc["next_chunk"]]:
            chunk_id = chunk["id"]
            if chunk_id not in seen:
                selected.append(chunk)
                seen.add(chunk_id)

    # Sort by chunk_id to maintain reading order
    selected.sort(key=lambda x: x["chunk_id"])

    # Join with clear separators
    return "\n---\n".join(chunk["content"] for chunk in selected)

You can now take the returned string and plug it into your LLM prompt like so:
prompt = f"""You are an expert assistant. Use the following context to answer clearly and accurately.

{prepare_prompt_context(retrieved_results)}

Question: {user_query}
Answer:"""

Result: Your model sees a coherent slice of the source doc complete with the lead-in, answer, and follow-up. No more broken thoughts!

Wrapping Up: From Documents to Grounded Answers

Preventing context loss isn’t just a nice-to-have in Retrieval-Augmented Generation (RAG). It’s the difference between vague answers… and useful ones.

By combining:

  • Semantic chunking - keeps ideas together
  • Smart indexing - stores structure and meaning
  • Hybrid retrieval - balances precision and recall
  • Neighbor-aware context - completes the narrative

we make Azure AI Search and Azure OpenAI work together like a dream team.

This approach isn’t just scalable it’s grounded, relevant, and ready for production RAG applications.

Whether you're building internal knowledge assistants, research bots, or customer-facing copilots preserving context is your secret weapon.

If you have any questions you can reach out our SharePoint Consulting team here.

Step-by-Step Guide: Convert a SharePoint Site Page to PDF using Power Automate

Converting SharePoint site pages into PDFs can be useful for creating reports, archives, or offline documentation. In this step-by-step guide, we’ll walk through how to automate this process using Power Automate.

Step 1: Create a Power Automate Flow

Start by creating a new Power Automate flow.
You can trigger it manually or configure it to run on a schedule or in response to a specific event, depending on your requirements.

Step 2: Initialise Department Variable

Add an Initialise Variable action to store the department name.
This variable will be used later when creating folders inside your document library.

Step 3: Initialise PDF File Name Variable

Next, create another Initialise Variable to hold the PDF file name that will be generated for each site page.


Step 4: Get Site Pages

Add a Get Files (Properties Only) action and point it to your Site Pages library.
You can apply a Filter Query to limit the results, or leave it blank to fetch all site pages.


Step 5: Apply to Each Site Page

Insert an Apply to Each loop and select the value output from the previous “Get Files” action.


Step 6: Set Department Variable

Inside the loop, set the Department variable using the value from your DepartOwner (or equivalent) column from the “Get Files” action.


💡 Replace the column name if your field name differs.

Step 7: Set PDF File Name Variable

Now, set the PDF file name dynamically using the page title:
concat(items('Apply_to_each')?['Title'], '.pdf')

Step 8: Get Canvas Content from Site Page

Add a Send an HTTP Request to SharePoint action.
Use it to retrieve the canvas content of each site page.
Pass the ID of the page from the “Get Files” action to get its content.


Step 9: Parse Canvas Content

Add a Parse JSON action to interpret the response from the previous HTTP request.
Use the Body output from the “Send an HTTP Request to SharePoint” step.


Step 10: Create a Temporary HTML File in OneDrive

Next, add a Create File action (in OneDrive).
This will temporarily store the HTML version of the site page.


File Name: concat(items('Apply_to_each')?['Title'], '.html')

Step 11: Convert HTML to PDF

Use the Convert File action (OneDrive) to convert the HTML file into a PDF.
Pass the File ID from the previous “Create File” step.

Step 12: Create a Folder in SharePoint

Add a Create New Folder action in your SharePoint Document Library.
Set the Folder Path using your Department variable to organise PDFs by department.


Step 13: Upload the PDF to SharePoint

Add a Create File (SharePoint) action.
This will create the final PDF inside the folder created in the previous step.


Step 14: Delete Temporary HTML File

Finally, clean up the temporary HTML file created in OneDrive.
Add a Delete File (OneDrive) action and pass the File ID from the earlier “Create File” step.



Once your flow is complete, run it manually (or trigger it automatically as configured). Your SharePoint site pages will now be converted into well-organised PDF files stored neatly in your document library.

If you have any questions you can reach out our SharePoint Consulting team here.

August 28, 2025

Building a Reusable React Component Library with TypeScript and Rollup - A Step-by-Step Guide

Thinking of building your own reusable React component library? Whether it’s to keep your projects consistent or to make collaboration with your team easier, you’re in the right place.

In this guide, I’ll walk you through exactly how I created a shareable React component library from setup to publishing, complete with real code examples, clear explanations, and practical tips. Everything you need is right here in one place.

Use Case

Maintaining multiple React projects with variations of the same UI components presented significant challenges for our team. We encountered frequent issues such as inconsistent styling, duplicated bug fixes, and difficulties in propagating enhancements across codebases. This approach led to inefficiencies, unnecessary overhead, and a lack of coherence in user experience.

To address these challenges, we developed a centralizedReusable Component Library, a standardized collection of UI components designed for use across all our React projects. By consolidating our shared components into a single, well-maintained package, we significantly reduced development redundancy and ensured visual and behavioral consistency throughout our applications. Updates or improvements made to the component library are seamlessly integrated wherever the library is used, streamlining maintenance and accelerating development cycles.


1. Set Up Your Project Folder

First, create a new folder for your component library and initialize it:


mkdir my-react-component-library
cd my-react-component-library
npm init -y

With your project folder in place, you have established a solid foundation for the steps ahead.


2. Install Essential Dependencies

Install React, TypeScript, and essential build tools for a robust library setup:


npm install react react-dom
npm install --save-dev typescript @types/react @types/react-dom
npm install --save-dev rollup rollup-plugin-peer-deps-external rollup-plugin-postcss @rollup/plugin-node-resolve @rollup/plugin-commonjs @rollup/plugin-typescript sass

 The right dependencies are now in place, ensuring your project is equipped for modern development and efficient bundling.


3. Organize Your Project Structure

Establish a clear and logical directory structure for your components and outputs:


With your file structure organized, you are primed for scalable code and easy project navigation.

4. Write Your Component

Develop a simple reusable React component as a starting point for your library:


import React from 'react';
import styles from './HelloWorld.module.scss';
type HelloWorldProps = {
  name: string;
};
export const HelloWorld: React.FC<HelloWorldProps> = ({ name }) => (
  <div className={styles.centerScreen}>
    <div className={styles.card}>
      <span className={styles.waveEmoji}></span>
      <div className={styles.textBlock}>
        <span className={styles.helloSmall}>Hello,</span>
        <span className={styles.name}>{name}</span>
      </div>
    </div>
  </div>
);

Having your first component ready sets the stage for further expansion and consistent styling across your library.


5. Set Up TypeScript

Configure TypeScript for optimal type safety and the generation of type declarations:

{
  "compilerOptions": {
    "declaration": true,
    "declarationDir": "dist/types",
    "emitDeclarationOnly": false,
    "jsx": "react",
    "module": "ESNext",
    "moduleResolution": "node",
    "outDir": "dist",
    "rootDir": "src",
    "target": "ES6",
    "strict": true,
    "esModuleInterop": true
  },
  "include": ["src"]
}

TypeScript is now fully configured, bringing type safety and easy downstream integration for consumers.


6. Create an Index Export

Make src/index.ts like this:

export { HelloWorld } from './HelloWorld';

Centralizing your exports prepares your library for seamless adoption in other projects

7. Add a Type Declarations File

Enable TypeScript to recognize SCSS module imports and prevent type errors:

declare module '*.module.scss' {
  const classes: { [key: string]: string };
  export default classes;
}

With declaration files in place, your styling workflow integrates smoothly with TypeScript.


8. Configure Rollup

Set up Rollup for reliable library bundling and versatile output formats:

import peerDepsExternal from "rollup-plugin-peer-deps-external";
import postcss from "rollup-plugin-postcss";
import resolve from "@rollup/plugin-node-resolve";
import commonjs from "@rollup/plugin-commonjs";
import typescript from "@rollup/plugin-typescript";

export default {
  input: "src/index.ts",
  output: [
    {
      file: "dist/index.js",
      format: "cjs",
      sourcemap: true,
    },
    {
      file: "dist/index.esm.js",
      format: "esm",
      sourcemap: true,
    },
  ],
  plugins: [
    peerDepsExternal(),
    resolve(),
    commonjs(),
    typescript({ tsconfig: "./tsconfig.json" }),
    postcss({
      modules: true,
      use: ["sass"],
    }),
  ],
  external: ["react", "react-dom"],
};


An optimized bundling process now supports your library's compatibility with a variety of JavaScript environments.

9. Update package.json

Reference all build outputs and dependencies accurately in your package.json.:

{
  "main": "dist/index.js",
  "module": "dist/index.esm.js",
  "types": "dist/types/index.d.ts",
  "files": [
    "dist"
  ],
  "scripts": {
    "build": "rollup -c"
  },
  "peerDependencies": {
    "react": "^17.0.0 || ^18.0.0",
    "react-dom": "^17.0.0 || ^18.0.0"
  }
}

Your package metadata is set, paving the way for effortless installation and use.


10. Build the Package

Trigger Rollup to bundle your components:

npm run build

With a completed build, your library files are now ready for distribution.


11. Publishing to Azure Artifacts npm Registry

a) Set up your Azure Artifacts Feed

Go to Azure DevOps > Artifacts and create (or use) an npm feed.


b) Configure npm for Azure Artifacts

In your project root, create or update a .npmrc file with:

@yourscope:registry=https://pkgs.dev.azure.com/yourorg/_packaging/yourfeed/npm/registry/
always-auth=true

Replace @yourscope, yourorg, and yourfeed with your actual values.

c) Authenticate Locally

Use Azure's instructions for authentication, such as:

npm login --registry=https://pkgs.dev.azure.com/yourorg/_packaging/yourfeed/npm/registry/

In some setups, especially on Windows, you might need to install and run vsts-npm-auth to complete authentication.

d ) Build Your Package

Ensure your package is built and ready to publish (e.g., run npm run build if you have a build step.

e ) Publish Your Package

From the project root, run:

npm publish

You do not need to specify the registry in the publish command if your .npmrc is set correctly. The registry is picked up from .npmrc.

And just like that, your component library is available in your Azure feed for your team or organization to install and use!

If you’d prefer to publish to the public npm registry, follow these steps:


OR 

12. Publishing to NPM

Prerequisites

  • You already built your library (dist/ exists, with all outputs, after running npm run build).
  • You have an npmjs.com account.

a) Log in to npm 

In your terminal, from the root of your project, type:

npm login

Enter your npm username, password, and email when prompted.

b)  Publish 

Publish the package:

npm publish

After publishing to npmjs.com, you’ll want to showcase your package’s availability directly from your npm dashboard.


Instructions:

  1. Go to npmjs.com and log in to your account.

  2. Click on your username (top-right) and select Packages from the dropdown.

  3. Find and click your newly published package.



Seeing your package live in npm’s dashboard is a proud milestone—your code is now out there, ready to make life easier for every developer who needs it!

Once published, your component library is available for installation in any compatible React project.


install the library in any React project:


npm install your-package-name

Output:

Below is an example of what you'll see after successfully publishing your package to npm. This confirmation means your component library can now be installed and used in any of your React projects.

Troubleshooting/Common Tips:

  • Instructions: If the package name + version already exists on npm, bump your version in package.json.
  • Make sure your main, module, and types fields point to valid files in your dist/ directory (you’ve already done this!).
  • Check .npmignore or the "files" section in package.json so only necessary files are published.

Conclusion:

You've now created, bundled, and published your reusable React component library with TypeScript and Rollup.
This new workflow helps you:

  • Speed up development: No more duplicating code between projects.
  • Guarantee consistency: All your apps share the same reliable components.
  • Simplify updates: Bug fixes or enhancements are made once and shared everywhere.
  • Easily distribute privately or publicly: Works with both internal feeds (like Azure Artifacts) and public npm.

Now your custom components are ready to power future projects, speed up development, and ensure consistency across your apps.