- Just half the answer.
- Or skips the definition of an important term.
- Or mixes up two unrelated paragraphs.
- Too small to be meaningful, or
- Too isolated to carry the full picture.
- Chop up documents semantically (not just every 500 tokens)
- Retrieve passages using both keywords and vector similarity
- Stitch back the right context (even when your query didn’t know it needed it)
By the end, you’ll have a clean, modular setup that’s ready to power any LLM app that needs rich, relevant context without losing the thread.
Let’s start with what actually goes wrong and why it happens more often than you think.
The Problem: Context Loss in RAG Pipelines
But in practice, there's a common issue that quietly sneaks in:
You lose the context right when it matters most.
Let’s say you’re indexing a long research doc. Somewhere in there, a paragraph says:
Here’s why this happens so often:
- Sentences get cut mid-way.
- Tables get sliced in half.
- References point to nowhere.
Shallow retrieval:
- Keyword matches (BM25)
- Or a single vector field (semantic similarity)
Both are good, but not enough on their own:
- Keywords might miss reworded passages.
- Vectors might pull something conceptually close… but not specific enough.
Context isolation:
- The chunk before might define a term.
- The chunk after might finish the logic.
- And they’re often left out entirely.
Most RAG pipelines are good at fetching passages,
but not great at reconstructing context.
Now let’s fix that without rewriting your whole stack.
The Solution Strategy: Keep Your Context, Serve Better Answers
Step 1: Semantic Chunking (Not Just Slicing Text)
We split your documents by meaning, not just fixed size. That means paragraphs that “belong together” stay together preserving the flow of thought.
This preserves semantic integrity, so the model sees the whole story.
Step 2: Index with Azure AI Search
- You get fast semantic search with vector support
- Plus, keyword fallback when needed (hybrid search FTW!)
Step 3: Hybrid Retrieval = Vector + Keyword
- Vector similarity: Find semantically close matches
- BM25 keyword matching: Catch exact terms (e.g., "Turing Test")
- Neighbor expansion – fetches previous and next chunks for continuity
Step 4: Feed to the Model as Context
- Enough signal to answer clearly
- No noise from unrelated data
- A better shot at staying grounded
let's jump in to the implementation
Prerequisites & Setup:
Python Packages to Install
pip install azure-search-documents openai python-docx tiktoken tenacity python-dotenvEnvironment Variables:
AZURE_OPENAI_API_KEY=""
AZURE_OPENAI_ENDPOINT=""
AZURE_OPENAI_EMBEDDING_DEPLOYMENT="text-embedding-3-small"
AZURE_SEARCH_ENDPOINT=""
AZURE_SEARCH_API_KEY=""
AZURE_SEARCH_INDEX_NAME="my-index-name"
Step 1: Semantic Chunking with Azure OpenAI
1. Setup Azure OpenAI Embeddings
from langchain_openai.embeddings import AzureOpenAIEmbeddings import os def get_azure_embeddings(): """ Creates an embedding client for Azure OpenAI Returns: AzureOpenAIEmbeddings: LangChain embedding object """ return AzureOpenAIEmbeddings( azure_deployment=os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT"), api_key=os.getenv("AZURE_OPENAI_API_KEY"), azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"), api_version=os.getenv("AZURE_OPENAI_API_VERSION"), )
2. Semantic Chunking with LangChain
from langchain_experimental.text_splitter import SemanticChunker def chunk_text_semantically(text: str, embeddings) -> list: """ Splits long text into semantically meaningful chunks using Azure OpenAI embeddings. Args: text (str): Full document text embeddings: An AzureOpenAIEmbeddings object Returns: list: A list of Document chunks """ splitter = SemanticChunker( embeddings=embeddings, breakpoint_threshold_type="percentile", # How aggressive to split breakpoint_threshold_amount=95.0, # Top 5% breakpoint min_chunk_size=120 # Avoid tiny chunks ) return splitter.create_documents([text])
embeddings = get_azure_embeddings() chunks = chunk_text_semantically(doc_text, embeddings) print(f"Total chunks created: {len(chunks)}") print("Sample Chunk:\n", chunks[0].page_content[:500])
Step 2: Indexing Chunks into Azure AI Search
1. Define Your Index Schema (if not already created)
from azure.search.documents.indexes.models import ( SearchIndex, SimpleField, SearchableField, VectorSearch, VectorSearchAlgorithmConfiguration ) def build_search_index_schema(index_name: str) -> SearchIndex: return SearchIndex( name=index_name, fields=[ SimpleField(name="id", type="Edm.String", key=True), SearchableField(name="content", type="Edm.String"), SimpleField(name="chunk_id", type="Edm.Int32"), SimpleField(name="doc_id", type="Edm.String"), SimpleField(name="source_uri", type="Edm.String"), SimpleField(name="prev_id", type="Edm.String"), SimpleField(name="next_id", type="Edm.String"), SimpleField(name="page_no", type="Edm.Int32", filterable=True), SimpleField(name="embedding", type="Collection(Edm.Single)", searchable=True, vector_search_dimensions=1536), ], vector_search=VectorSearch( algorithm_configurations=[ VectorSearchAlgorithmConfiguration( name="default-vector-config", kind="hnsw", parameters={"m": 4, "efConstruction": 400} ) ] ) )
2. Format Chunks with prev_id and next_id
import uuid def format_chunks_for_indexing(chunks: list, doc_id: str, source_uri: str) -> list: formatted = [] for i, chunk in enumerate(chunks): formatted.append({ "id": f"{doc_id}_{i}".replace("#", "_"), "doc_id": doc_id, "chunk_id": i, "source_uri": source_uri, "page_no": chunk.metadata.get("page", None), "content": chunk.page_content, "prev_id": f"{doc_id}_{i-1}" if i > 0 else None, "next_id": f"{doc_id}_{i+1}" if i < len(chunks)-1 else None }) return formatted
3. Embed and Upload to Azure AI Search
from azure.search.documents import SearchClient from azure.core.credentials import AzureKeyCredential def index_chunks_to_azure(chunks: list, embedding_fn, search_client: SearchClient): for chunk in chunks: chunk["embedding"] = embedding_fn(chunk["content"]) search_client.upload_documents(documents=chunks) print(f"Uploaded {len(chunks)} chunks to Azure Search")
# 1. Setup SearchClient search_client = SearchClient( endpoint=os.getenv("AZURE_SEARCH_ENDPOINT"), index_name=os.getenv("AZURE_SEARCH_INDEX_NAME"), credential=AzureKeyCredential(os.getenv("AZURE_SEARCH_API_KEY")) ) # 2. Setup embeddings embedding_model = get_azure_embeddings() embedding_fn = lambda text: embedding_model.embed_query(text) # 3. Format and push doc_id = "ai_intro_doc" formatted_chunks = format_chunks_for_indexing(chunks, doc_id, "document-path") index_chunks_to_azure(formatted_chunks, embedding_fn, search_client)
Step 3: Semantic Retrieval with Context-Aware Expansion
- Embed the user query (for semantic search)
- Perform hybrid search: text + vector
- Pull neighboring chunks via `prev_id` and `next_id` to prevent context loss
- Format results for your model prompt
1. Embed the User Query
def get_query_embedding(query: str, embedding_model) -> list: return embedding_model.embed_query(query)
2. Perform Hybrid Search in Azure AI Search
from azure.search.documents.models import Vector def hybrid_search(query: str, query_vector: list, search_client, k: int = 5): vector = Vector(value=query_vector, k=k, fields="embedding") results = search_client.search( search_text=query, vectors=[vector], select=["id", "content", "doc_id", "prev_id", "next_id"], top=k, ) return list(results)
3. Expand Results with Prev/Next Context
def fetch_with_context(results, search_client): related_ids = set() for r in results: related_ids.add(r["id"]) if r.get("prev_id"): related_ids.add(r["prev_id"]) if r.get("next_id"): related_ids.add(r["next_id"]) # Filter to fetch all related IDs filter_expr = " or ".join([f"id eq '{rid}'" for rid in related_ids]) expanded_results = search_client.search( search_text="*", filter=filter_expr, select=["id", "content", "doc_id"], ) return list(expanded_results)
query = "What were the key milestones in early AI history?" query_vector = get_query_embedding(query, embedding_model) top_chunks = hybrid_search(query, query_vector, search_client, k=4) contextual_chunks = fetch_with_context(top_chunks, search_client) # Sort by doc_id/chunk_id to preserve flow contextual_chunks = sorted(contextual_chunks, key=lambda c: c["id"]) # Display sample for chunk in contextual_chunks: print(f"\n{chunk['id']}\n{chunk['content'][:300]}...")
Step 4: Stitch Chunks, Prompt the Model (The RAG Finale)
- Deduplicated (no repeats)
- Sorted (in reading order)
- Joined with separators (so the model can distinguish them)
def prepare_prompt_context(results: list, k: int = 3) -> str: """ Collects top-k search results, expands with neighbors, deduplicates, and prepares prompt-ready context. """ seen = set() selected = [] for doc in results[:k]: for chunk in [doc["prev_chunk"], doc["current_chunk"], doc["next_chunk"]]: chunk_id = chunk["id"] if chunk_id not in seen: selected.append(chunk) seen.add(chunk_id) # Sort by chunk_id to maintain reading order selected.sort(key=lambda x: x["chunk_id"]) # Join with clear separators return "\n---\n".join(chunk["content"] for chunk in selected)
prompt = f"""You are an expert assistant. Use the following context to answer clearly and accurately.
{prepare_prompt_context(retrieved_results)}
Question: {user_query}
Answer:"""
Wrapping Up: From Documents to Grounded Answers
- Semantic chunking - keeps ideas together
- Smart indexing - stores structure and meaning
- Hybrid retrieval - balances precision and recall
- Neighbor-aware context - completes the narrative

No comments:
Post a Comment