Skip to main content

Overview

Intelligent document retrieval with grounded, citation-aware summarization


What is Document Search & Summarization?

A production-ready service that combines:

  • Hybrid Search: BM25 + Vector embeddings for comprehensive retrieval
  • Grounded Summarization: Extractive and abstractive with citations
  • Profile-Based Architecture: Different SLOs for different use cases

Quick Comparison

ProfileLatencyQualityCostUse Case
Balanced500msGood (0.7-0.8)$0.60/1KGeneral Q&A
Latency-First250msAcceptable (0.7)$0.35/1KInteractive
Quality-First5sExcellent (0.85-0.95)$52/1KResearch

Quick Start

from packages.rag.document_search import DocumentSearchPipeline, ProfileType

# Initialize with profile
pipeline = DocumentSearchPipeline.create_profile(ProfileType.BALANCED, store, embed_fn)

# Execute
result = pipeline.execute("Your query")
print(result.summary.text) # Grounded summary with citations

Full Example: Document Search Demo


Documentation Structure

This guide is organized into focused topics:

  1. Overview (you are here) - Introduction and quick start
  2. Architecture - Profile-based design and system architecture
  3. Storage & Indexing - Document storage and indexing strategies
  4. API Integration - API endpoints and integration guide
  5. Complete Guide - Comprehensive documentation
  6. Quick Reference - Cheat sheet and quick start

Key Features

Hybrid Retrieval

  • BM25: Exact keyword matching (fast, explainable)
  • Vector Search: Semantic similarity (handles synonyms)
  • RRF Fusion: Combines both with reciprocal rank fusion

Grounded Summarization

  • Extractive (TextRank): Fast, faithful, free
  • Abstractive (LLM): Fluent, comprehensive
  • Citations: Sentence-level citation tracking
  • Faithfulness: Verification with fail-closed design

Profile-Based Configuration

  • Balanced: General-purpose (500ms, good quality)
  • Latency-First: Interactive speed (250ms)
  • Quality-First: Research-grade (5s, 95%+ faithfulness)

Implementation Files

Core Module: packages/rag/document_search/

FilePurpose
pipeline.pyMain orchestrator with profiles
store.pyDocumentStore + OpenSearch
retriever.pyHybrid retrieval + query expansion
summarizer.pyGrounded summarization
test_fixtures.pyTest dataset (10 queries)

For implementation details, see the complete guide and architecture documentation.


Learning Paths

Beginners (60 minutes)

  1. This overview (10 min)
  2. Quick Reference (15 min)
  3. Run demo script (20 min)
  4. Architecture basics (15 min)

Practitioners (90 minutes)

  1. Quick Reference (10 min)
  2. Architecture (30 min)
  3. Storage & Indexing (25 min)
  4. API Integration (25 min)

ML Engineers (2 hours)

  1. Complete Guide (60 min)
  2. Architecture (30 min)
  3. Storage & Indexing (30 min)

Real-World Applications

Customer Support KB

Profile: Balanced
SLO: < 500ms, 85%+ faithfulness
Cost: $0.60 per 1K queries

result = pipeline.execute(
"How do I reset my password?",
filters={"category": "account"}
)

Legal/Compliance Research

Profile: Quality-First
SLO: < 5s, 95%+ faithfulness
Cost: $52 per 1K queries

result = quality_pipeline.execute(
"What is our data retention policy?",
filters={"document_type": "policy", "approved": True}
)

Interactive Chat

Profile: Latency-First
SLO: < 250ms, 70%+ relevancy
Cost: $0.35 per 1K queries

result = fast_pipeline.execute(partial_query)

Next Steps


Quick Links: