Overview
Intelligent document retrieval with grounded, citation-aware summarization
What is Document Search & Summarization?
A production-ready service that combines:
- Hybrid Search: BM25 + Vector embeddings for comprehensive retrieval
- Grounded Summarization: Extractive and abstractive with citations
- Profile-Based Architecture: Different SLOs for different use cases
Quick Comparison
Profile | Latency | Quality | Cost | Use Case |
---|---|---|---|---|
Balanced | 500ms | Good (0.7-0.8) | $0.60/1K | General Q&A |
Latency-First | 250ms | Acceptable (0.7) | $0.35/1K | Interactive |
Quality-First | 5s | Excellent (0.85-0.95) | $52/1K | Research |
Quick Start
from packages.rag.document_search import DocumentSearchPipeline, ProfileType
# Initialize with profile
pipeline = DocumentSearchPipeline.create_profile(ProfileType.BALANCED, store, embed_fn)
# Execute
result = pipeline.execute("Your query")
print(result.summary.text) # Grounded summary with citations
Full Example: Document Search Demo
Documentation Structure
This guide is organized into focused topics:
- Overview (you are here) - Introduction and quick start
- Architecture - Profile-based design and system architecture
- Storage & Indexing - Document storage and indexing strategies
- API Integration - API endpoints and integration guide
- Complete Guide - Comprehensive documentation
- Quick Reference - Cheat sheet and quick start
Key Features
Hybrid Retrieval
- BM25: Exact keyword matching (fast, explainable)
- Vector Search: Semantic similarity (handles synonyms)
- RRF Fusion: Combines both with reciprocal rank fusion
Grounded Summarization
- Extractive (TextRank): Fast, faithful, free
- Abstractive (LLM): Fluent, comprehensive
- Citations: Sentence-level citation tracking
- Faithfulness: Verification with fail-closed design
Profile-Based Configuration
- Balanced: General-purpose (500ms, good quality)
- Latency-First: Interactive speed (250ms)
- Quality-First: Research-grade (5s, 95%+ faithfulness)
Implementation Files
Core Module: packages/rag/document_search/
File | Purpose |
---|---|
pipeline.py | Main orchestrator with profiles |
store.py | DocumentStore + OpenSearch |
retriever.py | Hybrid retrieval + query expansion |
summarizer.py | Grounded summarization |
test_fixtures.py | Test dataset (10 queries) |
For implementation details, see the complete guide and architecture documentation.
Learning Paths
Beginners (60 minutes)
- This overview (10 min)
- Quick Reference (15 min)
- Run demo script (20 min)
- Architecture basics (15 min)
Practitioners (90 minutes)
- Quick Reference (10 min)
- Architecture (30 min)
- Storage & Indexing (25 min)
- API Integration (25 min)
ML Engineers (2 hours)
- Complete Guide (60 min)
- Architecture (30 min)
- Storage & Indexing (30 min)
Real-World Applications
Customer Support KB
Profile: Balanced
SLO: < 500ms, 85%+ faithfulness
Cost: $0.60 per 1K queries
result = pipeline.execute(
"How do I reset my password?",
filters={"category": "account"}
)
Legal/Compliance Research
Profile: Quality-First
SLO: < 5s, 95%+ faithfulness
Cost: $52 per 1K queries
result = quality_pipeline.execute(
"What is our data retention policy?",
filters={"document_type": "policy", "approved": True}
)
Interactive Chat
Profile: Latency-First
SLO: < 250ms, 70%+ relevancy
Cost: $0.35 per 1K queries
result = fast_pipeline.execute(partial_query)
Next Steps
- Understand Design: Read the Architecture guide
- Storage Setup: Learn about Storage & Indexing
- API Integration: See the API Integration guide
- Deep Dive: Explore the complete guide
- Build: Run the demo
Quick Links:
- Quick Reference - Cheat sheet
- Examples - Code examples
- Architecture - System design
- API Integration - Integration guide