Document Search & Summarization - Quick Reference
One-page cheat sheet for quick lookup
Three Profiles at a Glance
Profile | Latency | Quality | Cost | When to Use |
---|---|---|---|---|
Balanced | 500ms | 0.7-0.8 | $0.60/1K | General Q&A, customer support |
Latency-First | 250ms | 0.7 | $0.35/1K | Interactive search, auto-complete |
Quality-First | 5s | 0.85-0.95 | $52/1K | Research, compliance, legal |
Quick Start (3 Steps)
# 1. Setup
from packages.rag.document_search import *
store = OpenSearchDocumentStore("localhost", 9200, "documents")
pipeline = DocumentSearchPipeline.create_profile(
ProfileType.BALANCED, store, embedding_function
)
# 2. Execute
result = pipeline.execute("Your query here")
# 3. Use Results
print(result.summary.text) # Summary
print(result.summary.citations) # Citations
print(result.timing) # Performance metrics
Architecture
Query → Retriever → Reranker → Summarizer → Result
↓ ↓ ↓
(Hybrid) (Optional) (Grounded)
BM25+Vec CrossEnc Extract/Abstract
Key Concepts
Hybrid Search = BM25 + Vector
- BM25: Exact keyword matching (fast, explainable)
- Vector: Semantic similarity (handles synonyms)
- RRF: Combines both with reciprocal rank fusion
- α: Weight parameter (0=BM25 only, 1=Vector only, 0.5=equal)
Query Expansion
- PRF (Pseudo-Relevance Feedback): Use top results to expand query
- HyDE (Hypothetical Doc Embeddings): Generate ideal doc, search for similar
Reranking
- Bi-Encoder (retrieval): Fast, approximate similarity
- Cross-Encoder (reranking): Slow, accurate relevance scoring
- Two-Stage: Bi-encoder gets 50 candidates → Cross-encoder ranks → Top 5
Summarization
- Extractive: Select important sentences (fast, faithful, free)
- Abstractive: Generate new text (fluent, expensive, risk of hallucination)
- Grounded: Verify claims against sources, cite everything
Configuration Patterns
Pattern 1: Customer Support KB
config = PipelineConfig(
profile=ProfileType.BALANCED,
topK=20,
alpha=0.5,
enable_reranking=True,
summarization_mode=SummarizationMode.EXTRACTIVE,
max_summary_length=250
)
Pattern 2: Legal/Compliance
config = PipelineConfig(
profile=ProfileType.QUALITY_FIRST,
topK=50,
alpha=0.5,
enable_reranking=True,
query_expansion_method="hyde",
summarization_mode=SummarizationMode.ABSTRACTIVE,
faithfulness_threshold=0.95
)
Pattern 3: Real-Time Chat
config = PipelineConfig(
profile=ProfileType.LATENCY_FIRST,
topK=10,
alpha=0.7, # Favor BM25 for speed
enable_reranking=False,
enable_query_expansion=False,
summarization_mode=SummarizationMode.EXTRACTIVE,
max_summary_length=150
)
RAGAS Metrics
Metric | Formula | Target (Balanced) | Target (Quality) |
---|---|---|---|
Context Precision | Relevant/Retrieved | > 0.70 | > 0.85 |
Context Recall | Retrieved/Total Relevant | > 0.70 | > 0.85 |
Faithfulness | Supported Claims/Total Claims | > 0.85 | > 0.95 |
Answer Relevancy | cosine(query, answer) | > 0.75 | > 0.80 |
Common Tuning Parameters
Retrieval
topK = 20 # Number of results to retrieve
alpha = 0.5 # BM25 vs vector weight (0-1)
BM25
k1 = 1.2 # Term frequency saturation (0.5-3.0)
b = 0.75 # Length normalization (0-1)
RRF
k = 60 # Rank constant (typically 60)
Reranking
model = "cross-encoder/ms-marco-MiniLM-L-6-v2" # Balanced
model = "cross-encoder/ms-marco-MiniLM-L-12-v2" # Quality
top_k = 10 # Rerank top N candidates
Summarization
max_length = 250 # Words
faithfulness_threshold = 0.85 # Minimum acceptance
Caching Strategy
# L1: Full result cache (1 hour TTL)
cache_key = f"query:{hash(query)}:{hash(filters)}"
# L2: Retrieval cache (1 hour TTL)
cache_key = f"retrieval:{hash(query)}"
# L3: Summary cache (24 hour TTL)
cache_key = f"summary:{doc_ids_hash}"
# L4: Embedding cache (7 day TTL)
cache_key = f"embedding:{content_hash}"
Expected Savings: 40-60% cost reduction
Error Handling
try:
result = pipeline.execute(query)
except TimeoutError:
# Fall back to faster profile
result = latency_pipeline.execute(query)
except Exception as e:
# Log and return error
log_error(f"Pipeline failed: {e}")
return ErrorResponse("Service unavailable")
Monitoring Metrics
# Latency
emit("search.latency_ms", timing["total_ms"])
# Quality
emit("search.faithfulness", summary.faithfulness)
emit("search.slo_met", slo_met)
# Usage
emit("search.queries_total", 1)
emit("search.cache_hit", cache_hit)
# Cost
emit("search.cost_usd", cost)
API Quick Reference
# Create pipeline
pipeline = DocumentSearchPipeline.create_profile(profile, store, embed_fn)
# Execute search
result = pipeline.execute(query, filters={...})
# Access results
result.query # Original query
result.results # List[RetrievalResult]
result.summary # GroundedSummary
result.timing # Dict[str, float]
result.slo_met # bool
result.facets # Dict (if enabled)
# Summary details
result.summary.text # Summary text
result.summary.citations # Dict[int, Citation]
result.summary.faithfulness # float
result.summary.coverage # float
result.summary.key_points # List[str]
# Citation details
citation = result.summary.citations[1]
citation.document_id # str
citation.document_title # str
citation.chunk_id # str
citation.snippet # str (100-150 chars)
Troubleshooting
Issue | Likely Cause | Solution |
---|---|---|
Slow searches | Profile misconfigured | Use latency-first, enable caching |
Low faithfulness | Poor source quality | Use extractive mode, verify sources |
Poor relevance | Wrong alpha weight | Tune α, enable query expansion |
High costs | Too many LLM calls | Use extractive, implement caching |
SLO violations | Too ambitious targets | Adjust profile or hardware |
Performance Targets
Profile | Latency P95 | Throughput | Concurrent Users |
---|---|---|---|
Balanced | < 500ms | 100 QPS | 1000+ |
Latency-First | < 250ms | 200 QPS | 2000+ |
Quality-First | < 5000ms | 20 QPS | 200+ |
Cost Breakdown (per 1K queries)
Component | Balanced | Latency-First | Quality-First |
---|---|---|---|
Retrieval | $0.40 | $0.25 | $1.50 |
Reranking | $0.20 | $0.00 | $0.50 |
Summarization | $0.00 | $0.00 | $50.00 |
Embedding | $0.10 | $0.10 | $0.50 |
Total | $0.60 | $0.35 | $52.50 |
Decision Tree
Start
↓
Need < 250ms latency? → Yes → Latency-First
↓ No
Need 95%+ faithfulness? → Yes → Quality-First
↓ No
General use case? → Yes → Balanced
↓ No
Custom requirements? → Create custom profile
Further Reading
- Full Guide: Document Search & Summarization
- Examples: Document Search Examples
- API Reference: Document Search API
- Theory: Information Retrieval section in main guide
Last Updated: October 9, 2025
Version: 1.0