Document Search & Summarization - Multi-Page Guide
This documentation is organized into focused topics for easier navigation.
Table of Contents
Core Documentation
- Overview - Introduction, quick start, and learning paths
- Architecture - 80/20 reuse strategy, orchestration patterns
- Storage & Indexing - S3 patterns, indexing pipelines, theory
- Full Guide - Complete educational guide with IR theory, deployment
Quick References
- Quick Reference Card - One-page cheat sheet
- Documentation Index - Complete navigation guide
Code & Examples
- Examples Overview - Practical code examples
- Demo Script - Complete working example
- Implementation Code - Source code
Reading Paths
Path 1: Quick Start (80 minutes)
Perfect for getting started quickly.
- Overview - 10 min
- Quick Reference - 15 min
- Run demo - 20 min
- Architecture basics - 35 min
Path 2: Implementation (2 hours)
For developers ready to build.
- Quick Reference - 10 min
- Architecture - 30 min
- Storage & Indexing - 40 min
- Full Guide - 40 min
Path 3: Deep Understanding (4 hours)
For ML engineers and researchers.
- Full Guide - Theory - 60 min
- Architecture Deep Dive - 40 min
- Storage & Indexing Theory - 40 min
- Full Guide - Evaluation - 40 min
Key Features Summary
Hybrid Retrieval
- BM25 keyword matching + Vector semantic search
- Reciprocal Rank Fusion (RRF)
- Query expansion (PRF, HyDE)
- α-weighted combination
Grounded Summarization
- Extractive (TextRank) - fast, faithful
- Abstractive (LLM) - fluent, comprehensive
- Sentence-level citations
- Faithfulness verification
Profile-Based Architecture
- Balanced: 500ms, good quality, $0.60/1K
- Latency-First: 250ms, acceptable quality, $0.35/1K
- Quality-First: 5s, excellent quality, $52/1K
Implementation Status
✅ Week 0 Complete - Foundation
- Core components implemented
- Profile architecture validated
- Test fixtures created
- Demo ready
🔜 Week 1 In Progress - Document loading
- PDF, DOCX, XLSX loaders
- S3 integration
- Baseline evaluation
Quick Links
- Code:
packages/rag/document_search/
- Examples:
examples/document_search_demo.py
- Planning:
docs/docs/features/
Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: support@recoagent.com