Skip to main content

Document Search & Summarization - Multi-Page Guide

This documentation is organized into focused topics for easier navigation.

Table of Contents

Core Documentation

Overview - Introduction, quick start, and learning paths
Architecture - 80/20 reuse strategy, orchestration patterns
Storage & Indexing - S3 patterns, indexing pipelines, theory
Full Guide - Complete educational guide with IR theory, deployment

Quick References

Quick Reference Card - One-page cheat sheet
Documentation Index - Complete navigation guide

Code & Examples

Examples Overview - Practical code examples
Demo Script - Complete working example
Implementation Code - Source code

Reading Paths

Path 1: Quick Start (80 minutes)

Perfect for getting started quickly.

Overview - 10 min
Quick Reference - 15 min
Run demo - 20 min
Architecture basics - 35 min

Path 2: Implementation (2 hours)

For developers ready to build.

Quick Reference - 10 min
Architecture - 30 min
Storage & Indexing - 40 min
Full Guide - 40 min

Path 3: Deep Understanding (4 hours)

For ML engineers and researchers.

Key Features Summary

Hybrid Retrieval

BM25 keyword matching + Vector semantic search
Reciprocal Rank Fusion (RRF)
Query expansion (PRF, HyDE)
α-weighted combination

Grounded Summarization

Extractive (TextRank) - fast, faithful
Abstractive (LLM) - fluent, comprehensive
Sentence-level citations
Faithfulness verification

Profile-Based Architecture

Balanced: 500ms, good quality, $0.60/1K
Latency-First: 250ms, acceptable quality, $0.35/1K
Quality-First: 5s, excellent quality, $52/1K

Implementation Status

✅ Week 0 Complete - Foundation

Core components implemented
Profile architecture validated
Test fixtures created
Demo ready

🔜 Week 1 In Progress - Document loading

PDF, DOCX, XLSX loaders
S3 integration
Baseline evaluation

Quick Links

Code: packages/rag/document_search/
Examples: examples/document_search_demo.py
Planning: docs/docs/features/

Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: support@recoagent.com

Table of Contents
Reading Paths
Key Features Summary
Implementation Status
Quick Links
Support