Skip to main content

RAG Document Search

Document search pipeline with advanced search capabilities and summarization features.

Overview

The RAG document search system provides comprehensive document search capabilities including indexing, search, and summarization.

Core Features

  • Document Indexing: Efficient document indexing and storage
  • Advanced Search: Full-text, semantic, and hybrid search
  • Summarization: Automatic document summarization
  • Caching: Smart caching for improved performance
  • Analytics: Search analytics and insights

Usage Examples

from recoagent.rag.document_search import DocumentSearchEngine

# Create document search engine
search_engine = DocumentSearchEngine()

# Index documents
search_engine.index_documents([
{"id": "doc1", "content": "Document content...", "metadata": {"title": "Doc 1"}},
{"id": "doc2", "content": "Another document...", "metadata": {"title": "Doc 2"}}
])

# Search documents
results = search_engine.search("query text", limit=10)

Advanced Search with Summarization

# Search with summarization
search_results = search_engine.search_with_summarization(
query="machine learning algorithms",
summarize_results=True,
summary_length=200
)

# Get summarized results
for result in search_results:
print(f"Title: {result.title}")
print(f"Summary: {result.summary}")
print(f"Relevance: {result.relevance_score}")

API Reference

DocumentSearchEngine Methods

index_documents(documents: List[Dict]) -> None

Index documents for search

Parameters:

  • documents (List[Dict]): List of documents to index

search(query: str, limit: int = 10) -> List[SearchResult]

Search documents

Parameters:

  • query (str): Search query
  • limit (int): Maximum results

Returns: List of search results

See Also