RAG Chunkers
Document chunking strategies and implementations for RAG systems.
Overview
The RAG chunkers system provides various document chunking strategies optimized for different types of content and retrieval scenarios.
Core Features
- Multiple Chunking Strategies: Text, semantic, and hybrid chunking
- Content-Aware Chunking: Adapt to different document types
- Overlap Management: Configurable chunk overlap
- Metadata Preservation: Maintain document metadata
- Performance Optimization: Efficient chunking algorithms
Usage Examples
Basic Text Chunking
from recoagent.rag.chunkers import TextChunker
# Create text chunker
chunker = TextChunker(
chunk_size=1000,
chunk_overlap=200
)
# Chunk document
chunks = chunker.chunk_document(
text="Long document text...",
metadata={"source": "document.pdf", "page": 1}
)
Advanced Chunking
from recoagent.rag.chunkers import SemanticChunker
# Create semantic chunker
semantic_chunker = SemanticChunker(
embedding_model="text-embedding-ada-002",
similarity_threshold=0.8
)
# Chunk with semantic awareness
semantic_chunks = semantic_chunker.chunk_document(
text="Document with semantic structure...",
preserve_semantics=True
)
API Reference
TextChunker Methods
chunk_document(text: str, metadata: Dict = None) -> List[Chunk]
Chunk document into text chunks
Parameters:
text(str): Document textmetadata(Dict, optional): Document metadata
Returns: List of chunks
SemanticChunker Methods
chunk_document(text: str, preserve_semantics: bool = True) -> List[Chunk]
Chunk document with semantic awareness
Parameters:
text(str): Document textpreserve_semantics(bool): Preserve semantic boundaries
Returns: List of semantic chunks
See Also
- RAG Retrievers - Document retrieval
- RAG Stores - Vector stores