Understanding RAG
This tutorial will teach you the core concepts behind Retrieval-Augmented Generation (RAG), how it works, and why it's essential for building reliable AI applications. By the end, you'll understand the building blocks that make RecoAgent powerful.
What You'll Learn
- What RAG is and why it matters
- How retrieval and generation work together
- Different retrieval strategies and their trade-offs
- How evaluation helps improve RAG systems
- Common challenges and solutions
What is RAG?
Retrieval-Augmented Generation (RAG) is a technique that combines two powerful AI capabilities:
- Retrieval - Finding relevant information from a knowledge base
- Generation - Creating natural language responses using that information
Why RAG Matters
Traditional language models have limitations:
- Knowledge cutoff - They can't access information after training
- Hallucination - They may generate plausible but incorrect information
- No source attribution - You can't verify where answers come from
RAG solves these problems by:
- Grounding answers in real, up-to-date information
- Providing sources so you can verify claims
- Reducing hallucination by constraining generation to retrieved content
How RAG Works
The RAG Pipeline
Step-by-Step Process
- Query Processing - Analyze the user's question
- Document Retrieval - Find relevant documents from knowledge base
- Context Assembly - Combine retrieved documents into context
- Answer Generation - Use LLM to generate answer based on context
- Response with Sources - Return answer with source citations
Example in Action
from recoagent import RecoAgent
# Initialize agent
agent = RecoAgent()
# Add knowledge base
agent.add_documents([
"RecoAgent is an enterprise RAG platform built with LangGraph.",
"It supports hybrid retrieval combining BM25 and vector search.",
"The platform includes built-in evaluation with RAGAS metrics."
])
# Ask a question
response = agent.ask("What is RecoAgent built with?")
print(f"Answer: {response.answer}")
print(f"Sources: {response.sources}")
Output:
Answer: RecoAgent is built with LangGraph, which provides agent orchestration capabilities.
Sources: ['RecoAgent is an enterprise RAG platform built with LangGraph.']
Retrieval Strategies
1. Vector Search (Semantic)
How it works:
- Convert documents and queries to vector embeddings
- Find documents with similar embeddings using cosine similarity
- Good for finding conceptually related content
# Vector search example
from recoagent.retrievers import VectorRetriever
retriever = VectorRetriever(
embedding_model="text-embedding-ada-002",
similarity_threshold=0.7
)
results = retriever.search(
query="machine learning algorithms",
top_k=5
)
Pros:
- Finds semantically similar content
- Handles synonyms and paraphrasing well
- Good for conceptual questions
Cons:
- May miss exact keyword matches
- Requires good embedding models
- Can be computationally expensive
2. Keyword Search (BM25)
How it works:
- Uses traditional information retrieval (TF-IDF)
- Matches exact keywords and phrases
- Good for finding specific facts and details
# BM25 search example
from recoagent.retrievers import BM25Retriever
retriever = BM25Retriever(
k1=1.2, # Term frequency normalization
b=0.75 # Length normalization
)
results = retriever.search(
query="RecoAgent LangGraph",
top_k=5
)
Pros:
- Excellent for exact matches
- Fast and efficient
- Good for factual queries
Cons:
- Misses semantic relationships
- Requires exact keyword matches
- Can't handle synonyms well
3. Hybrid Search (Best of Both)
How it works:
- Combines vector and keyword search results
- Uses Reciprocal Rank Fusion (RRF) to merge rankings
- Gets benefits of both approaches
The Mathematics Behind Hybrid Search
Reciprocal Rank Fusion (RRF) combines rankings from multiple retrieval methods using this formula:
RRF(d) = sum over all retrievers r of: 1 / (k + rank_r(d))
Where:
- d = a document
- r = a retrieval method (vector search, BM25, etc.)
- rank_r(d) = position of document d in ranking r (1st, 2nd, 3rd, etc.)
- k = constant (typically 60) to prevent division by zero
Why This Works: Documents that rank highly across multiple methods get higher combined scores!
Example Calculation:
Let's say we have 3 documents and two retrieval methods:
Document | Vector Rank | BM25 Rank | RRF Calculation | RRF Score |
---|---|---|---|---|
Doc A | 1 | 3 | 1/(60+1) + 1/(60+3) = 0.0164 + 0.0159 | 0.0323 |
Doc B | 2 | 1 | 1/(60+2) + 1/(60+1) = 0.0161 + 0.0164 | 0.0325 ✓ |
Doc C | 3 | 2 | 1/(60+3) + 1/(60+2) = 0.0159 + 0.0161 | 0.0320 |
Final Ranking: Doc B (0.0325) > Doc A (0.0323) > Doc C (0.0320)
Why Doc B wins: Even though it's not #1 in either individual method, it ranks highly in BOTH, giving it the highest combined score!
Weighted Linear Combination (Alternative)
Another fusion approach uses weighted scores directly:
score(d) = α × score_vector(d) + (1-α) × score_BM25(d)
Where α (between 0 and 1) controls the balance. For example:
- α = 0.7 means 70% weight to vector search, 30% to BM25
- α = 0.5 means equal weight to both
- α = 0.9 means heavy preference for semantic search
# Hybrid search example
from recoagent.retrievers import HybridRetriever
retriever = HybridRetriever(
vector_weight=0.7, # 70% vector, 30% BM25
bm25_weight=0.3,
fusion_method="rrf" # Reciprocal Rank Fusion
)
results = retriever.search(
query="How does RecoAgent handle retrieval?",
top_k=10
)
Pros:
- Combines strengths of both approaches
- More robust across different query types
- Better overall performance
- RRF is rank-based (no score normalization needed)
Cons:
- More complex to implement
- Requires tuning weights (for weighted combination)
- Higher computational cost
Document Processing
Chunking Strategies
Documents need to be broken into chunks for retrieval:
from recoagent.chunkers import DocumentChunker
# Fixed-size chunking
chunker = DocumentChunker(
chunk_size=500, # 500 tokens per chunk
chunk_overlap=50 # 50 token overlap
)
chunks = chunker.chunk_document(large_document)
Chunking Considerations:
- Size - Too small loses context, too large dilutes relevance
- Overlap - Prevents losing information at chunk boundaries
- Semantic boundaries - Respect sentence and paragraph breaks
Metadata Preservation
# Add metadata to chunks
chunks = chunker.chunk_document(
document,
metadata={
"source": "user_manual.pdf",
"section": "installation",
"page": 15,
"author": "Technical Team"
}
)
Evaluation and Quality
Why Evaluate RAG Systems?
RAG systems can fail in subtle ways:
- Retrieval failures - Missing relevant documents
- Generation issues - Answers not grounded in retrieved content
- Context problems - Too much or too little context
RAGAS Metrics
RecoAgent uses RAGAS (Retrieval Augmented Generation Assessment) for evaluation:
from recoagent.evaluators import RAGASEvaluator
evaluator = RAGASEvaluator()
# Evaluate your system
results = evaluator.evaluate_rag_system(
agent=agent,
test_questions=[
"What is RecoAgent?",
"How does hybrid search work?",
"What evaluation metrics are used?"
]
)
print(f"Context Precision: {results.metrics.context_precision:.3f}")
print(f"Context Recall: {results.metrics.context_recall:.3f}")
print(f"Faithfulness: {results.metrics.faithfulness:.3f}")
print(f"Answer Relevancy: {results.metrics.answer_relevancy:.3f}")
Key Metrics Explained
RAGAS provides four key metrics to evaluate RAG system quality:
Metric | What It Measures | Target Score | Interpretation |
---|---|---|---|
🎯 Context Precision | Are the retrieved documents relevant? | > 0.7 | 0.9 = 90% of retrieved docs are relevant 0.4 = Only 40% are relevant (too much noise) |
📚 Context Recall | Did we retrieve all relevant documents? | > 0.6 | 0.8 = Found 80% of relevant docs 0.3 = Missed 70% of important info |
✅ Faithfulness | Is the answer factually correct? | > 0.8 | 0.95 = Answer stays true to context 0.5 = Answer hallucinates or adds info |
💬 Answer Relevancy | Does the answer address the question? | > 0.7 | 0.9 = Directly answers the question 0.4 = Answer is off-topic or vague |
Understanding the Trade-offs:
High Precision + Low Recall = Too conservative (misses information)
Example: Only retrieves 2 perfect docs but misses 8 relevant ones
Low Precision + High Recall = Too noisy (includes junk)
Example: Retrieves 50 docs but only 10 are actually relevant
✓ Good Balance: Precision ~0.7-0.8, Recall ~0.6-0.7
Example: Retrieves 10 docs, 7-8 are relevant, captures most key info
Real-World Example:
Question: "What are the system requirements for RecoAgent?"
Scenario | Precision | Recall | Faithfulness | Relevancy | What's Wrong? |
---|---|---|---|---|---|
🔴 Poor | 0.3 | 0.4 | 0.5 | 0.6 | Retrieved mostly irrelevant docs, missed key info, answer made up details |
🟡 Okay | 0.6 | 0.5 | 0.7 | 0.7 | Some noise in results, missed some requirements, mostly accurate |
🟢 Good | 0.8 | 0.7 | 0.9 | 0.9 | Clean relevant results, found most requirements, accurate and on-topic |
Common Challenges and Solutions
Challenge 1: Poor Retrieval Quality
Symptoms:
- Irrelevant documents retrieved
- Missing important information
- Low context precision/recall
Solutions:
# Improve chunking
chunker = DocumentChunker(
chunk_size=300, # Smaller chunks
chunk_overlap=100 # More overlap
)
# Tune retrieval parameters
retriever = HybridRetriever(
vector_weight=0.8, # Adjust weights
bm25_weight=0.2,
top_k=15 # Retrieve more candidates
)
Challenge 2: Generation Not Using Retrieved Context
Symptoms:
- Answers ignore retrieved documents
- Low faithfulness scores
- Hallucinated information
Solutions:
# Improve prompting
agent = RecoAgent(
prompt_template="""
Use the following context to answer the question.
If you can't find the answer in the context, say so.
Context: {context}
Question: {question}
Answer:
"""
)
Challenge 3: Context Length Limits
Symptoms:
- Truncated context
- Missing important information
- Poor performance on complex queries
Solutions:
# Implement context compression
from recoagent.compressors import ContextCompressor
compressor = ContextCompressor(
max_tokens=4000,
compression_ratio=0.7
)
# Use summarization for long contexts
compressed_context = compressor.compress(retrieved_documents)
Advanced RAG Patterns
Multi-Step Retrieval
For complex questions requiring multiple pieces of information:
# First retrieval for main topic
initial_results = retriever.search(query, top_k=5)
# Extract entities and concepts
entities = extract_entities(initial_results)
# Second retrieval for related concepts
for entity in entities:
related_results = retriever.search(entity, top_k=3)
# Combine results...
Iterative Refinement
Improve answers through multiple rounds:
# Initial answer
response = agent.ask(question)
# Check if answer is complete
if response.confidence < 0.8:
# Refine with more specific retrieval
refined_query = f"{question} Specifically: {response.answer}"
refined_response = agent.ask(refined_query)
Best Practices
1. Document Quality
- Use clean, well-structured documents
- Remove duplicates and irrelevant content
- Maintain consistent formatting
2. Chunking Strategy
- Respect semantic boundaries (sentences, paragraphs)
- Use appropriate chunk sizes (300-800 tokens)
- Include sufficient overlap (10-20%)
3. Retrieval Configuration
- Start with hybrid search
- Tune weights based on your data
- Use reranking for better results
4. Evaluation
- Create comprehensive test datasets
- Monitor metrics continuously
- A/B test different configurations
5. User Experience
- Provide source citations
- Show confidence scores
- Handle "I don't know" gracefully
Next Steps
Now that you understand RAG fundamentals:
- 🎯 Your First Agent - Build a complete RAG agent
- 🔧 Hybrid Retrieval - Master advanced retrieval
- 📚 Browse Examples - See RAG in action
- 📖 API Reference - Deep dive into implementation
Summary
You've learned:
- ✅ What RAG is and why it's important
- ✅ How retrieval works with different strategies
- ✅ Document processing and chunking techniques
- ✅ Evaluation metrics and quality measurement
- ✅ Common challenges and their solutions
- ✅ Best practices for building RAG systems
You now have the foundational knowledge to understand how RecoAgent works and why it's designed the way it is. The next tutorial will show you how to build your first complete RAG agent!
Ready to build? Head to Your First Agent to create a complete RAG application!