Understanding RAG

This tutorial will teach you the core concepts behind Retrieval-Augmented Generation (RAG), how it works, and why it's essential for building reliable AI applications. By the end, you'll understand the building blocks that make RecoAgent powerful.

What You'll Learn

What RAG is and why it matters
How retrieval and generation work together
Different retrieval strategies and their trade-offs
How evaluation helps improve RAG systems
Common challenges and solutions

What is RAG?

Retrieval-Augmented Generation (RAG) is a technique that combines two powerful AI capabilities:

Retrieval - Finding relevant information from a knowledge base
Generation - Creating natural language responses using that information

Why RAG Matters

Traditional language models have limitations:

Knowledge cutoff - They can't access information after training
Hallucination - They may generate plausible but incorrect information
No source attribution - You can't verify where answers come from

RAG solves these problems by:

Grounding answers in real, up-to-date information
Providing sources so you can verify claims
Reducing hallucination by constraining generation to retrieved content

How RAG Works

The RAG Pipeline

Step-by-Step Process

Query Processing - Analyze the user's question
Document Retrieval - Find relevant documents from knowledge base
Context Assembly - Combine retrieved documents into context
Answer Generation - Use LLM to generate answer based on context
Response with Sources - Return answer with source citations

Example in Action

from recoagent import RecoAgent

# Initialize agent
agent = RecoAgent()

# Add knowledge base
agent.add_documents([
    "RecoAgent is an enterprise RAG platform built with LangGraph.",
    "It supports hybrid retrieval combining BM25 and vector search.",
    "The platform includes built-in evaluation with RAGAS metrics."
])

# Ask a question
response = agent.ask("What is RecoAgent built with?")

print(f"Answer: {response.answer}")
print(f"Sources: {response.sources}")

Output:

Answer: RecoAgent is built with LangGraph, which provides agent orchestration capabilities.
Sources: ['RecoAgent is an enterprise RAG platform built with LangGraph.']

Retrieval Strategies

1. Vector Search (Semantic)

How it works:

Convert documents and queries to vector embeddings
Find documents with similar embeddings using cosine similarity
Good for finding conceptually related content

# Vector search example
from recoagent.retrievers import VectorRetriever

retriever = VectorRetriever(
    embedding_model="text-embedding-ada-002",
    similarity_threshold=0.7
)

results = retriever.search(
    query="machine learning algorithms",
    top_k=5
)

Pros:

Finds semantically similar content
Handles synonyms and paraphrasing well
Good for conceptual questions

Cons:

May miss exact keyword matches
Requires good embedding models
Can be computationally expensive

2. Keyword Search (BM25)

How it works:

Uses traditional information retrieval (TF-IDF)
Matches exact keywords and phrases
Good for finding specific facts and details

# BM25 search example
from recoagent.retrievers import BM25Retriever

retriever = BM25Retriever(
    k1=1.2,  # Term frequency normalization
    b=0.75   # Length normalization
)

results = retriever.search(
    query="RecoAgent LangGraph",
    top_k=5
)

Pros:

Excellent for exact matches
Fast and efficient
Good for factual queries

Cons:

Misses semantic relationships
Requires exact keyword matches
Can't handle synonyms well

3. Hybrid Search (Best of Both)

How it works:

Combines vector and keyword search results
Uses Reciprocal Rank Fusion (RRF) to merge rankings
Gets benefits of both approaches

The Mathematics Behind Hybrid Search

Reciprocal Rank Fusion (RRF) combines rankings from multiple retrieval methods using this formula:

RRF(d) = sum over all retrievers r of: 1 / (k + rank_r(d))

Where:

d = a document
r = a retrieval method (vector search, BM25, etc.)
rank_r(d) = position of document d in ranking r (1st, 2nd, 3rd, etc.)
k = constant (typically 60) to prevent division by zero

Why This Works: Documents that rank highly across multiple methods get higher combined scores!

Example Calculation:

Let's say we have 3 documents and two retrieval methods:

Document	Vector Rank	BM25 Rank	RRF Calculation	RRF Score
Doc A	1	3	1/(60+1) + 1/(60+3) = 0.0164 + 0.0159	0.0323
Doc B	2	1	1/(60+2) + 1/(60+1) = 0.0161 + 0.0164	0.0325 ✓
Doc C	3	2	1/(60+3) + 1/(60+2) = 0.0159 + 0.0161	0.0320

Final Ranking: Doc B (0.0325) > Doc A (0.0323) > Doc C (0.0320)

Why Doc B wins: Even though it's not #1 in either individual method, it ranks highly in BOTH, giving it the highest combined score!

Weighted Linear Combination (Alternative)

Another fusion approach uses weighted scores directly:

score(d) = α × score_vector(d) + (1-α) × score_BM25(d)

Where α (between 0 and 1) controls the balance. For example:

α = 0.7 means 70% weight to vector search, 30% to BM25
α = 0.5 means equal weight to both
α = 0.9 means heavy preference for semantic search

# Hybrid search example
from recoagent.retrievers import HybridRetriever

retriever = HybridRetriever(
    vector_weight=0.7,  # 70% vector, 30% BM25
    bm25_weight=0.3,
    fusion_method="rrf"  # Reciprocal Rank Fusion
)

results = retriever.search(
    query="How does RecoAgent handle retrieval?",
    top_k=10
)

Pros:

Combines strengths of both approaches
More robust across different query types
Better overall performance
RRF is rank-based (no score normalization needed)

Cons:

More complex to implement
Requires tuning weights (for weighted combination)
Higher computational cost

Document Processing

Chunking Strategies

Documents need to be broken into chunks for retrieval:

from recoagent.chunkers import DocumentChunker

# Fixed-size chunking
chunker = DocumentChunker(
    chunk_size=500,      # 500 tokens per chunk
    chunk_overlap=50     # 50 token overlap
)

chunks = chunker.chunk_document(large_document)

Chunking Considerations:

Size - Too small loses context, too large dilutes relevance
Overlap - Prevents losing information at chunk boundaries
Semantic boundaries - Respect sentence and paragraph breaks

Metadata Preservation

# Add metadata to chunks
chunks = chunker.chunk_document(
    document,
    metadata={
        "source": "user_manual.pdf",
        "section": "installation",
        "page": 15,
        "author": "Technical Team"
    }
)

Evaluation and Quality

Why Evaluate RAG Systems?

RAG systems can fail in subtle ways:

Retrieval failures - Missing relevant documents
Generation issues - Answers not grounded in retrieved content
Context problems - Too much or too little context

RAGAS Metrics

RecoAgent uses RAGAS (Retrieval Augmented Generation Assessment) for evaluation:

from recoagent.evaluators import RAGASEvaluator

evaluator = RAGASEvaluator()

# Evaluate your system
results = evaluator.evaluate_rag_system(
    agent=agent,
    test_questions=[
        "What is RecoAgent?",
        "How does hybrid search work?",
        "What evaluation metrics are used?"
    ]
)

print(f"Context Precision: {results.metrics.context_precision:.3f}")
print(f"Context Recall: {results.metrics.context_recall:.3f}")
print(f"Faithfulness: {results.metrics.faithfulness:.3f}")
print(f"Answer Relevancy: {results.metrics.answer_relevancy:.3f}")

Key Metrics Explained

RAGAS provides four key metrics to evaluate RAG system quality:

Metric	What It Measures	Target Score	Interpretation
🎯 Context Precision	Are the retrieved documents relevant?	> 0.7	0.9 = 90% of retrieved docs are relevant 0.4 = Only 40% are relevant (too much noise)
📚 Context Recall	Did we retrieve all relevant documents?	> 0.6	0.8 = Found 80% of relevant docs 0.3 = Missed 70% of important info
✅ Faithfulness	Is the answer factually correct?	> 0.8	0.95 = Answer stays true to context 0.5 = Answer hallucinates or adds info
💬 Answer Relevancy	Does the answer address the question?	> 0.7	0.9 = Directly answers the question 0.4 = Answer is off-topic or vague

Understanding the Trade-offs:

High Precision + Low Recall = Too conservative (misses information)
   Example: Only retrieves 2 perfect docs but misses 8 relevant ones

Low Precision + High Recall = Too noisy (includes junk)
   Example: Retrieves 50 docs but only 10 are actually relevant

✓ Good Balance: Precision ~0.7-0.8, Recall ~0.6-0.7
   Example: Retrieves 10 docs, 7-8 are relevant, captures most key info

Real-World Example:

Question: "What are the system requirements for RecoAgent?"

Scenario	Precision	Recall	Faithfulness	Relevancy	What's Wrong?
🔴 Poor	0.3	0.4	0.5	0.6	Retrieved mostly irrelevant docs, missed key info, answer made up details
🟡 Okay	0.6	0.5	0.7	0.7	Some noise in results, missed some requirements, mostly accurate
🟢 Good	0.8	0.7	0.9	0.9	Clean relevant results, found most requirements, accurate and on-topic

Common Challenges and Solutions

Challenge 1: Poor Retrieval Quality

Symptoms:

Irrelevant documents retrieved
Missing important information
Low context precision/recall

Solutions:

# Improve chunking
chunker = DocumentChunker(
    chunk_size=300,      # Smaller chunks
    chunk_overlap=100    # More overlap
)

# Tune retrieval parameters
retriever = HybridRetriever(
    vector_weight=0.8,   # Adjust weights
    bm25_weight=0.2,
    top_k=15             # Retrieve more candidates
)

Challenge 2: Generation Not Using Retrieved Context

Symptoms:

Answers ignore retrieved documents
Low faithfulness scores
Hallucinated information

Solutions:

# Improve prompting
agent = RecoAgent(
    prompt_template="""
    Use the following context to answer the question.
    If you can't find the answer in the context, say so.
    
    Context: {context}
    Question: {question}
    Answer:
    """
)

Challenge 3: Context Length Limits

Symptoms:

Truncated context
Missing important information
Poor performance on complex queries

Solutions:

# Implement context compression
from recoagent.compressors import ContextCompressor

compressor = ContextCompressor(
    max_tokens=4000,
    compression_ratio=0.7
)

# Use summarization for long contexts
compressed_context = compressor.compress(retrieved_documents)

Advanced RAG Patterns

Multi-Step Retrieval

For complex questions requiring multiple pieces of information:

# First retrieval for main topic
initial_results = retriever.search(query, top_k=5)

# Extract entities and concepts
entities = extract_entities(initial_results)

# Second retrieval for related concepts
for entity in entities:
    related_results = retriever.search(entity, top_k=3)
    # Combine results...

Improve answers through multiple rounds:

# Initial answer
response = agent.ask(question)

# Check if answer is complete
if response.confidence < 0.8:
    # Refine with more specific retrieval
    refined_query = f"{question} Specifically: {response.answer}"
    refined_response = agent.ask(refined_query)

Best Practices

1. Document Quality

Use clean, well-structured documents
Remove duplicates and irrelevant content
Maintain consistent formatting

2. Chunking Strategy

Respect semantic boundaries (sentences, paragraphs)
Use appropriate chunk sizes (300-800 tokens)
Include sufficient overlap (10-20%)

3. Retrieval Configuration

Start with hybrid search
Tune weights based on your data
Use reranking for better results

4. Evaluation

Create comprehensive test datasets
Monitor metrics continuously
A/B test different configurations

5. User Experience

Provide source citations
Show confidence scores
Handle "I don't know" gracefully

Next Steps

Now that you understand RAG fundamentals:

🎯 Your First Agent - Build a complete RAG agent
🔧 Hybrid Retrieval - Master advanced retrieval
📚 Browse Examples - See RAG in action
📖 API Reference - Deep dive into implementation

Summary

You've learned:

✅ What RAG is and why it's important
✅ How retrieval works with different strategies
✅ Document processing and chunking techniques
✅ Evaluation metrics and quality measurement
✅ Common challenges and their solutions
✅ Best practices for building RAG systems

You now have the foundational knowledge to understand how RecoAgent works and why it's designed the way it is. The next tutorial will show you how to build your first complete RAG agent!

Ready to build? Head to Your First Agent to create a complete RAG application!

What You'll Learn​

What is RAG?​

Why RAG Matters​

How RAG Works​

The RAG Pipeline​

Step-by-Step Process​

Example in Action​

Retrieval Strategies​

1. Vector Search (Semantic)​

2. Keyword Search (BM25)​

3. Hybrid Search (Best of Both)​

The Mathematics Behind Hybrid Search​

Weighted Linear Combination (Alternative)​

Document Processing​

Chunking Strategies​

Metadata Preservation​

Evaluation and Quality​

Why Evaluate RAG Systems?​

RAGAS Metrics​

Key Metrics Explained​

Common Challenges and Solutions​

Challenge 1: Poor Retrieval Quality​

Challenge 2: Generation Not Using Retrieved Context​

Challenge 3: Context Length Limits​

Advanced RAG Patterns​

Multi-Step Retrieval​

Iterative Refinement​

Best Practices​

1. Document Quality​

2. Chunking Strategy​

3. Retrieval Configuration​

4. Evaluation​

5. User Experience​

Next Steps​

Summary​

What You'll Learn

What is RAG?

Why RAG Matters

How RAG Works

The RAG Pipeline

Step-by-Step Process

Example in Action

Retrieval Strategies

1. Vector Search (Semantic)

2. Keyword Search (BM25)

3. Hybrid Search (Best of Both)

The Mathematics Behind Hybrid Search

Weighted Linear Combination (Alternative)

Document Processing

Chunking Strategies

Metadata Preservation

Evaluation and Quality

Why Evaluate RAG Systems?

RAGAS Metrics

Key Metrics Explained

Common Challenges and Solutions

Challenge 1: Poor Retrieval Quality

Challenge 2: Generation Not Using Retrieved Context

Challenge 3: Context Length Limits

Advanced RAG Patterns

Multi-Step Retrieval

Iterative Refinement

Best Practices

1. Document Quality

2. Chunking Strategy

3. Retrieval Configuration

4. Evaluation

5. User Experience

Next Steps

Summary