Skip to main content

Understanding RAG

This tutorial will teach you the core concepts behind Retrieval-Augmented Generation (RAG), how it works, and why it's essential for building reliable AI applications. By the end, you'll understand the building blocks that make RecoAgent powerful.

What You'll Learn

  • What RAG is and why it matters
  • How retrieval and generation work together
  • Different retrieval strategies and their trade-offs
  • How evaluation helps improve RAG systems
  • Common challenges and solutions

What is RAG?

Retrieval-Augmented Generation (RAG) is a technique that combines two powerful AI capabilities:

  1. Retrieval - Finding relevant information from a knowledge base
  2. Generation - Creating natural language responses using that information

Why RAG Matters

Traditional language models have limitations:

  • Knowledge cutoff - They can't access information after training
  • Hallucination - They may generate plausible but incorrect information
  • No source attribution - You can't verify where answers come from

RAG solves these problems by:

  • Grounding answers in real, up-to-date information
  • Providing sources so you can verify claims
  • Reducing hallucination by constraining generation to retrieved content

How RAG Works

The RAG Pipeline

Step-by-Step Process

  1. Query Processing - Analyze the user's question
  2. Document Retrieval - Find relevant documents from knowledge base
  3. Context Assembly - Combine retrieved documents into context
  4. Answer Generation - Use LLM to generate answer based on context
  5. Response with Sources - Return answer with source citations

Example in Action

from recoagent import RecoAgent

# Initialize agent
agent = RecoAgent()

# Add knowledge base
agent.add_documents([
"RecoAgent is an enterprise RAG platform built with LangGraph.",
"It supports hybrid retrieval combining BM25 and vector search.",
"The platform includes built-in evaluation with RAGAS metrics."
])

# Ask a question
response = agent.ask("What is RecoAgent built with?")

print(f"Answer: {response.answer}")
print(f"Sources: {response.sources}")

Output:

Answer: RecoAgent is built with LangGraph, which provides agent orchestration capabilities.
Sources: ['RecoAgent is an enterprise RAG platform built with LangGraph.']

Retrieval Strategies

1. Vector Search (Semantic)

How it works:

  • Convert documents and queries to vector embeddings
  • Find documents with similar embeddings using cosine similarity
  • Good for finding conceptually related content
# Vector search example
from recoagent.retrievers import VectorRetriever

retriever = VectorRetriever(
embedding_model="text-embedding-ada-002",
similarity_threshold=0.7
)

results = retriever.search(
query="machine learning algorithms",
top_k=5
)

Pros:

  • Finds semantically similar content
  • Handles synonyms and paraphrasing well
  • Good for conceptual questions

Cons:

  • May miss exact keyword matches
  • Requires good embedding models
  • Can be computationally expensive

2. Keyword Search (BM25)

How it works:

  • Uses traditional information retrieval (TF-IDF)
  • Matches exact keywords and phrases
  • Good for finding specific facts and details
# BM25 search example
from recoagent.retrievers import BM25Retriever

retriever = BM25Retriever(
k1=1.2, # Term frequency normalization
b=0.75 # Length normalization
)

results = retriever.search(
query="RecoAgent LangGraph",
top_k=5
)

Pros:

  • Excellent for exact matches
  • Fast and efficient
  • Good for factual queries

Cons:

  • Misses semantic relationships
  • Requires exact keyword matches
  • Can't handle synonyms well

3. Hybrid Search (Best of Both)

How it works:

  • Combines vector and keyword search results
  • Uses Reciprocal Rank Fusion (RRF) to merge rankings
  • Gets benefits of both approaches

Reciprocal Rank Fusion (RRF) combines rankings from multiple retrieval methods using this formula:

RRF(d) = sum over all retrievers r of: 1 / (k + rank_r(d))

Where:

  • d = a document
  • r = a retrieval method (vector search, BM25, etc.)
  • rank_r(d) = position of document d in ranking r (1st, 2nd, 3rd, etc.)
  • k = constant (typically 60) to prevent division by zero

Why This Works: Documents that rank highly across multiple methods get higher combined scores!

Example Calculation:

Let's say we have 3 documents and two retrieval methods:

DocumentVector RankBM25 RankRRF CalculationRRF Score
Doc A131/(60+1) + 1/(60+3) = 0.0164 + 0.01590.0323
Doc B211/(60+2) + 1/(60+1) = 0.0161 + 0.01640.0325
Doc C321/(60+3) + 1/(60+2) = 0.0159 + 0.01610.0320

Final Ranking: Doc B (0.0325) > Doc A (0.0323) > Doc C (0.0320)

Why Doc B wins: Even though it's not #1 in either individual method, it ranks highly in BOTH, giving it the highest combined score!

Weighted Linear Combination (Alternative)

Another fusion approach uses weighted scores directly:

score(d) = α × score_vector(d) + (1-α) × score_BM25(d)

Where α (between 0 and 1) controls the balance. For example:

  • α = 0.7 means 70% weight to vector search, 30% to BM25
  • α = 0.5 means equal weight to both
  • α = 0.9 means heavy preference for semantic search
# Hybrid search example
from recoagent.retrievers import HybridRetriever

retriever = HybridRetriever(
vector_weight=0.7, # 70% vector, 30% BM25
bm25_weight=0.3,
fusion_method="rrf" # Reciprocal Rank Fusion
)

results = retriever.search(
query="How does RecoAgent handle retrieval?",
top_k=10
)

Pros:

  • Combines strengths of both approaches
  • More robust across different query types
  • Better overall performance
  • RRF is rank-based (no score normalization needed)

Cons:

  • More complex to implement
  • Requires tuning weights (for weighted combination)
  • Higher computational cost

Document Processing

Chunking Strategies

Documents need to be broken into chunks for retrieval:

from recoagent.chunkers import DocumentChunker

# Fixed-size chunking
chunker = DocumentChunker(
chunk_size=500, # 500 tokens per chunk
chunk_overlap=50 # 50 token overlap
)

chunks = chunker.chunk_document(large_document)

Chunking Considerations:

  • Size - Too small loses context, too large dilutes relevance
  • Overlap - Prevents losing information at chunk boundaries
  • Semantic boundaries - Respect sentence and paragraph breaks

Metadata Preservation

# Add metadata to chunks
chunks = chunker.chunk_document(
document,
metadata={
"source": "user_manual.pdf",
"section": "installation",
"page": 15,
"author": "Technical Team"
}
)

Evaluation and Quality

Why Evaluate RAG Systems?

RAG systems can fail in subtle ways:

  • Retrieval failures - Missing relevant documents
  • Generation issues - Answers not grounded in retrieved content
  • Context problems - Too much or too little context

RAGAS Metrics

RecoAgent uses RAGAS (Retrieval Augmented Generation Assessment) for evaluation:

from recoagent.evaluators import RAGASEvaluator

evaluator = RAGASEvaluator()

# Evaluate your system
results = evaluator.evaluate_rag_system(
agent=agent,
test_questions=[
"What is RecoAgent?",
"How does hybrid search work?",
"What evaluation metrics are used?"
]
)

print(f"Context Precision: {results.metrics.context_precision:.3f}")
print(f"Context Recall: {results.metrics.context_recall:.3f}")
print(f"Faithfulness: {results.metrics.faithfulness:.3f}")
print(f"Answer Relevancy: {results.metrics.answer_relevancy:.3f}")

Key Metrics Explained

RAGAS provides four key metrics to evaluate RAG system quality:

MetricWhat It MeasuresTarget ScoreInterpretation
🎯 Context PrecisionAre the retrieved documents relevant?> 0.70.9 = 90% of retrieved docs are relevant
0.4 = Only 40% are relevant (too much noise)
📚 Context RecallDid we retrieve all relevant documents?> 0.60.8 = Found 80% of relevant docs
0.3 = Missed 70% of important info
FaithfulnessIs the answer factually correct?> 0.80.95 = Answer stays true to context
0.5 = Answer hallucinates or adds info
💬 Answer RelevancyDoes the answer address the question?> 0.70.9 = Directly answers the question
0.4 = Answer is off-topic or vague

Understanding the Trade-offs:

High Precision + Low Recall = Too conservative (misses information)
Example: Only retrieves 2 perfect docs but misses 8 relevant ones

Low Precision + High Recall = Too noisy (includes junk)
Example: Retrieves 50 docs but only 10 are actually relevant

✓ Good Balance: Precision ~0.7-0.8, Recall ~0.6-0.7
Example: Retrieves 10 docs, 7-8 are relevant, captures most key info

Real-World Example:

Question: "What are the system requirements for RecoAgent?"

ScenarioPrecisionRecallFaithfulnessRelevancyWhat's Wrong?
🔴 Poor0.30.40.50.6Retrieved mostly irrelevant docs, missed key info, answer made up details
🟡 Okay0.60.50.70.7Some noise in results, missed some requirements, mostly accurate
🟢 Good0.80.70.90.9Clean relevant results, found most requirements, accurate and on-topic

Common Challenges and Solutions

Challenge 1: Poor Retrieval Quality

Symptoms:

  • Irrelevant documents retrieved
  • Missing important information
  • Low context precision/recall

Solutions:

# Improve chunking
chunker = DocumentChunker(
chunk_size=300, # Smaller chunks
chunk_overlap=100 # More overlap
)

# Tune retrieval parameters
retriever = HybridRetriever(
vector_weight=0.8, # Adjust weights
bm25_weight=0.2,
top_k=15 # Retrieve more candidates
)

Challenge 2: Generation Not Using Retrieved Context

Symptoms:

  • Answers ignore retrieved documents
  • Low faithfulness scores
  • Hallucinated information

Solutions:

# Improve prompting
agent = RecoAgent(
prompt_template="""
Use the following context to answer the question.
If you can't find the answer in the context, say so.

Context: {context}
Question: {question}
Answer:
"""
)

Challenge 3: Context Length Limits

Symptoms:

  • Truncated context
  • Missing important information
  • Poor performance on complex queries

Solutions:

# Implement context compression
from recoagent.compressors import ContextCompressor

compressor = ContextCompressor(
max_tokens=4000,
compression_ratio=0.7
)

# Use summarization for long contexts
compressed_context = compressor.compress(retrieved_documents)

Advanced RAG Patterns

Multi-Step Retrieval

For complex questions requiring multiple pieces of information:

# First retrieval for main topic
initial_results = retriever.search(query, top_k=5)

# Extract entities and concepts
entities = extract_entities(initial_results)

# Second retrieval for related concepts
for entity in entities:
related_results = retriever.search(entity, top_k=3)
# Combine results...

Iterative Refinement

Improve answers through multiple rounds:

# Initial answer
response = agent.ask(question)

# Check if answer is complete
if response.confidence < 0.8:
# Refine with more specific retrieval
refined_query = f"{question} Specifically: {response.answer}"
refined_response = agent.ask(refined_query)

Best Practices

1. Document Quality

  • Use clean, well-structured documents
  • Remove duplicates and irrelevant content
  • Maintain consistent formatting

2. Chunking Strategy

  • Respect semantic boundaries (sentences, paragraphs)
  • Use appropriate chunk sizes (300-800 tokens)
  • Include sufficient overlap (10-20%)

3. Retrieval Configuration

  • Start with hybrid search
  • Tune weights based on your data
  • Use reranking for better results

4. Evaluation

  • Create comprehensive test datasets
  • Monitor metrics continuously
  • A/B test different configurations

5. User Experience

  • Provide source citations
  • Show confidence scores
  • Handle "I don't know" gracefully

Next Steps

Now that you understand RAG fundamentals:

  1. 🎯 Your First Agent - Build a complete RAG agent
  2. 🔧 Hybrid Retrieval - Master advanced retrieval
  3. 📚 Browse Examples - See RAG in action
  4. 📖 API Reference - Deep dive into implementation

Summary

You've learned:

  • What RAG is and why it's important
  • How retrieval works with different strategies
  • Document processing and chunking techniques
  • Evaluation metrics and quality measurement
  • Common challenges and their solutions
  • Best practices for building RAG systems

You now have the foundational knowledge to understand how RecoAgent works and why it's designed the way it is. The next tutorial will show you how to build your first complete RAG agent!


Ready to build? Head to Your First Agent to create a complete RAG application!