Simple RAG
This example demonstrates how to build a simple RAG system that loads documents from files and answers questions about their content.
Overview
This example shows:
- Loading documents from text files
- Basic question answering
- Source citation and confidence scoring
- Error handling and validation
Prerequisites
- Python 3.8+
- RecoAgent installed (
pip install recoagent) - OpenAI API key
- Sample text files for the knowledge base
Code Implementation
import os
import glob
from pathlib import Path
from dotenv import load_dotenv
from recoagent import RecoAgent
# Load environment variables
load_dotenv()
class SimpleRAG:
def __init__(self):
"""Initialize the RAG system."""
self.agent = RecoAgent(
llm_provider="openai",
llm_model="gpt-3.5-turbo",
embedding_model="text-embedding-ada-002",
chunk_size=500,
chunk_overlap=50
)
def load_documents_from_files(self, directory: str):
"""Load all text files from a directory."""
documents = []
file_paths = []
# Find all .txt files in directory
txt_files = glob.glob(os.path.join(directory, "*.txt"))
for file_path in txt_files:
try:
with open(file_path, 'r', encoding='utf-8') as file:
content = file.read().strip()
if content:
documents.append(content)
file_paths.append(file_path)
print(f"✅ Loaded: {os.path.basename(file_path)}")
except Exception as e:
print(f"❌ Error loading {file_path}: {e}")
# Add documents to agent
if documents:
self.agent.add_documents(documents)
print(f"\n📚 Total documents loaded: {len(documents)}")
return file_paths
else:
print("❌ No documents found or loaded")
return []
def ask_question(self, question: str):
"""Ask a question and get a detailed response."""
print(f"\n🤔 Question: {question}")
try:
response = self.agent.ask(question)
print(f"💡 Answer: {response.answer}")
print(f"🎯 Confidence: {response.confidence:.2f}")
print(f"📚 Sources: {len(response.sources)} documents")
# Show source details
if response.sources:
print("\n📖 Source Documents:")
for i, source in enumerate(response.sources, 1):
# Truncate long sources
source_preview = source[:100] + "..." if len(source) > 100 else source
print(f" {i}. {source_preview}")
return response
except Exception as e:
print(f"❌ Error processing question: {e}")
return None
def batch_questions(self, questions: list):
"""Ask multiple questions and show results."""
print(f"\n📋 Processing {len(questions)} questions...")
results = []
for i, question in enumerate(questions, 1):
print(f"\n--- Question {i}/{len(questions)} ---")
response = self.ask_question(question)
if response:
results.append({
'question': question,
'answer': response.answer,
'confidence': response.confidence
})
return results
def create_sample_files():
"""Create sample text files for demonstration."""
sample_dir = "sample_docs"
os.makedirs(sample_dir, exist_ok=True)
# Sample documents
docs = {
"company_info.txt": """
RecoHut is a leading provider of enterprise AI solutions.
Founded in 2023, we specialize in RAG (Retrieval-Augmented Generation) systems.
Our flagship product, RecoAgent, helps organizations build intelligent knowledge management systems.
We serve clients across healthcare, finance, legal, and technology sectors.
""",
"product_features.txt": """
RecoAgent Key Features:
- Hybrid retrieval combining BM25 and vector search
- Built-in evaluation with RAGAS metrics
- Safety guardrails using NVIDIA NeMo Guardrails
- Multiple vector store support (OpenSearch, Azure AI Search, Vertex AI)
- LangSmith integration for observability
- FastAPI-based REST API for easy integration
""",
"technical_details.txt": """
Technical Architecture:
RecoAgent is built using Python and follows a modular architecture.
Core components include:
- packages.agents: Agent orchestration using LangGraph
- packages.rag: Retrieval, reranking, and evaluation
- packages.observability: Monitoring, tracing, and metrics
The system supports both local and cloud deployments.
Minimum requirements: Python 3.8+, 4GB RAM, 10GB storage.
"""
}
# Write sample files
for filename, content in docs.items():
filepath = os.path.join(sample_dir, filename)
with open(filepath, 'w', encoding='utf-8') as file:
file.write(content.strip())
print(f"✅ Created sample documents in {sample_dir}/")
return sample_dir
def main():
"""Main function to run the simple RAG example."""
print("🚀 Simple RAG Example")
print("=" * 40)
# Create sample files if they don't exist
sample_dir = create_sample_files()
# Initialize RAG system
rag = SimpleRAG()
# Load documents
print(f"\n📁 Loading documents from {sample_dir}/...")
file_paths = rag.load_documents_from_files(sample_dir)
if not file_paths:
print("❌ No documents loaded. Exiting.")
return
# Interactive Q&A session
print("\n" + "=" * 40)
print("💬 Interactive Q&A Session")
print("=" * 40)
print("Type 'quit' to exit, 'batch' for batch questions")
while True:
user_input = input("\n❓ Ask a question: ").strip()
if user_input.lower() == 'quit':
break
elif user_input.lower() == 'batch':
# Run batch questions
batch_questions = [
"What is RecoHut?",
"What are the key features of RecoAgent?",
"What is the technical architecture?",
"What are the minimum requirements?"
]
rag.batch_questions(batch_questions)
elif user_input:
rag.ask_question(user_input)
print("\n👋 Thanks for using Simple RAG!")
if __name__ == "__main__":
main()
Running the Example
1. Setup
# Create project directory
mkdir simple-rag-example
cd simple-rag-example
# Install dependencies
pip install recoagent python-dotenv
# Create .env file
echo "OPENAI_API_KEY=your_api_key_here" > .env
2. Run the Example
python simple_rag.py
3. Expected Output
🚀 Simple RAG Example
========================================
✅ Created sample documents in sample_docs/
📁 Loading documents from sample_docs/...
✅ Loaded: company_info.txt
✅ Loaded: product_features.txt
✅ Loaded: technical_details.txt
📚 Total documents loaded: 3
========================================
💬 Interactive Q&A Session
========================================
Type 'quit' to exit, 'batch' for batch questions
❓ Ask a question: What is RecoHut?
🤔 Question: What is RecoHut?
💡 Answer: RecoHut is a leading provider of enterprise AI solutions, founded in 2023. We specialize in RAG (Retrieval-Augmented Generation) systems and serve clients across healthcare, finance, legal, and technology sectors.
🎯 Confidence: 0.92
📚 Sources: 1 documents
📖 Source Documents:
1. RecoHut is a leading provider of enterprise AI solutions. Founded in 2023, we specialize in RAG...
❓ Ask a question: batch
📋 Processing 4 questions...
--- Question 1/4 ---
🤔 Question: What is RecoHut?
💡 Answer: RecoHut is a leading provider of enterprise AI solutions, founded in 2023...
🎯 Confidence: 0.92
📚 Sources: 1 documents
--- Question 2/4 ---
🤔 Question: What are the key features of RecoAgent?
💡 Answer: RecoAgent's key features include hybrid retrieval combining BM25 and vector search, built-in evaluation with RAGAS metrics, safety guardrails, multiple vector store support, LangSmith integration, and FastAPI-based REST API.
🎯 Confidence: 0.89
📚 Sources: 1 documents
❓ Ask a question: quit
👋 Thanks for using Simple RAG!
Understanding the Code
Key Components
- Document Loading: Loads all
.txtfiles from a directory - Error Handling: Gracefully handles file loading errors
- Batch Processing: Can process multiple questions at once
- Source Citation: Shows which documents were used for answers
- Confidence Scoring: Provides confidence levels for responses
Customization Options
# Custom chunking strategy
rag = SimpleRAG()
rag.agent = RecoAgent(
chunk_size=300, # Smaller chunks
chunk_overlap=100, # More overlap
chunking_strategy="semantic" # Semantic chunking
)
# Different LLM models
rag.agent = RecoAgent(
llm_model="gpt-4", # Use GPT-4
temperature=0.1, # Lower temperature for consistency
max_tokens=500 # Limit response length
)
Troubleshooting
Common Issues
No documents loaded:
# Check file permissions and encoding
ls -la sample_docs/
file sample_docs/*.txt
API key issues:
# Verify API key is set
echo $OPENAI_API_KEY
# Test API key
python -c "import openai; openai.api_key='$OPENAI_API_KEY'; print('API key valid')"
Low confidence scores:
- Ensure documents contain relevant information
- Try rephrasing questions
- Check document quality and formatting
Next Steps
This example provides a foundation for more advanced RAG systems. You can extend it by:
- Adding more file formats (PDF, Word, HTML)
- Implementing document preprocessing (cleaning, formatting)
- Adding metadata tracking (file names, timestamps)
- Creating a web interface for easier interaction
- Adding evaluation metrics to measure performance
Related Examples
- Minimal Usage - Even simpler setup
- Vector Search - Focus on retrieval
- Basic Agent - Add agent capabilities
Ready for more? Check out the Vector Search Example to learn about different retrieval strategies!