LangGraph Agent Orchestration
Learn how to build sophisticated, stateful agents using RecoAgent's LangGraph integration. This tutorial covers the complete agent workflow from retrieval to answering with error handling and escalation.
What You'll Learn
- How to create a stateful agent with LangGraph
- Building multi-step workflows with conditional logic
- Implementing error handling and retry mechanisms
- Using tools and escalation in agent workflows
- Monitoring agent execution with observability
Prerequisites
- Basic understanding of LangGraph concepts
- Python 3.8+ installed
- RecoAgent installed:
pip install recoagent
LangGraph Architecture Overview
Step 1: Understanding the Agent State
RecoAgent uses a comprehensive state machine that tracks the entire conversation flow:
from packages.agents import AgentState, AgentConfig
from typing import Dict, Any
# The AgentState tracks everything during execution
state_example = {
"messages": [], # Chat history
"query": "How do I deploy to production?",
"retrieved_docs": [], # Documents from retrieval
"reranked_docs": [], # Reranked results
"plan": "I need to find deployment documentation", # Agent's plan
"action": "retrieve_docs", # Current action
"answer": None, # Final answer
"error": None, # Any errors
"metadata": {}, # Additional context
"step_count": 0, # Execution steps
"max_steps": 5, # Maximum allowed steps
"cost_tracker": {}, # Cost monitoring
"latency_tracker": {} # Performance tracking
}
Visual State Transitions
Let's see how state evolves through a real query:
Key Insight: Each node reads from and writes to the shared state, creating a traceable execution history!
Step 2: Creating Your First Agent
Let's build a simple RAG agent that can retrieve information and answer questions:
import os
from packages.agents import RAGAgentGraph, AgentConfig, ToolRegistry
from packages.rag import HybridRetriever, VectorRetriever, BM25Retriever
from packages.rag.stores import OpenSearchStore
# Configure the agent
config = AgentConfig(
model_name="gpt-4",
temperature=0.1,
max_tokens=1000,
max_steps=5,
cost_limit=0.10,
safety_enabled=True
)
# Set up vector store
vector_store = OpenSearchStore(
endpoint="http://localhost:9200",
index_name="knowledge_base"
)
# Create retrievers
vector_retriever = VectorRetriever(vector_store=vector_store)
bm25_retriever = BM25Retriever(vector_store=vector_store)
hybrid_retriever = HybridRetriever(
vector_retriever=vector_retriever,
bm25_retriever=bm25_retriever,
alpha=0.7 # 70% vector, 30% BM25
)
# Create tool registry
tool_registry = ToolRegistry()
tool_registry.register_retrieval_tool(hybrid_retriever)
# Create the agent
agent = RAGAgentGraph(
config=config,
tool_registry=tool_registry
)
print("Agent created successfully!")
Step 3: Running the Agent
Now let's run the agent with a sample query:
# Run the agent
query = "What are the best practices for deploying RecoAgent to production?"
result = await agent.run(query, user_id="tutorial_user")
print(f"Query: {result['query']}")
print(f"Answer: {result['answer']}")
print(f"Cost: ${result['cost']:.4f}")
print(f"Latency: {result['latency_ms']:.0f}ms")
print(f"Steps taken: {result['metadata']['step_count']}")
Step 4: Understanding the Workflow
The agent follows this state machine flow:
Real-World Scenario: Multi-Step Research Query
Let's trace a complex query through the entire agent lifecycle:
Query: "What are the security best practices for deploying RecoAgent with sensitive healthcare data?"
Execution Timeline
| Step | Node | Action | State Changes | Time | Cost |
|---|---|---|---|---|---|
| 1 | Start | Receive query | query set, step_count=1 | 0ms | $0 |
| 2 | Retrieval | Search "RecoAgent security healthcare" | retrieved_docs=[10] | 45ms | $0.002 |
| 3 | Reranking | Score docs by relevance | reranked_docs=[5] | 120ms | $0.003 |
| 4 | Planning | Analyze: Need HIPAA + deployment info | plan="Multi-topic query" | 15ms | $0 |
| 5 | Tool Use | Call web_search for latest HIPAA | external_data added | 850ms | $0.005 |
| 6 | Synthesis | Combine docs + external data | context assembled | 30ms | $0 |
| 7 | Generation | Generate comprehensive answer | answer set | 1200ms | $0.015 |
| 8 | Complete | Return with sources | final_response ready | 10ms | $0 |
| Total | 2.27s | $0.025 |
State Inspector Output
# Step 2: After Retrieval
{
"query": "security best practices healthcare...",
"retrieved_docs": [
{"title": "RecoAgent Security Guide", "score": 0.89},
{"title": "Healthcare Data Protection", "score": 0.85},
{"title": "HIPAA Compliance Checklist", "score": 0.82},
# ... 7 more docs
],
"step_count": 2,
"cost_so_far": 0.002
}
# Step 5: After Tool Use
{
# ... previous state ...
"reranked_docs": [top 5 docs],
"plan": "Need external HIPAA info - using web search",
"external_data": {
"source": "web_search",
"results": ["Latest HIPAA updates 2024..."]
},
"step_count": 5,
"cost_so_far": 0.010
}
# Step 8: Final State
{
# ... all previous state ...
"answer": "To deploy RecoAgent with healthcare data...",
"sources": [
"RecoAgent Security Guide (internal)",
"HIPAA Compliance 2024 (web)",
"Healthcare Data Protection (internal)"
],
"metadata": {
"total_steps": 8,
"tools_used": ["retrieval", "reranking", "web_search"],
"confidence": 0.92
},
"cost_so_far": 0.025,
"latency_ms": 2270
}
What Made This Complex:
- ✅ Multi-topic query (security + healthcare + deployment)
- ✅ Required external data (latest HIPAA rules)
- ✅ Multiple retrieval passes
- ✅ Synthesis of internal + external knowledge
Step 5: Adding Custom Tools
Let's add a custom tool for web search:
from packages.agents.tools import BaseTool
from typing import Dict, Any
class WebSearchTool(BaseTool):
name = "web_search"
description = "Search the web for current information"
def _run(self, query: str, **kwargs) -> Dict[str, Any]:
# Simulate web search
results = [
{
"title": f"Search result for: {query}",
"url": "https://example.com",
"snippet": f"Relevant information about {query}"
}
]
return {"results": results}
# Register the custom tool
tool_registry.register_tool(WebSearchTool())
# Update the agent with the new tool
agent = RAGAgentGraph(
config=config,
tool_registry=tool_registry
)
Step 6: Error Handling and Escalation
The agent includes built-in error handling and escalation:
# Configure escalation policies
from packages.agents.policies import EscalationPolicy
escalation_policy = EscalationPolicy(
max_cost=0.05, # Escalate if cost exceeds $0.05
max_steps=3, # Escalate after 3 steps
error_threshold=2, # Escalate after 2 errors
sensitive_topics=["financial", "medical", "legal"]
)
# Update agent config
config.escalation_policy = escalation_policy
# Run with a complex query that might trigger escalation
complex_query = """
I need help with a complex financial calculation involving
multiple currencies and tax implications for international business.
"""
result = await agent.run(complex_query, user_id="tutorial_user")
if result.get("escalated"):
print("Query was escalated to human agent")
print(f"Reason: {result['metadata']['escalation_reason']}")
else:
print("Agent handled the query successfully")
Performance Comparison: Traditional vs LangGraph Agent
| Aspect | Traditional RAG | LangGraph Agent | Improvement |
|---|---|---|---|
| Simple Query | 0.8s, $0.008 | 1.2s, $0.012 | 🟡 Slightly slower |
| Complex Query | 1.5s, $0.015 Often incomplete | 2.3s, $0.025 Comprehensive | 🟢 Better quality |
| Multi-hop Research | Not supported | 3.5s, $0.040 Fully supported | 🟢 New capability |
| Error Recovery | Fails immediately | Retries, then escalates | 🟢 More reliable |
| Tool Integration | Manual coordination | Automatic orchestration | 🟢 Much easier |
| Observability | Limited | Full trace of decisions | 🟢 Better debugging |
| Cost Control | No built-in limits | Configurable limits | 🟢 Production-safe |
When to Use Each:
Use Traditional RAG when:
✓ Queries are simple and direct
✓ Speed is critical (< 1s response time)
✓ Cost needs to be minimal
✓ Single-pass retrieval is sufficient
Use LangGraph Agent when:
✓ Queries need multi-step reasoning
✓ Multiple tools/data sources required
✓ Quality matters more than speed
✓ Need error handling and escalation
✓ Want full observability
Common Agent Patterns
Pattern 1: Simple RAG Agent
# Best for: FAQ, documentation search
# Complexity: ⭐ Low
# Cost: $ Low
agent = RAGAgentGraph(
config=AgentConfig(max_steps=2), # Just retrieve and answer
tools=[retrieval_tool]
)
Flow: Query → Retrieve → Answer
Use Case: "What is RecoAgent?" - Direct factual questions
Pattern 2: Research Agent (Multi-Hop)
# Best for: Complex research, analysis
# Complexity: ⭐⭐⭐ High
# Cost: $$$ High
agent = RAGAgentGraph(
config=AgentConfig(max_steps=10),
tools=[retrieval_tool, web_search_tool, calculator_tool]
)
Flow: Query → Retrieve → Analyze → Search More → Synthesize → Answer
Use Case: "Compare deployment options considering cost, security, and scalability"
Pattern 3: Conversational Agent with Memory
# Best for: Chat, ongoing conversations
# Complexity: ⭐⭐ Medium
# Cost: $$ Medium
agent = RAGAgentGraph(
config=AgentConfig(
max_steps=5,
enable_memory=True,
memory_window=10 # Remember last 10 messages
),
tools=[retrieval_tool]
)
Flow: Query + History → Contextualize → Retrieve → Answer
Use Case: Follow-up questions: "Tell me more about that" or "What about X?"
Pattern 4: Tool-Heavy Agent
# Best for: Actions, integrations
# Complexity: ⭐⭐⭐ High
# Cost: $$$ High
agent = RAGAgentGraph(
config=AgentConfig(max_steps=8),
tools=[
retrieval_tool,
database_query_tool,
api_call_tool,
file_reader_tool
]
)
Flow: Query → Plan → Use Multiple Tools → Aggregate → Answer
Use Case: "Pull deployment stats from database and compare with documentation"
Error Handling Decision Tree
Step 7: Monitoring and Observability
Add comprehensive monitoring to your agent:
from packages.observability import LangSmithClient, LangSmithConfig
from packages.agents.callbacks import AgentCallbackHandler
# Set up LangSmith integration
langsmith_config = LangSmithConfig(
api_key=os.getenv("LANGSMITH_API_KEY"),
project="recoagent-tutorial"
)
langsmith_client = LangSmithClient(langsmith_config)
# Create callback handler
callback_handler = AgentCallbackHandler(langsmith_client)
# Add callbacks to agent
agent = RAGAgentGraph(
config=config,
tool_registry=tool_registry,
callback_handlers=[callback_handler]
)
# Run agent with monitoring
result = await agent.run(query, user_id="tutorial_user")
# View traces in LangSmith dashboard
print(f"Trace ID: {callback_handler.current_trace_id}")
Step 8: Advanced Configuration
Configure the agent for production use:
# Production-ready configuration
production_config = AgentConfig(
model_name="gpt-4",
temperature=0.0, # More deterministic for production
max_tokens=2000,
max_steps=10,
cost_limit=0.50, # Higher limit for production
timeout_seconds=60,
safety_enabled=True,
enable_escalation=True,
enable_web_search=True,
enable_citations=True
)
# Add safety middleware
from packages.agents.middleware import GuardrailsMiddleware
guardrails = GuardrailsMiddleware(
enable_pii_detection=True,
enable_content_filtering=True,
enable_rate_limiting=True,
max_requests_per_minute=60
)
production_agent = RAGAgentGraph(
config=production_config,
tool_registry=tool_registry,
callback_handlers=[callback_handler]
)
# Add middleware
production_agent.guardrails = guardrails
Cost & Performance Optimization
Where Costs Come From
Total Agent Cost Breakdown (typical complex query):
LLM API Calls $0.015 (60%) ████████████
├─ Planning $0.003
├─ Answer Generation $0.010
└─ Tool Decisions $0.002
Embedding API $0.005 (20%) ████
└─ Vector Search
Reranking $0.003 (12%) ██
└─ Cross-Encoder
Tools/External $0.002 (8%) █
└─ Web Search, etc.
TOTAL: $0.025 (100%)
Optimization Strategies
| Strategy | Impact | Trade-off | Recommended For |
|---|---|---|---|
| Use GPT-3.5 for simple queries | -70% cost | Slightly lower quality | FAQ, simple lookups |
| Cache embeddings | -20% cost | Storage needed | Repeated queries |
| Limit max_steps to 3-5 | -30% cost | May miss complex answers | Most use cases |
| Batch API calls | -50% latency | Slightly higher complexity | High volume |
| Pre-filter with BM25 | -40% cost | May miss semantic matches | Keyword-heavy queries |
| Skip reranking for high scores | -15% cost | Slightly lower precision | When top results are great |
| Use streaming responses | Better UX | More complex code | User-facing apps |
Before & After Optimization
Before:
- Average query: $0.035, 3.2s
- 1000 queries/day = $35/day = $1,050/month
After Optimization:
- Average query: $0.015, 1.8s
- 1000 queries/day = $15/day = $450/month
- Savings: $600/month (57% reduction)
Debugging Techniques
1. State Inspector Tool
def inspect_state(state: Dict, step: int):
"""Print formatted state for debugging"""
print(f"\n{'='*60}")
print(f"STEP {step}: {state.get('action', 'unknown')}")
print(f"{'='*60}")
# Key metrics
print(f"📊 Metrics:")
print(f" Steps: {state['step_count']}/{state['max_steps']}")
print(f" Cost: ${state.get('cost_so_far', 0):.4f}")
print(f" Time: {state.get('latency_ms', 0):.0f}ms")
# Documents
if state.get('retrieved_docs'):
print(f"\n📄 Retrieved: {len(state['retrieved_docs'])} docs")
for i, doc in enumerate(state['retrieved_docs'][:3], 1):
print(f" {i}. {doc.get('title', 'Untitled')[:50]} (score: {doc.get('score', 0):.2f})")
# Current plan
if state.get('plan'):
print(f"\n🎯 Plan: {state['plan']}")
# Errors
if state.get('error'):
print(f"\n⚠️ Error: {state['error']}")
# Use in your agent
agent = RAGAgentGraph(config=config, debug_callback=inspect_state)
Output Example:
============================================================
STEP 3: reranking
============================================================
📊 Metrics:
Steps: 3/5
Cost: $0.0050
Time: 165ms
📄 Retrieved: 10 docs
1. RecoAgent Deployment Guide (score: 0.89)
2. Production Best Practices (score: 0.85)
3. Security Checklist (score: 0.82)
🎯 Plan: Focus on deployment and security aspects
2. Replay Agent Execution
# Save execution trace
result = await agent.run(query, save_trace=True)
trace_id = result['trace_id']
# Later, replay the exact same execution
from packages.agents.replay import AgentReplayer
replayer = AgentReplayer(agent)
replay_result = replayer.replay(trace_id)
# Compare outputs
print("Original:", result['answer'])
print("Replayed:", replay_result['answer'])
print("Match:", result['answer'] == replay_result['answer'])
3. Unit Test Conditional Logic
import pytest
from packages.agents import should_continue, should_escalate
def test_escalation_logic():
"""Test when agent should escalate"""
# Should escalate: cost limit hit
state = {"cost_so_far": 0.06, "max_cost": 0.05}
assert should_escalate(state) == True
# Should NOT escalate: under limit
state = {"cost_so_far": 0.03, "max_cost": 0.05}
assert should_escalate(state) == False
# Should escalate: too many errors
state = {"error_count": 3, "max_errors": 2}
assert should_escalate(state) == True
def test_continue_logic():
"""Test when agent should continue"""
# Should continue: has plan and under limits
state = {
"plan": "Need more info",
"step_count": 3,
"max_steps": 5,
"has_answer": False
}
assert should_continue(state) == True
# Should stop: has answer
state["has_answer"] = True
assert should_continue(state) == False
Agent Configuration Cheat Sheet
Quick Reference
| Parameter | Type | Default | Recommended Values | Description |
|---|---|---|---|---|
model_name | str | "gpt-4" | gpt-4, gpt-3.5-turbo, claude-3 | LLM to use |
temperature | float | 0.1 | 0.0-0.3 (prod), 0.5-0.8 (creative) | Randomness in responses |
max_tokens | int | 1000 | 500-2000 | Max response length |
max_steps | int | 5 | 2-3 (simple), 5-8 (complex), 10+ (research) | Execution limit |
cost_limit | float | 0.10 | 0.01-0.05 (prod), 0.50+ (research) | Budget per query |
timeout_seconds | int | 60 | 30 (simple), 120 (complex) | Max execution time |
enable_escalation | bool | True | True (prod), False (dev) | Human handoff |
enable_memory | bool | False | True (chat), False (stateless) | Conversation history |
memory_window | int | 10 | 5-10 (chat), 20+ (long context) | Messages to remember |
safety_enabled | bool | True | True (always!) | Content filtering |
enable_web_search | bool | False | True (research), False (internal only) | External data |
enable_citations | bool | True | True (prod), False (speed test) | Source tracking |
Configuration Presets
from packages.agents import AgentConfig
# Preset 1: Fast & Cheap (FAQ Bot)
faq_config = AgentConfig(
model_name="gpt-3.5-turbo",
temperature=0.0,
max_steps=2,
cost_limit=0.01,
timeout_seconds=15
)
# Preset 2: Balanced (General Purpose)
balanced_config = AgentConfig(
model_name="gpt-4",
temperature=0.1,
max_steps=5,
cost_limit=0.05,
timeout_seconds=60,
enable_escalation=True
)
# Preset 3: Deep Research (Complex Queries)
research_config = AgentConfig(
model_name="gpt-4",
temperature=0.2,
max_steps=15,
cost_limit=0.50,
timeout_seconds=180,
enable_web_search=True,
enable_memory=True
)
# Preset 4: Production Chat
chat_config = AgentConfig(
model_name="gpt-4",
temperature=0.3,
max_steps=5,
cost_limit=0.10,
enable_memory=True,
memory_window=10,
safety_enabled=True,
enable_citations=True
)
What You've Learned
You now understand how to:
Core Concepts
✅ Create stateful agents with LangGraph integration
✅ Build multi-step workflows with conditional logic
✅ Understand state transitions and how data flows through nodes
Advanced Capabilities
✅ Handle errors and escalation with decision tree logic
✅ Add custom tools to extend agent capabilities
✅ Choose the right agent pattern for your use case (Simple, Research, Chat, Tool-Heavy)
Production Readiness
✅ Monitor agent execution with LangSmith and callbacks
✅ Optimize costs and performance using proven strategies
✅ Debug agent behavior with state inspection and replay
✅ Configure agents using preset configurations
Real-World Skills
✅ Trace complex queries through multi-step execution
✅ Compare Traditional RAG vs LangGraph trade-offs
✅ Unit test agent logic for reliability
✅ Choose optimal configuration for different scenarios
Next Steps
- How-To Guides: Learn specific implementation patterns
- Examples: See working code for different scenarios
- Reference: Explore the complete API documentation
- Explanations: Understand the architecture and design decisions