Skip to main content

LangGraph Agent Orchestration

Learn how to build sophisticated, stateful agents using RecoAgent's LangGraph integration. This tutorial covers the complete agent workflow from retrieval to answering with error handling and escalation.

What You'll Learn

  • How to create a stateful agent with LangGraph
  • Building multi-step workflows with conditional logic
  • Implementing error handling and retry mechanisms
  • Using tools and escalation in agent workflows
  • Monitoring agent execution with observability

Prerequisites

  • Basic understanding of LangGraph concepts
  • Python 3.8+ installed
  • RecoAgent installed: pip install recoagent

LangGraph Architecture Overview

Step 1: Understanding the Agent State

RecoAgent uses a comprehensive state machine that tracks the entire conversation flow:

from packages.agents import AgentState, AgentConfig
from typing import Dict, Any

# The AgentState tracks everything during execution
state_example = {
"messages": [], # Chat history
"query": "How do I deploy to production?",
"retrieved_docs": [], # Documents from retrieval
"reranked_docs": [], # Reranked results
"plan": "I need to find deployment documentation", # Agent's plan
"action": "retrieve_docs", # Current action
"answer": None, # Final answer
"error": None, # Any errors
"metadata": {}, # Additional context
"step_count": 0, # Execution steps
"max_steps": 5, # Maximum allowed steps
"cost_tracker": {}, # Cost monitoring
"latency_tracker": {} # Performance tracking
}

Visual State Transitions

Let's see how state evolves through a real query:

Key Insight: Each node reads from and writes to the shared state, creating a traceable execution history!

Step 2: Creating Your First Agent

Let's build a simple RAG agent that can retrieve information and answer questions:

import os
from packages.agents import RAGAgentGraph, AgentConfig, ToolRegistry
from packages.rag import HybridRetriever, VectorRetriever, BM25Retriever
from packages.rag.stores import OpenSearchStore

# Configure the agent
config = AgentConfig(
model_name="gpt-4",
temperature=0.1,
max_tokens=1000,
max_steps=5,
cost_limit=0.10,
safety_enabled=True
)

# Set up vector store
vector_store = OpenSearchStore(
endpoint="http://localhost:9200",
index_name="knowledge_base"
)

# Create retrievers
vector_retriever = VectorRetriever(vector_store=vector_store)
bm25_retriever = BM25Retriever(vector_store=vector_store)
hybrid_retriever = HybridRetriever(
vector_retriever=vector_retriever,
bm25_retriever=bm25_retriever,
alpha=0.7 # 70% vector, 30% BM25
)

# Create tool registry
tool_registry = ToolRegistry()
tool_registry.register_retrieval_tool(hybrid_retriever)

# Create the agent
agent = RAGAgentGraph(
config=config,
tool_registry=tool_registry
)

print("Agent created successfully!")

Step 3: Running the Agent

Now let's run the agent with a sample query:

# Run the agent
query = "What are the best practices for deploying RecoAgent to production?"

result = await agent.run(query, user_id="tutorial_user")

print(f"Query: {result['query']}")
print(f"Answer: {result['answer']}")
print(f"Cost: ${result['cost']:.4f}")
print(f"Latency: {result['latency_ms']:.0f}ms")
print(f"Steps taken: {result['metadata']['step_count']}")

Step 4: Understanding the Workflow

The agent follows this state machine flow:

Real-World Scenario: Multi-Step Research Query

Let's trace a complex query through the entire agent lifecycle:

Query: "What are the security best practices for deploying RecoAgent with sensitive healthcare data?"

Execution Timeline

StepNodeActionState ChangesTimeCost
1StartReceive queryquery set, step_count=10ms$0
2RetrievalSearch "RecoAgent security healthcare"retrieved_docs=[10]45ms$0.002
3RerankingScore docs by relevancereranked_docs=[5]120ms$0.003
4PlanningAnalyze: Need HIPAA + deployment infoplan="Multi-topic query"15ms$0
5Tool UseCall web_search for latest HIPAAexternal_data added850ms$0.005
6SynthesisCombine docs + external datacontext assembled30ms$0
7GenerationGenerate comprehensive answeranswer set1200ms$0.015
8CompleteReturn with sourcesfinal_response ready10ms$0
Total2.27s$0.025

State Inspector Output

# Step 2: After Retrieval
{
"query": "security best practices healthcare...",
"retrieved_docs": [
{"title": "RecoAgent Security Guide", "score": 0.89},
{"title": "Healthcare Data Protection", "score": 0.85},
{"title": "HIPAA Compliance Checklist", "score": 0.82},
# ... 7 more docs
],
"step_count": 2,
"cost_so_far": 0.002
}

# Step 5: After Tool Use
{
# ... previous state ...
"reranked_docs": [top 5 docs],
"plan": "Need external HIPAA info - using web search",
"external_data": {
"source": "web_search",
"results": ["Latest HIPAA updates 2024..."]
},
"step_count": 5,
"cost_so_far": 0.010
}

# Step 8: Final State
{
# ... all previous state ...
"answer": "To deploy RecoAgent with healthcare data...",
"sources": [
"RecoAgent Security Guide (internal)",
"HIPAA Compliance 2024 (web)",
"Healthcare Data Protection (internal)"
],
"metadata": {
"total_steps": 8,
"tools_used": ["retrieval", "reranking", "web_search"],
"confidence": 0.92
},
"cost_so_far": 0.025,
"latency_ms": 2270
}

What Made This Complex:

  • ✅ Multi-topic query (security + healthcare + deployment)
  • ✅ Required external data (latest HIPAA rules)
  • ✅ Multiple retrieval passes
  • ✅ Synthesis of internal + external knowledge

Step 5: Adding Custom Tools

Let's add a custom tool for web search:

from packages.agents.tools import BaseTool
from typing import Dict, Any

class WebSearchTool(BaseTool):
name = "web_search"
description = "Search the web for current information"

def _run(self, query: str, **kwargs) -> Dict[str, Any]:
# Simulate web search
results = [
{
"title": f"Search result for: {query}",
"url": "https://example.com",
"snippet": f"Relevant information about {query}"
}
]
return {"results": results}

# Register the custom tool
tool_registry.register_tool(WebSearchTool())

# Update the agent with the new tool
agent = RAGAgentGraph(
config=config,
tool_registry=tool_registry
)

Step 6: Error Handling and Escalation

The agent includes built-in error handling and escalation:

# Configure escalation policies
from packages.agents.policies import EscalationPolicy

escalation_policy = EscalationPolicy(
max_cost=0.05, # Escalate if cost exceeds $0.05
max_steps=3, # Escalate after 3 steps
error_threshold=2, # Escalate after 2 errors
sensitive_topics=["financial", "medical", "legal"]
)

# Update agent config
config.escalation_policy = escalation_policy

# Run with a complex query that might trigger escalation
complex_query = """
I need help with a complex financial calculation involving
multiple currencies and tax implications for international business.
"""

result = await agent.run(complex_query, user_id="tutorial_user")

if result.get("escalated"):
print("Query was escalated to human agent")
print(f"Reason: {result['metadata']['escalation_reason']}")
else:
print("Agent handled the query successfully")

Performance Comparison: Traditional vs LangGraph Agent

AspectTraditional RAGLangGraph AgentImprovement
Simple Query0.8s, $0.0081.2s, $0.012🟡 Slightly slower
Complex Query1.5s, $0.015
Often incomplete
2.3s, $0.025
Comprehensive
🟢 Better quality
Multi-hop ResearchNot supported3.5s, $0.040
Fully supported
🟢 New capability
Error RecoveryFails immediatelyRetries, then escalates🟢 More reliable
Tool IntegrationManual coordinationAutomatic orchestration🟢 Much easier
ObservabilityLimitedFull trace of decisions🟢 Better debugging
Cost ControlNo built-in limitsConfigurable limits🟢 Production-safe

When to Use Each:

Use Traditional RAG when:
✓ Queries are simple and direct
✓ Speed is critical (< 1s response time)
✓ Cost needs to be minimal
✓ Single-pass retrieval is sufficient

Use LangGraph Agent when:
✓ Queries need multi-step reasoning
✓ Multiple tools/data sources required
✓ Quality matters more than speed
✓ Need error handling and escalation
✓ Want full observability

Common Agent Patterns

Pattern 1: Simple RAG Agent

# Best for: FAQ, documentation search
# Complexity: ⭐ Low
# Cost: $ Low

agent = RAGAgentGraph(
config=AgentConfig(max_steps=2), # Just retrieve and answer
tools=[retrieval_tool]
)

Flow: Query → Retrieve → Answer
Use Case: "What is RecoAgent?" - Direct factual questions

Pattern 2: Research Agent (Multi-Hop)

# Best for: Complex research, analysis
# Complexity: ⭐⭐⭐ High
# Cost: $$$ High

agent = RAGAgentGraph(
config=AgentConfig(max_steps=10),
tools=[retrieval_tool, web_search_tool, calculator_tool]
)

Flow: Query → Retrieve → Analyze → Search More → Synthesize → Answer
Use Case: "Compare deployment options considering cost, security, and scalability"

Pattern 3: Conversational Agent with Memory

# Best for: Chat, ongoing conversations
# Complexity: ⭐⭐ Medium
# Cost: $$ Medium

agent = RAGAgentGraph(
config=AgentConfig(
max_steps=5,
enable_memory=True,
memory_window=10 # Remember last 10 messages
),
tools=[retrieval_tool]
)

Flow: Query + History → Contextualize → Retrieve → Answer
Use Case: Follow-up questions: "Tell me more about that" or "What about X?"

Pattern 4: Tool-Heavy Agent

# Best for: Actions, integrations
# Complexity: ⭐⭐⭐ High
# Cost: $$$ High

agent = RAGAgentGraph(
config=AgentConfig(max_steps=8),
tools=[
retrieval_tool,
database_query_tool,
api_call_tool,
file_reader_tool
]
)

Flow: Query → Plan → Use Multiple Tools → Aggregate → Answer
Use Case: "Pull deployment stats from database and compare with documentation"

Error Handling Decision Tree

Step 7: Monitoring and Observability

Add comprehensive monitoring to your agent:

from packages.observability import LangSmithClient, LangSmithConfig
from packages.agents.callbacks import AgentCallbackHandler

# Set up LangSmith integration
langsmith_config = LangSmithConfig(
api_key=os.getenv("LANGSMITH_API_KEY"),
project="recoagent-tutorial"
)

langsmith_client = LangSmithClient(langsmith_config)

# Create callback handler
callback_handler = AgentCallbackHandler(langsmith_client)

# Add callbacks to agent
agent = RAGAgentGraph(
config=config,
tool_registry=tool_registry,
callback_handlers=[callback_handler]
)

# Run agent with monitoring
result = await agent.run(query, user_id="tutorial_user")

# View traces in LangSmith dashboard
print(f"Trace ID: {callback_handler.current_trace_id}")

Step 8: Advanced Configuration

Configure the agent for production use:

# Production-ready configuration
production_config = AgentConfig(
model_name="gpt-4",
temperature=0.0, # More deterministic for production
max_tokens=2000,
max_steps=10,
cost_limit=0.50, # Higher limit for production
timeout_seconds=60,
safety_enabled=True,
enable_escalation=True,
enable_web_search=True,
enable_citations=True
)

# Add safety middleware
from packages.agents.middleware import GuardrailsMiddleware

guardrails = GuardrailsMiddleware(
enable_pii_detection=True,
enable_content_filtering=True,
enable_rate_limiting=True,
max_requests_per_minute=60
)

production_agent = RAGAgentGraph(
config=production_config,
tool_registry=tool_registry,
callback_handlers=[callback_handler]
)

# Add middleware
production_agent.guardrails = guardrails

Cost & Performance Optimization

Where Costs Come From

Total Agent Cost Breakdown (typical complex query):

LLM API Calls $0.015 (60%) ████████████
├─ Planning $0.003
├─ Answer Generation $0.010
└─ Tool Decisions $0.002

Embedding API $0.005 (20%) ████
└─ Vector Search

Reranking $0.003 (12%) ██
└─ Cross-Encoder

Tools/External $0.002 (8%) █
└─ Web Search, etc.

TOTAL: $0.025 (100%)

Optimization Strategies

StrategyImpactTrade-offRecommended For
Use GPT-3.5 for simple queries-70% costSlightly lower qualityFAQ, simple lookups
Cache embeddings-20% costStorage neededRepeated queries
Limit max_steps to 3-5-30% costMay miss complex answersMost use cases
Batch API calls-50% latencySlightly higher complexityHigh volume
Pre-filter with BM25-40% costMay miss semantic matchesKeyword-heavy queries
Skip reranking for high scores-15% costSlightly lower precisionWhen top results are great
Use streaming responsesBetter UXMore complex codeUser-facing apps

Before & After Optimization

Before:

  • Average query: $0.035, 3.2s
  • 1000 queries/day = $35/day = $1,050/month

After Optimization:

  • Average query: $0.015, 1.8s
  • 1000 queries/day = $15/day = $450/month
  • Savings: $600/month (57% reduction)

Debugging Techniques

1. State Inspector Tool

def inspect_state(state: Dict, step: int):
"""Print formatted state for debugging"""
print(f"\n{'='*60}")
print(f"STEP {step}: {state.get('action', 'unknown')}")
print(f"{'='*60}")

# Key metrics
print(f"📊 Metrics:")
print(f" Steps: {state['step_count']}/{state['max_steps']}")
print(f" Cost: ${state.get('cost_so_far', 0):.4f}")
print(f" Time: {state.get('latency_ms', 0):.0f}ms")

# Documents
if state.get('retrieved_docs'):
print(f"\n📄 Retrieved: {len(state['retrieved_docs'])} docs")
for i, doc in enumerate(state['retrieved_docs'][:3], 1):
print(f" {i}. {doc.get('title', 'Untitled')[:50]} (score: {doc.get('score', 0):.2f})")

# Current plan
if state.get('plan'):
print(f"\n🎯 Plan: {state['plan']}")

# Errors
if state.get('error'):
print(f"\n⚠️ Error: {state['error']}")

# Use in your agent
agent = RAGAgentGraph(config=config, debug_callback=inspect_state)

Output Example:

============================================================
STEP 3: reranking
============================================================
📊 Metrics:
Steps: 3/5
Cost: $0.0050
Time: 165ms

📄 Retrieved: 10 docs
1. RecoAgent Deployment Guide (score: 0.89)
2. Production Best Practices (score: 0.85)
3. Security Checklist (score: 0.82)

🎯 Plan: Focus on deployment and security aspects

2. Replay Agent Execution

# Save execution trace
result = await agent.run(query, save_trace=True)
trace_id = result['trace_id']

# Later, replay the exact same execution
from packages.agents.replay import AgentReplayer

replayer = AgentReplayer(agent)
replay_result = replayer.replay(trace_id)

# Compare outputs
print("Original:", result['answer'])
print("Replayed:", replay_result['answer'])
print("Match:", result['answer'] == replay_result['answer'])

3. Unit Test Conditional Logic

import pytest
from packages.agents import should_continue, should_escalate

def test_escalation_logic():
"""Test when agent should escalate"""

# Should escalate: cost limit hit
state = {"cost_so_far": 0.06, "max_cost": 0.05}
assert should_escalate(state) == True

# Should NOT escalate: under limit
state = {"cost_so_far": 0.03, "max_cost": 0.05}
assert should_escalate(state) == False

# Should escalate: too many errors
state = {"error_count": 3, "max_errors": 2}
assert should_escalate(state) == True

def test_continue_logic():
"""Test when agent should continue"""

# Should continue: has plan and under limits
state = {
"plan": "Need more info",
"step_count": 3,
"max_steps": 5,
"has_answer": False
}
assert should_continue(state) == True

# Should stop: has answer
state["has_answer"] = True
assert should_continue(state) == False

Agent Configuration Cheat Sheet

Quick Reference

ParameterTypeDefaultRecommended ValuesDescription
model_namestr"gpt-4"gpt-4, gpt-3.5-turbo, claude-3LLM to use
temperaturefloat0.10.0-0.3 (prod), 0.5-0.8 (creative)Randomness in responses
max_tokensint1000500-2000Max response length
max_stepsint52-3 (simple), 5-8 (complex), 10+ (research)Execution limit
cost_limitfloat0.100.01-0.05 (prod), 0.50+ (research)Budget per query
timeout_secondsint6030 (simple), 120 (complex)Max execution time
enable_escalationboolTrueTrue (prod), False (dev)Human handoff
enable_memoryboolFalseTrue (chat), False (stateless)Conversation history
memory_windowint105-10 (chat), 20+ (long context)Messages to remember
safety_enabledboolTrueTrue (always!)Content filtering
enable_web_searchboolFalseTrue (research), False (internal only)External data
enable_citationsboolTrueTrue (prod), False (speed test)Source tracking

Configuration Presets

from packages.agents import AgentConfig

# Preset 1: Fast & Cheap (FAQ Bot)
faq_config = AgentConfig(
model_name="gpt-3.5-turbo",
temperature=0.0,
max_steps=2,
cost_limit=0.01,
timeout_seconds=15
)

# Preset 2: Balanced (General Purpose)
balanced_config = AgentConfig(
model_name="gpt-4",
temperature=0.1,
max_steps=5,
cost_limit=0.05,
timeout_seconds=60,
enable_escalation=True
)

# Preset 3: Deep Research (Complex Queries)
research_config = AgentConfig(
model_name="gpt-4",
temperature=0.2,
max_steps=15,
cost_limit=0.50,
timeout_seconds=180,
enable_web_search=True,
enable_memory=True
)

# Preset 4: Production Chat
chat_config = AgentConfig(
model_name="gpt-4",
temperature=0.3,
max_steps=5,
cost_limit=0.10,
enable_memory=True,
memory_window=10,
safety_enabled=True,
enable_citations=True
)

What You've Learned

You now understand how to:

Core Concepts

Create stateful agents with LangGraph integration
Build multi-step workflows with conditional logic
Understand state transitions and how data flows through nodes

Advanced Capabilities

Handle errors and escalation with decision tree logic
Add custom tools to extend agent capabilities
Choose the right agent pattern for your use case (Simple, Research, Chat, Tool-Heavy)

Production Readiness

Monitor agent execution with LangSmith and callbacks
Optimize costs and performance using proven strategies
Debug agent behavior with state inspection and replay
Configure agents using preset configurations

Real-World Skills

Trace complex queries through multi-step execution
Compare Traditional RAG vs LangGraph trade-offs
Unit test agent logic for reliability
Choose optimal configuration for different scenarios

Next Steps

  • How-To Guides: Learn specific implementation patterns
  • Examples: See working code for different scenarios
  • Reference: Explore the complete API documentation
  • Explanations: Understand the architecture and design decisions

Troubleshooting Guide

Common Issues & Solutions

ProblemSymptomsRoot CauseSolution
🔄 Agent LoopsSame actions repeat
Never terminates
Poor conditional logic
No clear goal
• Add max_steps limit
• Improve termination conditions
• Check state transitions
💰 High CostsBills increasing
$0.10+ per query
Too many LLM calls
No cost controls
• Set cost_limit
• Use GPT-3.5 for planning
• Cache embeddings
🐌 Slow Responses5+ seconds per query
User complaints
Too many steps
Sequential processing
• Reduce max_steps
• Parallelize tool calls
• Skip reranking when not needed
❌ Poor RetrievalWrong documents
Low precision/recall
Bad query expansion
Wrong weights
• Tune alpha parameter
• Add domain terms
• Increase reranking threshold
🔴 Frequent EscalationsMany queries escalate
High human load
Limits too strict
Knowledge gaps
• Increase max_steps
• Add more documents
• Review escalation policy
🤔 Inconsistent AnswersDifferent answers
each time
High temperature
Randomness
• Set temperature to 0.0
• Use deterministic config
• Pin LLM version

Advanced Debugging

Enable Debug Mode:

agent = RAGAgentGraph(
config=config,
debug=True, # Prints state at each step
verbose=True # Shows tool outputs
)

# Run with full logging
import logging
logging.basicConfig(level=logging.DEBUG)
result = await agent.run(query)

Check Specific State Fields:

# Add custom state validation
def validate_state(state):
"""Ensure state is consistent"""
assert "query" in state, "Missing query"
assert state["step_count"] <= state["max_steps"], "Exceeded max steps"
if state.get("answer"):
assert len(state["answer"]) > 10, "Answer too short"
return True

agent.add_state_validator(validate_state)

Performance Troubleshooting

Use the built-in profiler:

from packages.observability import AgentProfiler

profiler = AgentProfiler()
result = await profiler.profile(agent, query)

# View breakdown
print(profiler.get_report())

Output:

Agent Execution Profile
═══════════════════════════════════════
Total Time: 2.45s
Total Cost: $0.028
Steps Executed: 6

Time Breakdown:
Retrieval: 0.45s (18%) ████
Reranking: 0.32s (13%) ███
Planning: 0.15s (6%) █
Generation: 1.35s (55%) ███████████
Other: 0.18s (8%) ██

Cost Breakdown:
Embeddings: $0.005 (18%)
Reranking: $0.003 (11%)
LLM Calls: $0.020 (71%)

Bottleneck: LLM Generation (1.35s)
Recommendation: Consider streaming responses or caching

Getting Help

  • 📚 Check the How-To Guides for specific patterns
  • 💡 Browse Examples for working implementations
  • 🔧 Review the API Reference for detailed documentation
  • 💬 Join community discussions for peer support