LangGraph Agent Orchestration

Learn how to build sophisticated, stateful agents using RecoAgent's LangGraph integration. This tutorial covers the complete agent workflow from retrieval to answering with error handling and escalation.

What You'll Learn

How to create a stateful agent with LangGraph
Building multi-step workflows with conditional logic
Implementing error handling and retry mechanisms
Using tools and escalation in agent workflows
Monitoring agent execution with observability

Prerequisites

Basic understanding of LangGraph concepts
Python 3.8+ installed
RecoAgent installed: pip install recoagent

LangGraph Architecture Overview

Step 1: Understanding the Agent State

RecoAgent uses a comprehensive state machine that tracks the entire conversation flow:

from packages.agents import AgentState, AgentConfig
from typing import Dict, Any

# The AgentState tracks everything during execution
state_example = {
    "messages": [],  # Chat history
    "query": "How do I deploy to production?",
    "retrieved_docs": [],  # Documents from retrieval
    "reranked_docs": [],  # Reranked results
    "plan": "I need to find deployment documentation",  # Agent's plan
    "action": "retrieve_docs",  # Current action
    "answer": None,  # Final answer
    "error": None,  # Any errors
    "metadata": {},  # Additional context
    "step_count": 0,  # Execution steps
    "max_steps": 5,  # Maximum allowed steps
    "cost_tracker": {},  # Cost monitoring
    "latency_tracker": {}  # Performance tracking
}

Visual State Transitions

Let's see how state evolves through a real query:

Key Insight: Each node reads from and writes to the shared state, creating a traceable execution history!

Step 2: Creating Your First Agent

Let's build a simple RAG agent that can retrieve information and answer questions:

import os
from packages.agents import RAGAgentGraph, AgentConfig, ToolRegistry
from packages.rag import HybridRetriever, VectorRetriever, BM25Retriever
from packages.rag.stores import OpenSearchStore

# Configure the agent
config = AgentConfig(
    model_name="gpt-4",
    temperature=0.1,
    max_tokens=1000,
    max_steps=5,
    cost_limit=0.10,
    safety_enabled=True
)

# Set up vector store
vector_store = OpenSearchStore(
    endpoint="http://localhost:9200",
    index_name="knowledge_base"
)

# Create retrievers
vector_retriever = VectorRetriever(vector_store=vector_store)
bm25_retriever = BM25Retriever(vector_store=vector_store)
hybrid_retriever = HybridRetriever(
    vector_retriever=vector_retriever,
    bm25_retriever=bm25_retriever,
    alpha=0.7  # 70% vector, 30% BM25
)

# Create tool registry
tool_registry = ToolRegistry()
tool_registry.register_retrieval_tool(hybrid_retriever)

# Create the agent
agent = RAGAgentGraph(
    config=config,
    tool_registry=tool_registry
)

print("Agent created successfully!")

Step 3: Running the Agent

Now let's run the agent with a sample query:

# Run the agent
query = "What are the best practices for deploying RecoAgent to production?"

result = await agent.run(query, user_id="tutorial_user")

print(f"Query: {result['query']}")
print(f"Answer: {result['answer']}")
print(f"Cost: ${result['cost']:.4f}")
print(f"Latency: {result['latency_ms']:.0f}ms")
print(f"Steps taken: {result['metadata']['step_count']}")

Step 4: Understanding the Workflow

The agent follows this state machine flow:

Real-World Scenario: Multi-Step Research Query

Let's trace a complex query through the entire agent lifecycle:

Query: "What are the security best practices for deploying RecoAgent with sensitive healthcare data?"

Execution Timeline

Step	Node	Action	State Changes	Time	Cost
1	Start	Receive query	query set, step_count=1	0ms	$0
2	Retrieval	Search "RecoAgent security healthcare"	retrieved_docs=[10]	45ms	$0.002
3	Reranking	Score docs by relevance	reranked_docs=[5]	120ms	$0.003
4	Planning	Analyze: Need HIPAA + deployment info	plan="Multi-topic query"	15ms	$0
5	Tool Use	Call web_search for latest HIPAA	external_data added	850ms	$0.005
6	Synthesis	Combine docs + external data	context assembled	30ms	$0
7	Generation	Generate comprehensive answer	answer set	1200ms	$0.015
8	Complete	Return with sources	final_response ready	10ms	$0
Total				2.27s	$0.025

State Inspector Output

# Step 2: After Retrieval
{
    "query": "security best practices healthcare...",
    "retrieved_docs": [
        {"title": "RecoAgent Security Guide", "score": 0.89},
        {"title": "Healthcare Data Protection", "score": 0.85},
        {"title": "HIPAA Compliance Checklist", "score": 0.82},
        # ... 7 more docs
    ],
    "step_count": 2,
    "cost_so_far": 0.002
}

# Step 5: After Tool Use
{
    # ... previous state ...
    "reranked_docs": [top 5 docs],
    "plan": "Need external HIPAA info - using web search",
    "external_data": {
        "source": "web_search",
        "results": ["Latest HIPAA updates 2024..."]
    },
    "step_count": 5,
    "cost_so_far": 0.010
}

# Step 8: Final State
{
    # ... all previous state ...
    "answer": "To deploy RecoAgent with healthcare data...",
    "sources": [
        "RecoAgent Security Guide (internal)",
        "HIPAA Compliance 2024 (web)",
        "Healthcare Data Protection (internal)"
    ],
    "metadata": {
        "total_steps": 8,
        "tools_used": ["retrieval", "reranking", "web_search"],
        "confidence": 0.92
    },
    "cost_so_far": 0.025,
    "latency_ms": 2270
}

What Made This Complex:

✅ Multi-topic query (security + healthcare + deployment)
✅ Required external data (latest HIPAA rules)
✅ Multiple retrieval passes
✅ Synthesis of internal + external knowledge

Step 5: Adding Custom Tools

Let's add a custom tool for web search:

from packages.agents.tools import BaseTool
from typing import Dict, Any

class WebSearchTool(BaseTool):
    name = "web_search"
    description = "Search the web for current information"
    
    def _run(self, query: str, **kwargs) -> Dict[str, Any]:
        # Simulate web search
        results = [
            {
                "title": f"Search result for: {query}",
                "url": "https://example.com",
                "snippet": f"Relevant information about {query}"
            }
        ]
        return {"results": results}

# Register the custom tool
tool_registry.register_tool(WebSearchTool())

# Update the agent with the new tool
agent = RAGAgentGraph(
    config=config,
    tool_registry=tool_registry
)

Step 6: Error Handling and Escalation

The agent includes built-in error handling and escalation:

# Configure escalation policies
from packages.agents.policies import EscalationPolicy

escalation_policy = EscalationPolicy(
    max_cost=0.05,  # Escalate if cost exceeds $0.05
    max_steps=3,    # Escalate after 3 steps
    error_threshold=2,  # Escalate after 2 errors
    sensitive_topics=["financial", "medical", "legal"]
)

# Update agent config
config.escalation_policy = escalation_policy

# Run with a complex query that might trigger escalation
complex_query = """
I need help with a complex financial calculation involving 
multiple currencies and tax implications for international business.
"""

result = await agent.run(complex_query, user_id="tutorial_user")

if result.get("escalated"):
    print("Query was escalated to human agent")
    print(f"Reason: {result['metadata']['escalation_reason']}")
else:
    print("Agent handled the query successfully")

Performance Comparison: Traditional vs LangGraph Agent

Aspect	Traditional RAG	LangGraph Agent	Improvement
Simple Query	0.8s, $0.008	1.2s, $0.012	🟡 Slightly slower
Complex Query	1.5s, $0.015 Often incomplete	2.3s, $0.025 Comprehensive	🟢 Better quality
Multi-hop Research	Not supported	3.5s, $0.040 Fully supported	🟢 New capability
Error Recovery	Fails immediately	Retries, then escalates	🟢 More reliable
Tool Integration	Manual coordination	Automatic orchestration	🟢 Much easier
Observability	Limited	Full trace of decisions	🟢 Better debugging
Cost Control	No built-in limits	Configurable limits	🟢 Production-safe

When to Use Each:

Use Traditional RAG when:
✓ Queries are simple and direct
✓ Speed is critical (< 1s response time)
✓ Cost needs to be minimal
✓ Single-pass retrieval is sufficient

Use LangGraph Agent when:
✓ Queries need multi-step reasoning
✓ Multiple tools/data sources required
✓ Quality matters more than speed
✓ Need error handling and escalation
✓ Want full observability

Common Agent Patterns

Pattern 1: Simple RAG Agent

# Best for: FAQ, documentation search
# Complexity: ⭐ Low
# Cost: $ Low

agent = RAGAgentGraph(
    config=AgentConfig(max_steps=2),  # Just retrieve and answer
    tools=[retrieval_tool]
)

Flow: Query → Retrieve → Answer
Use Case: "What is RecoAgent?" - Direct factual questions

Pattern 2: Research Agent (Multi-Hop)

# Best for: Complex research, analysis
# Complexity: ⭐⭐⭐ High
# Cost: $$$ High

agent = RAGAgentGraph(
    config=AgentConfig(max_steps=10),
    tools=[retrieval_tool, web_search_tool, calculator_tool]
)

Flow: Query → Retrieve → Analyze → Search More → Synthesize → Answer
Use Case: "Compare deployment options considering cost, security, and scalability"

Pattern 3: Conversational Agent with Memory

# Best for: Chat, ongoing conversations
# Complexity: ⭐⭐ Medium
# Cost: $$ Medium

agent = RAGAgentGraph(
    config=AgentConfig(
        max_steps=5,
        enable_memory=True,
        memory_window=10  # Remember last 10 messages
    ),
    tools=[retrieval_tool]
)

Flow: Query + History → Contextualize → Retrieve → Answer
Use Case: Follow-up questions: "Tell me more about that" or "What about X?"

Pattern 4: Tool-Heavy Agent

# Best for: Actions, integrations
# Complexity: ⭐⭐⭐ High
# Cost: $$$ High

agent = RAGAgentGraph(
    config=AgentConfig(max_steps=8),
    tools=[
        retrieval_tool,
        database_query_tool,
        api_call_tool,
        file_reader_tool
    ]
)

Flow: Query → Plan → Use Multiple Tools → Aggregate → Answer
Use Case: "Pull deployment stats from database and compare with documentation"

Error Handling Decision Tree

Step 7: Monitoring and Observability

Add comprehensive monitoring to your agent:

from packages.observability import LangSmithClient, LangSmithConfig
from packages.agents.callbacks import AgentCallbackHandler

# Set up LangSmith integration
langsmith_config = LangSmithConfig(
    api_key=os.getenv("LANGSMITH_API_KEY"),
    project="recoagent-tutorial"
)

langsmith_client = LangSmithClient(langsmith_config)

# Create callback handler
callback_handler = AgentCallbackHandler(langsmith_client)

# Add callbacks to agent
agent = RAGAgentGraph(
    config=config,
    tool_registry=tool_registry,
    callback_handlers=[callback_handler]
)

# Run agent with monitoring
result = await agent.run(query, user_id="tutorial_user")

# View traces in LangSmith dashboard
print(f"Trace ID: {callback_handler.current_trace_id}")

Step 8: Advanced Configuration

Configure the agent for production use:

# Production-ready configuration
production_config = AgentConfig(
    model_name="gpt-4",
    temperature=0.0,  # More deterministic for production
    max_tokens=2000,
    max_steps=10,
    cost_limit=0.50,  # Higher limit for production
    timeout_seconds=60,
    safety_enabled=True,
    enable_escalation=True,
    enable_web_search=True,
    enable_citations=True
)

# Add safety middleware
from packages.agents.middleware import GuardrailsMiddleware

guardrails = GuardrailsMiddleware(
    enable_pii_detection=True,
    enable_content_filtering=True,
    enable_rate_limiting=True,
    max_requests_per_minute=60
)

production_agent = RAGAgentGraph(
    config=production_config,
    tool_registry=tool_registry,
    callback_handlers=[callback_handler]
)

# Add middleware
production_agent.guardrails = guardrails

Cost & Performance Optimization

Where Costs Come From

Total Agent Cost Breakdown (typical complex query):

LLM API Calls          $0.015  (60%)  ████████████
├─ Planning            $0.003
├─ Answer Generation   $0.010
└─ Tool Decisions      $0.002

Embedding API          $0.005  (20%)  ████
└─ Vector Search

Reranking             $0.003  (12%)  ██
└─ Cross-Encoder

Tools/External        $0.002  (8%)   █
└─ Web Search, etc.

TOTAL:                $0.025  (100%)

Optimization Strategies

Strategy	Impact	Trade-off	Recommended For
Use GPT-3.5 for simple queries	-70% cost	Slightly lower quality	FAQ, simple lookups
Cache embeddings	-20% cost	Storage needed	Repeated queries
Limit max_steps to 3-5	-30% cost	May miss complex answers	Most use cases
Batch API calls	-50% latency	Slightly higher complexity	High volume
Pre-filter with BM25	-40% cost	May miss semantic matches	Keyword-heavy queries
Skip reranking for high scores	-15% cost	Slightly lower precision	When top results are great
Use streaming responses	Better UX	More complex code	User-facing apps

Before & After Optimization

Before:

Average query: $0.035, 3.2s
1000 queries/day = $35/day = $1,050/month

After Optimization:

Average query: $0.015, 1.8s
1000 queries/day = $15/day = $450/month
Savings: $600/month (57% reduction)

Debugging Techniques

1. State Inspector Tool

def inspect_state(state: Dict, step: int):
    """Print formatted state for debugging"""
    print(f"\n{'='*60}")
    print(f"STEP {step}: {state.get('action', 'unknown')}")
    print(f"{'='*60}")
    
    # Key metrics
    print(f"📊 Metrics:")
    print(f"   Steps: {state['step_count']}/{state['max_steps']}")
    print(f"   Cost:  ${state.get('cost_so_far', 0):.4f}")
    print(f"   Time:  {state.get('latency_ms', 0):.0f}ms")
    
    # Documents
    if state.get('retrieved_docs'):
        print(f"\n📄 Retrieved: {len(state['retrieved_docs'])} docs")
        for i, doc in enumerate(state['retrieved_docs'][:3], 1):
            print(f"   {i}. {doc.get('title', 'Untitled')[:50]} (score: {doc.get('score', 0):.2f})")
    
    # Current plan
    if state.get('plan'):
        print(f"\n🎯 Plan: {state['plan']}")
    
    # Errors
    if state.get('error'):
        print(f"\n⚠️  Error: {state['error']}")

# Use in your agent
agent = RAGAgentGraph(config=config, debug_callback=inspect_state)

Output Example:

============================================================
STEP 3: reranking
============================================================
📊 Metrics:
   Steps: 3/5
   Cost:  $0.0050
   Time:  165ms

📄 Retrieved: 10 docs
   1. RecoAgent Deployment Guide (score: 0.89)
   2. Production Best Practices (score: 0.85)
   3. Security Checklist (score: 0.82)

🎯 Plan: Focus on deployment and security aspects

2. Replay Agent Execution

# Save execution trace
result = await agent.run(query, save_trace=True)
trace_id = result['trace_id']

# Later, replay the exact same execution
from packages.agents.replay import AgentReplayer

replayer = AgentReplayer(agent)
replay_result = replayer.replay(trace_id)

# Compare outputs
print("Original:", result['answer'])
print("Replayed:", replay_result['answer'])
print("Match:", result['answer'] == replay_result['answer'])

3. Unit Test Conditional Logic

import pytest
from packages.agents import should_continue, should_escalate

def test_escalation_logic():
    """Test when agent should escalate"""
    
    # Should escalate: cost limit hit
    state = {"cost_so_far": 0.06, "max_cost": 0.05}
    assert should_escalate(state) == True
    
    # Should NOT escalate: under limit
    state = {"cost_so_far": 0.03, "max_cost": 0.05}
    assert should_escalate(state) == False
    
    # Should escalate: too many errors
    state = {"error_count": 3, "max_errors": 2}
    assert should_escalate(state) == True

def test_continue_logic():
    """Test when agent should continue"""
    
    # Should continue: has plan and under limits
    state = {
        "plan": "Need more info",
        "step_count": 3,
        "max_steps": 5,
        "has_answer": False
    }
    assert should_continue(state) == True
    
    # Should stop: has answer
    state["has_answer"] = True
    assert should_continue(state) == False

Agent Configuration Cheat Sheet

Quick Reference

Parameter	Type	Default	Recommended Values	Description
`model_name`	str	"gpt-4"	gpt-4, gpt-3.5-turbo, claude-3	LLM to use
`temperature`	float	0.1	0.0-0.3 (prod), 0.5-0.8 (creative)	Randomness in responses
`max_tokens`	int	1000	500-2000	Max response length
`max_steps`	int	5	2-3 (simple), 5-8 (complex), 10+ (research)	Execution limit
`cost_limit`	float	0.10	0.01-0.05 (prod), 0.50+ (research)	Budget per query
`timeout_seconds`	int	60	30 (simple), 120 (complex)	Max execution time
`enable_escalation`	bool	True	True (prod), False (dev)	Human handoff
`enable_memory`	bool	False	True (chat), False (stateless)	Conversation history
`memory_window`	int	10	5-10 (chat), 20+ (long context)	Messages to remember
`safety_enabled`	bool	True	True (always!)	Content filtering
`enable_web_search`	bool	False	True (research), False (internal only)	External data
`enable_citations`	bool	True	True (prod), False (speed test)	Source tracking

Configuration Presets

from packages.agents import AgentConfig

# Preset 1: Fast & Cheap (FAQ Bot)
faq_config = AgentConfig(
    model_name="gpt-3.5-turbo",
    temperature=0.0,
    max_steps=2,
    cost_limit=0.01,
    timeout_seconds=15
)

# Preset 2: Balanced (General Purpose)
balanced_config = AgentConfig(
    model_name="gpt-4",
    temperature=0.1,
    max_steps=5,
    cost_limit=0.05,
    timeout_seconds=60,
    enable_escalation=True
)

# Preset 3: Deep Research (Complex Queries)
research_config = AgentConfig(
    model_name="gpt-4",
    temperature=0.2,
    max_steps=15,
    cost_limit=0.50,
    timeout_seconds=180,
    enable_web_search=True,
    enable_memory=True
)

# Preset 4: Production Chat
chat_config = AgentConfig(
    model_name="gpt-4",
    temperature=0.3,
    max_steps=5,
    cost_limit=0.10,
    enable_memory=True,
    memory_window=10,
    safety_enabled=True,
    enable_citations=True
)

What You've Learned

You now understand how to:

Core Concepts

✅ Create stateful agents with LangGraph integration
✅ Build multi-step workflows with conditional logic
✅ Understand state transitions and how data flows through nodes

Advanced Capabilities

✅ Handle errors and escalation with decision tree logic
✅ Add custom tools to extend agent capabilities
✅ Choose the right agent pattern for your use case (Simple, Research, Chat, Tool-Heavy)

Production Readiness

✅ Monitor agent execution with LangSmith and callbacks
✅ Optimize costs and performance using proven strategies
✅ Debug agent behavior with state inspection and replay
✅ Configure agents using preset configurations

Real-World Skills

✅ Trace complex queries through multi-step execution
✅ Compare Traditional RAG vs LangGraph trade-offs
✅ Unit test agent logic for reliability
✅ Choose optimal configuration for different scenarios

Next Steps

How-To Guides: Learn specific implementation patterns
Examples: See working code for different scenarios
Reference: Explore the complete API documentation
Explanations: Understand the architecture and design decisions

Troubleshooting Guide

Common Issues & Solutions

Problem	Symptoms	Root Cause	Solution
🔄 Agent Loops	Same actions repeat Never terminates	Poor conditional logic No clear goal	• Add `max_steps` limit • Improve termination conditions • Check state transitions
💰 High Costs	Bills increasing $0.10+ per query	Too many LLM calls No cost controls	• Set `cost_limit` • Use GPT-3.5 for planning • Cache embeddings
🐌 Slow Responses	5+ seconds per query User complaints	Too many steps Sequential processing	• Reduce `max_steps` • Parallelize tool calls • Skip reranking when not needed
❌ Poor Retrieval	Wrong documents Low precision/recall	Bad query expansion Wrong weights	• Tune alpha parameter • Add domain terms • Increase reranking threshold
🔴 Frequent Escalations	Many queries escalate High human load	Limits too strict Knowledge gaps	• Increase `max_steps` • Add more documents • Review escalation policy
🤔 Inconsistent Answers	Different answers each time	High temperature Randomness	• Set temperature to 0.0 • Use deterministic config • Pin LLM version

Advanced Debugging

Enable Debug Mode:

agent = RAGAgentGraph(
    config=config,
    debug=True,  # Prints state at each step
    verbose=True  # Shows tool outputs
)

# Run with full logging
import logging
logging.basicConfig(level=logging.DEBUG)
result = await agent.run(query)

Check Specific State Fields:

# Add custom state validation
def validate_state(state):
    """Ensure state is consistent"""
    assert "query" in state, "Missing query"
    assert state["step_count"] <= state["max_steps"], "Exceeded max steps"
    if state.get("answer"):
        assert len(state["answer"]) > 10, "Answer too short"
    return True

agent.add_state_validator(validate_state)

Performance Troubleshooting

Use the built-in profiler:

from packages.observability import AgentProfiler

profiler = AgentProfiler()
result = await profiler.profile(agent, query)

# View breakdown
print(profiler.get_report())

Output:

Agent Execution Profile
═══════════════════════════════════════
Total Time:       2.45s
Total Cost:       $0.028
Steps Executed:   6

Time Breakdown:
  Retrieval:      0.45s  (18%)  ████
  Reranking:      0.32s  (13%)  ███
  Planning:       0.15s  (6%)   █
  Generation:     1.35s  (55%)  ███████████
  Other:          0.18s  (8%)   ██

Cost Breakdown:
  Embeddings:     $0.005  (18%)
  Reranking:      $0.003  (11%)
  LLM Calls:      $0.020  (71%)

Bottleneck: LLM Generation (1.35s)
Recommendation: Consider streaming responses or caching

Getting Help

📚 Check the How-To Guides for specific patterns
💡 Browse Examples for working implementations
🔧 Review the API Reference for detailed documentation
💬 Join community discussions for peer support

What You'll Learn​

Prerequisites​

LangGraph Architecture Overview​

Step 1: Understanding the Agent State​

Visual State Transitions​

Step 2: Creating Your First Agent​

Step 3: Running the Agent​

Step 4: Understanding the Workflow​

Real-World Scenario: Multi-Step Research Query​

Execution Timeline​

State Inspector Output​

Step 5: Adding Custom Tools​

Step 6: Error Handling and Escalation​

Performance Comparison: Traditional vs LangGraph Agent​

Common Agent Patterns​

Pattern 1: Simple RAG Agent​

Pattern 2: Research Agent (Multi-Hop)​

Pattern 3: Conversational Agent with Memory​

Pattern 4: Tool-Heavy Agent​

Error Handling Decision Tree​

Step 7: Monitoring and Observability​

Step 8: Advanced Configuration​

Cost & Performance Optimization​

Where Costs Come From​

Optimization Strategies​

Before & After Optimization​

Debugging Techniques​

1. State Inspector Tool​

2. Replay Agent Execution​

3. Unit Test Conditional Logic​

Agent Configuration Cheat Sheet​

Quick Reference​

Configuration Presets​

What You've Learned​

Core Concepts​

Advanced Capabilities​

Production Readiness​

Real-World Skills​

Next Steps​

Troubleshooting Guide​

Common Issues & Solutions​

Advanced Debugging​

Performance Troubleshooting​

Getting Help​