Skip to main content

Reasoning Cache

Intelligent caching system for reasoning results with Redis persistence, compression, and cost optimization.

Features

  • Redis Persistence: Persistent caching with Redis backend
  • Compression: LZ4 compression for efficient storage
  • TTL Support: Time-to-live for cache entries
  • Cost Tracking: Track cache hits and cost savings
  • Fallback Support: In-memory fallback when Redis unavailable
  • Statistics: Comprehensive cache performance metrics

Quick Start

from recoagent.reasoning import ReasoningCache, CacheConfig

# Configure cache
config = CacheConfig(
redis_url="redis://localhost:6379/0",
ttl_seconds=3600,
max_size_mb=100,
enable_compression=True,
compression_level=9
)

# Initialize cache
cache = ReasoningCache(config)

# Store reasoning result
cache.store(
query="What is machine learning?",
result={
"conclusion": "Machine learning is a subset of AI...",
"confidence": 0.95,
"reasoning_trace": ["Step 1", "Step 2", "Step 3"]
},
context={"subject": "ai", "difficulty": "medium"},
cost=0.05
)

# Retrieve reasoning result
result = cache.get(
query="What is machine learning?",
context={"subject": "ai", "difficulty": "medium"}
)

if result:
print(f"Answer: {result['conclusion']}")
print(f"Confidence: {result['confidence']:.2f}")
print(f"Cost Saved: {result.get('cost', 0):.4f}")

Configuration

Basic Configuration

config = CacheConfig(
redis_url="redis://localhost:6379/0",
ttl_seconds=3600, # 1 hour TTL
max_size_mb=100, # 100MB max size
enable_compression=True,
compression_level=9 # Highest compression
)

Advanced Configuration

config = CacheConfig(
redis_url="redis://localhost:6379/0",
ttl_seconds=7200, # 2 hours TTL
max_size_mb=500, # 500MB max size
enable_compression=True,
compression_level=6, # Balanced compression
# Additional Redis options
redis_options={
"socket_timeout": 5,
"socket_connect_timeout": 5,
"retry_on_timeout": True
}
)

Environment Variables

# Set via environment variables
export REASONING_CACHE_REDIS_URL="redis://localhost:6379/0"
export REASONING_CACHE_TTL_SECONDS="3600"
export REASONING_CACHE_MAX_SIZE_MB="100"
export REASONING_CACHE_ENABLE_COMPRESSION="true"
export REASONING_CACHE_COMPRESSION_LEVEL="9"

Cache Operations

Storing Results

# Store simple result
cache.store(
query="What is 2+2?",
result={"conclusion": "4", "confidence": 1.0},
context={"subject": "math"},
cost=0.01
)

# Store complex result
cache.store(
query="Explain quantum computing",
result={
"conclusion": "Quantum computing uses quantum mechanical phenomena...",
"confidence": 0.92,
"reasoning_trace": [
"Step 1: Define quantum mechanics",
"Step 2: Explain superposition",
"Step 3: Describe quantum gates"
],
"metadata": {
"sources": ["textbook", "research_paper"],
"difficulty": "advanced"
}
},
context={"subject": "physics", "level": "advanced"},
cost=0.15
)

Retrieving Results

# Retrieve result
result = cache.get(
query="What is 2+2?",
context={"subject": "math"}
)

if result:
print(f"Answer: {result['conclusion']}")
print(f"Confidence: {result['confidence']:.2f}")
print(f"Cost Saved: ${result.get('cost', 0):.4f}")
else:
print("Cache miss - need to compute")

Cache Management

# Clear all cache
cache.clear()

# Get cache statistics
stats = cache.get_stats()
print(f"Cache Hits: {stats['hits']}")
print(f"Cache Misses: {stats['misses']}")
print(f"Hit Rate: {stats['hit_rate']:.2%}")
print(f"Total Stores: {stats['stores']}")
print(f"Evictions: {stats['evictions']}")
print(f"Cache Type: {stats['cache_type']}")
print(f"Current Size: {stats['current_size']}")

Integration Examples

With DSPy Reasoning

from recoagent.reasoning import DSPyReasoningEngine, ReasoningCache

# Initialize reasoning engine with cache
cache = ReasoningCache(config)
engine = DSPyReasoningEngine(
enable_caching=True,
cache=cache
)

# Use cached reasoning
result = engine.reason(
query="What is machine learning?",
use_cache=True # Enable caching
)

With Cost Tracking

from packages.observability import get_cost_tracker

# Track cache cost savings
cost_tracker = get_cost_tracker()

# Store with cost tracking
cache.store(
query="Expensive reasoning problem",
result=reasoning_result,
context=context,
cost=0.10 # Original cost
)

# Track cost savings
cost_tracker.add_cost_entry(
category=CostCategory.LLM_TOKENS,
provider="cache",
model="reasoning_cache",
operation="cache_hit",
cost_usd=-0.10, # Negative cost (savings)
metadata={"cache_type": "reasoning"}
)

With Workflows

from packages.observability import trace_workflow

@trace_workflow(name="cached_reasoning_workflow")
async def reasoning_workflow(problem):
# Check cache first
cached_result = cache.get(
query=problem,
context={"workflow": "reasoning_workflow"}
)

if cached_result:
print("Using cached result")
return cached_result

# Compute if not cached
result = await compute_reasoning(problem)

# Store in cache
cache.store(
query=problem,
result=result,
context={"workflow": "reasoning_workflow"},
cost=result.get('cost', 0)
)

return result

Performance Optimization

Compression Settings

# High compression (more CPU, less storage)
config = CacheConfig(
redis_url="redis://localhost:6379/0",
enable_compression=True,
compression_level=9 # Highest compression
)

# Balanced compression
config = CacheConfig(
redis_url="redis://localhost:6379/0",
enable_compression=True,
compression_level=6 # Balanced
)

# No compression (faster, more storage)
config = CacheConfig(
redis_url="redis://localhost:6379/0",
enable_compression=False
)

TTL Optimization

# Short TTL for dynamic content
config = CacheConfig(
redis_url="redis://localhost:6379/0",
ttl_seconds=300 # 5 minutes
)

# Long TTL for stable content
config = CacheConfig(
redis_url="redis://localhost:6379/0",
ttl_seconds=86400 # 24 hours
)

# Variable TTL based on content
cache.store(
query="Current weather",
result=weather_result,
context=context,
cost=0.01,
ttl_seconds=600 # 10 minutes for weather
)

cache.store(
query="Historical fact",
result=fact_result,
context=context,
cost=0.01,
ttl_seconds=86400 # 24 hours for facts
)

Memory Management

# Configure memory limits
config = CacheConfig(
redis_url="redis://localhost:6379/0",
max_size_mb=1000, # 1GB limit
enable_compression=True,
compression_level=6
)

# Monitor memory usage
stats = cache.get_stats()
if stats['current_size'] > 800: # 80% of limit
print("Warning: Cache approaching size limit")
# Consider clearing old entries

Monitoring and Analytics

Cache Statistics

# Get comprehensive statistics
stats = cache.get_stats()

print("=== Cache Statistics ===")
print(f"Cache Type: {stats['cache_type']}")
print(f"Total Hits: {stats['hits']}")
print(f"Total Misses: {stats['misses']}")
print(f"Hit Rate: {stats['hit_rate']:.2%}")
print(f"Total Stores: {stats['stores']}")
print(f"Evictions: {stats['evictions']}")
print(f"Current Size: {stats['current_size']}")

# Calculate cost savings
if stats['hits'] > 0:
avg_cost_per_hit = 0.05 # Average cost per LLM call
total_savings = stats['hits'] * avg_cost_per_hit
print(f"Estimated Cost Savings: ${total_savings:.2f}")

Performance Metrics

import time

# Measure cache performance
start_time = time.time()

# Cache hit
result = cache.get(query="test query", context={})
hit_time = time.time() - start_time

if result:
print(f"Cache Hit Time: {hit_time:.4f} seconds")
else:
print("Cache Miss")
# Measure cache store time
start_time = time.time()
cache.store(query="test query", result={"answer": "test"}, context={})
store_time = time.time() - start_time
print(f"Cache Store Time: {store_time:.4f} seconds")

Error Handling

Connection Errors

try:
cache = ReasoningCache(config)
except ConnectionError as e:
print(f"Redis connection failed: {e}")
# Fallback to in-memory cache
config.redis_url = None
cache = ReasoningCache(config)

Cache Errors

# Handle cache errors gracefully
try:
result = cache.get(query="test", context={})
except Exception as e:
print(f"Cache error: {e}")
# Fallback to computation
result = compute_reasoning("test")

Fallback Strategy

# Implement fallback strategy
def get_reasoning_with_fallback(query, context):
# Try cache first
try:
result = cache.get(query=query, context=context)
if result:
return result
except Exception as e:
print(f"Cache error: {e}")

# Fallback to computation
result = compute_reasoning(query)

# Try to store in cache
try:
cache.store(query=query, result=result, context=context)
except Exception as e:
print(f"Cache store error: {e}")

return result

Best Practices

  1. Set Appropriate TTL: Based on content volatility
  2. Use Compression: For large reasoning results
  3. Monitor Performance: Track hit rates and costs
  4. Handle Errors: Implement fallback strategies
  5. Optimize Keys: Use consistent query/context keys
  6. Clean Up: Regularly clear old entries
  7. Test Thoroughly: Test with various scenarios

Troubleshooting

Common Issues

  1. Redis Connection: Check Redis server and connection string
  2. Memory Issues: Monitor cache size and Redis memory
  3. Compression Errors: Check LZ4 installation
  4. TTL Issues: Verify TTL settings and Redis configuration

Debug Mode

# Enable debug logging
import logging
logging.getLogger('recoagent.reasoning.reasoning_cache').setLevel(logging.DEBUG)

Health Check

# Check cache health
def check_cache_health():
try:
# Test basic operations
test_key = "health_check"
test_result = {"test": "value"}

# Store
cache.store(
query=test_key,
result=test_result,
context={}
)

# Retrieve
retrieved = cache.get(query=test_key, context={})

# Clean up
cache.clear()

if retrieved == test_result:
print("✅ Cache is healthy")
return True
else:
print("❌ Cache data corruption")
return False

except Exception as e:
print(f"❌ Cache health check failed: {e}")
return False

API Reference

CacheConfig

ParameterTypeDescription
redis_urlstrRedis connection URL
ttl_secondsintTime-to-live in seconds
max_size_mbintMaximum cache size in MB
enable_compressionboolEnable LZ4 compression
compression_levelintCompression level (0-9)

ReasoningCache

MethodDescription
store(query, result, context, cost)Store reasoning result
get(query, context)Retrieve reasoning result
clear()Clear all cache entries
get_stats()Get cache statistics

Statistics

MetricDescription
hitsNumber of cache hits
missesNumber of cache misses
hit_rateCache hit rate percentage
storesNumber of cache stores
evictionsNumber of evictions
cache_typeCache backend type
current_sizeCurrent cache size