Skip to main content

Rate Limiting System

This example demonstrates the production-grade rate limiting and cost management system implemented in RecoAgent.

Overview

The rate limiting system provides comprehensive API rate limiting with:

  • Token-based throttling with Redis
  • User tier management with quotas
  • Provider-specific pricing and cost tracking
  • Intelligent queuing with priority handling
  • Cost-based throttling and budget enforcement

Prerequisites

Before running this example, ensure you have:

  1. Redis server running on localhost:6379
  2. Python dependencies installed
  3. Environment variables configured

Start Redis

# Install Redis (Ubuntu/Debian)
sudo apt-get install redis-server

# Start Redis
redis-server

# Or using Docker
docker run -d -p 6379:6379 redis:alpine

Install Dependencies

pip install redis>=5.0.0

Running the Example

Basic Example

# Run the rate limiting demo
python examples/rate_limiting_example.py

Expected Output

🚀 RecoAgent Rate Limiting System Demo
==================================================
✅ Rate limiting service initialized

📊 Demo 1: User Tier Management
------------------------------
✅ Set user demo_user_123 to PREMIUM tier
📋 Tier quotas: 100 req/min
📋 Allowed models: {'gpt-4', 'gpt-4-turbo', 'text-embedding-3-large', ...}

🚦 Demo 2: Rate Limiting Check
------------------------------
✅ ALLOWED gpt-3.5-turbo: rate_limit_passed
✅ ALLOWED gpt-4: rate_limit_passed
✅ ALLOWED claude-3-sonnet: rate_limit_passed

💰 Demo 3: Cost Calculation
------------------------------
💵 gpt-3.5-turbo: $0.0020 for 1000 input + 500 output tokens
💵 gpt-4: $0.0600 for 1000 input + 500 output tokens
💵 claude-3-sonnet: $0.0075 for 1000 input + 500 output tokens

📝 Demo 4: Request Recording
------------------------------
✅ Request 1 recorded
✅ Request 2 recorded
✅ Request 3 recorded
✅ Request 4 recorded
✅ Request 5 recorded

📊 Demo 5: Usage Tracking
------------------------------
👤 User: demo_user_123
🏷️ Tier: premium
💵 Daily cost: $0.0050
📈 Total requests: 5

🔍 Demo 6: System Status
------------------------------
📊 Queue status: 5 queues
🏷️ Available tiers: 4
💰 Pricing models: 8

⚡ Demo 7: Burst Traffic Simulation
------------------------------
🚀 Burst test: 20/20 requests allowed

🔄 Demo 8: Tier Auto-Adjustment
------------------------------
🔄 Auto-adjustment: ❌ Not needed

🛡️ Demo 9: Error Handling
------------------------------
🔍 Invalid model test: rate_limit_passed
💰 High cost test: rate_limit_passed

🎉 Demo completed successfully!
==================================================

🔧 Admin Operations Demo
==============================
✅ Updated user admin_demo_user to ENTERPRISE tier
✅ Reset daily usage for user admin_demo_user
🔄 Auto-adjustment check: Not needed
📊 System status retrieved: 4 components

Code Walkthrough

1. Service Initialization

# Initialize Redis client
redis_client = redis.from_url("redis://localhost:6379", decode_responses=True)

# Initialize components
tier_manager = UserTierManager()
pricing_manager = ProviderPricingManager()

# Configure budget
budget_config = BudgetConfig(
daily_budget=100.0,
monthly_budget=1000.0,
per_request_budget=1.0
)

# Initialize rate limiting service
service = RateLimitingService(
redis_client=redis_client,
tier_manager=tier_manager,
pricing_manager=pricing_manager,
budget_config=budget_config,
queue_config=queue_config
)

2. User Tier Management

# Set user tier
await service.update_user_tier(user_id, UserTier.PREMIUM)

# Check tier configuration
tier_config = tier_manager.get_user_config(user_id)
print(f"Requests per minute: {tier_config.quotas.requests_per_minute}")
print(f"Allowed models: {list(tier_config.allowed_models)}")

3. Rate Limiting Check

# Check rate limit for a request
result = await service.check_rate_limit(
user_id=user_id,
model_name="gpt-4",
input_tokens=1000,
output_tokens=500
)

if result.allowed:
print("✅ Request allowed")
else:
print(f"❌ Request blocked: {result.reason}")

4. Cost Calculation

# Calculate cost for different models
models = ["gpt-3.5-turbo", "gpt-4", "claude-3-sonnet"]

for model in models:
cost = pricing_manager.calculate_cost(model, 1000, 500)
print(f"{model}: ${cost:.4f}")

5. Request Recording

# Record a completed request
await service.record_request(
user_id=user_id,
model_name="gpt-3.5-turbo",
input_tokens=100,
output_tokens=50,
actual_cost=0.001
)

6. Usage Tracking

# Get user usage information
usage = await service.get_user_usage(user_id)
print(f"Daily cost: ${usage['cost_usage']['daily_cost']:.4f}")
print(f"Total requests: {usage['cost_usage']['total_requests']}")

7. Admin Operations

# Update user tier
await service.update_user_tier(user_id, UserTier.ENTERPRISE)

# Reset user usage
await service.reset_user_usage(user_id, "daily")

# Check auto-adjustment
adjusted = await service.check_auto_tier_adjustment(user_id)

Advanced Usage

Custom Configuration

# Custom budget configuration
budget_config = BudgetConfig(
daily_budget=500.0,
monthly_budget=5000.0,
per_request_budget=5.0,
soft_threshold=0.8, # 80% of budget
hard_threshold=0.95, # 95% of budget
enable_fallback=True,
fallback_models=["gpt-3.5-turbo"]
)

# Custom queue configuration
queue_config = QueueConfig(
max_queue_size=50000,
max_processing_requests=500,
processing_timeout=600.0, # 10 minutes
retry_delay_base=2.0,
max_retry_delay=600.0
)

Custom User Tiers

# Create custom tier
from packages.rate_limiting import TierConfig, TierQuotas, UserTier

custom_quotas = TierQuotas(
requests_per_minute=200,
requests_per_hour=2000,
requests_per_day=20000,
max_tokens_per_request=4000,
max_cost_per_request=0.50
)

custom_tier = TierConfig(
tier=UserTier.ENTERPRISE,
quotas=custom_quotas,
allowed_models={"gpt-4", "gpt-4-turbo", "claude-3-opus"},
allowed_features={"advanced_retrieval", "code_generation"},
priority=5,
cost_multiplier=0.8
)

# Update tier configuration
tier_manager.update_tier_config(UserTier.ENTERPRISE, custom_tier)

Custom Provider Pricing

# Add custom model pricing
from packages.rate_limiting import ModelPricing, Provider, ModelType

custom_pricing = ModelPricing(
provider=Provider.OPENAI,
model_name="gpt-4-custom",
model_type=ModelType.CHAT,
input_price_per_1k=0.02,
output_price_per_1k=0.04,
context_window=8192,
requests_per_minute=1000,
tokens_per_minute=200000
)

pricing_manager.update_pricing("gpt-4-custom", custom_pricing)

Integration with FastAPI

Basic Integration

from fastapi import FastAPI
from packages.rate_limiting import RateLimitingMiddleware

app = FastAPI()

# Add rate limiting middleware
app.add_middleware(RateLimitingMiddleware, rate_limiting_service=service)

@app.post("/query")
async def query(request: QueryRequest):
# Rate limiting is handled automatically by middleware
return {"answer": "Your response"}

Admin Endpoints

from packages.rate_limiting import AdminRateLimitingMiddleware

admin_middleware = AdminRateLimitingMiddleware(service)

@app.get("/admin/rate-limiting/status")
async def get_status():
return await admin_middleware.get_system_status()

@app.get("/admin/rate-limiting/users/{user_id}/usage")
async def get_user_usage(user_id: str):
return await admin_middleware.get_user_usage(user_id)

Monitoring and Observability

Prometheus Metrics

The system exposes Prometheus metrics for monitoring:

# Access metrics endpoint
curl http://localhost:8000/metrics/prometheus

Grafana Dashboards

Create dashboards for:

  • Request rates by tier
  • Cost tracking and budgets
  • Queue performance
  • Rate limit violations

Logging

Structured logging is available:

import structlog

# Configure logging
structlog.configure(
processors=[
structlog.dev.ConsoleRenderer()
],
wrapper_class=structlog.make_filtering_bound_logger(20),
logger_factory=structlog.PrintLoggerFactory(),
cache_logger_on_first_use=True,
)

Troubleshooting

Common Issues

  1. Redis Connection Error

    # Check Redis status
    redis-cli ping

    # Start Redis if not running
    redis-server
  2. Rate Limit False Positives

    • Check tier configuration
    • Verify quota calculations
    • Review burst settings
  3. Cost Calculation Errors

    • Verify provider pricing data
    • Check token counting accuracy
    • Review tier cost multipliers

Debug Mode

Enable debug logging:

import logging
logging.basicConfig(level=logging.DEBUG)

Production Deployment

Docker Compose

version: '3.8'
services:
redis:
image: redis:7-alpine
ports:
- "6379:6379"
command: redis-server --maxmemory 2gb --maxmemory-policy allkeys-lru

api:
build: .
ports:
- "8000:8000"
environment:
- REDIS_URL=redis://redis:6379
- DAILY_BUDGET=1000.0
- MONTHLY_BUDGET=10000.0
depends_on:
- redis

Environment Variables

# Rate Limiting Configuration
REDIS_URL=redis://localhost:6379
DAILY_BUDGET=100.0
MONTHLY_BUDGET=1000.0
PER_REQUEST_BUDGET=1.0
MAX_QUEUE_SIZE=10000
MAX_PROCESSING_REQUESTS=100

Next Steps

  1. Customize Tiers: Adjust tier configurations for your use case
  2. Set Budgets: Configure appropriate budget limits
  3. Monitor Usage: Set up monitoring and alerting
  4. Scale: Deploy to production with proper Redis setup
  5. Optimize: Fine-tune based on usage patterns

For more detailed information, see the Rate Limiting Guide.