Rate Limiting System
This example demonstrates the production-grade rate limiting and cost management system implemented in RecoAgent.
Overview
The rate limiting system provides comprehensive API rate limiting with:
- Token-based throttling with Redis
- User tier management with quotas
- Provider-specific pricing and cost tracking
- Intelligent queuing with priority handling
- Cost-based throttling and budget enforcement
Prerequisites
Before running this example, ensure you have:
- Redis server running on localhost:6379
- Python dependencies installed
- Environment variables configured
Start Redis
# Install Redis (Ubuntu/Debian)
sudo apt-get install redis-server
# Start Redis
redis-server
# Or using Docker
docker run -d -p 6379:6379 redis:alpine
Install Dependencies
pip install redis>=5.0.0
Running the Example
Basic Example
# Run the rate limiting demo
python examples/rate_limiting_example.py
Expected Output
🚀 RecoAgent Rate Limiting System Demo
==================================================
✅ Rate limiting service initialized
📊 Demo 1: User Tier Management
------------------------------
✅ Set user demo_user_123 to PREMIUM tier
📋 Tier quotas: 100 req/min
📋 Allowed models: {'gpt-4', 'gpt-4-turbo', 'text-embedding-3-large', ...}
🚦 Demo 2: Rate Limiting Check
------------------------------
✅ ALLOWED gpt-3.5-turbo: rate_limit_passed
✅ ALLOWED gpt-4: rate_limit_passed
✅ ALLOWED claude-3-sonnet: rate_limit_passed
💰 Demo 3: Cost Calculation
------------------------------
💵 gpt-3.5-turbo: $0.0020 for 1000 input + 500 output tokens
💵 gpt-4: $0.0600 for 1000 input + 500 output tokens
💵 claude-3-sonnet: $0.0075 for 1000 input + 500 output tokens
📝 Demo 4: Request Recording
------------------------------
✅ Request 1 recorded
✅ Request 2 recorded
✅ Request 3 recorded
✅ Request 4 recorded
✅ Request 5 recorded
📊 Demo 5: Usage Tracking
------------------------------
👤 User: demo_user_123
🏷️ Tier: premium
💵 Daily cost: $0.0050
📈 Total requests: 5
🔍 Demo 6: System Status
------------------------------
📊 Queue status: 5 queues
🏷️ Available tiers: 4
💰 Pricing models: 8
⚡ Demo 7: Burst Traffic Simulation
------------------------------
🚀 Burst test: 20/20 requests allowed
🔄 Demo 8: Tier Auto-Adjustment
------------------------------
🔄 Auto-adjustment: ❌ Not needed
🛡️ Demo 9: Error Handling
------------------------------
🔍 Invalid model test: rate_limit_passed
💰 High cost test: rate_limit_passed
🎉 Demo completed successfully!
==================================================
🔧 Admin Operations Demo
==============================
✅ Updated user admin_demo_user to ENTERPRISE tier
✅ Reset daily usage for user admin_demo_user
🔄 Auto-adjustment check: Not needed
📊 System status retrieved: 4 components
Code Walkthrough
1. Service Initialization
# Initialize Redis client
redis_client = redis.from_url("redis://localhost:6379", decode_responses=True)
# Initialize components
tier_manager = UserTierManager()
pricing_manager = ProviderPricingManager()
# Configure budget
budget_config = BudgetConfig(
daily_budget=100.0,
monthly_budget=1000.0,
per_request_budget=1.0
)
# Initialize rate limiting service
service = RateLimitingService(
redis_client=redis_client,
tier_manager=tier_manager,
pricing_manager=pricing_manager,
budget_config=budget_config,
queue_config=queue_config
)
2. User Tier Management
# Set user tier
await service.update_user_tier(user_id, UserTier.PREMIUM)
# Check tier configuration
tier_config = tier_manager.get_user_config(user_id)
print(f"Requests per minute: {tier_config.quotas.requests_per_minute}")
print(f"Allowed models: {list(tier_config.allowed_models)}")
3. Rate Limiting Check
# Check rate limit for a request
result = await service.check_rate_limit(
user_id=user_id,
model_name="gpt-4",
input_tokens=1000,
output_tokens=500
)
if result.allowed:
print("✅ Request allowed")
else:
print(f"❌ Request blocked: {result.reason}")
4. Cost Calculation
# Calculate cost for different models
models = ["gpt-3.5-turbo", "gpt-4", "claude-3-sonnet"]
for model in models:
cost = pricing_manager.calculate_cost(model, 1000, 500)
print(f"{model}: ${cost:.4f}")
5. Request Recording
# Record a completed request
await service.record_request(
user_id=user_id,
model_name="gpt-3.5-turbo",
input_tokens=100,
output_tokens=50,
actual_cost=0.001
)
6. Usage Tracking
# Get user usage information
usage = await service.get_user_usage(user_id)
print(f"Daily cost: ${usage['cost_usage']['daily_cost']:.4f}")
print(f"Total requests: {usage['cost_usage']['total_requests']}")
7. Admin Operations
# Update user tier
await service.update_user_tier(user_id, UserTier.ENTERPRISE)
# Reset user usage
await service.reset_user_usage(user_id, "daily")
# Check auto-adjustment
adjusted = await service.check_auto_tier_adjustment(user_id)
Advanced Usage
Custom Configuration
# Custom budget configuration
budget_config = BudgetConfig(
daily_budget=500.0,
monthly_budget=5000.0,
per_request_budget=5.0,
soft_threshold=0.8, # 80% of budget
hard_threshold=0.95, # 95% of budget
enable_fallback=True,
fallback_models=["gpt-3.5-turbo"]
)
# Custom queue configuration
queue_config = QueueConfig(
max_queue_size=50000,
max_processing_requests=500,
processing_timeout=600.0, # 10 minutes
retry_delay_base=2.0,
max_retry_delay=600.0
)
Custom User Tiers
# Create custom tier
from packages.rate_limiting import TierConfig, TierQuotas, UserTier
custom_quotas = TierQuotas(
requests_per_minute=200,
requests_per_hour=2000,
requests_per_day=20000,
max_tokens_per_request=4000,
max_cost_per_request=0.50
)
custom_tier = TierConfig(
tier=UserTier.ENTERPRISE,
quotas=custom_quotas,
allowed_models={"gpt-4", "gpt-4-turbo", "claude-3-opus"},
allowed_features={"advanced_retrieval", "code_generation"},
priority=5,
cost_multiplier=0.8
)
# Update tier configuration
tier_manager.update_tier_config(UserTier.ENTERPRISE, custom_tier)
Custom Provider Pricing
# Add custom model pricing
from packages.rate_limiting import ModelPricing, Provider, ModelType
custom_pricing = ModelPricing(
provider=Provider.OPENAI,
model_name="gpt-4-custom",
model_type=ModelType.CHAT,
input_price_per_1k=0.02,
output_price_per_1k=0.04,
context_window=8192,
requests_per_minute=1000,
tokens_per_minute=200000
)
pricing_manager.update_pricing("gpt-4-custom", custom_pricing)
Integration with FastAPI
Basic Integration
from fastapi import FastAPI
from packages.rate_limiting import RateLimitingMiddleware
app = FastAPI()
# Add rate limiting middleware
app.add_middleware(RateLimitingMiddleware, rate_limiting_service=service)
@app.post("/query")
async def query(request: QueryRequest):
# Rate limiting is handled automatically by middleware
return {"answer": "Your response"}
Admin Endpoints
from packages.rate_limiting import AdminRateLimitingMiddleware
admin_middleware = AdminRateLimitingMiddleware(service)
@app.get("/admin/rate-limiting/status")
async def get_status():
return await admin_middleware.get_system_status()
@app.get("/admin/rate-limiting/users/{user_id}/usage")
async def get_user_usage(user_id: str):
return await admin_middleware.get_user_usage(user_id)
Monitoring and Observability
Prometheus Metrics
The system exposes Prometheus metrics for monitoring:
# Access metrics endpoint
curl http://localhost:8000/metrics/prometheus
Grafana Dashboards
Create dashboards for:
- Request rates by tier
- Cost tracking and budgets
- Queue performance
- Rate limit violations
Logging
Structured logging is available:
import structlog
# Configure logging
structlog.configure(
processors=[
structlog.dev.ConsoleRenderer()
],
wrapper_class=structlog.make_filtering_bound_logger(20),
logger_factory=structlog.PrintLoggerFactory(),
cache_logger_on_first_use=True,
)
Troubleshooting
Common Issues
-
Redis Connection Error
# Check Redis status
redis-cli ping
# Start Redis if not running
redis-server -
Rate Limit False Positives
- Check tier configuration
- Verify quota calculations
- Review burst settings
-
Cost Calculation Errors
- Verify provider pricing data
- Check token counting accuracy
- Review tier cost multipliers
Debug Mode
Enable debug logging:
import logging
logging.basicConfig(level=logging.DEBUG)
Production Deployment
Docker Compose
version: '3.8'
services:
redis:
image: redis:7-alpine
ports:
- "6379:6379"
command: redis-server --maxmemory 2gb --maxmemory-policy allkeys-lru
api:
build: .
ports:
- "8000:8000"
environment:
- REDIS_URL=redis://redis:6379
- DAILY_BUDGET=1000.0
- MONTHLY_BUDGET=10000.0
depends_on:
- redis
Environment Variables
# Rate Limiting Configuration
REDIS_URL=redis://localhost:6379
DAILY_BUDGET=100.0
MONTHLY_BUDGET=1000.0
PER_REQUEST_BUDGET=1.0
MAX_QUEUE_SIZE=10000
MAX_PROCESSING_REQUESTS=100
Next Steps
- Customize Tiers: Adjust tier configurations for your use case
- Set Budgets: Configure appropriate budget limits
- Monitor Usage: Set up monitoring and alerting
- Scale: Deploy to production with proper Redis setup
- Optimize: Fine-tune based on usage patterns
For more detailed information, see the Rate Limiting Guide.