Deploy RecoAgent to Production
Difficulty: ⭐⭐⭐ Advanced | Time: 3 hours
🎯 The Problem
Your RecoAgent prototype works great locally, but you need to deploy it to production. You're worried about:
- Container configuration and orchestration
- Scaling to handle real traffic
- Monitoring and logging
- Zero-downtime updates
- Security and secrets management
This guide solves: Deploying RecoAgent to production with Docker and Kubernetes, including monitoring, scaling, and best practices for reliability.
⚡ TL;DR - Quick Deploy
# 1. Build Docker image
docker build -t recoagent-api:latest -f infra/Dockerfile.api .
# 2. Run with docker-compose (includes OpenSearch, Redis)
docker-compose -f infra/docker-compose.yml up -d
# 3. Verify it's running
curl http://localhost:8000/health
# Should return: {"status": "healthy"}
# 4. Deploy to K8s (if using Kubernetes)
kubectl apply -f infra/k8s/
kubectl get pods -l app=recoagent
# Expected: All pods running, health check passing
Result: Production API running with monitoring and auto-scaling!
Full Deployment Guide
Architecture Overview
Prerequisites
- Docker and Docker Compose installed
- Kubernetes cluster (for K8s deployment)
- kubectl configured
- Domain name and SSL certificates
- Cloud provider account (AWS/Azure/GCP)
- API keys for LLM providers
Step 1: Containerize Your Application
Create Dockerfile
The example Dockerfile is at infra/Dockerfile.api
. Key features:
FROM python:3.11-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY packages/ packages/
COPY apps/api/ apps/api/
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s \
CMD curl -f http://localhost:8000/health || exit 1
# Run with gunicorn for production
CMD ["gunicorn", "apps.api.main:app", "-w", "4", "-k", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:8000"]
Build and Test Locally
# Build image
docker build -t recoagent-api:latest -f infra/Dockerfile.api .
# Test locally
docker run -p 8000:8000 \
-e OPENAI_API_KEY="your-key" \
-e OPENSEARCH_ENDPOINT="http://host.docker.internal:9200" \
recoagent-api:latest
# Test health endpoint
curl http://localhost:8000/health
# Should return: {"status": "healthy", "version": "1.0.0"}
# Test query endpoint
curl -X POST http://localhost:8000/api/query \
-H "Content-Type: application/json" \
-d '{"query": "What is RecoAgent?"}'
Step 2: Docker Compose for Local Staging
Complete Stack Setup
The infra/docker-compose.yml
includes:
version: '3.8'
services:
api:
build:
context: .
dockerfile: infra/Dockerfile.api
ports:
- "8000:8000"
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- OPENSEARCH_ENDPOINT=http://opensearch:9200
- REDIS_URL=redis://redis:6379
depends_on:
- opensearch
- redis
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
opensearch:
image: opensearchproject/opensearch:2.11.0
environment:
- discovery.type=single-node
- DISABLE_SECURITY_PLUGIN=true
ports:
- "9200:9200"
volumes:
- opensearch-data:/usr/share/opensearch/data
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
- redis-data:/data
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
grafana:
image: grafana/grafana:latest
ports:
- "3001:3000"
volumes:
- grafana-data:/var/lib/grafana
volumes:
opensearch-data:
redis-data:
grafana-data:
Deploy with Docker Compose
# Start all services
docker-compose -f infra/docker-compose.yml up -d
# Check status
docker-compose ps
# View logs
docker-compose logs -f api
# Test the stack
curl http://localhost:8000/health
curl http://localhost:9090 # Prometheus
curl http://localhost:3001 # Grafana
Step 3: Kubernetes Deployment
Deploy to K8s
# Apply all manifests
kubectl apply -f infra/k8s/namespace.yaml
kubectl apply -f infra/k8s/secrets.yaml
kubectl apply -f infra/k8s/configmap.yaml
kubectl apply -f infra/k8s/deployment.yaml
kubectl apply -f infra/k8s/service.yaml
kubectl apply -f infra/k8s/ingress.yaml
kubectl apply -f infra/k8s/hpa.yaml # Horizontal Pod Autoscaler
# Check deployment
kubectl get pods -n recoagent
kubectl get svc -n recoagent
kubectl get hpa -n recoagent
# Check logs
kubectl logs -f deployment/recoagent-api -n recoagent
Key K8s Resources
Deployment with Auto-Scaling:
apiVersion: apps/v1
kind: Deployment
metadata:
name: recoagent-api
namespace: recoagent
spec:
replicas: 3
selector:
matchLabels:
app: recoagent-api
template:
spec:
containers:
- name: api
image: recoagent-api:latest
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "2000m"
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 5
periodSeconds: 5
Horizontal Pod Autoscaler:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: recoagent-api-hpa
namespace: recoagent
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: recoagent-api
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Step 4: Configuration & Secrets
Environment Variables
Create .env.production
:
# LLM Configuration
OPENAI_API_KEY=sk-your-key-here
ANTHROPIC_API_KEY=sk-ant-your-key
# Vector Store
OPENSEARCH_ENDPOINT=https://your-cluster.amazonaws.com
OPENSEARCH_USERNAME=admin
OPENSEARCH_PASSWORD=your-secure-password
# Caching
REDIS_URL=redis://redis-cluster:6379
REDIS_PASSWORD=your-redis-password
# Observability
LANGSMITH_API_KEY=your-langsmith-key
LANGSMITH_PROJECT=production
# Application
LOG_LEVEL=INFO
MAX_WORKERS=4
REQUEST_TIMEOUT=60
COST_LIMIT=0.10
Kubernetes Secrets
# Create secrets from .env file
kubectl create secret generic recoagent-secrets \
--from-env-file=.env.production \
-n recoagent
# Or create individual secrets
kubectl create secret generic openai-key \
--from-literal=apikey='your-key' \
-n recoagent
Step 5: Monitoring & Logging
Prometheus Metrics
# Metrics are automatically exposed at /metrics endpoint
# Configure Prometheus to scrape:
# prometheus.yml
scrape_configs:
- job_name: 'recoagent'
static_configs:
- targets: ['recoagent-api:8000']
metrics_path: '/metrics'
scrape_interval: 15s
Grafana Dashboards
Key metrics to track:
Metric | Alert Threshold | Action |
---|---|---|
Request Latency (p95) | > 2s | Scale up or optimize |
Error Rate | > 5% | Check logs, rollback if needed |
Cost per Query | > $0.10 | Review cost optimization |
Cache Hit Rate | < 30% | Check cache TTL settings |
Vector Search Latency | > 200ms | Check OpenSearch health |
Logging Configuration
# apps/api/main.py
import logging
from packages.observability import StructuredLogger
logger = StructuredLogger(
service_name="recoagent-api",
environment="production",
log_level="INFO"
)
# Logs include:
# - Request ID for tracing
# - User ID for analytics
# - Latency, cost, model used
# - Retrieved documents count
# - Error details if any
Step 6: Scaling & Performance
Horizontal Pod Autoscaler Settings
# Scale based on:
# - CPU: Target 70% utilization
# - Memory: Target 80% utilization
# - Custom: Requests per second
# Scaling behavior:
# - Scale up: When CPU > 70% for 30 seconds
# - Scale down: When CPU < 50% for 5 minutes
# - Min replicas: 3 (for high availability)
# - Max replicas: 10 (cost control)
Performance Tuning
Setting | Development | Production | Why |
---|---|---|---|
Workers | 1 | 4-8 | Handle concurrent requests |
Timeout | 120s | 60s | Prevent hanging requests |
Cache TTL | 300s | 3600s | Balance freshness vs cost |
Max tokens | 2000 | 1500 | Control costs |
Retry attempts | 1 | 3 | Handle transient errors |
Step 7: Security Checklist
- ✅ API keys stored in secrets (not env vars)
- ✅ TLS/SSL enabled on all endpoints
- ✅ Rate limiting configured (100 req/min per user)
- ✅ Input sanitization enabled
- ✅ Output filtering for PII
- ✅ Network policies restrict pod communication
- ✅ RBAC configured for K8s access
- ✅ Security scanning in CI/CD pipeline
Step 8: CI/CD Pipeline
GitHub Actions Example
name: Deploy to Production
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run Tests
run: |
pip install -r requirements.txt
pytest tests/ --cov=packages
- name: Build Docker Image
run: |
docker build -t recoagent-api:${{ github.sha }} .
docker tag recoagent-api:${{ github.sha }} recoagent-api:latest
- name: Push to Registry
run: |
echo "${{ secrets.DOCKER_PASSWORD }}" | docker login -u "${{ secrets.DOCKER_USERNAME }}" --password-stdin
docker push recoagent-api:latest
- name: Deploy to K8s
run: |
kubectl set image deployment/recoagent-api api=recoagent-api:${{ github.sha }} -n recoagent
kubectl rollout status deployment/recoagent-api -n recoagent
Step 9: Health Checks & Readiness
Health Check Endpoint
# apps/api/main.py
@app.get("/health")
async def health_check():
"""Comprehensive health check"""
checks = {
"api": "healthy",
"opensearch": await check_opensearch(),
"redis": await check_redis(),
"llm": await check_llm_api(),
}
all_healthy = all(v == "healthy" for v in checks.values())
status_code = 200 if all_healthy else 503
return JSONResponse(
content={
"status": "healthy" if all_healthy else "degraded",
"checks": checks,
"version": "1.0.0",
"timestamp": datetime.utcnow().isoformat()
},
status_code=status_code
)
Step 10: Rollback Strategy
Quick Rollback
# If deployment fails or has issues:
# Option 1: Rollback in K8s
kubectl rollout undo deployment/recoagent-api -n recoagent
# Option 2: Redeploy previous version
kubectl set image deployment/recoagent-api api=recoagent-api:previous-sha -n recoagent
# Option 3: Scale down bad deployment
kubectl scale deployment/recoagent-api --replicas=0 -n recoagent
# Then fix and redeploy
Deployment Best Practices
Practice | Why | How |
---|---|---|
Blue-Green Deployment | Zero downtime | Deploy to "green", switch traffic, keep "blue" for rollback |
Canary Releases | Test with small traffic | Route 5% to new version, monitor, then 100% |
Database Migrations | Backward compatibility | Run migrations before code deploy, test rollback |
Feature Flags | Safe rollout | Enable new features gradually per user segment |
Success Criteria
After deployment, verify:
- ✅ All pods are running and ready
- ✅ Health check returns 200 OK
- ✅ Can query the API successfully
- ✅ Metrics appearing in Prometheus
- ✅ Logs flowing to your logging system
- ✅ Auto-scaling works (test with load)
- ✅ SSL certificate valid
- ✅ Rate limiting enforced
Troubleshooting Production Issues
Issue | Symptoms | Solution |
---|---|---|
Pods CrashLooping | Pod restarts repeatedly | Check logs: kubectl logs pod-name Common: Missing secrets, OOM |
503 Errors | Service unavailable | Check health endpoint, verify backends are up |
High Latency | Slow responses | Check OpenSearch performance, add caching, scale up |
Out of Memory | Pods killed | Increase memory limits, reduce context size |
ImagePullBackOff | Can't pull image | Check registry credentials, image name |
Monitoring Dashboard
Key metrics to display:
Production Dashboard (Grafana):
┌─────────────────────────────────────────────┐
│ Requests/min: 850 (▲ 12% from avg) │
│ Avg Latency: 1.2s (target: <2s) ✅ │
│ Error Rate: 0.8% (target: <5%) ✅ │
│ Cost/hour: $12 (budget: $15) ✅ │
└─────────────────────────────────────────────┘
┌─── Pods ───────────────────────────────────┐
│ Running: 5/5 ✅ │
│ CPU: 65% ✅ (target: <80%) │
│ Memory: 72% ✅ (target: <85%) │
│ Restarts: 0 ✅ │
└─────────────────────────────────────────────┘
┌─── Dependencies ───────────────────────────┐
│ OpenSearch: ✅ Healthy (45ms avg) │
│ Redis: ✅ Healthy (2ms avg) │
│ OpenAI API: ✅ Healthy │
└─────────────────────────────────────────────┘
What You've Accomplished
✅ Containerized your RecoAgent application
✅ Deployed with Docker Compose for local staging
✅ Orchestrated with Kubernetes for production
✅ Configured auto-scaling based on load
✅ Set up comprehensive monitoring and logging
✅ Implemented health checks and rollback strategy
✅ Secured with secrets management and network policies
Next Steps
- 🔒 Handle Authentication - Secure your API
- 🛡️ Implement Guardrails - Add safety controls
- 📊 Production Monitoring Best Practices - Advanced observability
- 🚀 Scale for High Traffic - Handle millions of requests