Deploy RecoAgent to Production

Difficulty: ⭐⭐⭐ Advanced | Time: 3 hours

🎯 The Problem

Your RecoAgent prototype works great locally, but you need to deploy it to production. You're worried about:

Container configuration and orchestration
Scaling to handle real traffic
Monitoring and logging
Zero-downtime updates
Security and secrets management

This guide solves: Deploying RecoAgent to production with Docker and Kubernetes, including monitoring, scaling, and best practices for reliability.

⚡ TL;DR - Quick Deploy

# 1. Build Docker image
docker build -t recoagent-api:latest -f infra/Dockerfile.api .

# 2. Run with docker-compose (includes OpenSearch, Redis)
docker-compose -f infra/docker-compose.yml up -d

# 3. Verify it's running
curl http://localhost:8000/health
# Should return: {"status": "healthy"}

# 4. Deploy to K8s (if using Kubernetes)
kubectl apply -f infra/k8s/
kubectl get pods -l app=recoagent

# Expected: All pods running, health check passing

Result: Production API running with monitoring and auto-scaling!

Full Deployment Guide

Architecture Overview

Prerequisites

Docker and Docker Compose installed
Kubernetes cluster (for K8s deployment)
kubectl configured
Domain name and SSL certificates
Cloud provider account (AWS/Azure/GCP)
API keys for LLM providers

Step 1: Containerize Your Application

Create Dockerfile

The example Dockerfile is at infra/Dockerfile.api. Key features:

FROM python:3.11-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY packages/ packages/
COPY apps/api/ apps/api/

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s \
  CMD curl -f http://localhost:8000/health || exit 1

# Run with gunicorn for production
CMD ["gunicorn", "apps.api.main:app", "-w", "4", "-k", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:8000"]

Build and Test Locally

# Build image
docker build -t recoagent-api:latest -f infra/Dockerfile.api .

# Test locally
docker run -p 8000:8000 \
  -e OPENAI_API_KEY="your-key" \
  -e OPENSEARCH_ENDPOINT="http://host.docker.internal:9200" \
  recoagent-api:latest

# Test health endpoint
curl http://localhost:8000/health
# Should return: {"status": "healthy", "version": "1.0.0"}

# Test query endpoint
curl -X POST http://localhost:8000/api/query \
  -H "Content-Type: application/json" \
  -d '{"query": "What is RecoAgent?"}'

Step 2: Docker Compose for Local Staging

Complete Stack Setup

The infra/docker-compose.yml includes:

version: '3.8'

services:
  api:
    build:
      context: .
      dockerfile: infra/Dockerfile.api
    ports:
      - "8000:8000"
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - OPENSEARCH_ENDPOINT=http://opensearch:9200
      - REDIS_URL=redis://redis:6379
    depends_on:
      - opensearch
      - redis
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  opensearch:
    image: opensearchproject/opensearch:2.11.0
    environment:
      - discovery.type=single-node
      - DISABLE_SECURITY_PLUGIN=true
    ports:
      - "9200:9200"
    volumes:
      - opensearch-data:/usr/share/opensearch/data

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis-data:/data

  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3001:3000"
    volumes:
      - grafana-data:/var/lib/grafana

volumes:
  opensearch-data:
  redis-data:
  grafana-data:

Deploy with Docker Compose

# Start all services
docker-compose -f infra/docker-compose.yml up -d

# Check status
docker-compose ps

# View logs
docker-compose logs -f api

# Test the stack
curl http://localhost:8000/health
curl http://localhost:9090  # Prometheus
curl http://localhost:3001  # Grafana

Step 3: Kubernetes Deployment

Deploy to K8s

# Apply all manifests
kubectl apply -f infra/k8s/namespace.yaml
kubectl apply -f infra/k8s/secrets.yaml
kubectl apply -f infra/k8s/configmap.yaml
kubectl apply -f infra/k8s/deployment.yaml
kubectl apply -f infra/k8s/service.yaml
kubectl apply -f infra/k8s/ingress.yaml
kubectl apply -f infra/k8s/hpa.yaml  # Horizontal Pod Autoscaler

# Check deployment
kubectl get pods -n recoagent
kubectl get svc -n recoagent
kubectl get hpa -n recoagent

# Check logs
kubectl logs -f deployment/recoagent-api -n recoagent

Key K8s Resources

Deployment with Auto-Scaling:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: recoagent-api
  namespace: recoagent
spec:
  replicas: 3
  selector:
    matchLabels:
      app: recoagent-api
  template:
    spec:
      containers:
      - name: api
        image: recoagent-api:latest
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "2000m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 5

Horizontal Pod Autoscaler:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: recoagent-api-hpa
  namespace: recoagent
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: recoagent-api
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Step 4: Configuration & Secrets

Environment Variables

Create .env.production:

# LLM Configuration
OPENAI_API_KEY=sk-your-key-here
ANTHROPIC_API_KEY=sk-ant-your-key

# Vector Store
OPENSEARCH_ENDPOINT=https://your-cluster.amazonaws.com
OPENSEARCH_USERNAME=admin
OPENSEARCH_PASSWORD=your-secure-password

# Caching
REDIS_URL=redis://redis-cluster:6379
REDIS_PASSWORD=your-redis-password

# Observability
LANGSMITH_API_KEY=your-langsmith-key
LANGSMITH_PROJECT=production

# Application
LOG_LEVEL=INFO
MAX_WORKERS=4
REQUEST_TIMEOUT=60
COST_LIMIT=0.10

Kubernetes Secrets

# Create secrets from .env file
kubectl create secret generic recoagent-secrets \
  --from-env-file=.env.production \
  -n recoagent

# Or create individual secrets
kubectl create secret generic openai-key \
  --from-literal=apikey='your-key' \
  -n recoagent

Step 5: Monitoring & Logging

Prometheus Metrics

# Metrics are automatically exposed at /metrics endpoint
# Configure Prometheus to scrape:

# prometheus.yml
scrape_configs:
  - job_name: 'recoagent'
    static_configs:
      - targets: ['recoagent-api:8000']
    metrics_path: '/metrics'
    scrape_interval: 15s

Grafana Dashboards

Key metrics to track:

Metric	Alert Threshold	Action
Request Latency (p95)	> 2s	Scale up or optimize
Error Rate	> 5%	Check logs, rollback if needed
Cost per Query	> $0.10	Review cost optimization
Cache Hit Rate	< 30%	Check cache TTL settings
Vector Search Latency	> 200ms	Check OpenSearch health

Logging Configuration

# apps/api/main.py
import logging
from packages.observability import StructuredLogger

logger = StructuredLogger(
    service_name="recoagent-api",
    environment="production",
    log_level="INFO"
)

# Logs include:
# - Request ID for tracing
# - User ID for analytics
# - Latency, cost, model used
# - Retrieved documents count
# - Error details if any

Step 6: Scaling & Performance

Horizontal Pod Autoscaler Settings

# Scale based on:
# - CPU: Target 70% utilization
# - Memory: Target 80% utilization
# - Custom: Requests per second

# Scaling behavior:
# - Scale up: When CPU > 70% for 30 seconds
# - Scale down: When CPU < 50% for 5 minutes
# - Min replicas: 3 (for high availability)
# - Max replicas: 10 (cost control)

Performance Tuning

Setting	Development	Production	Why
Workers	1	4-8	Handle concurrent requests
Timeout	120s	60s	Prevent hanging requests
Cache TTL	300s	3600s	Balance freshness vs cost
Max tokens	2000	1500	Control costs
Retry attempts	1	3	Handle transient errors

Step 7: Security Checklist

✅ API keys stored in secrets (not env vars)
✅ TLS/SSL enabled on all endpoints
✅ Rate limiting configured (100 req/min per user)
✅ Input sanitization enabled
✅ Output filtering for PII
✅ Network policies restrict pod communication
✅ RBAC configured for K8s access
✅ Security scanning in CI/CD pipeline

Step 8: CI/CD Pipeline

GitHub Actions Example

name: Deploy to Production

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Run Tests
        run: |
          pip install -r requirements.txt
          pytest tests/ --cov=packages
      
      - name: Build Docker Image
        run: |
          docker build -t recoagent-api:${{ github.sha }} .
          docker tag recoagent-api:${{ github.sha }} recoagent-api:latest
      
      - name: Push to Registry
        run: |
          echo "${{ secrets.DOCKER_PASSWORD }}" | docker login -u "${{ secrets.DOCKER_USERNAME }}" --password-stdin
          docker push recoagent-api:latest
      
      - name: Deploy to K8s
        run: |
          kubectl set image deployment/recoagent-api api=recoagent-api:${{ github.sha }} -n recoagent
          kubectl rollout status deployment/recoagent-api -n recoagent

Step 9: Health Checks & Readiness

Health Check Endpoint

# apps/api/main.py
@app.get("/health")
async def health_check():
    """Comprehensive health check"""
    checks = {
        "api": "healthy",
        "opensearch": await check_opensearch(),
        "redis": await check_redis(),
        "llm": await check_llm_api(),
    }
    
    all_healthy = all(v == "healthy" for v in checks.values())
    status_code = 200 if all_healthy else 503
    
    return JSONResponse(
        content={
            "status": "healthy" if all_healthy else "degraded",
            "checks": checks,
            "version": "1.0.0",
            "timestamp": datetime.utcnow().isoformat()
        },
        status_code=status_code
    )

Step 10: Rollback Strategy

Quick Rollback

# If deployment fails or has issues:

# Option 1: Rollback in K8s
kubectl rollout undo deployment/recoagent-api -n recoagent

# Option 2: Redeploy previous version
kubectl set image deployment/recoagent-api api=recoagent-api:previous-sha -n recoagent

# Option 3: Scale down bad deployment
kubectl scale deployment/recoagent-api --replicas=0 -n recoagent
# Then fix and redeploy

Deployment Best Practices

Practice	Why	How
Blue-Green Deployment	Zero downtime	Deploy to "green", switch traffic, keep "blue" for rollback
Canary Releases	Test with small traffic	Route 5% to new version, monitor, then 100%
Database Migrations	Backward compatibility	Run migrations before code deploy, test rollback
Feature Flags	Safe rollout	Enable new features gradually per user segment

Success Criteria

After deployment, verify:

✅ All pods are running and ready
✅ Health check returns 200 OK
✅ Can query the API successfully
✅ Metrics appearing in Prometheus
✅ Logs flowing to your logging system
✅ Auto-scaling works (test with load)
✅ SSL certificate valid
✅ Rate limiting enforced

Troubleshooting Production Issues

Issue	Symptoms	Solution
Pods CrashLooping	Pod restarts repeatedly	Check logs: `kubectl logs pod-name` Common: Missing secrets, OOM
503 Errors	Service unavailable	Check health endpoint, verify backends are up
High Latency	Slow responses	Check OpenSearch performance, add caching, scale up
Out of Memory	Pods killed	Increase memory limits, reduce context size
ImagePullBackOff	Can't pull image	Check registry credentials, image name

Monitoring Dashboard

Key metrics to display:

Production Dashboard (Grafana):

┌─────────────────────────────────────────────┐
│ Requests/min:     850    (▲ 12% from avg)  │
│ Avg Latency:      1.2s   (target: <2s) ✅  │
│ Error Rate:       0.8%   (target: <5%) ✅  │
│ Cost/hour:        $12    (budget: $15) ✅   │
└─────────────────────────────────────────────┘

┌─── Pods ───────────────────────────────────┐
│ Running:  5/5  ✅                           │
│ CPU:      65%  ✅  (target: <80%)          │
│ Memory:   72%  ✅  (target: <85%)          │
│ Restarts: 0    ✅                           │
└─────────────────────────────────────────────┘

┌─── Dependencies ───────────────────────────┐
│ OpenSearch:  ✅ Healthy (45ms avg)         │
│ Redis:       ✅ Healthy (2ms avg)          │
│ OpenAI API:  ✅ Healthy                     │
└─────────────────────────────────────────────┘

What You've Accomplished

✅ Containerized your RecoAgent application
✅ Deployed with Docker Compose for local staging
✅ Orchestrated with Kubernetes for production
✅ Configured auto-scaling based on load
✅ Set up comprehensive monitoring and logging
✅ Implemented health checks and rollback strategy
✅ Secured with secrets management and network policies

Next Steps

🔒 Handle Authentication - Secure your API
🛡️ Implement Guardrails - Add safety controls
📊 Production Monitoring Best Practices - Advanced observability
🚀 Scale for High Traffic - Handle millions of requests

🎯 The Problem​

⚡ TL;DR - Quick Deploy​

Full Deployment Guide​

Architecture Overview​

Prerequisites​

Step 1: Containerize Your Application​

Create Dockerfile​

Build and Test Locally​

Step 2: Docker Compose for Local Staging​

Complete Stack Setup​

Deploy with Docker Compose​

Step 3: Kubernetes Deployment​

Deploy to K8s​

Key K8s Resources​

Step 4: Configuration & Secrets​

Environment Variables​

Kubernetes Secrets​

Step 5: Monitoring & Logging​

Prometheus Metrics​

Grafana Dashboards​

Logging Configuration​

Step 6: Scaling & Performance​

Horizontal Pod Autoscaler Settings​

Performance Tuning​

Step 7: Security Checklist​

Step 8: CI/CD Pipeline​

GitHub Actions Example​

Step 9: Health Checks & Readiness​

Health Check Endpoint​

Step 10: Rollback Strategy​

Quick Rollback​

Deployment Best Practices​

Success Criteria​

Troubleshooting Production Issues​

Monitoring Dashboard​

What You've Accomplished​

Next Steps​