Skip to main content

Deploy RecoAgent to Production

Difficulty: ⭐⭐⭐ Advanced | Time: 3 hours

🎯 The Problem

Your RecoAgent prototype works great locally, but you need to deploy it to production. You're worried about:

  • Container configuration and orchestration
  • Scaling to handle real traffic
  • Monitoring and logging
  • Zero-downtime updates
  • Security and secrets management

This guide solves: Deploying RecoAgent to production with Docker and Kubernetes, including monitoring, scaling, and best practices for reliability.

⚡ TL;DR - Quick Deploy

# 1. Build Docker image
docker build -t recoagent-api:latest -f infra/Dockerfile.api .

# 2. Run with docker-compose (includes OpenSearch, Redis)
docker-compose -f infra/docker-compose.yml up -d

# 3. Verify it's running
curl http://localhost:8000/health
# Should return: {"status": "healthy"}

# 4. Deploy to K8s (if using Kubernetes)
kubectl apply -f infra/k8s/
kubectl get pods -l app=recoagent

# Expected: All pods running, health check passing

Result: Production API running with monitoring and auto-scaling!


Full Deployment Guide

Architecture Overview

Prerequisites

  • Docker and Docker Compose installed
  • Kubernetes cluster (for K8s deployment)
  • kubectl configured
  • Domain name and SSL certificates
  • Cloud provider account (AWS/Azure/GCP)
  • API keys for LLM providers

Step 1: Containerize Your Application

Create Dockerfile

The example Dockerfile is at infra/Dockerfile.api. Key features:

FROM python:3.11-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY packages/ packages/
COPY apps/api/ apps/api/

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s \
CMD curl -f http://localhost:8000/health || exit 1

# Run with gunicorn for production
CMD ["gunicorn", "apps.api.main:app", "-w", "4", "-k", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:8000"]

Build and Test Locally

# Build image
docker build -t recoagent-api:latest -f infra/Dockerfile.api .

# Test locally
docker run -p 8000:8000 \
-e OPENAI_API_KEY="your-key" \
-e OPENSEARCH_ENDPOINT="http://host.docker.internal:9200" \
recoagent-api:latest

# Test health endpoint
curl http://localhost:8000/health
# Should return: {"status": "healthy", "version": "1.0.0"}

# Test query endpoint
curl -X POST http://localhost:8000/api/query \
-H "Content-Type: application/json" \
-d '{"query": "What is RecoAgent?"}'

Step 2: Docker Compose for Local Staging

Complete Stack Setup

The infra/docker-compose.yml includes:

version: '3.8'

services:
api:
build:
context: .
dockerfile: infra/Dockerfile.api
ports:
- "8000:8000"
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- OPENSEARCH_ENDPOINT=http://opensearch:9200
- REDIS_URL=redis://redis:6379
depends_on:
- opensearch
- redis
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3

opensearch:
image: opensearchproject/opensearch:2.11.0
environment:
- discovery.type=single-node
- DISABLE_SECURITY_PLUGIN=true
ports:
- "9200:9200"
volumes:
- opensearch-data:/usr/share/opensearch/data

redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
- redis-data:/data

prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml

grafana:
image: grafana/grafana:latest
ports:
- "3001:3000"
volumes:
- grafana-data:/var/lib/grafana

volumes:
opensearch-data:
redis-data:
grafana-data:

Deploy with Docker Compose

# Start all services
docker-compose -f infra/docker-compose.yml up -d

# Check status
docker-compose ps

# View logs
docker-compose logs -f api

# Test the stack
curl http://localhost:8000/health
curl http://localhost:9090 # Prometheus
curl http://localhost:3001 # Grafana

Step 3: Kubernetes Deployment

Deploy to K8s

# Apply all manifests
kubectl apply -f infra/k8s/namespace.yaml
kubectl apply -f infra/k8s/secrets.yaml
kubectl apply -f infra/k8s/configmap.yaml
kubectl apply -f infra/k8s/deployment.yaml
kubectl apply -f infra/k8s/service.yaml
kubectl apply -f infra/k8s/ingress.yaml
kubectl apply -f infra/k8s/hpa.yaml # Horizontal Pod Autoscaler

# Check deployment
kubectl get pods -n recoagent
kubectl get svc -n recoagent
kubectl get hpa -n recoagent

# Check logs
kubectl logs -f deployment/recoagent-api -n recoagent

Key K8s Resources

Deployment with Auto-Scaling:

apiVersion: apps/v1
kind: Deployment
metadata:
name: recoagent-api
namespace: recoagent
spec:
replicas: 3
selector:
matchLabels:
app: recoagent-api
template:
spec:
containers:
- name: api
image: recoagent-api:latest
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "2000m"
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 5
periodSeconds: 5

Horizontal Pod Autoscaler:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: recoagent-api-hpa
namespace: recoagent
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: recoagent-api
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80

Step 4: Configuration & Secrets

Environment Variables

Create .env.production:

# LLM Configuration
OPENAI_API_KEY=sk-your-key-here
ANTHROPIC_API_KEY=sk-ant-your-key

# Vector Store
OPENSEARCH_ENDPOINT=https://your-cluster.amazonaws.com
OPENSEARCH_USERNAME=admin
OPENSEARCH_PASSWORD=your-secure-password

# Caching
REDIS_URL=redis://redis-cluster:6379
REDIS_PASSWORD=your-redis-password

# Observability
LANGSMITH_API_KEY=your-langsmith-key
LANGSMITH_PROJECT=production

# Application
LOG_LEVEL=INFO
MAX_WORKERS=4
REQUEST_TIMEOUT=60
COST_LIMIT=0.10

Kubernetes Secrets

# Create secrets from .env file
kubectl create secret generic recoagent-secrets \
--from-env-file=.env.production \
-n recoagent

# Or create individual secrets
kubectl create secret generic openai-key \
--from-literal=apikey='your-key' \
-n recoagent

Step 5: Monitoring & Logging

Prometheus Metrics

# Metrics are automatically exposed at /metrics endpoint
# Configure Prometheus to scrape:

# prometheus.yml
scrape_configs:
- job_name: 'recoagent'
static_configs:
- targets: ['recoagent-api:8000']
metrics_path: '/metrics'
scrape_interval: 15s

Grafana Dashboards

Key metrics to track:

MetricAlert ThresholdAction
Request Latency (p95)> 2sScale up or optimize
Error Rate> 5%Check logs, rollback if needed
Cost per Query> $0.10Review cost optimization
Cache Hit Rate< 30%Check cache TTL settings
Vector Search Latency> 200msCheck OpenSearch health

Logging Configuration

# apps/api/main.py
import logging
from packages.observability import StructuredLogger

logger = StructuredLogger(
service_name="recoagent-api",
environment="production",
log_level="INFO"
)

# Logs include:
# - Request ID for tracing
# - User ID for analytics
# - Latency, cost, model used
# - Retrieved documents count
# - Error details if any

Step 6: Scaling & Performance

Horizontal Pod Autoscaler Settings

# Scale based on:
# - CPU: Target 70% utilization
# - Memory: Target 80% utilization
# - Custom: Requests per second

# Scaling behavior:
# - Scale up: When CPU > 70% for 30 seconds
# - Scale down: When CPU < 50% for 5 minutes
# - Min replicas: 3 (for high availability)
# - Max replicas: 10 (cost control)

Performance Tuning

SettingDevelopmentProductionWhy
Workers14-8Handle concurrent requests
Timeout120s60sPrevent hanging requests
Cache TTL300s3600sBalance freshness vs cost
Max tokens20001500Control costs
Retry attempts13Handle transient errors

Step 7: Security Checklist

  • ✅ API keys stored in secrets (not env vars)
  • ✅ TLS/SSL enabled on all endpoints
  • ✅ Rate limiting configured (100 req/min per user)
  • ✅ Input sanitization enabled
  • ✅ Output filtering for PII
  • ✅ Network policies restrict pod communication
  • ✅ RBAC configured for K8s access
  • ✅ Security scanning in CI/CD pipeline

Step 8: CI/CD Pipeline

GitHub Actions Example

name: Deploy to Production

on:
push:
branches: [main]

jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3

- name: Run Tests
run: |
pip install -r requirements.txt
pytest tests/ --cov=packages

- name: Build Docker Image
run: |
docker build -t recoagent-api:${{ github.sha }} .
docker tag recoagent-api:${{ github.sha }} recoagent-api:latest

- name: Push to Registry
run: |
echo "${{ secrets.DOCKER_PASSWORD }}" | docker login -u "${{ secrets.DOCKER_USERNAME }}" --password-stdin
docker push recoagent-api:latest

- name: Deploy to K8s
run: |
kubectl set image deployment/recoagent-api api=recoagent-api:${{ github.sha }} -n recoagent
kubectl rollout status deployment/recoagent-api -n recoagent

Step 9: Health Checks & Readiness

Health Check Endpoint

# apps/api/main.py
@app.get("/health")
async def health_check():
"""Comprehensive health check"""
checks = {
"api": "healthy",
"opensearch": await check_opensearch(),
"redis": await check_redis(),
"llm": await check_llm_api(),
}

all_healthy = all(v == "healthy" for v in checks.values())
status_code = 200 if all_healthy else 503

return JSONResponse(
content={
"status": "healthy" if all_healthy else "degraded",
"checks": checks,
"version": "1.0.0",
"timestamp": datetime.utcnow().isoformat()
},
status_code=status_code
)

Step 10: Rollback Strategy

Quick Rollback

# If deployment fails or has issues:

# Option 1: Rollback in K8s
kubectl rollout undo deployment/recoagent-api -n recoagent

# Option 2: Redeploy previous version
kubectl set image deployment/recoagent-api api=recoagent-api:previous-sha -n recoagent

# Option 3: Scale down bad deployment
kubectl scale deployment/recoagent-api --replicas=0 -n recoagent
# Then fix and redeploy

Deployment Best Practices

PracticeWhyHow
Blue-Green DeploymentZero downtimeDeploy to "green", switch traffic, keep "blue" for rollback
Canary ReleasesTest with small trafficRoute 5% to new version, monitor, then 100%
Database MigrationsBackward compatibilityRun migrations before code deploy, test rollback
Feature FlagsSafe rolloutEnable new features gradually per user segment

Success Criteria

After deployment, verify:

  • ✅ All pods are running and ready
  • ✅ Health check returns 200 OK
  • ✅ Can query the API successfully
  • ✅ Metrics appearing in Prometheus
  • ✅ Logs flowing to your logging system
  • ✅ Auto-scaling works (test with load)
  • ✅ SSL certificate valid
  • ✅ Rate limiting enforced

Troubleshooting Production Issues

IssueSymptomsSolution
Pods CrashLoopingPod restarts repeatedlyCheck logs: kubectl logs pod-name
Common: Missing secrets, OOM
503 ErrorsService unavailableCheck health endpoint, verify backends are up
High LatencySlow responsesCheck OpenSearch performance, add caching, scale up
Out of MemoryPods killedIncrease memory limits, reduce context size
ImagePullBackOffCan't pull imageCheck registry credentials, image name

Monitoring Dashboard

Key metrics to display:

Production Dashboard (Grafana):

┌─────────────────────────────────────────────┐
│ Requests/min: 850 (▲ 12% from avg) │
│ Avg Latency: 1.2s (target: <2s) ✅ │
│ Error Rate: 0.8% (target: <5%) ✅ │
│ Cost/hour: $12 (budget: $15) ✅ │
└─────────────────────────────────────────────┘

┌─── Pods ───────────────────────────────────┐
│ Running: 5/5 ✅ │
│ CPU: 65% ✅ (target: <80%) │
│ Memory: 72% ✅ (target: <85%) │
│ Restarts: 0 ✅ │
└─────────────────────────────────────────────┘

┌─── Dependencies ───────────────────────────┐
│ OpenSearch: ✅ Healthy (45ms avg) │
│ Redis: ✅ Healthy (2ms avg) │
│ OpenAI API: ✅ Healthy │
└─────────────────────────────────────────────┘

What You've Accomplished

Containerized your RecoAgent application
Deployed with Docker Compose for local staging
Orchestrated with Kubernetes for production
Configured auto-scaling based on load
Set up comprehensive monitoring and logging
Implemented health checks and rollback strategy
Secured with secrets management and network policies

Next Steps