Content Generation Service - Deployment Guide

Overview

This guide covers deploying the Personalized Content Generation service to production.

Prerequisites

Required

Python 3.9+
OpenAI API key
Redis (for caching)
PostgreSQL (for storage - optional)

Optional

Docker & Docker Compose
Kubernetes cluster
Prometheus + Grafana (monitoring)
Load balancer (for scaling)

Installation

1. Install Dependencies

# Clone repository
git clone [repository-url]
cd recoagent

# Install requirements
pip install -r requirements.txt

# Verify quality libraries installed
python -c "import textstat, language_tool_python, detoxify, yake; print('✅ All quality libraries installed')"

2. Configure Environment

# Copy example env file
cp env.example .env

# Edit .env with your configuration
nano .env

Required Environment Variables:

# Core
OPENAI_API_KEY=your-api-key-here
CONTENT_GEN_ENVIRONMENT=production

# Optional
CONTENT_GEN_MAX_CONCURRENT_GENERATIONS=10
CONTENT_GEN_ENABLE_QUALITY_SCORING=true
CONTENT_GEN_ENABLE_RATE_LIMITING=true

Running the Service

Development

# Start development server
uvicorn apps.api.content_generation_api:app --reload --port 8000

# Test health endpoint
curl http://localhost:8000/api/v1/content/health

Production

# Start with gunicorn (production WSGI server)
gunicorn apps.api.content_generation_api:app \
  --workers 4 \
  --worker-class uvicorn.workers.UvicornWorker \
  --bind 0.0.0.0:8000 \
  --timeout 60 \
  --log-level info

Docker Deployment

Dockerfile

FROM python:3.11-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Download language models
RUN python -c "import spacy; spacy.cli.download('en_core_web_sm')"

# Copy application
COPY . .

# Expose port
EXPOSE 8000

# Run application
CMD ["gunicorn", "apps.api.content_generation_api:app", \
     "--workers", "4", \
     "--worker-class", "uvicorn.workers.UvicornWorker", \
     "--bind", "0.0.0.0:8000"]

Docker Compose

version: '3.8'

services:
  content-generation:
    build: .
    ports:
      - "8000:8000"
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - CONTENT_GEN_ENVIRONMENT=production
      - REDIS_URL=redis://redis:6379
    depends_on:
      - redis
    restart: always
  
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    restart: always
  
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./config/prometheus.yml:/etc/prometheus/prometheus.yml
    restart: always
  
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    restart: always

Monitoring Setup

Prometheus Configuration

# config/prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'content-generation'
    static_configs:
      - targets: ['content-generation:8000']

Grafana Dashboards

Key Metrics to Monitor:

Requests per minute
Generation duration (p50, p95, p99)
Quality scores (avg by content type)
Error rate
Cost per hour
Token usage

Production Checklist

Pre-Launch

Launch

Post-Launch

Performance Optimization

Caching

# Enable Redis caching for templates
CONTENT_GEN_ENABLE_CACHING=true
CONTENT_GEN_CACHE_TTL_SECONDS=3600

Scaling

Horizontal Scaling:

Multiple worker processes (gunicorn --workers 4)
Load balancer distribution
Stateless design allows easy scaling

Vertical Scaling:

Increase worker count
Optimize memory per worker
Use faster GPUs (if self-hosting models)

Security

API Key Management

Store in environment variables
Use secret management (AWS Secrets Manager, HashiCorp Vault)
Rotate keys regularly

Rate Limiting

# Configure rate limiting
CONTENT_GEN_ENABLE_RATE_LIMITING=true
CONTENT_GEN_RATE_LIMIT_PER_MINUTE=60

Input Validation

All inputs validated by Pydantic
Content safety checked (detoxify)
Compliance validation enabled

Cost Management

Monitoring Costs

# Enable cost tracking
CONTENT_GEN_ENABLE_COST_TRACKING=true
CONTENT_GEN_MAX_COST_PER_REQUEST=0.10
CONTENT_GEN_DAILY_BUDGET_LIMIT=100.00

Cost Optimization

Cache results where appropriate
Optimize prompts for token efficiency
Use appropriate models (GPT-4o vs GPT-3.5)
Set max_tokens limits
Monitor daily spending

Troubleshooting

Common Issues

Issue: Generation timeout Solution: Increase generation_timeout_seconds or optimize prompts

Issue: High error rate Solution: Check OpenAI API status, verify API key

Issue: Poor quality scores Solution: Check quality libraries are installed and working

Issue: Memory usage high Solution: Reduce concurrent generations, enable caching

Maintenance

Regular Tasks

Daily:

Monitor error rates
Check cost spending
Review quality scores

Weekly:

Review performance metrics
Analyze usage patterns
Check for optimization opportunities

Monthly:

Update dependencies
Review and update templates
Analyze user feedback
Optimize based on data

Support

Health Check

curl http://your-domain.com/api/v1/content/health

Metrics Endpoint

curl http://your-domain.com/metrics

Logs

# View logs
docker logs content-generation

# Follow logs
docker logs -f content-generation

Deployment Guide Version: 1.0
Last Updated: October 9, 2025
Status: Production-Ready

Overview​

Prerequisites​

Required​

Optional​

Installation​

1. Install Dependencies​

2. Configure Environment​

Running the Service​

Development​

Production​

Docker Deployment​

Dockerfile​

Docker Compose​

Monitoring Setup​

Prometheus Configuration​

Grafana Dashboards​

Production Checklist​

Pre-Launch​

Launch​

Post-Launch​

Performance Optimization​

Caching​

Scaling​

Security​

API Key Management​

Rate Limiting​

Input Validation​

Cost Management​

Monitoring Costs​

Cost Optimization​

Troubleshooting​

Common Issues​

Maintenance​

Regular Tasks​

Support​

Health Check​

Metrics Endpoint​

Logs​