Skip to main content

Content Generation Service - Deployment Guide

Overview

This guide covers deploying the Personalized Content Generation service to production.


Prerequisites

Required

  • Python 3.9+
  • OpenAI API key
  • Redis (for caching)
  • PostgreSQL (for storage - optional)

Optional

  • Docker & Docker Compose
  • Kubernetes cluster
  • Prometheus + Grafana (monitoring)
  • Load balancer (for scaling)

Installation

1. Install Dependencies

# Clone repository
git clone [repository-url]
cd recoagent

# Install requirements
pip install -r requirements.txt

# Verify quality libraries installed
python -c "import textstat, language_tool_python, detoxify, yake; print('✅ All quality libraries installed')"

2. Configure Environment

# Copy example env file
cp env.example .env

# Edit .env with your configuration
nano .env

Required Environment Variables:

# Core
OPENAI_API_KEY=your-api-key-here
CONTENT_GEN_ENVIRONMENT=production

# Optional
CONTENT_GEN_MAX_CONCURRENT_GENERATIONS=10
CONTENT_GEN_ENABLE_QUALITY_SCORING=true
CONTENT_GEN_ENABLE_RATE_LIMITING=true

Running the Service

Development

# Start development server
uvicorn apps.api.content_generation_api:app --reload --port 8000

# Test health endpoint
curl http://localhost:8000/api/v1/content/health

Production

# Start with gunicorn (production WSGI server)
gunicorn apps.api.content_generation_api:app \
--workers 4 \
--worker-class uvicorn.workers.UvicornWorker \
--bind 0.0.0.0:8000 \
--timeout 60 \
--log-level info

Docker Deployment

Dockerfile

FROM python:3.11-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Download language models
RUN python -c "import spacy; spacy.cli.download('en_core_web_sm')"

# Copy application
COPY . .

# Expose port
EXPOSE 8000

# Run application
CMD ["gunicorn", "apps.api.content_generation_api:app", \
"--workers", "4", \
"--worker-class", "uvicorn.workers.UvicornWorker", \
"--bind", "0.0.0.0:8000"]

Docker Compose

version: '3.8'

services:
content-generation:
build: .
ports:
- "8000:8000"
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- CONTENT_GEN_ENVIRONMENT=production
- REDIS_URL=redis://redis:6379
depends_on:
- redis
restart: always

redis:
image: redis:7-alpine
ports:
- "6379:6379"
restart: always

prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./config/prometheus.yml:/etc/prometheus/prometheus.yml
restart: always

grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
restart: always

Monitoring Setup

Prometheus Configuration

# config/prometheus.yml
global:
scrape_interval: 15s

scrape_configs:
- job_name: 'content-generation'
static_configs:
- targets: ['content-generation:8000']

Grafana Dashboards

Key Metrics to Monitor:

  • Requests per minute
  • Generation duration (p50, p95, p99)
  • Quality scores (avg by content type)
  • Error rate
  • Cost per hour
  • Token usage

Production Checklist

Pre-Launch

  • All tests passing (pytest tests/)
  • Load testing completed
  • Configuration reviewed
  • API keys secured
  • Monitoring configured
  • Backup strategy defined
  • Incident response plan ready

Launch

  • Deploy to staging
  • Smoke tests on staging
  • Deploy to production
  • Health check passes
  • Monitoring dashboards active
  • Alert rules configured

Post-Launch

  • Monitor for 24 hours
  • Review error logs
  • Check performance metrics
  • Validate cost tracking
  • User feedback collection

Performance Optimization

Caching

# Enable Redis caching for templates
CONTENT_GEN_ENABLE_CACHING=true
CONTENT_GEN_CACHE_TTL_SECONDS=3600

Scaling

Horizontal Scaling:

  • Multiple worker processes (gunicorn --workers 4)
  • Load balancer distribution
  • Stateless design allows easy scaling

Vertical Scaling:

  • Increase worker count
  • Optimize memory per worker
  • Use faster GPUs (if self-hosting models)

Security

API Key Management

  • Store in environment variables
  • Use secret management (AWS Secrets Manager, HashiCorp Vault)
  • Rotate keys regularly

Rate Limiting

# Configure rate limiting
CONTENT_GEN_ENABLE_RATE_LIMITING=true
CONTENT_GEN_RATE_LIMIT_PER_MINUTE=60

Input Validation

  • All inputs validated by Pydantic
  • Content safety checked (detoxify)
  • Compliance validation enabled

Cost Management

Monitoring Costs

# Enable cost tracking
CONTENT_GEN_ENABLE_COST_TRACKING=true
CONTENT_GEN_MAX_COST_PER_REQUEST=0.10
CONTENT_GEN_DAILY_BUDGET_LIMIT=100.00

Cost Optimization

  1. Cache results where appropriate
  2. Optimize prompts for token efficiency
  3. Use appropriate models (GPT-4o vs GPT-3.5)
  4. Set max_tokens limits
  5. Monitor daily spending

Troubleshooting

Common Issues

Issue: Generation timeout Solution: Increase generation_timeout_seconds or optimize prompts

Issue: High error rate Solution: Check OpenAI API status, verify API key

Issue: Poor quality scores Solution: Check quality libraries are installed and working

Issue: Memory usage high Solution: Reduce concurrent generations, enable caching


Maintenance

Regular Tasks

Daily:

  • Monitor error rates
  • Check cost spending
  • Review quality scores

Weekly:

  • Review performance metrics
  • Analyze usage patterns
  • Check for optimization opportunities

Monthly:

  • Update dependencies
  • Review and update templates
  • Analyze user feedback
  • Optimize based on data

Support

Health Check

curl http://your-domain.com/api/v1/content/health

Metrics Endpoint

curl http://your-domain.com/metrics

Logs

# View logs
docker logs content-generation

# Follow logs
docker logs -f content-generation

Deployment Guide Version: 1.0
Last Updated: October 9, 2025
Status: Production-Ready