Production API
Location: apps/api/
Tech Stack: FastAPI, OpenAPI, Docker
Purpose: Production-grade RESTful API for deploying RecoAgent at scale
๐ฏ Overviewโ
The Production API is a FastAPI-based REST API designed for enterprise deployments. It provides secure, scalable access to RecoAgent with built-in authentication, rate limiting, and comprehensive observability.
Use this when:
- Deploying to production environments
- Need RESTful endpoints for integration
- Require authentication and access control
- Want OpenAPI/Swagger documentation
- Need to scale horizontally
โก Quick Startโ
# Navigate to API directory
cd apps/api
# Install dependencies
pip install -r requirements.txt
# Set environment variables
export OPENAI_API_KEY="your-key"
export DATABASE_URL="postgresql://..."
# Run server
uvicorn main:app --reload
# Open API docs: http://localhost:8000/docs
๐ก Key Endpointsโ
Query Agentโ
POST /api/v1/query
Content-Type: application/json
Authorization: Bearer <token>
{
"question": "What is hybrid search?",
"session_id": "user123",
"include_sources": true
}
Response:
{
"answer": "Hybrid search combines...",
"sources": [...],
"confidence": 0.92,
"latency_ms": 850,
"cost": 0.012
}
Health Checkโ
GET /health
# Response
{
"status": "healthy",
"version": "1.0.0",
"uptime_seconds": 12345
}
Metricsโ
GET /metrics
# Returns Prometheus-format metrics
๐ Authenticationโ
The API supports multiple authentication methods:
API Key Authenticationโ
curl -H "X-API-Key: your-api-key" \
http://localhost:8000/api/v1/query
JWT Bearer Tokensโ
# Get token
POST /auth/token
{
"username": "user",
"password": "pass"
}
# Use token
curl -H "Authorization: Bearer <token>" \
http://localhost:8000/api/v1/query
โ๏ธ Configurationโ
Create .env
file:
# LLM Configuration
OPENAI_API_KEY=sk-...
LLM_MODEL=gpt-4
EMBEDDING_MODEL=text-embedding-ada-002
# API Configuration
API_HOST=0.0.0.0
API_PORT=8000
API_WORKERS=4
# Database
DATABASE_URL=postgresql://user:pass@localhost/recoagent
# Redis (for caching)
REDIS_URL=redis://localhost:6379
# Security
SECRET_KEY=your-secret-key-here
JWT_ALGORITHM=HS256
ACCESS_TOKEN_EXPIRE_MINUTES=30
# Rate Limiting
RATE_LIMIT_PER_MINUTE=100
RATE_LIMIT_PER_HOUR=1000
# Monitoring
LANGSMITH_API_KEY=your-langsmith-key
ENABLE_METRICS=true
๐ Production Deploymentโ
Docker Deploymentโ
# Dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
# Build and run
docker build -t recoagent-api .
docker run -p 8000:8000 --env-file .env recoagent-api
Kubernetes Deploymentโ
apiVersion: apps/v1
kind: Deployment
metadata:
name: recoagent-api
spec:
replicas: 3
selector:
matchLabels:
app: recoagent-api
template:
metadata:
labels:
app: recoagent-api
spec:
containers:
- name: api
image: recoagent-api:latest
ports:
- containerPort: 8000
env:
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: recoagent-secrets
key: openai-api-key
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "2000m"
๐ Monitoring & Observabilityโ
Prometheus Metricsโ
# Available metrics
recoagent_requests_total
recoagent_request_duration_seconds
recoagent_llm_calls_total
recoagent_llm_cost_total
recoagent_errors_total
LangSmith Tracingโ
All requests are automatically traced in LangSmith for debugging.
Structured Loggingโ
{
"timestamp": "2024-10-09T10:30:00Z",
"level": "INFO",
"request_id": "abc123",
"endpoint": "/api/v1/query",
"latency_ms": 850,
"cost": 0.012,
"user_id": "user123"
}
๐ก๏ธ Security Featuresโ
Feature | Status | Description |
---|---|---|
Authentication | โ | API keys + JWT tokens |
Rate Limiting | โ | Per-user and global limits |
Input Validation | โ | Pydantic models |
SQL Injection Prevention | โ | Parameterized queries |
CORS | โ | Configurable origins |
HTTPS | โ | TLS/SSL support |
Secrets Management | โ | Environment variables |
Audit Logging | โ | All requests logged |
๐ฅ Performance Tipsโ
1. Enable Cachingโ
# Redis caching for responses
ENABLE_REDIS_CACHE=true
CACHE_TTL_SECONDS=3600
2. Optimize Workersโ
# Adjust worker count based on load
uvicorn main:app --workers 4 --host 0.0.0.0
3. Use Connection Poolingโ
# PostgreSQL connection pool
DATABASE_POOL_SIZE=20
DATABASE_MAX_OVERFLOW=10
๐ API Documentationโ
OpenAPI Docsโ
Visit http://localhost:8000/docs
for interactive Swagger UI.
ReDocโ
Visit http://localhost:8000/redoc
for alternative documentation.
๐งช Testingโ
# Run tests
pytest tests/
# Load testing
locust -f locustfile.py --host=http://localhost:8000
๐ Troubleshootingโ
Issue | Solution |
---|---|
API not starting | Check port 8000 is free: lsof -i :8000 |
401 Unauthorized | Verify API key or JWT token |
429 Rate Limited | Wait or increase rate limits |
Slow responses | Check LLM API latency, enable caching |
Memory errors | Increase worker memory limits |
๐ Related Docsโ
- Deployment Guide
- Authentication Guide
- Caching Guide
- Worker App - For async processing
Ready to deploy? Check out the Deployment Guide! ๐