Deployment Guide¶
This guide covers deploying CalcBridge to production environments. CalcBridge is designed for containerized deployments with Docker and can be orchestrated with Docker Compose or Kubernetes.
Prerequisites¶
System Requirements¶
| Component | Minimum | Recommended |
|---|---|---|
| CPU | 4 cores | 8+ cores |
| RAM | 8 GB | 16+ GB |
| Disk | 50 GB SSD | 200+ GB SSD |
| Network | 1 Gbps | 10 Gbps |
Software Requirements¶
| Software | Version | Purpose |
|---|---|---|
| Docker | 24.0+ | Container runtime |
| Docker Compose | 2.20+ | Multi-container orchestration |
| PostgreSQL | 16+ | Primary database |
| Valkey/Redis | 8+ | Cache and message broker |
Environment Variables¶
Required Production Variables¶
Security Critical
These values MUST be changed from defaults before production deployment. Use strong, unique values generated with:
# Core Security (REQUIRED - Must change from defaults)
JWT_SECRET_KEY=<secure-random-key-32-chars-minimum>
ENCRYPTION_MASTER_KEY=<secure-random-key-32-chars-minimum>
# Environment
ENVIRONMENT=production
DEBUG=false
LOG_LEVEL=INFO
# Database
DATABASE_URL=postgresql+psycopg://user:password@db-host:5432/calcbridge
DATABASE_POOL_SIZE=20
DATABASE_MAX_OVERFLOW=10
DATABASE_SSL_MODE=verify-full
# Cache
VALKEY_HOST=cache-host
VALKEY_PORT=6379
VALKEY_PASSWORD=<secure-cache-password>
VALKEY_SSL=true
# API
API_PREFIX=/api/v1
CORS_ORIGINS=["https://app.yourdomain.com"]
Complete Environment Reference¶
| Variable | Required | Default | Description |
|---|---|---|---|
| Security | |||
JWT_SECRET_KEY | Yes | (insecure) | JWT signing key (32+ chars) |
ENCRYPTION_MASTER_KEY | Yes | (insecure) | PII encryption key (32+ chars) |
JWT_ACCESS_TOKEN_EXPIRE_MINUTES | No | 30 | Access token lifetime |
JWT_REFRESH_TOKEN_EXPIRE_DAYS | No | 7 | Refresh token lifetime |
| Environment | |||
ENVIRONMENT | Yes | development | development, staging, production |
DEBUG | No | false | Enable debug mode |
LOG_LEVEL | No | INFO | Logging level |
| Database | |||
DATABASE_URL | Yes | (local) | PostgreSQL connection URL |
DATABASE_POOL_SIZE | No | 20 | Connection pool size |
DATABASE_MAX_OVERFLOW | No | 10 | Max overflow connections |
DATABASE_SSL_MODE | No | prefer | SSL mode (verify-full for prod) |
DATABASE_APP_ROLE | No | calcbridge_app | Database role for RLS |
| Cache | |||
VALKEY_HOST | Yes | localhost | Valkey/Redis host |
VALKEY_PORT | No | 6379 | Valkey/Redis port |
VALKEY_PASSWORD | No | (none) | Valkey/Redis password |
VALKEY_SSL | No | false | Enable SSL (true for prod) |
| Rate Limiting | |||
RATE_LIMIT_ENABLED | No | true | Enable rate limiting |
RATE_LIMIT_TIER_FREE | No | 100 | Free tier requests/min |
RATE_LIMIT_TIER_ENTERPRISE | No | 10000 | Enterprise requests/min |
| File Storage | |||
STORAGE_BACKEND | No | local | local or s3 |
STORAGE_S3_BUCKET | If S3 | - | S3 bucket name |
STORAGE_S3_REGION | No | us-east-1 | AWS region |
| Observability | |||
OTEL_ENABLED | No | false | Enable OpenTelemetry |
OTEL_ENDPOINT | If OTEL | - | OTLP collector endpoint |
SENTRY_DSN | No | - | Sentry error tracking DSN |
Docker Deployment¶
Building the Image¶
# Build production image
docker build -t calcbridge:latest -f Dockerfile --target production .
# Build with specific version tag
docker build -t calcbridge:v1.0.0 -f Dockerfile --target production .
Docker Compose Production¶
Create a production compose file:
docker-compose.prod.yml
version: "3.8"
services:
api:
image: calcbridge:latest
restart: unless-stopped
environment:
- ENVIRONMENT=production
- DEBUG=false
- DATABASE_URL=postgresql+psycopg://calcbridge:${DB_PASSWORD}@postgres:5432/calcbridge
- DATABASE_SSL_MODE=verify-full
- VALKEY_HOST=valkey
- VALKEY_SSL=true
- JWT_SECRET_KEY=${JWT_SECRET_KEY}
- ENCRYPTION_MASTER_KEY=${ENCRYPTION_MASTER_KEY}
ports:
- "8000:8000"
depends_on:
postgres:
condition: service_healthy
valkey:
condition: service_healthy
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
deploy:
replicas: 2
resources:
limits:
cpus: "2"
memory: 2G
reservations:
cpus: "0.5"
memory: 512M
celery-worker:
image: calcbridge:latest
restart: unless-stopped
command: >
celery -A src.workers.celery_app worker
--loglevel=INFO
--concurrency=4
--queues=default,parse,export
environment:
- DATABASE_URL=postgresql+psycopg://calcbridge:${DB_PASSWORD}@postgres:5432/calcbridge
- VALKEY_HOST=valkey
depends_on:
postgres:
condition: service_healthy
valkey:
condition: service_healthy
deploy:
replicas: 2
resources:
limits:
cpus: "4"
memory: 4G
postgres:
image: postgres:16-alpine
restart: unless-stopped
environment:
POSTGRES_USER: calcbridge
POSTGRES_PASSWORD: ${DB_PASSWORD}
POSTGRES_DB: calcbridge
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U calcbridge"]
interval: 10s
timeout: 5s
retries: 5
valkey:
image: valkey/valkey:8-alpine
restart: unless-stopped
command: >
valkey-server
--appendonly yes
--maxmemory 1gb
--maxmemory-policy allkeys-lru
--requirepass ${VALKEY_PASSWORD}
volumes:
- valkey_data:/data
healthcheck:
test: ["CMD", "valkey-cli", "-a", "${VALKEY_PASSWORD}", "ping"]
interval: 10s
timeout: 5s
retries: 5
volumes:
postgres_data:
valkey_data:
Deploy with Docker Compose¶
# Create production environment file
cat > .env.prod << EOF
DB_PASSWORD=$(python -c "import secrets; print(secrets.token_urlsafe(24))")
VALKEY_PASSWORD=$(python -c "import secrets; print(secrets.token_urlsafe(24))")
JWT_SECRET_KEY=$(python -c "import secrets; print(secrets.token_urlsafe(32))")
ENCRYPTION_MASTER_KEY=$(python -c "import secrets; print(secrets.token_urlsafe(32))")
EOF
# Deploy
docker compose -f docker-compose.prod.yml --env-file .env.prod up -d
# Verify deployment
docker compose -f docker-compose.prod.yml ps
docker compose -f docker-compose.prod.yml logs -f api
Database Migrations¶
Apply Migrations¶
Migrations must be applied before the first deployment and after each upgrade:
# Option 1: Apply via Docker
docker exec -i calcbridge-postgres psql -U calcbridge -d calcbridge < db/migrations/V001__initial_schema.sql
# Option 2: Apply all migrations
for migration in db/migrations/V*.sql; do
echo "Applying: $migration"
docker exec -i calcbridge-postgres psql -U calcbridge -d calcbridge < "$migration"
done
# Option 3: Use the migration script
./scripts/apply_migration_docker.sh db/migrations/V019__grant_app_role_metrics_mapping.sql
Migration Best Practices¶
Migration Safety
- Always backup the database before migrations
- Test migrations in staging first
- Use transactions for rollback capability
- Never modify already-applied migrations
# Backup before migration
docker exec calcbridge-postgres pg_dump -U calcbridge calcbridge > backup_$(date +%Y%m%d).sql
# Apply migration
docker exec -i calcbridge-postgres psql -U calcbridge -d calcbridge < db/migrations/V020__new_migration.sql
# Verify migration
docker exec calcbridge-postgres psql -U calcbridge -d calcbridge -c "\dt"
SSL/TLS Configuration¶
Database SSL¶
For production, use verify-full SSL mode:
# Environment variables
DATABASE_SSL_MODE=verify-full
DATABASE_SSL_CERT_PATH=/certs/client-cert.pem
DATABASE_SSL_KEY_PATH=/certs/client-key.pem
DATABASE_SSL_CA_PATH=/certs/ca-cert.pem
Reverse Proxy with TLS¶
Use nginx or Traefik as a reverse proxy with TLS termination:
nginx.conf
upstream calcbridge {
server api:8000;
}
server {
listen 443 ssl http2;
server_name api.yourdomain.com;
ssl_certificate /etc/nginx/ssl/fullchain.pem;
ssl_certificate_key /etc/nginx/ssl/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
ssl_prefer_server_ciphers off;
# Security headers
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-Frame-Options "DENY" always;
location / {
proxy_pass http://calcbridge;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# WebSocket support
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
Health Checks¶
CalcBridge provides multiple health check endpoints:
| Endpoint | Purpose | Response |
|---|---|---|
/health | Basic liveness | {"status": "healthy"} |
/health/live | Kubernetes liveness | {"status": "alive"} |
/health/ready | Kubernetes readiness | Checks DB + cache |
/health/detailed | Full system status | All component statuses |
Kubernetes Probes¶
livenessProbe:
httpGet:
path: /health/live
port: 8000
initialDelaySeconds: 10
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /health/ready
port: 8000
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
Monitoring Setup¶
Prometheus Metrics¶
CalcBridge exposes Prometheus metrics at /metrics:
# prometheus.yml scrape config
scrape_configs:
- job_name: "calcbridge"
static_configs:
- targets: ["api:8000"]
metrics_path: /metrics
scheme: http
Key Metrics¶
| Metric | Type | Description |
|---|---|---|
http_requests_total | Counter | Total HTTP requests |
http_request_duration_seconds | Histogram | Request latency |
celery_task_duration_seconds | Histogram | Task processing time |
db_connection_pool_size | Gauge | Database pool usage |
rate_limit_exceeded_total | Counter | Rate limit violations |
Grafana Dashboards¶
Import the provided dashboards from config/grafana/dashboards/:
- CalcBridge Overview: Request rates, latency, error rates
- Celery Workers: Task throughput, queue depths, worker status
- Database Performance: Connection pool, query latency
Alerting Configuration¶
Alertmanager Integration¶
alertmanager.yml
global:
resolve_timeout: 5m
route:
group_by: ['alertname', 'severity']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receiver: 'default'
receivers:
- name: 'default'
slack_configs:
- api_url: '${SLACK_WEBHOOK_URL}'
channel: '#alerts'
send_resolved: true
Alert Rules¶
alert_rules.yml
groups:
- name: calcbridge
rules:
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.1
for: 5m
labels:
severity: critical
annotations:
summary: High error rate detected
- alert: SlowResponses
expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 2
for: 10m
labels:
severity: warning
annotations:
summary: API response times are slow
Scaling Guidelines¶
Horizontal Scaling¶
| Component | Scaling Strategy | Notes |
|---|---|---|
| API | Stateless, scale freely | Use load balancer |
| Celery Workers | Scale by queue depth | Monitor memory usage |
| PostgreSQL | Read replicas | Consider PgBouncer |
| Valkey | Cluster mode | For high availability |
Celery Worker Scaling¶
# Scale workers based on workload
docker compose -f docker-compose.prod.yml up -d --scale celery-worker=4
# Or use separate workers for different queues
docker compose -f docker-compose.prod.yml up -d \
--scale celery-worker=2 \
--scale celery-worker-calc=4
Backup and Recovery¶
Database Backup¶
# Create backup
docker exec calcbridge-postgres pg_dump -U calcbridge calcbridge | gzip > backup_$(date +%Y%m%d_%H%M%S).sql.gz
# Automated daily backup (cron)
0 2 * * * docker exec calcbridge-postgres pg_dump -U calcbridge calcbridge | gzip > /backups/calcbridge_$(date +\%Y\%m\%d).sql.gz
# Restore from backup
gunzip -c backup_20250101.sql.gz | docker exec -i calcbridge-postgres psql -U calcbridge calcbridge
Valkey Backup¶
# Trigger RDB snapshot
docker exec calcbridge-valkey valkey-cli BGSAVE
# Copy backup file
docker cp calcbridge-valkey:/data/dump.rdb ./backups/valkey_$(date +%Y%m%d).rdb
Deployment Checklist¶
Before deploying to production:
- All environment variables set with secure values
-
JWT_SECRET_KEYandENCRYPTION_MASTER_KEYare unique, random, 32+ chars -
ENVIRONMENT=productionandDEBUG=false - Database SSL mode is
verify-full - Valkey SSL is enabled
- Database migrations applied
- Health checks configured and passing
- Monitoring and alerting configured
- Backup strategy implemented
- Load testing completed
- Security scan passed
- TLS certificates installed and valid
- CORS origins correctly configured
- Rate limiting enabled