Checklist before going live
Rotate the setup token
Change
ENGRAM_SETUP_TOKEN to a random secret before exposing the server to any network. This token creates tenants — treat it like a root password.Use managed Postgres
Replace the Docker Compose Postgres with a managed database that has automated backups and point-in-time recovery. Recommended: Neon, Supabase, or AWS RDS with pgvector enabled.
Set LLM_PROVIDER=none for high throughput
If you don’t need LLM-based contradiction detection, set
LLM_PROVIDER=none. This eliminates all external API calls on the POST /v1/memories hot path and reduces P99 latency from ~3s to under 150ms.Restrict network access
Engram’s API should not be directly internet-accessible in most deployments. Put it behind a reverse proxy (Nginx, Caddy, or a cloud load balancer) and restrict the setup endpoint:
Connection pooling
Engram opens one connection per request by default. At high QPS, add PgBouncer in front of Postgres:DATABASE_URL at PgBouncer, not Postgres directly.
Reverse proxy with TLS
Caddy (simplest)
Nginx
Environment variables (production)
Health checks
Use the health endpoint for load balancer probes:- Interval: 10s
- Timeout: 5s
- Unhealthy threshold: 3 consecutive failures
Monitoring
Engram exposes a JSON metrics endpoint:| Metric | Alert threshold |
|---|---|
recall_latency_p95 | > 500ms |
memory_store_errors | > 1% error rate |
database_connections | > 80% of pool |
decay_job_last_run | > 2 hours ago |
/metrics endpoint in Prometheus exposition format is on the roadmap (P4-5).
Backup and recovery
Neon / Supabase
Both support point-in-time recovery (PITR) out of the box. No additional configuration needed.Self-managed Postgres
Set uppg_dump on a cron:
Scaling
Engram is stateless on the HTTP path — the only shared state is Postgres. To scale horizontally:- Run multiple server instances behind a load balancer
- Ensure all instances point to the same
DATABASE_URL - Background workers (decay, consolidation) use Postgres advisory locks to prevent duplicate runs — only one instance runs them at a time, regardless of replica count