Skip to main content

Observability

What to monitor, how to check health, and what the logs tell you.

Health check

The container runs a built-in health check every 10 seconds:

pgagroal-cli -c /etc/pgagroal/pgagroal.conf ping

Exit code 0 means the pooler daemon is running and accepting management commands. This is what Docker and Kubernetes use to determine container health.

What the health check does not tell you

The ping command checks the pgagroal process, not the PostgreSQL backend. A healthy pooler with an unreachable backend will still pass the health check. This is by design — the pooler should stay running so it can recover when the backend returns. To verify end-to-end connectivity, run an actual query through the pooler.

Logging

The container logs to stdout (console mode). All output goes to the container's log stream, which Docker and Kubernetes collect automatically.

# View logs
docker logs pgagroal

# Follow logs
docker logs -f pgagroal

# Kubernetes
kubectl logs -f deployment/pgagroal -n pgagroal

Log levels

LevelWhen to use
fatalOnly unrecoverable errors. Very quiet.
errorErrors that affect connections but the daemon continues.
warnRecommended for production. Catches problems without noise.
infoDefault. Includes startup messages and connection events.
debug1–debug5Increasing verbosity. Use temporarily for troubleshooting.
traceWire-level detail. Extremely verbose. Never in production.

Set via PGAGROAL_LOG_LEVEL=warn. You can change the level and restart the container without losing connections — clients reconnect through the pool.

What to monitor

Even without a metrics endpoint, you can assess pool health by watching a few key indicators:

SignalHealthyProblem
Connection errors in app logsNone"connection refused" or timeouts suggest pool exhaustion
Client connection latency< 1 ms to acquireConsistently > 5 ms means clients are waiting for a free connection
Backend connection countWell below MAX_CONNECTIONSAt or near limit means pool is saturated
Container restartsZeroFrequent restarts indicate OOM or liveness probe failures
pgagroal-cli pingExit code 0Non-zero means daemon is down or unresponsive

Prometheus metrics

pgagroal can expose a Prometheus-compatible metrics endpoint. In the Helm chart, enable it with metrics.enabled: true and metrics.port: 9187.

# Prometheus scrape config
scrape_configs:
  - job_name: pgagroal
    static_configs:
      - targets: ["pgagroal.pgagroal.svc.cluster.local:9187"]
    scrape_interval: 15s

The metrics endpoint is disabled by default. Enable it only when you have a Prometheus instance ready to scrape it.

Healthy vs unhealthy

Healthy pool

  • -- pgagroal-cli ping returns 0
  • -- Connections acquire in < 1 ms
  • -- Backend connection count is stable and below limit
  • -- No authentication errors in logs
  • -- Zero container restarts

Unhealthy pool

  • -- Applications report "connection refused" or timeouts
  • -- Backend connection count is at MAX_CONNECTIONS
  • -- Logs show repeated connection errors or auth failures
  • -- Container is restarting (check liveness probe logs)
  • -- Queries succeed but with unusually high latency

See also: Troubleshooting for diagnosing unhealthy pools.

Run pgagroal

docker pull elevarq/pgagroal:1.0.0