Observability & Monitoring
Observability & Monitoring
Health checks, metrics, logs, and alerts
Health & Readiness
GET /health— basic liveness.GET /ready— readiness with DB check.GET /ops— agent ops manifest + runbooks (requires token).
Metrics
GET /metrics exposes Prometheus-compatible counters and histograms.
If METRICS_TOKEN is set, provide:
Ops Endpoint
If OPS_TOKEN is set, provide:
POST /ops/alert accepts a generic payload or an Alertmanager payload. For Alertmanager, send the default JSON body.
Generic payload supports runbookTag for auto-tagging ops tasks.
GET /ops/features returns current feature flags and the allowlist.
POST /ops/features toggles an allowed flag.
Alerts & SLOs
Sample Prometheus alert rules live in:
deploy/alerts/prometheus.rules.yml
SLO definitions live in:
docs/operations/slo.md
Prometheus Scrape Example
Add a scrape job similar to:
Logs
API logs are structured JSON with request IDs.
Fields:
requestIdmethodpathstatusdurationMs
Use X-Request-Id / X-Trace-Id to correlate client-side issues.