Observability & Monitoring

Health checks, metrics, logs, and alerts

Health & Readiness

  • GET /health — basic liveness.
  • GET /ready — readiness with DB check.
  • GET /ops — agent ops manifest + runbooks (requires token).

Metrics

GET /metrics exposes Prometheus-compatible counters and histograms.

If METRICS_TOKEN is set, provide:

Authorization: Bearer <METRICS_TOKEN>

Ops Endpoint

If OPS_TOKEN is set, provide:

Authorization: Bearer <OPS_TOKEN>

POST /ops/alert accepts a generic payload or an Alertmanager payload. For Alertmanager, send the default JSON body. Generic payload supports runbookTag for auto-tagging ops tasks.

GET /ops/features returns current feature flags and the allowlist. POST /ops/features toggles an allowed flag.

Alerts & SLOs

Sample Prometheus alert rules live in:

  • deploy/alerts/prometheus.rules.yml

SLO definitions live in:

  • docs/operations/slo.md

Prometheus Scrape Example

Add a scrape job similar to:

1scrape_configs:
2 - job_name: recursiv-api
3 metrics_path: /metrics
4 scheme: https
5 bearer_token: ${METRICS_TOKEN}
6 static_configs:
7 - targets:
8 - api.recursiv.io

Logs

API logs are structured JSON with request IDs.

Fields:

  • requestId
  • method
  • path
  • status
  • durationMs

Use X-Request-Id / X-Trace-Id to correlate client-side issues.