Observability & Monitoring

Health & Readiness

GET /health — basic liveness.
GET /ready — readiness with DB check.
GET /ops — agent ops manifest + runbooks (requires token).

Metrics

GET /metrics exposes Prometheus-compatible counters and histograms.

If METRICS_TOKEN is set, provide:

Authorization: Bearer <METRICS_TOKEN>

Ops Endpoint

If OPS_TOKEN is set, provide:

Authorization: Bearer <OPS_TOKEN>

POST /ops/alert accepts a generic payload or an Alertmanager payload. For Alertmanager, send the default JSON body. Generic payload supports runbookTag for auto-tagging ops tasks.

GET /ops/features returns current feature flags and the allowlist. POST /ops/features toggles an allowed flag.

Alerts & SLOs

Sample Prometheus alert rules live in:

deploy/alerts/prometheus.rules.yml

SLO definitions live in:

docs/operations/slo.md

Prometheus Scrape Example

Add a scrape job similar to:

1 scrape_configs:
2   - job_name: recursiv-api
3     metrics_path: /metrics
4     scheme: https
5     bearer_token: ${METRICS_TOKEN}
6     static_configs:
7       - targets:
8           - api.recursiv.io

Logs

API logs are structured JSON with request IDs.

Fields:

requestId
method
path
status
durationMs

Use X-Request-Id / X-Trace-Id to correlate client-side issues.