Skip to main content

Observability

The gateway ships with three observability layers: Prometheus metrics, structured log output, and a deep health endpoint.

Prometheus metrics

Metrics are exposed at GET /metrics in the standard Prometheus text format. Scrape this endpoint with your Prometheus server.

Available metrics

MetricTypeLabelsDescription
ferrogw_requests_totalCounterprovider, model, statusTotal requests processed
ferrogw_errors_totalCounterprovider, model, error_typeTotal errors by type
ferrogw_request_duration_secondsHistogramprovider, modelEnd-to-end request latency
ferrogw_provider_latency_secondsHistogramproviderProvider round-trip latency
ferrogw_tokens_totalCounterprovider, model, typeToken usage (type: prompt|completion)
ferrogw_cache_hits_totalCounterResponse cache hits
ferrogw_cache_misses_totalCounterResponse cache misses
ferrogw_circuit_breaker_stateGaugeproviderCircuit breaker state (0=closed, 1=open, 2=half-open)
ferrogw_plugin_blocks_totalCounterplugin, reasonRequests blocked by a plugin

Example Prometheus scrape config

scrape_configs:
- job_name: ferrogw
static_configs:
- targets: ["localhost:8080"]
metrics_path: /metrics
scrape_interval: 15s

Useful PromQL queries

# Request rate by provider
rate(ferrogw_requests_total[5m])

# P99 request latency
histogram_quantile(0.99, rate(ferrogw_request_duration_seconds_bucket[5m]))

# Error rate percentage
rate(ferrogw_errors_total[5m]) / rate(ferrogw_requests_total[5m]) * 100

# Token usage per minute
rate(ferrogw_tokens_total[1m]) * 60

# Cache hit ratio
rate(ferrogw_cache_hits_total[5m]) /
(rate(ferrogw_cache_hits_total[5m]) + rate(ferrogw_cache_misses_total[5m]))

# Open circuit breakers
ferrogw_circuit_breaker_state == 1

Structured JSON logs

The gateway writes structured JSON to stdout. Each log line includes:

FieldDescription
timeISO 8601 timestamp
leveldebug, info, warn, error
trace_idPer-request UUID for log correlation
msgLog message
providerProvider name (on request/response lines)
modelModel ID
latency_msProvider round-trip latency in milliseconds
statusHTTP status code
tokens_promptPrompt token count
tokens_completionCompletion token count

Example log line:

{
"time": "2026-03-11T10:23:45Z",
"level": "info",
"trace_id": "a3f9b1c2-d4e5-4678-8901-abcdef012345",
"msg": "request complete",
"provider": "anthropic",
"model": "claude-3-5-sonnet-20241022",
"latency_ms": 412,
"status": 200,
"tokens_prompt": 312,
"tokens_completion": 87
}

Log level

Set the log level with LOG_LEVEL (default: info). Use LOG_FORMAT=text for human-readable output during development.

export LOG_LEVEL=debug
export LOG_FORMAT=text

Health endpoint

GET /health returns a deep health check with per-provider availability and latency:

{
"status": "ok",
"providers": {
"openai": { "healthy": true, "latency_ms": 245 },
"anthropic": { "healthy": true, "latency_ms": 312 },
"groq": { "healthy": false, "error": "connection refused" }
}
}

Returns 200 OK if at least one provider is healthy, 503 Service Unavailable if all providers are down.

For monitoring in production, see Monitoring.