Observability

The gateway ships with three observability layers: Prometheus metrics, structured log output, and a deep health endpoint.

Prometheus metrics

Metrics are exposed at GET /metrics in the standard Prometheus text format. Scrape this endpoint with your Prometheus server.

Available metrics

Metric	Type	Labels	Description
`ferrogw_requests_total`	Counter	`provider`, `model`, `status`	Total requests processed
`ferrogw_errors_total`	Counter	`provider`, `model`, `error_type`	Total errors by type
`ferrogw_request_duration_seconds`	Histogram	`provider`, `model`	End-to-end request latency
`ferrogw_provider_latency_seconds`	Histogram	`provider`	Provider round-trip latency
`ferrogw_tokens_total`	Counter	`provider`, `model`, `type`	Token usage (`type`: `prompt`\|`completion`)
`ferrogw_cache_hits_total`	Counter		Response cache hits
`ferrogw_cache_misses_total`	Counter		Response cache misses
`ferrogw_circuit_breaker_state`	Gauge	`provider`	Circuit breaker state (`0`=closed, `1`=open, `2`=half-open)
`ferrogw_plugin_blocks_total`	Counter	`plugin`, `reason`	Requests blocked by a plugin

Example Prometheus scrape config

scrape_configs:
  - job_name: ferrogw
    static_configs:
      - targets: ["localhost:8080"]
    metrics_path: /metrics
    scrape_interval: 15s

Useful PromQL queries

# Request rate by provider
rate(ferrogw_requests_total[5m])

# P99 request latency
histogram_quantile(0.99, rate(ferrogw_request_duration_seconds_bucket[5m]))

# Error rate percentage
rate(ferrogw_errors_total[5m]) / rate(ferrogw_requests_total[5m]) * 100

# Token usage per minute
rate(ferrogw_tokens_total[1m]) * 60

# Cache hit ratio
rate(ferrogw_cache_hits_total[5m]) /
  (rate(ferrogw_cache_hits_total[5m]) + rate(ferrogw_cache_misses_total[5m]))

# Open circuit breakers
ferrogw_circuit_breaker_state == 1

Structured JSON logs

The gateway writes structured JSON to stdout. Each log line includes:

Field	Description
`time`	ISO 8601 timestamp
`level`	`debug`, `info`, `warn`, `error`
`trace_id`	Per-request UUID for log correlation
`msg`	Log message
`provider`	Provider name (on request/response lines)
`model`	Model ID
`latency_ms`	Provider round-trip latency in milliseconds
`status`	HTTP status code
`tokens_prompt`	Prompt token count
`tokens_completion`	Completion token count

Example log line:

{
  "time": "2026-03-11T10:23:45Z",
  "level": "info",
  "trace_id": "a3f9b1c2-d4e5-4678-8901-abcdef012345",
  "msg": "request complete",
  "provider": "anthropic",
  "model": "claude-3-5-sonnet-20241022",
  "latency_ms": 412,
  "status": 200,
  "tokens_prompt": 312,
  "tokens_completion": 87
}

Log level

Set the log level with LOG_LEVEL (default: info). Use LOG_FORMAT=text for human-readable output during development.

export LOG_LEVEL=debug
export LOG_FORMAT=text

Health endpoint

GET /health returns a deep health check with per-provider availability and latency:

{
  "status": "ok",
  "providers": {
    "openai": { "healthy": true, "latency_ms": 245 },
    "anthropic": { "healthy": true, "latency_ms": 312 },
    "groq": { "healthy": false, "error": "connection refused" }
  }
}

Returns 200 OK if at least one provider is healthy, 503 Service Unavailable if all providers are down.

For monitoring in production, see Monitoring.

Prometheus metrics​

Available metrics​

Example Prometheus scrape config​

Useful PromQL queries​

Structured JSON logs​

Log level​

Health endpoint​