Troubleshooting
This page covers the most common issues encountered when running the Ferro Labs AI Gateway and how to resolve them.
Provider key not picked up at startupโ
Symptom: The gateway starts but returns 401 Unauthorized for every request to a provider you configured.
Likely cause: The environment variable referenced in config.yaml is not set, misspelled, or not visible to the gateway process.
Fix:
Check whether the variable is available inside the running container:
# Docker
docker inspect --format '{{range .Config.Env}}{{println .}}{{end}}' ferrogw | grep OPENAI
# Local process
env | grep OPENAI_API_KEY
Verify the variable name in config.yaml matches exactly (including case):
providers:
- name: openai
type: openai
api_key: ${OPENAI_API_KEY} # must match the env var name exactly
Never hard-code API keys in config.yaml. Always use environment variable references (${VAR}) and inject secrets via your orchestrator or .env file.
Circuit breaker opens immediatelyโ
Symptom: After a single failed request the target is excluded from routing and the gateway returns errors or falls through to the next target.
Likely cause: failure_threshold is set to 1, so one failure trips the breaker. Alternatively, the upstream provider is genuinely down.
Fix:
First, check upstream health:
curl -s http://localhost:8080/health | jq .
If the provider is healthy, raise the threshold:
targets:
- virtual_key: openai
circuit_breaker:
failure_threshold: 5 # require 5 consecutive failures before opening
success_threshold: 2
timeout: "30s"
Set failure_threshold to at least 3 in production to avoid flapping on transient errors.
Streaming responses truncatedโ
Symptom: Server-sent event (SSE) streams cut off before the model finishes generating. The client receives a partial response.
Likely cause: A reverse proxy or load balancer between the client and the gateway is timing out before the stream completes.
Fix:
If you use nginx in front of the gateway, increase the read timeout:
location /v1/ {
proxy_pass http://gateway:8080;
proxy_read_timeout 300s;
proxy_buffering off;
proxy_set_header Connection '';
proxy_http_version 1.1;
chunked_transfer_encoding off;
}
If the gateway itself is timing the request out, check your target-level timeout configuration. Streaming requests to large models can take 60 seconds or more.
MCP server connection timeoutโ
Symptom: Requests that should trigger tool calls return a plain text response, or the gateway logs mcp connection timeout.
Likely cause: The MCP server URL is wrong, the server is not running, or a firewall is blocking the connection.
Fix:
Check the gateway logs for MCP-related errors:
docker logs ferrogw 2>&1 | grep -i mcp
Test connectivity from the gateway's network:
# From inside the container
docker exec ferrogw curl -sf http://mcp-server:3001/mcp
Verify the URL and timeout in config.yaml:
mcp_servers:
- name: filesystem
url: "http://mcp-server:3001/mcp" # must be reachable from the gateway
timeout_seconds: 15 # increase if the server is slow to respond
max_call_depth: 5
If the MCP server runs in a separate Docker Compose service, make sure both services are on the same Docker network.
Rate limiter firing unexpectedlyโ
Symptom: Clients receive 429 Too Many Requests well below expected traffic levels.
Likely cause: The burst value is too low, or the global rate limit is being confused with the per-key limit. The global bucket drains across all clients combined.
Fix:
Review your rate-limit plugin configuration. The global requests_per_second applies to all traffic, while key_rpm applies per API key:
plugins:
- name: rate-limit
type: ratelimit
stage: before_request
enabled: true
config:
requests_per_second: 100 # global: 100 req/s across all clients
burst: 200 # allow short bursts up to 200
key_rpm: 60 # per-key: max 60 requests per minute
If you only want per-key limits and no global cap, omit requests_per_second and burst:
plugins:
- name: rate-limit
type: ratelimit
stage: before_request
enabled: true
config:
key_rpm: 60
Rate checks execute in order: global, then per-key, then per-user. The first exceeded limit triggers the 429.
Config reload not taking effectโ
Symptom: You edited config.yaml but the gateway behavior has not changed.
Likely cause: The GATEWAY_CONFIG environment variable points to a different file, or the edited file has a YAML syntax error that causes a silent reload failure.
Fix:
Confirm which file the gateway is loading:
echo $GATEWAY_CONFIG
# Should print the path to the config file you edited
Validate the YAML before reloading:
# Quick syntax check (requires yq or python)
yq eval '.' config.yaml > /dev/null && echo "YAML OK" || echo "YAML ERROR"
# Or with Python
python3 -c "import yaml; yaml.safe_load(open('config.yaml'))"
After fixing any syntax issues, restart the gateway:
docker restart ferrogw
High memory usage under loadโ
Symptom: The gateway process memory grows steadily under sustained traffic and eventually OOMs.
Likely cause: Too many concurrent connections, an unbounded response cache, or a memory leak in a plugin.
Fix:
If you use the response-cache plugin, set a maximum entry count:
plugins:
- name: response-cache
type: transform
stage: before_request
enabled: true
config:
max_entries: 5000 # cap the cache to prevent unbounded growth
ttl_seconds: 300
Profile the gateway with pprof to identify the source of allocations:
# Requires LOG_LEVEL=debug or pprof enabled
curl -s http://localhost:8080/debug/pprof/heap > heap.out
go tool pprof heap.out
In Docker deployments, set memory limits on the container (--memory=2g) so an OOM kills the container instead of the host.
Docker healthcheck failingโ
Symptom: Docker reports the gateway container as unhealthy even though it is processing requests.
Likely cause: The healthcheck is hitting the wrong port or path.
Fix:
The gateway exposes its health endpoint at /health on the configured PORT (default 8080). Make sure your docker-compose.yml or Dockerfile healthcheck matches:
services:
gateway:
image: ghcr.io/ferrolabs/ferrogw:latest
ports:
- "8080:8080"
healthcheck:
test: ["CMD", "curl", "-sf", "http://localhost:8080/health"]
interval: 10s
timeout: 5s
retries: 3
start_period: 5s
If you changed the port via the PORT environment variable, update the healthcheck URL to match:
environment:
PORT: "9090"
healthcheck:
test: ["CMD", "curl", "-sf", "http://localhost:9090/health"]
Prometheus scrape returning emptyโ
Symptom: Prometheus shows no metrics for the gateway, or curl to the metrics endpoint returns an empty body.
Likely cause: The metrics endpoint is not enabled, or Prometheus is scraping the wrong port.
Fix:
Verify the metrics endpoint is responding:
curl -s http://localhost:8080/metrics | head -20
If you get a 404, confirm that metrics are enabled in your server settings. The gateway exposes Prometheus metrics at GET /metrics by default on the same port as the API.
Check your prometheus.yml targets:
scrape_configs:
- job_name: ferrogw
static_configs:
- targets: ["gateway-host:8080"] # must match the gateway's PORT
scrape_interval: 15s
If the gateway runs inside Docker Compose and Prometheus runs in the same stack, use the service name as the host: targets: ["gateway:8080"].
502 Bad Gateway from all providersโ
Symptom: Every request returns 502 Bad Gateway regardless of the target provider.
Likely cause: All circuit breakers are open because every upstream provider is failing (or was recently failing).
Fix:
Check health to see provider status:
curl -s http://localhost:8080/health | jq .
If the upstream providers have recovered but circuit breakers are still open, they will close automatically after the configured timeout period. You can speed this up by restarting the gateway:
docker restart ferrogw
To prevent all breakers from opening simultaneously, stagger your failure_threshold and timeout values across targets:
targets:
- virtual_key: openai
circuit_breaker:
failure_threshold: 5
timeout: "30s"
- virtual_key: anthropic
circuit_breaker:
failure_threshold: 3
timeout: "20s"
- virtual_key: gemini
circuit_breaker:
failure_threshold: 5
timeout: "45s"
Model not found errorโ
Symptom: The gateway returns an error like model "gpt4o" not found even though you have an OpenAI target configured.
Likely cause: The model name in the request does not match any entry in the built-in catalog or your configured models list. Model names are exact-match (e.g. gpt-4o, not gpt4o).
Fix:
List available models through the gateway:
curl -s http://localhost:8080/v1/models \
-H "Authorization: Bearer $API_KEY" | jq '.data[].id'
If you need to map a custom name to a real model, use model aliases in your config:
model_aliases:
- alias: "our-default"
model: "gpt-4o-mini"
target_key: openai
- alias: "our-smart"
model: "claude-sonnet-4-20250514"
target_key: anthropic
Content-based routing not matchingโ
Symptom: Requests that should match a prompt_regex rule are falling through to the default target instead.
Likely cause: The regex pattern is not matching due to case sensitivity, or the pattern has a syntax error that prevented compilation.
Fix:
Regex patterns are compiled at gateway startup. An invalid regex causes a startup error (not a silent failure). Check gateway startup logs for compilation errors.
If the gateway started successfully but routing is wrong, test your regex independently:
# Test a Go-compatible regex
echo "Write a Python function to sort a list" | grep -P '(?i)(code|function|class|def |import )'
Remember that the gateway uses Go regular expressions. Use (?i) at the start for case-insensitive matching:
strategy:
mode: content-based
content_conditions:
- type: prompt_regex
value: "(?i)(code|function|class|def |import |bug|error|debug)"
target_key: deepseek
A/B test weights not reflecting expected distributionโ
Symptom: You configured an 80/20 split but after 50 requests you see a 60/40 ratio.
Likely cause: With small sample sizes, random weighted selection naturally deviates from the configured weights. This is expected statistical variance, not a bug.
Fix:
Weight normalization works as follows: weights are relative, so 80 and 20 produce an 80/20 split. Over a large number of requests (1,000+) the actual distribution converges toward the configured ratio.
A few things to check:
- Zero weights: If a variant has weight
0, it is treated as weight1and receives roughly equal traffic with other zero-weight variants. This is by design to prevent accidentally silencing a variant. - Circuit breakers: If the target backing one variant has its circuit breaker open, all traffic goes to the remaining variant.
- Sample size: At 100 requests with an 80/20 split, a 70/30 or 90/10 actual split is within normal variance. Collect at least 1,000 requests before evaluating.
strategy:
mode: ab-test
ab_variants:
- target_key: openai
weight: 80
label: control
- target_key: anthropic
weight: 20
label: challenger
Budget plugin not persisting across restartsโ
Symptom: After restarting the gateway, all API key spend counters reset to zero.
Likely cause: This is by design. The budget plugin uses an in-memory store that resets on every restart.
Fix:
The open-source budget plugin is intended for session-scoped soft limits and development quotas. If you need durable spend tracking that survives restarts:
- Use Ferro Labs Managed for persistent billing enforcement with database-backed spend tracking.
- As a workaround, export spend data via the
/metricsendpoint before restarting and use your monitoring system for budget alerts.
Do not rely on the in-memory budget plugin as your only spend control in production. A restart silently resets all limits. Use it as a safety net alongside durable billing in Ferro Labs Managed.
Request logger not writing to Postgresโ
Symptom: The request-logger plugin is enabled but no rows appear in the Postgres request_logs table.
Likely cause: The connection string (DSN) is wrong, Postgres is not reachable from the gateway, or the database/table does not exist.
Fix:
Test connectivity from the gateway's environment:
# From inside the Docker container
docker exec ferrogw sh -c \
'pg_isready -h postgres -p 5432 -U ferro || echo "Postgres unreachable"'
# Or test with curl/psql from the host
psql "postgres://ferro:ferro_secret@localhost:5432/ferro_logs" -c "SELECT 1;"
Verify the plugin config matches the actual Postgres host, port, user, and database:
plugins:
- name: request-logger
type: postgres-logger
connection_string: postgres://ferro:ferro_secret@postgres:5432/ferro_logs?sslmode=disable
log_request_body: true
log_response_body: false
If you run both the gateway and Postgres in Docker Compose, use the Compose service name (e.g. postgres) as the hostname, not localhost.
Check gateway logs for connection errors:
docker logs ferrogw 2>&1 | grep -i postgres