Skip to main content

Configuration

v1.0.0 reference

This configuration reference covers Ferro Labs AI Gateway v1.0.0 โ€” the first stable release with semver guarantees. All configuration keys documented here are part of the stable API.

The gateway loads configuration from a YAML or JSON file at the path set by GATEWAY_CONFIG.

export GATEWAY_CONFIG=./config.yaml
./ferrogw

Supported extensions: .yaml, .yml, .json.

Strategyโ€‹

The top-level strategy block controls how requests are routed.

strategy:
mode: fallback # single | fallback | loadbalance | conditional | least-latency | cost-optimized | content-based | ab-test
ModeDescription
singleRoute every request to the first target.
fallbackTry targets in order; retry on failure with exponential backoff.
loadbalanceWeighted random distribution across targets.
conditionalRule-based: match model name or model prefix to a target.
least-latencyRoute to the target with the lowest P50 latency (rolling tracker).
cost-optimizedUse the built-in model catalog to estimate prompt cost and pick the cheapest compatible target.
content-basedRoute based on user message content using substring or regex matching.
ab-testSplit traffic across labeled variants by weight for comparison testing.

Conditional rulesโ€‹

strategy:
mode: conditional
conditions:
- key: model
value: gpt-4o-mini
target_key: openai
- key: model_prefix
value: claude
target_key: anthropic
- key: model
value: gemini-1.5-flash
target_key: gemini

Rules are evaluated in order. model matches the exact model ID; model_prefix matches any model whose name starts with the given string.

Content-based routingโ€‹

strategy:
mode: content-based
content_conditions:
- type: prompt_contains
value: "translate"
target_key: deepl-provider
- type: prompt_regex
value: "(?i)(code|function|class|def |import )"
target_key: openai
- type: prompt_contains
value: "summarize"
target_key: anthropic

Three condition types: prompt_contains (case-insensitive substring), prompt_not_contains, and prompt_regex (Go regexp). Regex patterns are compiled at startup โ€” invalid patterns cause a startup error. First match wins; unmatched requests fall to the first target.

A/B test routingโ€‹

strategy:
mode: ab-test
ab_variants:
- target_key: openai
weight: 80
label: control
- target_key: anthropic
weight: 20
label: challenger

Weights are relative. Each request is tagged with the label field in structured logs (the ab_variant field in gateway.request.completed events), so you can aggregate quality or cost metrics per variant.

Targetsโ€‹

Targets are provider references. Each virtual_key must match a registered provider (see Provider configuration).

targets:
- virtual_key: openai
weight: 70 # used by loadbalance strategy
retry:
attempts: 3
retry_on_status: [429, 502, 503, 504]
circuit_breaker:
failure_threshold: 5 # failures before opening
success_threshold: 2 # successes before closing
timeout: "30s" # time in open state before half-open probe

- virtual_key: anthropic
weight: 30
retry:
attempts: 2

Model aliasesโ€‹

Aliases resolve before routing. They let you use short names and switch backing models without changing client code.

aliases:
fast: gpt-4o-mini
smart: claude-3-5-sonnet-20241022
cheap: gemini-1.5-flash
code: deepseek-coder

A client that requests model: cheap will receive a response from Gemini 1.5 Flash.

Pluginsโ€‹

Each plugin entry specifies its name, type, stage, and config map. Set enabled: false to disable without removing the entry.

OSS pluginsโ€‹

plugins:
- name: word-filter
type: guardrail
stage: before_request
enabled: true
config:
blocked_words: ["password", "confidential"]
case_sensitive: false

- name: max-token
type: guardrail
stage: before_request
enabled: true
config:
max_tokens: 4096
max_messages: 50

- name: response-cache
type: transform
stage: before_request
enabled: true
config:
max_age: 300
max_entries: 1000

- name: request-logger
type: logging
stage: before_request
enabled: true
config:
level: info
persist: true
backend: sqlite
dsn: ferrogw-requests.db

- name: rate-limit
type: ratelimit
stage: before_request
enabled: true
config:
requests_per_minute: 120
burst: 20
key_rpm: 60 # per API key limit (v0.8.5+)
user_rpm: 30 # per user ID limit (v0.8.5+)

- name: budget
type: guardrail
stage: before_request # also register at after_request to record costs
enabled: true
config:
spend_limit_usd: 10.00
store_id: default # instances sharing store_id share spend data
Budget plugin dual registration

The budget plugin should be registered twice โ€” once at before_request (to reject over-limit keys) and once at after_request (to record token costs). Use the same store_id in both entries to share the spend counter.

plugins:
- name: budget
stage: before_request
enabled: true
config:
spend_limit_usd: 10.00
store_id: default
- name: budget
stage: after_request
enabled: true
config:
store_id: default

Ferro Labs Managed pluginsโ€‹

These plugins require a Ferro Labs Managed account:

plugins:
- name: pii-redact
type: guardrail
stage: before_request
enabled: true
config:
action: redact # redact | block
redact_mode: replace_type
apply_to: input

- name: secret-scan
type: guardrail
stage: before_request
enabled: true
config:
action: block

- name: prompt-shield
type: guardrail
stage: before_request
enabled: true
config:
action: block
threshold: 0.90
apply_to: user_messages

- name: schema-guard
type: guardrail
stage: after_request
enabled: true
config:
apply_to: output
action: block
extract_json: true
schema:
type: object
required: [name, confidence]
properties:
name:
type: string
confidence:
type: number
minimum: 0
maximum: 1

- name: regex-guard
type: guardrail
stage: before_request
enabled: true
config:
action: block
patterns:
- "(?i)drop\\s+table"
- "(?i)delete\\s+from"
Ferro Labs Managed feature

The 5 enterprise plugins (pii-redact, secret-scan, prompt-shield, schema-guard, regex-guard) require a Ferro Labs Managed account. Join the waitlist โ†’

See Plugins for all 11 plugins and their full config options.

MCP serversโ€‹

Configure Model Context Protocol tool servers for agentic tool-calling. Streaming requests are supported as of v1.0.0.

mcp_servers:
- name: filesystem
url: "http://localhost:3001/mcp"
timeout_seconds: 10
max_call_depth: 3

- name: database
url: "https://mcp-db.internal/mcp"
headers:
Authorization: "Bearer ${MCP_DB_TOKEN}"
allowed_tools:
- query_readonly
- list_tables
timeout_seconds: 15
max_call_depth: 5

When mcp_servers is present, the gateway initialises MCP connections in the background on startup (60-second timeout) and injects available tools into every chat completion request. See MCP integration.

Complete exampleโ€‹

Copy config.example.yaml from the repository for a full example covering all 29 providers, aliases, and all 11 plugins.