Configuration

v1.0.0 reference

This configuration reference covers Ferro Labs AI Gateway v1.0.0 — the first stable release with semver guarantees. All configuration keys documented here are part of the stable API.

The gateway loads configuration from a YAML or JSON file at the path set by GATEWAY_CONFIG.

export GATEWAY_CONFIG=./config.yaml
./ferrogw

Supported extensions: .yaml, .yml, .json.

Strategy

The top-level strategy block controls how requests are routed.

strategy:
  mode: fallback  # single | fallback | loadbalance | conditional | least-latency | cost-optimized | content-based | ab-test

Mode	Description
`single`	Route every request to the first target.
`fallback`	Try targets in order; retry on failure with exponential backoff.
`loadbalance`	Weighted random distribution across targets.
`conditional`	Rule-based: match model name or model prefix to a target.
`least-latency`	Route to the target with the lowest P50 latency (rolling tracker).
`cost-optimized`	Use the built-in model catalog to estimate prompt cost and pick the cheapest compatible target.
`content-based`	Route based on user message content using substring or regex matching.
`ab-test`	Split traffic across labeled variants by weight for comparison testing.

Conditional rules

strategy:
  mode: conditional
  conditions:
    - key: model
      value: gpt-4o-mini
      target_key: openai
    - key: model_prefix
      value: claude
      target_key: anthropic
    - key: model
      value: gemini-1.5-flash
      target_key: gemini

Rules are evaluated in order. model matches the exact model ID; model_prefix matches any model whose name starts with the given string.

Content-based routing

strategy:
  mode: content-based
  content_conditions:
    - type: prompt_contains
      value: "translate"
      target_key: deepl-provider
    - type: prompt_regex
      value: "(?i)(code|function|class|def |import )"
      target_key: openai
    - type: prompt_contains
      value: "summarize"
      target_key: anthropic

Three condition types: prompt_contains (case-insensitive substring), prompt_not_contains, and prompt_regex (Go regexp). Regex patterns are compiled at startup — invalid patterns cause a startup error. First match wins; unmatched requests fall to the first target.

A/B test routing

strategy:
  mode: ab-test
  ab_variants:
    - target_key: openai
      weight: 80
      label: control
    - target_key: anthropic
      weight: 20
      label: challenger

Weights are relative. Each request is tagged with the label field in structured logs (the ab_variant field in gateway.request.completed events), so you can aggregate quality or cost metrics per variant.

Targets

Targets are provider references. Each virtual_key must match a registered provider (see Provider configuration).

targets:
  - virtual_key: openai
    weight: 70           # used by loadbalance strategy
    retry:
      attempts: 3
      retry_on_status: [429, 502, 503, 504]
    circuit_breaker:
      failure_threshold: 5   # failures before opening
      success_threshold: 2   # successes before closing
      timeout: "30s"         # time in open state before half-open probe

  - virtual_key: anthropic
    weight: 30
    retry:
      attempts: 2

Model aliases

Aliases resolve before routing. They let you use short names and switch backing models without changing client code.

aliases:
  fast: gpt-4o-mini
  smart: claude-3-5-sonnet-20241022
  cheap: gemini-1.5-flash
  code: deepseek-coder

A client that requests model: cheap will receive a response from Gemini 1.5 Flash.

Plugins

Each plugin entry specifies its name, type, stage, and config map. Set enabled: false to disable without removing the entry.

OSS plugins

plugins:
  - name: word-filter
    type: guardrail
    stage: before_request
    enabled: true
    config:
      blocked_words: ["password", "confidential"]
      case_sensitive: false

  - name: max-token
    type: guardrail
    stage: before_request
    enabled: true
    config:
      max_tokens: 4096
      max_messages: 50

  - name: response-cache
    type: transform
    stage: before_request
    enabled: true
    config:
      max_age: 300
      max_entries: 1000

  - name: request-logger
    type: logging
    stage: before_request
    enabled: true
    config:
      level: info
      persist: true
      backend: sqlite
      dsn: ferrogw-requests.db

  - name: rate-limit
    type: ratelimit
    stage: before_request
    enabled: true
    config:
      requests_per_minute: 120
      burst: 20
      key_rpm: 60          # per API key limit (v0.8.5+)
      user_rpm: 30         # per user ID limit (v0.8.5+)

  - name: budget
    type: guardrail
    stage: before_request   # also register at after_request to record costs
    enabled: true
    config:
      spend_limit_usd: 10.00
      store_id: default     # instances sharing store_id share spend data

Budget plugin dual registration

The budget plugin should be registered twice — once at before_request (to reject over-limit keys) and once at after_request (to record token costs). Use the same store_id in both entries to share the spend counter.

plugins:
  - name: budget
    stage: before_request
    enabled: true
    config:
      spend_limit_usd: 10.00
      store_id: default
  - name: budget
    stage: after_request
    enabled: true
    config:
      store_id: default

Ferro Labs Managed plugins

These plugins require a Ferro Labs Managed account:

plugins:
  - name: pii-redact
    type: guardrail
    stage: before_request
    enabled: true
    config:
      action: redact       # redact | block
      redact_mode: replace_type
      apply_to: input

  - name: secret-scan
    type: guardrail
    stage: before_request
    enabled: true
    config:
      action: block

  - name: prompt-shield
    type: guardrail
    stage: before_request
    enabled: true
    config:
      action: block
      threshold: 0.90
      apply_to: user_messages

  - name: schema-guard
    type: guardrail
    stage: after_request
    enabled: true
    config:
      apply_to: output
      action: block
      extract_json: true
      schema:
        type: object
        required: [name, confidence]
        properties:
          name:
            type: string
          confidence:
            type: number
            minimum: 0
            maximum: 1

  - name: regex-guard
    type: guardrail
    stage: before_request
    enabled: true
    config:
      action: block
      patterns:
        - "(?i)drop\\s+table"
        - "(?i)delete\\s+from"

Ferro Labs Managed feature

The 5 enterprise plugins (pii-redact, secret-scan, prompt-shield, schema-guard, regex-guard) require a Ferro Labs Managed account. Join the waitlist →

See Plugins for all 11 plugins and their full config options.

MCP servers

Configure Model Context Protocol tool servers for agentic tool-calling. Streaming requests are supported as of v1.0.0.

mcp_servers:
  - name: filesystem
    url: "http://localhost:3001/mcp"
    timeout_seconds: 10
    max_call_depth: 3

  - name: database
    url: "https://mcp-db.internal/mcp"
    headers:
      Authorization: "Bearer ${MCP_DB_TOKEN}"
    allowed_tools:
      - query_readonly
      - list_tables
    timeout_seconds: 15
    max_call_depth: 5

When mcp_servers is present, the gateway initialises MCP connections in the background on startup (60-second timeout) and injects available tools into every chat completion request. See MCP integration.

Complete example

Copy config.example.yaml from the repository for a full example covering all 29 providers, aliases, and all 11 plugins.

Routing policies — all 8 strategies with examples
Plugins — detailed plugin documentation
Use cases — recipe-style configurations
MCP integration — tool server setup

Strategy​

Conditional rules​

Content-based routing​

A/B test routing​

Targets​

Model aliases​

Plugins​

OSS plugins​

Ferro Labs Managed plugins​

MCP servers​

Complete example​

Related pages​