Skip to main content

Plugins

Plugins extend the request pipeline at three lifecycle stages:

before_request — runs before the request is forwarded to the provider
after_request — runs after the provider response is received
on_error — runs when the provider returns an error

Each plugin entry in config.yaml has name, type, stage, enabled, and an optional config map. Disabled plugins (enabled: false) are ignored at runtime.

Plugin	Type	Stage	Open-source
`word-filter`	guardrail	before_request	✅
`max-token`	guardrail	before_request	✅
`response-cache`	transform	before_request	✅
`request-logger`	logging	before_request	✅
`rate-limit`	ratelimit	before_request	✅
`budget`	guardrail	before_request + after_request	✅
`pii-redact`	guardrail	before_request	Ferro Labs Managed only
`secret-scan`	guardrail	before_request	Ferro Labs Managed only
`prompt-shield`	guardrail	before_request	Ferro Labs Managed only
`schema-guard`	guardrail	after_request	Ferro Labs Managed only
`regex-guard`	guardrail	before_request	Ferro Labs Managed only

Guardrail plugins

word-filter

Blocks requests whose messages contain any of the configured words or phrases. Case sensitivity is optional.

- name: word-filter
  type: guardrail
  stage: before_request
  enabled: true
  config:
    blocked_words: ["confidential", "password", "secret"]
    case_sensitive: false

max-token

Enforces limits on token count, message count, and raw input length before the request reaches the provider.

- name: max-token
  type: guardrail
  stage: before_request
  enabled: true
  config:
    max_tokens: 4096          # maximum output tokens to request
    max_messages: 50          # maximum messages in the conversation
    max_input_length: 20000   # maximum raw characters in user input

pii-redact

Ferro Labs Managed only

This plugin is available in Ferro Labs Managed managed deployments. It is not included in the open-source gateway.

Detects Personally Identifiable Information (PII) in the request and either redacts it or blocks the request entirely.

- name: pii-redact
  type: guardrail
  stage: before_request
  enabled: true
  config:
    action: redact           # redact | block
    redact_mode: replace_type  # replace detected entity with its type label
    apply_to: input          # input | output | both
    entities: []             # empty = detect all entity types

With action: redact, detected PII is replaced in-place before the request is forwarded. With action: block, the entire request is rejected with a 400 error.

secret-scan

Ferro Labs Managed only

This plugin is available in Ferro Labs Managed managed deployments. It is not included in the open-source gateway.

Scans request content for leaked credentials, API keys, and secrets using pattern matching and (optionally) entropy analysis.

- name: secret-scan
  type: guardrail
  stage: before_request
  enabled: true
  config:
    action: block            # block | warn
    entropy_check: true      # also flag high-entropy strings

prompt-shield

Ferro Labs Managed only

This plugin is available in Ferro Labs Managed managed deployments. It is not included in the open-source gateway.

Scores user messages for prompt injection attempts and blocks requests that exceed a configurable confidence threshold.

- name: prompt-shield
  type: guardrail
  stage: before_request
  enabled: true
  config:
    action: block
    threshold: 0.90          # 0.0–1.0; higher = stricter
    apply_to: user_messages

schema-guard

Ferro Labs Managed only

This plugin is available in Ferro Labs Managed managed deployments. It is not included in the open-source gateway.

Validates the model's JSON output against a JSON Schema. Runs after_request. Optionally extracts JSON from a text response before validating.

- name: schema-guard
  type: guardrail
  stage: after_request
  enabled: true
  config:
    apply_to: output
    action: block
    extract_json: true       # attempt to parse JSON from a markdown code block
    schema:
      type: object
      required: [name, confidence]
      properties:
        name:
          type: string
        confidence:
          type: number
          minimum: 0
          maximum: 1

regex-guard

Ferro Labs Managed only

This plugin is available in Ferro Labs Managed managed deployments. It is not included in the open-source gateway.

Blocks or warns on requests matching one or more regular expressions. Useful for custom business rules not covered by other guardrails.

- name: regex-guard
  type: guardrail
  stage: before_request
  enabled: true
  config:
    action: block            # block | warn
    rules:
      - pattern: "(?i)(ssn|social security)\\s*:?\\s*\\d{3}-\\d{2}-\\d{4}"
        message: "SSN pattern detected"
      - pattern: "(?i)jailbreak|ignore previous instructions"
        message: "Potential jailbreak attempt"

Transform plugins

response-cache

Caches exact-match responses in memory. Identical requests (same model + messages) served from cache skip the provider entirely.

- name: response-cache
  type: transform
  stage: before_request
  enabled: true
  config:
    max_age: 300      # seconds before a cache entry expires
    max_entries: 1000 # maximum number of cached responses

Logging plugins

request-logger

Emits structured per-request logs. Optionally persists request and response data to SQLite or Postgres for later querying via the admin API.

- name: request-logger
  type: logging
  stage: before_request
  enabled: true
  config:
    level: info
    persist: true
    backend: sqlite      # sqlite | postgres
    dsn: ferrogw-requests.db   # SQLite path or Postgres DSN

When persist: true, requests are queryable at GET /admin/logs. See Request logging.

Rate limit plugins

rate-limit

Token-bucket rate limiting applied per request. Rejects requests with 429 Too Many Requests when the bucket is empty.

- name: rate-limit
  type: ratelimit
  stage: before_request
  enabled: true
  config:
    requests_per_second: 50
    burst: 100

For IP-level rate limiting (HTTP middleware layer), see Rate limiting.

Budget plugins

budget

Tracks cumulative USD spend per API key using an in-memory token-cost model. Must be registered at both before_request (to check the limit) and after_request (to record the cost). The two instances share state via store_id.

Requests without an api_key in request metadata are not subject to budget enforcement.

In-memory only

Spend data is in-memory and resets on gateway restart. Use this for session-scoped soft limits and development quotas. Durable billing enforcement is available in Ferro Labs Managed.

plugins:
  # Check limit before forwarding
  - name: budget
    type: guardrail
    stage: before_request
    enabled: true
    config:
      store_id: "default"         # shared between before/after instances
      spend_limit_usd: 50.0       # max cumulative spend per API key (USD)
      input_per_m_tokens: 3.0     # cost per 1M prompt tokens
      output_per_m_tokens: 15.0   # cost per 1M completion tokens
      max_keys: 10000             # max tracked API keys before eviction

  # Record cost after response
  - name: budget
    type: guardrail
    stage: after_request
    enabled: true
    config:
      store_id: "default"         # must match the before_request instance
      input_per_m_tokens: 3.0
      output_per_m_tokens: 15.0

When the accumulated spend for an API key reaches spend_limit_usd, subsequent requests are rejected with HTTP 429.

Plugin execution order

Plugins are executed in the order they appear in config.yaml. Within a stage, if any plugin sets reject: true, execution stops and an error is returned to the client. Setting skip: true short-circuits the entire remaining stage loop — the current plugin finishes, then all subsequent plugins in that stage are skipped (it does not merely bypass the current plugin). This is how the response-cache plugin short-circuits the rest of the before_request stage and the provider call on a cache hit.

Guardrail plugins
Transform plugins
- response-cache
Logging plugins
- request-logger
Rate limit plugins
- rate-limit
Budget plugins
- budget
Plugin execution order