Skip to main content

Plugins

Plugins extend the request pipeline at three lifecycle stages:

  • before_request โ€” runs before the request is forwarded to the provider
  • after_request โ€” runs after the provider response is received
  • on_error โ€” runs when the provider returns an error

Each plugin entry in config.yaml has name, type, stage, enabled, and an optional config map. Disabled plugins (enabled: false) are ignored at runtime.

PluginTypeStageOpen-source
word-filterguardrailbefore_requestโœ…
max-tokenguardrailbefore_requestโœ…
response-cachetransformbefore_requestโœ…
request-loggerloggingbefore_requestโœ…
rate-limitratelimitbefore_requestโœ…
budgetbudgetbefore_request + after_requestโœ…
pii-redactguardrailbefore_requestFerro Labs Managed only
secret-scanguardrailbefore_requestFerro Labs Managed only
prompt-shieldguardrailbefore_requestFerro Labs Managed only
schema-guardguardrailafter_requestFerro Labs Managed only
regex-guardguardrailbefore_requestFerro Labs Managed only

Guardrail pluginsโ€‹

word-filterโ€‹

Blocks requests whose messages contain any of the configured words or phrases. Case sensitivity is optional.

- name: word-filter
type: guardrail
stage: before_request
enabled: true
config:
blocked_words: ["confidential", "password", "secret"]
case_sensitive: false

max-tokenโ€‹

Enforces limits on token count, message count, and raw input length before the request reaches the provider.

- name: max-token
type: guardrail
stage: before_request
enabled: true
config:
max_tokens: 4096 # maximum output tokens to request
max_messages: 50 # maximum messages in the conversation
max_input_length: 20000 # maximum raw characters in user input

pii-redactโ€‹

Ferro Labs Managed only

This plugin is available in Ferro Labs Managed managed deployments. It is not included in the open-source gateway.

Detects Personally Identifiable Information (PII) in the request and either redacts it or blocks the request entirely.

- name: pii-redact
type: guardrail
stage: before_request
enabled: true
config:
action: redact # redact | block
redact_mode: replace_type # replace detected entity with its type label
apply_to: input # input | output | both
entities: [] # empty = detect all entity types

With action: redact, detected PII is replaced in-place before the request is forwarded. With action: block, the entire request is rejected with a 400 error.

secret-scanโ€‹

Ferro Labs Managed only

This plugin is available in Ferro Labs Managed managed deployments. It is not included in the open-source gateway.

Scans request content for leaked credentials, API keys, and secrets using pattern matching and (optionally) entropy analysis.

- name: secret-scan
type: guardrail
stage: before_request
enabled: true
config:
action: block # block | warn
entropy_check: true # also flag high-entropy strings

prompt-shieldโ€‹

Ferro Labs Managed only

This plugin is available in Ferro Labs Managed managed deployments. It is not included in the open-source gateway.

Scores user messages for prompt injection attempts and blocks requests that exceed a configurable confidence threshold.

- name: prompt-shield
type: guardrail
stage: before_request
enabled: true
config:
action: block
threshold: 0.90 # 0.0โ€“1.0; higher = stricter
apply_to: user_messages

schema-guardโ€‹

Ferro Labs Managed only

This plugin is available in Ferro Labs Managed managed deployments. It is not included in the open-source gateway.

Validates the model's JSON output against a JSON Schema. Runs after_request. Optionally extracts JSON from a text response before validating.

- name: schema-guard
type: guardrail
stage: after_request
enabled: true
config:
apply_to: output
action: block
extract_json: true # attempt to parse JSON from a markdown code block
schema:
type: object
required: [name, confidence]
properties:
name:
type: string
confidence:
type: number
minimum: 0
maximum: 1

regex-guardโ€‹

Ferro Labs Managed only

This plugin is available in Ferro Labs Managed managed deployments. It is not included in the open-source gateway.

Blocks or warns on requests matching one or more regular expressions. Useful for custom business rules not covered by other guardrails.

- name: regex-guard
type: guardrail
stage: before_request
enabled: true
config:
action: block # block | warn
rules:
- pattern: "(?i)(ssn|social security)\\s*:?\\s*\\d{3}-\\d{2}-\\d{4}"
message: "SSN pattern detected"
- pattern: "(?i)jailbreak|ignore previous instructions"
message: "Potential jailbreak attempt"

Transform pluginsโ€‹

response-cacheโ€‹

Caches exact-match responses in memory. Identical requests (same model + messages) served from cache skip the provider entirely.

- name: response-cache
type: transform
stage: before_request
enabled: true
config:
max_age: 300 # seconds before a cache entry expires
max_entries: 1000 # maximum number of cached responses

Logging pluginsโ€‹

request-loggerโ€‹

Emits structured per-request logs. Optionally persists request and response data to SQLite or Postgres for later querying via the admin API.

- name: request-logger
type: logging
stage: before_request
enabled: true
config:
level: info
persist: true
backend: sqlite # sqlite | postgres
dsn: ferrogw-requests.db # SQLite path or Postgres DSN

When persist: true, requests are queryable at GET /admin/logs. See Request logging.

Rate limit pluginsโ€‹

rate-limitโ€‹

Token-bucket rate limiting applied per request. Rejects requests with 429 Too Many Requests when the bucket is empty.

- name: rate-limit
type: ratelimit
stage: before_request
enabled: true
config:
requests_per_second: 50
burst: 100

For IP-level rate limiting (HTTP middleware layer), see Rate limiting.

Budget pluginsโ€‹

budgetโ€‹

Tracks cumulative USD spend per API key using an in-memory token-cost model. Must be registered at both before_request (to check the limit) and after_request (to record the cost). The two instances share state via store_id.

Requests without an api_key in request metadata are not subject to budget enforcement.

In-memory only

Spend data is in-memory and resets on gateway restart. Use this for session-scoped soft limits and development quotas. Durable billing enforcement is available in Ferro Labs Managed.

plugins:
# Check limit before forwarding
- name: budget
type: budget
stage: before_request
enabled: true
config:
store_id: "default" # shared between before/after instances
spend_limit_usd: 50.0 # max cumulative spend per API key (USD)
input_per_m_tokens: 3.0 # cost per 1M prompt tokens
output_per_m_tokens: 15.0 # cost per 1M completion tokens
max_keys: 10000 # max tracked API keys before eviction

# Record cost after response
- name: budget
type: budget
stage: after_request
enabled: true
config:
store_id: "default" # must match the before_request instance
input_per_m_tokens: 3.0
output_per_m_tokens: 15.0

When the accumulated spend for an API key reaches spend_limit_usd, subsequent requests are rejected with HTTP 429.

Plugin execution orderโ€‹

Plugins are executed in the order they appear in config.yaml. Within a stage, if any plugin sets reject: true, execution stops and an error is returned to the client. The skip flag causes only the current plugin to be bypassed.