Changelog

Full release notes are also on GitHub Releases.

v1.0.0 — 2026-03-24 — Stable release

v1.0.0 is the first stable release of the Ferro Labs AI Gateway. Starting with this release, the project follows semantic versioning: breaking changes will only occur in major version bumps. The configuration format and OpenAI-compatible API are now part of the stable contract.

What's new in v1.0.0

MCP streaming support — clients can send stream: true when MCP servers are configured. The gateway resolves all tool calls internally and returns the final answer as SSE. Added in v1.0.0-rc.1.
29 providers — 10 providers added since v0.6.5: Cerebras, NVIDIA NIM, Cloudflare Workers AI, Databricks, Novita AI, Qwen (Alibaba), Moonshot AI, SambaNova, DeepInfra, OpenRouter.
8 routing strategies — content-based and A/B test routing (from v0.8.5) are now stable.
Per-key and per-user rate limiting — key_rpm and user_rpm fields in the rate-limit plugin (from v0.8.5).
Budget plugin — per-key spend tracking and enforcement (from v0.8.5).
Published benchmarks — sub-millisecond p99 overhead at 500 RPS, 100% success rate sustained. See benchmarks.

Stability guarantees

The config.yaml schema is stable. Existing configs will continue to work across v1.x releases.
The OpenAI-compatible API (/v1/chat/completions, /v1/embeddings, /v1/models, /v1/images/generations) wire format is stable.
The admin API (/admin/*) is stable. New endpoints may be added but existing ones will not change in breaking ways.
Prometheus metric names and labels are stable.

v0.8.5 — 2026-03-12

Content-based routing strategy

New strategy.mode: content-based selects a provider target based on user-role prompt content
Three condition types: prompt_contains (case-insensitive), prompt_not_contains, and prompt_regex (Go regexp)
Rules evaluated in declaration order — first match wins; unmatched requests fall back to the first target
Regex patterns compiled at startup for zero-cost hot-path matching; invalid patterns surface as a startup error

A/B testing strategy

New strategy.mode: ab-test splits traffic across two or more named variants using weighted random sampling
Each variant carries a label (e.g. "control", "challenger") emitted as the ab_variant structured log field on every routed request
Zero-weight variants participate with weight 1 (equal distribution)

Per-key and per-user rate limiting

Extended rate-limit plugin with key_rpm (requests per minute per API key) and user_rpm (requests per minute per user ID)
Rate checks execute in order: global → per-key → per-user; request rejected at first exceeded limiter with a distinct reason string

Per-key budget controls plugin

New budget plugin tracks cumulative USD spend per API key in an in-memory store
Register at before_request to reject over-limit keys and after_request to record token costs
Two instances sharing the same store_id share accumulated spend data
spend_limit_usd: 0 (or unset) means unlimited — spend is tracked without rejection
Spend data is in-memory; does not survive process restarts

v0.8.0 — 2026-03-10

MCP integration (Phase 1)

Added mcp_servers configuration block for Model Context Protocol tool servers
Gateway injects available tools into every chat completion request automatically
Full agentic loop: gateway handles all tool_calls rounds internally, returns final text to client
Background MCP initialisation on startup with 60-second timeout; gateway is ready immediately
MCPInitDone() channel on Gateway struct for sync when needed
Per-server allowed_tools whitelist for access control
Per-server max_call_depth limit to prevent infinite loops
Environment variable interpolation (${VAR}) in MCP server headers
29 new tests covering MCP lifecycle and agentic loop behaviour
Bug fixes: nil-safe circuit breaker map init, empty config array handling, streaming fix for empty delta content

v0.7.0 — 2026-03-08

Comprehensive regression test suite (50+ end-to-end scenarios)
Fixed: race condition in concurrent provider health checks
Fixed: weight normalisation with single-target load balancer
Fixed: least-latency cold-start selecting excluded targets
Fixed: cost-optimized panic on missing catalog entry
Fixed: admin API pagination off-by-one on last page

v0.6.6 — 2026-03-07

Refactored providers/core subpackage; providers_list.go split for clarity
All Name* constants re-exported from providers top-level package
Dashboard XSS hardening (output encoding on all admin UI fields)
Added CORS origin validation warning on startup for wildcard origins
Removed 19 deprecated provider shim files

v0.6.5 — 2026-03-07 — 5 new providers

xAI (Grok) — XAI_API_KEY
Azure AI Foundry — AZURE_FOUNDRY_API_KEY + AZURE_FOUNDRY_ENDPOINT
Hugging Face — HUGGING_FACE_API_KEY
Google Vertex AI — VERTEX_AI_PROJECT_ID (ADC)
AWS Bedrock (static credentials) — AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY option
Provider subpackage refactor: unified factory pattern across all providers
Total providers: 19 (at time of release); total models in catalog: 2,531

v0.6.1 — 2026-03-06

CI GitHub Actions version bumps
Go dependency refresh (net/http, crypto)

v0.6.0 — 2026-03-06 — 5 new guardrail plugins

pii-redact — detect and redact PII entities before forwarding
secret-scan — block requests containing credentials or high-entropy secrets
prompt-shield — score and block prompt injection attempts
schema-guard — validate model output against JSON Schema (after_request)
regex-guard — block requests matching configurable regex patterns
Total built-in plugins: 11 (5 new + 6 existing)
All new plugins ship disabled (enabled: false) in config.example.yaml

v0.5.0 — 2026-03-03

Streaming cost tracking — token usage counted during streamed responses
Least-latency strategy — P50 rolling latency tracker, routes to fastest provider
Cost-optimized strategy — model catalog cost estimation, routes to cheapest provider
Per-target retry_on_status codes list (customise which HTTP status codes trigger retry)
CLI overhaul using Cobra — ferrogw-cli with admin, models, and keys subcommands

v0.4.5 — 2026-02-28

Built-in model catalog with 2,531 model entries (pricing, context window, capabilities)
Cost calculator (models.Calculate()) used by cost-optimized strategy
/v1/models response enriched with catalog metadata (context window, max tokens, cost)
GitHub Actions catalog CI check — fails the build if catalog format is invalid

v1.0.0 — 2026-03-24 — Stable release​

What's new in v1.0.0​

Stability guarantees​

v0.8.5 — 2026-03-12​

Content-based routing strategy​

A/B testing strategy​

Per-key and per-user rate limiting​

Per-key budget controls plugin​

v0.8.0 — 2026-03-10​

MCP integration (Phase 1)​

v0.7.0 — 2026-03-08​

v0.6.6 — 2026-03-07​

v0.6.5 — 2026-03-07 — 5 new providers​

v0.6.1 — 2026-03-06​

v0.6.0 — 2026-03-06 — 5 new guardrail plugins​

v0.5.0 — 2026-03-03​

v0.4.5 — 2026-02-28​