Changelog
Full release notes are also on GitHub Releases.
v1.0.0 โ 2026-03-24 โ Stable releaseโ
v1.0.0 is the first stable release of the Ferro Labs AI Gateway. Starting with this release, the project follows semantic versioning: breaking changes will only occur in major version bumps. The configuration format and OpenAI-compatible API are now part of the stable contract.
What's new in v1.0.0โ
- MCP streaming support โ clients can send
stream: truewhen MCP servers are configured. The gateway resolves all tool calls internally and returns the final answer as SSE. Added in v1.0.0-rc.1. - 29 providers โ 10 providers added since v0.6.5: Cerebras, NVIDIA NIM, Cloudflare Workers AI, Databricks, Novita AI, Qwen (Alibaba), Moonshot AI, SambaNova, DeepInfra, OpenRouter.
- 8 routing strategies โ content-based and A/B test routing (from v0.8.5) are now stable.
- Per-key and per-user rate limiting โ
key_rpmanduser_rpmfields in the rate-limit plugin (from v0.8.5). - Budget plugin โ per-key spend tracking and enforcement (from v0.8.5).
- Published benchmarks โ sub-millisecond p99 overhead at 500 RPS, 100% success rate sustained. See benchmarks.
Stability guaranteesโ
- The
config.yamlschema is stable. Existing configs will continue to work across v1.x releases. - The OpenAI-compatible API (
/v1/chat/completions,/v1/embeddings,/v1/models,/v1/images/generations) wire format is stable. - The admin API (
/admin/*) is stable. New endpoints may be added but existing ones will not change in breaking ways. - Prometheus metric names and labels are stable.
v0.8.5 โ 2026-03-12โ
Content-based routing strategyโ
- New
strategy.mode: content-basedselects a provider target based on user-role prompt content - Three condition types:
prompt_contains(case-insensitive),prompt_not_contains, andprompt_regex(Go regexp) - Rules evaluated in declaration order โ first match wins; unmatched requests fall back to the first target
- Regex patterns compiled at startup for zero-cost hot-path matching; invalid patterns surface as a startup error
A/B testing strategyโ
- New
strategy.mode: ab-testsplits traffic across two or more named variants using weighted random sampling - Each variant carries a
label(e.g."control","challenger") emitted as theab_variantstructured log field on every routed request - Zero-weight variants participate with weight 1 (equal distribution)
Per-key and per-user rate limitingโ
- Extended
rate-limitplugin withkey_rpm(requests per minute per API key) anduser_rpm(requests per minute per user ID) - Rate checks execute in order: global โ per-key โ per-user; request rejected at first exceeded limiter with a distinct reason string
Per-key budget controls pluginโ
- New
budgetplugin tracks cumulative USD spend per API key in an in-memory store - Register at
before_requestto reject over-limit keys andafter_requestto record token costs - Two instances sharing the same
store_idshare accumulated spend data spend_limit_usd: 0(or unset) means unlimited โ spend is tracked without rejection- Spend data is in-memory; does not survive process restarts
v0.8.0 โ 2026-03-10โ
MCP integration (Phase 1)โ
- Added
mcp_serversconfiguration block for Model Context Protocol tool servers - Gateway injects available tools into every chat completion request automatically
- Full agentic loop: gateway handles all
tool_callsrounds internally, returns final text to client - Background MCP initialisation on startup with 60-second timeout; gateway is ready immediately
MCPInitDone()channel onGatewaystruct for sync when needed- Per-server
allowed_toolswhitelist for access control - Per-server
max_call_depthlimit to prevent infinite loops - Environment variable interpolation (
${VAR}) in MCP server headers - 29 new tests covering MCP lifecycle and agentic loop behaviour
- Bug fixes: nil-safe circuit breaker map init, empty config array handling, streaming fix for empty delta content
v0.7.0 โ 2026-03-08โ
- Comprehensive regression test suite (50+ end-to-end scenarios)
- Fixed: race condition in concurrent provider health checks
- Fixed: weight normalisation with single-target load balancer
- Fixed:
least-latencycold-start selecting excluded targets - Fixed:
cost-optimizedpanic on missing catalog entry - Fixed: admin API pagination off-by-one on last page
v0.6.6 โ 2026-03-07โ
- Refactored
providers/coresubpackage;providers_list.gosplit for clarity - All
Name*constants re-exported fromproviderstop-level package - Dashboard XSS hardening (output encoding on all admin UI fields)
- Added CORS origin validation warning on startup for wildcard origins
- Removed 19 deprecated provider shim files
v0.6.5 โ 2026-03-07 โ 5 new providersโ
- xAI (Grok) โ
XAI_API_KEY - Azure AI Foundry โ
AZURE_FOUNDRY_API_KEY+AZURE_FOUNDRY_ENDPOINT - Hugging Face โ
HUGGING_FACE_API_KEY - Google Vertex AI โ
VERTEX_AI_PROJECT_ID(ADC) - AWS Bedrock (static credentials) โ
AWS_ACCESS_KEY_ID+AWS_SECRET_ACCESS_KEYoption - Provider subpackage refactor: unified factory pattern across all providers
- Total providers: 19 (at time of release); total models in catalog: 2,531
v0.6.1 โ 2026-03-06โ
- CI GitHub Actions version bumps
- Go dependency refresh (net/http, crypto)
v0.6.0 โ 2026-03-06 โ 5 new guardrail pluginsโ
- pii-redact โ detect and redact PII entities before forwarding
- secret-scan โ block requests containing credentials or high-entropy secrets
- prompt-shield โ score and block prompt injection attempts
- schema-guard โ validate model output against JSON Schema (after_request)
- regex-guard โ block requests matching configurable regex patterns
- Total built-in plugins: 11 (5 new + 6 existing)
- All new plugins ship disabled (
enabled: false) inconfig.example.yaml
v0.5.0 โ 2026-03-03โ
- Streaming cost tracking โ token usage counted during streamed responses
- Least-latency strategy โ P50 rolling latency tracker, routes to fastest provider
- Cost-optimized strategy โ model catalog cost estimation, routes to cheapest provider
- Per-target
retry_on_statuscodes list (customise which HTTP status codes trigger retry) - CLI overhaul using Cobra โ
ferrogw-cliwithadmin,models, andkeyssubcommands
v0.4.5 โ 2026-02-28โ
- Built-in model catalog with 2,531 model entries (pricing, context window, capabilities)
- Cost calculator (
models.Calculate()) used bycost-optimizedstrategy /v1/modelsresponse enriched with catalog metadata (context window, max tokens, cost)- GitHub Actions catalog CI check โ fails the build if catalog format is invalid