Skip to main content

Changelog

Full release notes are also on GitHub Releases.

v1.0.0 โ€” 2026-03-24 โ€” Stable releaseโ€‹

v1.0.0 is the first stable release of the Ferro Labs AI Gateway. Starting with this release, the project follows semantic versioning: breaking changes will only occur in major version bumps. The configuration format and OpenAI-compatible API are now part of the stable contract.

What's new in v1.0.0โ€‹

  • MCP streaming support โ€” clients can send stream: true when MCP servers are configured. The gateway resolves all tool calls internally and returns the final answer as SSE. Added in v1.0.0-rc.1.
  • 29 providers โ€” 10 providers added since v0.6.5: Cerebras, NVIDIA NIM, Cloudflare Workers AI, Databricks, Novita AI, Qwen (Alibaba), Moonshot AI, SambaNova, DeepInfra, OpenRouter.
  • 8 routing strategies โ€” content-based and A/B test routing (from v0.8.5) are now stable.
  • Per-key and per-user rate limiting โ€” key_rpm and user_rpm fields in the rate-limit plugin (from v0.8.5).
  • Budget plugin โ€” per-key spend tracking and enforcement (from v0.8.5).
  • Published benchmarks โ€” sub-millisecond p99 overhead at 500 RPS, 100% success rate sustained. See benchmarks.

Stability guaranteesโ€‹

  • The config.yaml schema is stable. Existing configs will continue to work across v1.x releases.
  • The OpenAI-compatible API (/v1/chat/completions, /v1/embeddings, /v1/models, /v1/images/generations) wire format is stable.
  • The admin API (/admin/*) is stable. New endpoints may be added but existing ones will not change in breaking ways.
  • Prometheus metric names and labels are stable.

v0.8.5 โ€” 2026-03-12โ€‹

Content-based routing strategyโ€‹

  • New strategy.mode: content-based selects a provider target based on user-role prompt content
  • Three condition types: prompt_contains (case-insensitive), prompt_not_contains, and prompt_regex (Go regexp)
  • Rules evaluated in declaration order โ€” first match wins; unmatched requests fall back to the first target
  • Regex patterns compiled at startup for zero-cost hot-path matching; invalid patterns surface as a startup error

A/B testing strategyโ€‹

  • New strategy.mode: ab-test splits traffic across two or more named variants using weighted random sampling
  • Each variant carries a label (e.g. "control", "challenger") emitted as the ab_variant structured log field on every routed request
  • Zero-weight variants participate with weight 1 (equal distribution)

Per-key and per-user rate limitingโ€‹

  • Extended rate-limit plugin with key_rpm (requests per minute per API key) and user_rpm (requests per minute per user ID)
  • Rate checks execute in order: global โ†’ per-key โ†’ per-user; request rejected at first exceeded limiter with a distinct reason string

Per-key budget controls pluginโ€‹

  • New budget plugin tracks cumulative USD spend per API key in an in-memory store
  • Register at before_request to reject over-limit keys and after_request to record token costs
  • Two instances sharing the same store_id share accumulated spend data
  • spend_limit_usd: 0 (or unset) means unlimited โ€” spend is tracked without rejection
  • Spend data is in-memory; does not survive process restarts

v0.8.0 โ€” 2026-03-10โ€‹

MCP integration (Phase 1)โ€‹

  • Added mcp_servers configuration block for Model Context Protocol tool servers
  • Gateway injects available tools into every chat completion request automatically
  • Full agentic loop: gateway handles all tool_calls rounds internally, returns final text to client
  • Background MCP initialisation on startup with 60-second timeout; gateway is ready immediately
  • MCPInitDone() channel on Gateway struct for sync when needed
  • Per-server allowed_tools whitelist for access control
  • Per-server max_call_depth limit to prevent infinite loops
  • Environment variable interpolation (${VAR}) in MCP server headers
  • 29 new tests covering MCP lifecycle and agentic loop behaviour
  • Bug fixes: nil-safe circuit breaker map init, empty config array handling, streaming fix for empty delta content

v0.7.0 โ€” 2026-03-08โ€‹

  • Comprehensive regression test suite (50+ end-to-end scenarios)
  • Fixed: race condition in concurrent provider health checks
  • Fixed: weight normalisation with single-target load balancer
  • Fixed: least-latency cold-start selecting excluded targets
  • Fixed: cost-optimized panic on missing catalog entry
  • Fixed: admin API pagination off-by-one on last page

v0.6.6 โ€” 2026-03-07โ€‹

  • Refactored providers/core subpackage; providers_list.go split for clarity
  • All Name* constants re-exported from providers top-level package
  • Dashboard XSS hardening (output encoding on all admin UI fields)
  • Added CORS origin validation warning on startup for wildcard origins
  • Removed 19 deprecated provider shim files

v0.6.5 โ€” 2026-03-07 โ€” 5 new providersโ€‹

  • xAI (Grok) โ€” XAI_API_KEY
  • Azure AI Foundry โ€” AZURE_FOUNDRY_API_KEY + AZURE_FOUNDRY_ENDPOINT
  • Hugging Face โ€” HUGGING_FACE_API_KEY
  • Google Vertex AI โ€” VERTEX_AI_PROJECT_ID (ADC)
  • AWS Bedrock (static credentials) โ€” AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY option
  • Provider subpackage refactor: unified factory pattern across all providers
  • Total providers: 19 (at time of release); total models in catalog: 2,531

v0.6.1 โ€” 2026-03-06โ€‹

  • CI GitHub Actions version bumps
  • Go dependency refresh (net/http, crypto)

v0.6.0 โ€” 2026-03-06 โ€” 5 new guardrail pluginsโ€‹

  • pii-redact โ€” detect and redact PII entities before forwarding
  • secret-scan โ€” block requests containing credentials or high-entropy secrets
  • prompt-shield โ€” score and block prompt injection attempts
  • schema-guard โ€” validate model output against JSON Schema (after_request)
  • regex-guard โ€” block requests matching configurable regex patterns
  • Total built-in plugins: 11 (5 new + 6 existing)
  • All new plugins ship disabled (enabled: false) in config.example.yaml

v0.5.0 โ€” 2026-03-03โ€‹

  • Streaming cost tracking โ€” token usage counted during streamed responses
  • Least-latency strategy โ€” P50 rolling latency tracker, routes to fastest provider
  • Cost-optimized strategy โ€” model catalog cost estimation, routes to cheapest provider
  • Per-target retry_on_status codes list (customise which HTTP status codes trigger retry)
  • CLI overhaul using Cobra โ€” ferrogw-cli with admin, models, and keys subcommands

v0.4.5 โ€” 2026-02-28โ€‹

  • Built-in model catalog with 2,531 model entries (pricing, context window, capabilities)
  • Cost calculator (models.Calculate()) used by cost-optimized strategy
  • /v1/models response enriched with catalog metadata (context window, max tokens, cost)
  • GitHub Actions catalog CI check โ€” fails the build if catalog format is invalid