Skip to main content

Ferro Labs AI Gateway

You're calling OpenAI directly. When they go down, your product goes down. When they raise prices, you scramble. When you need to test a new model, you rewrite integration code. When you want observability, you build it yourself.

Ferro Labs AI Gateway is a single Go binary that sits in front of all your LLM traffic. It exposes an OpenAI-compatible API, routes requests across 29 providers and 2,500+ models, enforces safety policies with 11 plugins, and emits production observability โ€” without changing your existing client code.

Ferro Labs Managed โ€” Managed AI Gateway

Ferro Labs Managed wraps the open-source gateway with multi-tenancy, a dashboard, durable billing, semantic caching, and 5 enterprise security plugins. Join the early access waitlist โ†’

What is it?โ€‹

Drop the gateway in front of your LLM traffic. Set base_url to the gateway endpoint. That's it โ€” your OpenAI SDK, LangChain, LlamaIndex, or curl commands continue to work unchanged.

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...

docker run -d -p 8080:8080 \
-e OPENAI_API_KEY \
-e ANTHROPIC_API_KEY \
ghcr.io/ferro-labs/ai-gateway:latest

Then send requests to http://localhost:8080/v1/chat/completions exactly as you would to OpenAI.

Key capabilitiesโ€‹

CapabilityDetails
29 AI providersOpenAI, Anthropic, Gemini, Mistral, Groq, Cohere, DeepSeek, Together, Perplexity, Fireworks, AI21, Azure OpenAI, Azure Foundry, xAI, Ollama, Replicate, AWS Bedrock, Vertex AI, Hugging Face, Cerebras, NVIDIA NIM, Cloudflare Workers AI, Databricks, Novita AI, Qwen, Moonshot AI, SambaNova, DeepInfra, OpenRouter
8 routing strategiesSingle, Fallback, Weighted, Conditional, Least-Latency, Cost-Optimized, Content-Based, A/B Test
6 OSS + 5 Ferro Labs Managed pluginsWord filter, max-token, response cache, request logger, rate limit, budget (OSS) โ€” plus PII redact, secret scan, prompt shield, schema guard, regex guard (Ferro Labs Managed)
MCP integrationAgentic tool-calling loop via Model Context Protocol servers with streaming support (v1.0.0)
ObservabilityPrometheus metrics, structured JSON logs with trace IDs, deep /health per provider
ResiliencyPer-target circuit breakers, retry with exponential backoff, per-status-code retry config
OpenAI compatibleChat completions, embeddings, images, and model listing โ€” same wire format
Built in GoSingle binary, zero runtime dependencies, sub-millisecond p99 overhead at 500 RPS. See benchmarks โ†’
Why Go?

Go's goroutine scheduler, low-GC overhead, and lack of a GIL make it ideal for a latency-sensitive proxy. The gateway ships as a single static binary under 20MB with zero runtime dependencies. Published benchmarks show sub-millisecond p99 overhead at 500 RPS โ€” while Python-based gateways degrade beyond 200 RPS.

Docs mapโ€‹

Getting startedโ€‹

Guidesโ€‹

Performanceโ€‹

  • Benchmarks โ€” Published Go vs Python gateway performance data

Integrationsโ€‹

Deploymentโ€‹

  • Railway โ€” One-click Railway deploy (SQLite or PostgreSQL)
  • Render โ€” One-click Render deploy with managed PostgreSQL
  • Docker Compose โ€” Production-like deployment
  • Kubernetes โ€” Helm chart and manifests
  • Fly.io โ€” Deploy to Fly.io

Operations & referenceโ€‹

Ferro Labs Managed (Managed)โ€‹