Ferro Labs AI Gateway
You're calling OpenAI directly. When they go down, your product goes down. When they raise prices, you scramble. When you need to test a new model, you rewrite integration code. When you want observability, you build it yourself.
Ferro Labs AI Gateway is a single Go binary that sits in front of all your LLM traffic. It exposes an OpenAI-compatible API, routes requests across 29 providers and 2,500+ models, enforces safety policies with 11 plugins, and emits production observability โ without changing your existing client code.
Ferro Labs Managed wraps the open-source gateway with multi-tenancy, a dashboard, durable billing, semantic caching, and 5 enterprise security plugins. Join the early access waitlist โ
What is it?โ
Drop the gateway in front of your LLM traffic. Set base_url to the gateway endpoint. That's it โ your OpenAI SDK, LangChain, LlamaIndex, or curl commands continue to work unchanged.
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
docker run -d -p 8080:8080 \
-e OPENAI_API_KEY \
-e ANTHROPIC_API_KEY \
ghcr.io/ferro-labs/ai-gateway:latest
Then send requests to http://localhost:8080/v1/chat/completions exactly as you would to OpenAI.
Key capabilitiesโ
| Capability | Details |
|---|---|
| 29 AI providers | OpenAI, Anthropic, Gemini, Mistral, Groq, Cohere, DeepSeek, Together, Perplexity, Fireworks, AI21, Azure OpenAI, Azure Foundry, xAI, Ollama, Replicate, AWS Bedrock, Vertex AI, Hugging Face, Cerebras, NVIDIA NIM, Cloudflare Workers AI, Databricks, Novita AI, Qwen, Moonshot AI, SambaNova, DeepInfra, OpenRouter |
| 8 routing strategies | Single, Fallback, Weighted, Conditional, Least-Latency, Cost-Optimized, Content-Based, A/B Test |
| 6 OSS + 5 Ferro Labs Managed plugins | Word filter, max-token, response cache, request logger, rate limit, budget (OSS) โ plus PII redact, secret scan, prompt shield, schema guard, regex guard (Ferro Labs Managed) |
| MCP integration | Agentic tool-calling loop via Model Context Protocol servers with streaming support (v1.0.0) |
| Observability | Prometheus metrics, structured JSON logs with trace IDs, deep /health per provider |
| Resiliency | Per-target circuit breakers, retry with exponential backoff, per-status-code retry config |
| OpenAI compatible | Chat completions, embeddings, images, and model listing โ same wire format |
| Built in Go | Single binary, zero runtime dependencies, sub-millisecond p99 overhead at 500 RPS. See benchmarks โ |
Go's goroutine scheduler, low-GC overhead, and lack of a GIL make it ideal for a latency-sensitive proxy. The gateway ships as a single static binary under 20MB with zero runtime dependencies. Published benchmarks show sub-millisecond p99 overhead at 500 RPS โ while Python-based gateways degrade beyond 200 RPS.
Docs mapโ
Getting startedโ
- Overview โ When and why to use the gateway
- Architecture โ Component diagrams and data flow
- Request lifecycle โ Step-by-step request flow
- Quickstart โ Docker, build from source, first request
- Concepts โ Core ideas: routing, plugins, observability, MCP
- Configuration โ Full config reference
Guidesโ
- Providers โ All 29 providers and supported capabilities
- Provider configuration โ Environment variables per provider
- Authentication โ API key configuration
- Routing policies โ All 8 routing strategies with examples
- Plugins โ All 11 plugins with YAML config
- MCP integration โ Model Context Protocol tool servers
- Observability โ Metrics, logs, health checks
- Rate limiting โ IP-level and request-level limiting
- Admin auth โ Admin API scopes and tokens
- Use cases โ Recipe-style configurations for common scenarios
- Why Ferro Labs โ Comparison with LiteLLM, Portkey, and other gateways
Performanceโ
- Benchmarks โ Published Go vs Python gateway performance data
Integrationsโ
- Integrations overview โ SDKs, frameworks, deployment, and providers
- Python SDK quickstart โ Install
ferrolabsaiand send your first request - Python SDK reference โ Full API reference
- Go SDK โ Embed the gateway, write custom plugins
- OpenAI-compatible SDKs โ Use any OpenAI SDK with zero changes
Deploymentโ
- Railway โ One-click Railway deploy (SQLite or PostgreSQL)
- Render โ One-click Render deploy with managed PostgreSQL
- Docker Compose โ Production-like deployment
- Kubernetes โ Helm chart and manifests
- Fly.io โ Deploy to Fly.io
Operations & referenceโ
- Monitoring โ Prometheus queries, alerting, dashboards
- Request logging โ Persistent log backends
- Server settings โ All environment variables
- API reference โ Endpoints, request format, admin API
- Security โ Data handling and least-privilege configuration
- Troubleshooting โ Common issues and fixes
- FAQ โ Common questions
Ferro Labs Managed (Managed)โ
- Ferro Labs Managed overview โ Managed multi-tenant AI gateway
- Semantic caching โ pgvector-based semantic response cache
- OSS vs Ferro Labs Managed โ Feature comparison