Ferro Labs AI Gateway
Ferro Labs AI Gateway is an open-source, high-performance control plane for AI traffic. It exposes a single OpenAI-compatible API that routes requests across 19 providers and 2,500+ models, enforces safety policies with 11 built-in plugins, and emits full production observability — all without changing your existing client code.
Ready to jump in?
What is it?
Drop the gateway in front of your LLM traffic. Set base_url to the gateway endpoint. That's it — your OpenAI SDK, LangChain, LlamaIndex, or curl commands continue to work unchanged.
docker run -d -p 8080:8080 \
-e OPENAI_API_KEY=sk-... \
-e ANTHROPIC_API_KEY=sk-ant-... \
ghcr.io/ferro-labs/ai-gateway:latest
Then send requests to http://localhost:8080/v1/chat/completions exactly as you would to OpenAI.
Key capabilities
| Capability | Details |
|---|---|
| 19 AI providers | OpenAI, Anthropic, Gemini, Mistral, Groq, Cohere, DeepSeek, Together, Perplexity, Fireworks, AI21, Azure OpenAI, Azure Foundry, xAI, Ollama, Replicate, AWS Bedrock, Vertex AI, Hugging Face |
| 6 routing strategies | Single, Fallback, Weighted, Conditional, Least-Latency, Cost-Optimized |
| 11 built-in plugins | Word filter, max-token, response cache, request logger, rate limit, PII redact, secret scan, prompt shield, schema guard, regex guard |
| MCP integration | Agentic tool-calling loop via Model Context Protocol servers (v0.8.0+) |
| Observability | Prometheus metrics, structured JSON logs with trace IDs, deep /health per provider |
| Resiliency | Per-target circuit breakers, retry with exponential backoff, per-status-code retry config |
| OpenAI compatible | Chat completions, embeddings, images, and model listing — same wire format |
Docs map
Getting started
- Overview — When and why to use the gateway
- Architecture — Component diagrams and data flow
- Request lifecycle — Step-by-step request flow
- Quickstart — Docker, build from source, first request
- Concepts — Core ideas: routing, plugins, observability, MCP
- Configuration — Full config reference
Guides
- Providers — Provider table and supported capabilities
- Provider configuration — Environment variables per provider
- Authentication — API key configuration
- Routing policies — All 6 routing strategies with examples
- Plugins — All 11 plugins with YAML config
- MCP integration — Model Context Protocol tool servers
- Observability — Metrics, logs, health checks
- Rate limiting — IP-level and request-level limiting
- Admin auth — Admin API scopes and tokens
Operations & reference
- Monitoring — Prometheus queries, alerting, dashboards
- Request logging — Persistent log backends
- Server settings — All environment variables
- API reference — Endpoints, request format, admin API
- Security — Data handling and least-privilege configuration
- FAQ — Common questions