Ferro Labs AI Gateway

Ferro Labs AI Gateway is an open-source, high-performance control plane for AI traffic. It exposes a single OpenAI-compatible API that routes requests across 19 providers and 2,500+ models, enforces safety policies with 11 built-in plugins, and emits full production observability — all without changing your existing client code.

Ready to jump in?

Quickstart — up and running in 30 seconds →

What is it?

Drop the gateway in front of your LLM traffic. Set base_url to the gateway endpoint. That's it — your OpenAI SDK, LangChain, LlamaIndex, or curl commands continue to work unchanged.

docker run -d -p 8080:8080 \
  -e OPENAI_API_KEY=sk-... \
  -e ANTHROPIC_API_KEY=sk-ant-... \
  ghcr.io/ferro-labs/ai-gateway:latest

Then send requests to http://localhost:8080/v1/chat/completions exactly as you would to OpenAI.

Key capabilities

Capability	Details
19 AI providers	OpenAI, Anthropic, Gemini, Mistral, Groq, Cohere, DeepSeek, Together, Perplexity, Fireworks, AI21, Azure OpenAI, Azure Foundry, xAI, Ollama, Replicate, AWS Bedrock, Vertex AI, Hugging Face
6 routing strategies	Single, Fallback, Weighted, Conditional, Least-Latency, Cost-Optimized
11 built-in plugins	Word filter, max-token, response cache, request logger, rate limit, PII redact, secret scan, prompt shield, schema guard, regex guard
MCP integration	Agentic tool-calling loop via Model Context Protocol servers (v0.8.0+)
Observability	Prometheus metrics, structured JSON logs with trace IDs, deep `/health` per provider
Resiliency	Per-target circuit breakers, retry with exponential backoff, per-status-code retry config
OpenAI compatible	Chat completions, embeddings, images, and model listing — same wire format

Docs map

Getting started

Overview — When and why to use the gateway
Architecture — Component diagrams and data flow
Request lifecycle — Step-by-step request flow
Quickstart — Docker, build from source, first request
Concepts — Core ideas: routing, plugins, observability, MCP
Configuration — Full config reference

Guides

Providers — Provider table and supported capabilities
Provider configuration — Environment variables per provider
Authentication — API key configuration
Routing policies — All 6 routing strategies with examples
Plugins — All 11 plugins with YAML config
MCP integration — Model Context Protocol tool servers
Observability — Metrics, logs, health checks
Rate limiting — IP-level and request-level limiting
Admin auth — Admin API scopes and tokens

Operations & reference

Monitoring — Prometheus queries, alerting, dashboards
Request logging — Persistent log backends
Server settings — All environment variables
API reference — Endpoints, request format, admin API
Security — Data handling and least-privilege configuration
FAQ — Common questions

What is it?​

Key capabilities​

Docs map​

Getting started​

Guides​

Operations & reference​