Ferro Labs AI Gateway

You're calling OpenAI directly. When they go down, your product goes down. When they raise prices, you scramble. When you need to test a new model, you rewrite integration code. When you want observability, you build it yourself.

Ferro Labs AI Gateway is a single Go binary that sits in front of all your LLM traffic. It exposes an OpenAI-compatible API, routes requests across 29 providers and 2,500+ models, enforces safety policies with 11 plugins, and emits production observability — without changing your existing client code.

Ferro Labs Managed — Managed AI Gateway

Ferro Labs Managed wraps the open-source gateway with multi-tenancy, a dashboard, durable billing, semantic caching, and 5 enterprise security plugins. Join the early access waitlist →

Ready to jump in?

Quickstart — up and running in 30 seconds →

What is it?

Drop the gateway in front of your LLM traffic. Set base_url to the gateway endpoint. That's it — your OpenAI SDK, LangChain, LlamaIndex, or curl commands continue to work unchanged.

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...

docker run -d -p 8080:8080 \
  -e OPENAI_API_KEY \
  -e ANTHROPIC_API_KEY \
  ghcr.io/ferro-labs/ai-gateway:latest

Then send requests to http://localhost:8080/v1/chat/completions exactly as you would to OpenAI.

Key capabilities

Capability	Details
29 AI providers	OpenAI, Anthropic, Gemini, Mistral, Groq, Cohere, DeepSeek, Together, Perplexity, Fireworks, AI21, Azure OpenAI, Azure Foundry, xAI, Ollama, Replicate, AWS Bedrock, Vertex AI, Hugging Face, Cerebras, NVIDIA NIM, Cloudflare Workers AI, Databricks, Novita AI, Qwen, Moonshot AI, SambaNova, DeepInfra, OpenRouter
8 routing strategies	Single, Fallback, Weighted, Conditional, Least-Latency, Cost-Optimized, Content-Based, A/B Test
6 OSS + 5 Ferro Labs Managed plugins	Word filter, max-token, response cache, request logger, rate limit, budget (OSS) — plus PII redact, secret scan, prompt shield, schema guard, regex guard (Ferro Labs Managed)
MCP integration	Agentic tool-calling loop via Model Context Protocol servers with streaming support (v1.0.0)
Observability	Prometheus metrics, structured JSON logs with trace IDs, deep `/health` per provider
Resiliency	Per-target circuit breakers, retry with exponential backoff, per-status-code retry config
OpenAI compatible	Chat completions, embeddings, images, and model listing — same wire format
Built in Go	Single binary, zero runtime dependencies, sub-millisecond p99 overhead at 500 RPS. See benchmarks →

Why Go?

Go's goroutine scheduler, low-GC overhead, and lack of a GIL make it ideal for a latency-sensitive proxy. The gateway ships as a single static binary under 20MB with zero runtime dependencies. Published benchmarks show sub-millisecond p99 overhead at 500 RPS — while Python-based gateways degrade beyond 200 RPS.

Docs map

Getting started

Overview — When and why to use the gateway
Architecture — Component diagrams and data flow
Request lifecycle — Step-by-step request flow
Quickstart — Docker, build from source, first request
Concepts — Core ideas: routing, plugins, observability, MCP
Configuration — Full config reference

Guides

Providers — All 29 providers and supported capabilities
Provider configuration — Environment variables per provider
Authentication — API key configuration
Routing policies — All 8 routing strategies with examples
Plugins — All 11 plugins with YAML config
MCP integration — Model Context Protocol tool servers
Observability — Metrics, logs, health checks
Rate limiting — IP-level and request-level limiting
Admin auth — Admin API scopes and tokens
Use cases — Recipe-style configurations for common scenarios
Why Ferro Labs — Comparison with LiteLLM, Portkey, and other gateways

Performance

Benchmarks — Published Go vs Python gateway performance data

Integrations

Integrations overview — SDKs, frameworks, deployment, and providers
Python SDK quickstart — Install ferrolabsai and send your first request
Python SDK reference — Full API reference
Go SDK — Embed the gateway, write custom plugins
OpenAI-compatible SDKs — Use any OpenAI SDK with zero changes

Deployment

Railway — One-click Railway deploy (SQLite or PostgreSQL)
Render — One-click Render deploy with managed PostgreSQL
Docker Compose — Production-like deployment
Kubernetes — Helm chart and manifests
Fly.io — Deploy to Fly.io

Operations & reference

Monitoring — Prometheus queries, alerting, dashboards
Request logging — Persistent log backends
Server settings — All environment variables
API reference — Endpoints, request format, admin API
Security — Data handling and least-privilege configuration
Troubleshooting — Common issues and fixes
FAQ — Common questions

Ferro Labs Managed (Managed)

Ferro Labs Managed overview — Managed multi-tenant AI gateway
Semantic caching — pgvector-based semantic response cache
OSS vs Ferro Labs Managed — Feature comparison

What is it?​

Key capabilities​

Docs map​

Getting started​

Guides​

Performance​

Integrations​

Deployment​

Operations & reference​

Ferro Labs Managed (Managed)​