A high-performance, OpenAI-compatible proxy for 29 providers and 2,500+ models โ built in Go for sub-millisecond overhead. 8 routing strategies, 11 safety plugins, circuit breakers, and first-class MCP integration โ self-hosted and production-ready.
Built in Go for low latency and high concurrency. <1ms p99 overhead at 500 RPS in published benchmarks. Designed to be deployed in front of your LLM traffic without changing your existing client code.
One API for OpenAI, Anthropic, Gemini, Mistral, Groq, xAI, Azure, AWS Bedrock, Vertex AI, Cerebras, DeepSeek, and 18 more โ all from a single endpoint.
View all 29 providers โEight strategies: single, fallback, weighted load balancing, conditional, least-latency, cost-optimized, content-based, and A/B testing.
Routing strategies โ6 OSS plugins (word filter, rate limit, budget, cache, logger, max-token) plus 5 Ferro Labs Managed enterprise plugins (PII redact, secret scan, prompt shield, schema guard, regex guard).
Plugin catalogue โFirst-class Model Context Protocol support with streaming. Connect filesystem, database, and custom tool servers to any model via an agentic loop.
MCP guide โPrometheus metrics at /metrics, structured JSON logs with per-request trace IDs, and deep per-provider health checks at /health.
Observability docs โSet base_url to the gateway. Your OpenAI SDK, LangChain, LlamaIndex, or custom client works unchanged. No code modifications needed.
Quickstart โ<1ms p99 overhead at 500 RPS. 100% success rate sustained. ~120MB memory. Go-native with zero runtime dependencies.
See benchmarks โCredentials are registered via environment variables. Enable any provider in seconds โ no code changes, no rebuilds.
Ferro Labs Managed wraps the open-source gateway engine with everything teams need to ship AI products.
Point any OpenAI-compatible client at http://localhost:8080 and the gateway handles provider credentials, routing, retries, and observability for you.
Use the same model name to switch providers transparently, or use conditional routing rules to send different models to different backends.
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="any", # gateway manages provider creds
)
# Route to Anthropic โ no SDK changes needed
response = client.chat.completions.create(
model="claude-3-5-sonnet-20241022",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
# Use cost-optimized โ gateway picks cheapest provider
response2 = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Summarize this."}],
)Open source under Apache 2.0. Self-host in minutes, scale to production, and never depend on a single LLM provider again.