A high-performance, OpenAI-compatible proxy for 19 providers and 2,500+ models. Sub-millisecond routing, 11 built-in safety plugins, circuit breakers, and first-class MCP integration — self-hosted and production-ready.
Built in Go for low latency and high concurrency. Designed to be deployed in front of your LLM traffic without changing your existing client code.
One API for OpenAI, Anthropic, Gemini, Mistral, Groq, xAI, Azure, AWS Bedrock, Vertex AI, Hugging Face, and 9 more — all from a single endpoint.
View all providers →Six strategies: single, fallback with exponential backoff, weighted load balancing, conditional routing, least-latency, and cost-optimized.
Routing strategies →PII redaction, secret scanning, prompt shield, word filter, schema guard, regex guard, max-token, response cache, rate limit — 11 plugins total.
Plugin catalogue →First-class Model Context Protocol support. Connect filesystem, database, and custom tool servers to any model via an agentic loop.
MCP guide →Prometheus metrics at /metrics, structured JSON logs with per-request trace IDs, and deep per-provider health checks at /health.
Observability docs →Set base_url to the gateway. Your OpenAI SDK, LangChain, LlamaIndex, or custom client works unchanged. No code modifications needed.
Quickstart →Credentials are registered via environment variables. Enable any provider in seconds — no code changes, no rebuilds.
Point any OpenAI-compatible client at http://localhost:8080 and the gateway handles provider credentials, routing, retries, and observability for you.
Use the same model name to switch providers transparently, or use conditional routing rules to send different models to different backends.
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="any", # gateway manages provider creds
)
# Route to Anthropic — no SDK changes needed
response = client.chat.completions.create(
model="claude-3-5-sonnet-20241022",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
# Use cost-optimized — gateway picks cheapest provider
response2 = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Summarize this."}],
)Open source under Apache 2.0. Self-host in minutes, scale to production, and never depend on a single LLM provider again.