⚙ Open Source · Apache 2.0 · v1.0.0

One Gateway for
Every AI Model

A high-performance, OpenAI-compatible proxy for 29 providers and 2,500+ models — built in Go for sub-millisecond overhead. 8 routing strategies, 11 safety plugins, circuit breakers, and first-class MCP integration — self-hosted and production-ready.

Get Started → Star on GitHub

29AI Providers

2,500+Models

11Plugins

8Routing Strategies

Start in 30 seconds

$docker run -p 8080:8080 -e OPENAI_API_KEY=sk-... ghcr.io/ferro-labs/ai-gateway:latest

Capabilities

Everything the production gateway needs

Built in Go for low latency and high concurrency. <1ms p99 overhead at 500 RPS in published benchmarks. Designed to be deployed in front of your LLM traffic without changing your existing client code.

Universal Provider Access

One API for OpenAI, Anthropic, Gemini, Mistral, Groq, xAI, Azure, AWS Bedrock, Vertex AI, Cerebras, DeepSeek, and 18 more — all from a single endpoint.

View all 29 providers →

Intelligent Routing

Eight strategies: single, fallback, weighted load balancing, conditional, least-latency, cost-optimized, content-based, and A/B testing.

Routing strategies →

Built-in Safety Plugins

6 OSS plugins (word filter, rate limit, budget, cache, logger, max-token) plus 5 Ferro Labs Managed enterprise plugins (PII redact, secret scan, prompt shield, schema guard, regex guard).

Plugin catalogue →

MCP Integration

First-class Model Context Protocol support with streaming. Connect filesystem, database, and custom tool servers to any model via an agentic loop.

MCP guide →

Production Observability

Prometheus metrics at /metrics, structured JSON logs with per-request trace IDs, and deep per-provider health checks at /health.

Observability docs →

Drop-in OpenAI Compatible

Set base_url to the gateway. Your OpenAI SDK, LangChain, LlamaIndex, or custom client works unchanged. No code modifications needed.

Quickstart →

Published Benchmarks

<1ms p99 overhead at 500 RPS. 100% success rate sustained. ~120MB memory. Go-native with zero runtime dependencies.

See benchmarks →

Provider Support

29 providers out of the box

Credentials are registered via environment variables. Enable any provider in seconds — no code changes, no rebuilds.

OpenAIAnthropicGoogle GeminiMistralGroqCohereDeepSeekTogether AIPerplexityFireworks AIAI21Azure OpenAIAzure FoundryOllamaAWS BedrockReplicateVertex AIHugging FacexAI GrokCerebrasNVIDIA NIMCloudflare Workers AIDatabricksNovita AIQwenMoonshot AISambaNovaDeepInfraOpenRouter

Managed Platform

Need multi-tenant, managed, and fully hosted?

Ferro Labs Managed wraps the open-source gateway engine with everything teams need to ship AI products.

✅ Isolated per-tenant gateway instances — each customer gets their own gateway with separate keys, limits, and logging

✅ Dashboard + billing + analytics — usage tracking, cost attribution, Stripe integration, and team management

✅ SSO + audit logs + enterprise plugins — SAML, PII redaction, prompt shield, secret scanning, and schema validation

Join Ferro Labs Managed Waitlist →

Up and running in minutes

Point any OpenAI-compatible client at http://localhost:8080 and the gateway handles provider credentials, routing, retries, and observability for you.

Use the same model name to switch providers transparently, or use conditional routing rules to send different models to different backends.

→ Quickstart guide → Configuration reference → Routing strategies → Plugin catalogue → MCP integration → Benchmarks

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="any",          # gateway manages provider creds
)

# Route to Anthropic — no SDK changes needed
response = client.chat.completions.create(
    model="claude-3-5-sonnet-20241022",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

# Use cost-optimized — gateway picks cheapest provider
response2 = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize this."}],
)

Start routing AI traffic today

Open source under Apache 2.0. Self-host in minutes, scale to production, and never depend on a single LLM provider again.

Read the Docs → Star on GitHub