Skip to main content

Architecture

This page gives a practical architecture view similar to modern AI gateway docs: control points, traffic flow, and scaling boundaries.

High-level architecture

Request path

Core components

API compatibility layer

  • Accepts OpenAI-compatible request/response formats.
  • Lets application code remain stable while backend models/providers change.

Routing engine

  • Supports single, fallback, weighted, and conditional strategies.
  • Separates model selection policy from app logic.

Policy and controls

  • Handles authentication, rate limiting, and plugin-level checks.
  • Applies policy consistently for every provider.

Observability

  • Emits logs for every request and upstream outcome.
  • Exposes metrics for latency, error rate, and provider health.

Deployment recommendations

  • Start with one gateway instance behind a reverse proxy.
  • Move to multiple gateway replicas with shared configuration for HA.
  • Add provider-level fallback before introducing weighted distribution.
  • Use metrics + request logs to tune routing rules over time.