Skip to main content

Routing policies

The gateway implements six routing strategies. Set strategy.mode in config.yaml to choose one.

Single

Always routes to the first target. Best for single-provider setups.

strategy:
mode: single

targets:
- virtual_key: openai

Fallback

Tries targets in order. On failure (error or retryable status code), the next target is attempted with exponential backoff. Use this for high-availability setups.

strategy:
mode: fallback

targets:
- virtual_key: openai
retry:
attempts: 3
retry_on_status: [429, 502, 503, 504]
- virtual_key: anthropic
retry:
attempts: 2
- virtual_key: gemini

If all targets fail, the last error is returned to the client.

Weighted load balancing

Distributes requests across targets by weight. Weights are relative — a weight of 70 and 30 sends 70% to the first target and 30% to the second.

strategy:
mode: loadbalance

targets:
- virtual_key: openai
weight: 70
- virtual_key: anthropic
weight: 30

Only targets that support the requested model are candidates for selection.

Conditional

Evaluates rules in order. The first matching rule determines the target. Use model for exact match or model_prefix for prefix match.

strategy:
mode: conditional
conditions:
- key: model
value: gpt-4o
target_key: openai
- key: model
value: gpt-4o-mini
target_key: openai
- key: model_prefix
value: claude
target_key: anthropic
- key: model_prefix
value: gemini
target_key: gemini

targets:
- virtual_key: openai
- virtual_key: anthropic
- virtual_key: gemini

If no rule matches, the request falls through to the first target.

Least-latency

Routes to the target with the lowest P50 latency as measured by a rolling latency tracker. On a cold start (no latency data yet) it picks a target randomly.

strategy:
mode: least-latency

targets:
- virtual_key: openai
- virtual_key: groq
- virtual_key: anthropic

Useful when you have multiple fast providers and want to minimise time-to-first-token automatically.

Cost-optimized

Uses the built-in model catalog (2,500+ entries with pricing data) to estimate the input token cost for each target, then routes to the cheapest compatible provider. Falls back to the first compatible target if cost data is unavailable.

strategy:
mode: cost-optimized

targets:
- virtual_key: openai
- virtual_key: together
- virtual_key: deepseek
- virtual_key: gemini

Cost estimation uses the model name in the request and the catalog's input_cost_per_token field. The target with the lowest estimated cost for the matched model wins.

Combining strategies with circuit breakers

All strategies respect per-target circuit breakers. A target whose circuit breaker is open is excluded from selection.

targets:
- virtual_key: openai
circuit_breaker:
failure_threshold: 5
success_threshold: 2
timeout: "30s"

The circuit breaker opens after failure_threshold consecutive failures, stays open for timeout, then enters half-open state where it allows one probe request. After success_threshold successes it closes again.