Rate limiting
The gateway supports rate limiting in three layers.
HTTP middleware (per IP)โ
Enable per IP limits with environment variables:
export RATE_LIMIT_RPS=20
export RATE_LIMIT_BURST=40
Requests beyond the limit receive a 429 with an OpenAI style error response.
Plugin rate limiting (per request)โ
Add the rate-limit plugin to apply global token-bucket limits before traffic hits a provider:
plugins:
- name: rate-limit
type: ratelimit
stage: before_request
enabled: true
config:
requests_per_second: 50
burst: 100
Per-key and per-user rate limitingโ
Extend the rate-limit plugin with key_rpm and user_rpm to enforce per-identity limits in addition to (or instead of) the global bucket:
plugins:
- name: rate-limit
type: ratelimit
stage: before_request
enabled: true
config:
requests_per_second: 100 # global limit (optional)
burst: 200
key_rpm: 60 # max 60 req/min per API key
user_rpm: 30 # max 30 req/min per user ID
Rate checks execute in order: global โ per-key โ per-user. The request is rejected at the first exceeded limiter with a distinct reason string so you can distinguish which limit was hit in your logs.
| Config key | Granularity | Source field |
|---|---|---|
requests_per_second + burst | Global (all traffic) | โ |
key_rpm | Per API key | pctx.Metadata["api_key"] |
user_rpm | Per user ID | Request.User |
Requests without an API key skip the per-key check. Requests without a user field skip the per-user check. All three limits are independent โ configure any combination.