Rate limiting
The gateway supports rate limiting in two layers.
HTTP middleware (per IP)
Enable per IP limits with environment variables:
export RATE_LIMIT_RPS=20
export RATE_LIMIT_BURST=40
Requests beyond the limit receive a 429 with an OpenAI style error response.
Plugin rate limiting (per request)
Add the rate-limit plugin to reject traffic before it hits a provider:
plugins:
- name: rate-limit
type: ratelimit
stage: before_request
enabled: true
config:
requests_per_second: 50
burst: 100