nav_title	title
Set up rate limiting for different LLM providers	Set up rate limiting for different LLM providers

About AI Rate Limiting with {{site.base_gateway}}

The AI Rate Limiting Advanced plugin enables rate limiting for AI providers used by various LLM (Large Language Model) plugins. This plugin extends the functionality of the Rate Limiting Advanced plugin.

Prerequisite

You have the AI Proxy plugin configured.

Add your service and route on Kong

After installing and starting {{site.base_gateway}}, use the Admin API on port 8001 to add a new service and route. In this example, {{site.base_gateway}} will reverse proxy every incoming request with the specified incoming host to the associated upstream URL. You can implement very complex routing mechanisms beyond simple host matching.

curl -i -X POST \
  --url http://localhost:8001/services/ \
  --data 'name=example-service' \
  --data 'url=http://example.com'

curl -i -X POST \
  --url http://localhost:8001/services/example-service/routes \
  --data 'hosts[]=example.com' \

Add the AI Rate Limiting Advanced plugin to the service

Protect your LLM service with rate limiting. It will analyze query costs and token response to provide an enterprise-grade rate limiting strategy.

{% if_version lte:3.9.x %}

curl -i -X POST http://localhost:8001/services/example-service/plugins \
  --data 'name=ai-rate-limiting-advanced' \
  --data 'config.llm_providers[1].name=openai' \
  --data 'config.llm_providers[1].limit=100' \
  --data 'config.llm_providers[1].window_size=3600'

{% endif_version %} {% if_version gte:3.10.x %}

curl -i -X POST http://localhost:8001/services/example-service/plugins \
  --data 'name=ai-rate-limiting-advanced' \
  --data 'config.llm_providers[1].name=openai' \
  --data 'config.llm_providers[1].limit[]=100' \
  --data 'config.llm_providers[1].limit[]=10000' \
  --data 'config.llm_providers[1].window_size[]=60 \
  --data 'config.llm_providers[1].window_size[]=3600'

{% endif_version %}

The AI Rate Limiting Advanced plugin supports threes rate limiting strategies. The default strategy will estimate cost on queries by counting the total token value returned in the LLM responses.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

_index.md

_index.md

About AI Rate Limiting with {{site.base_gateway}}

Prerequisite

Add your service and route on Kong

Add the AI Rate Limiting Advanced plugin to the service

Files

_index.md

Latest commit

History

_index.md

File metadata and controls

About AI Rate Limiting with {{site.base_gateway}}

Prerequisite

Add your service and route on Kong

Add the AI Rate Limiting Advanced plugin to the service