Rate Limits

Rate limits control how many requests you can make to the Kimchi Inference API within a given time period. These limits help maintain service stability and ensure fair access for all users.

Kimchi enforces two types of rate limits on the Inference API:

Requests per minute (RPM) limits the number of API calls you can make each minute.

Tokens per minute (TPM) limits the total number of input and output tokens processed each minute.

If you exceed either limit, the API returns an HTTP status code 429 Too Many Requests. Your application should implement retry logic with exponential backoff to handle rate limit responses gracefully.

Limits by plan

	Starter	Enterprise
Rate limits	Dynamic	Custom
Billing	Pay-as-you-go	Committed usage

Rate limits are dynamic — they adjust based on current system load and available capacity rather than being fixed numbers. This means your effective throughput may vary, but the system is designed to maximise what each plan can use at any given moment.

📘
Rate limits exist across all plans for infrastructure protection. See kimchi.dev/pricing for plan details and current rates.

Rate limit responses

When you exceed your rate limit, the API returns HTTP 429 with an error in the response body:

{"error": "minimax-m2.7 model is rate limited until 2026-02-05T15:32:41Z"}

The response includes a Retry-After header indicating how many seconds to wait before retrying:

Retry-After: 5

📘
If you have multiple providers configured for a model, Kimchi automatically attempts fallback to other available providers before returning a rate limit error.

Best practices

Respect the Retry-After header — When you receive a 429 response, wait the number of seconds specified in the header before retrying.
Implement exponential backoff — In addition to the Retry-After delay, increase wait times progressively for repeated failures.
Batch requests where possible — Combine multiple small prompts into fewer, larger requests to reduce overhead.
Monitor your usage — Track token consumption in the Kimchi console to anticipate when you might approach limits.
Use appropriate model sizes — Smaller models have higher rate limits. Choose the smallest model that meets your quality requirements for each use case.

Upgrading your plan

When you need higher rate limits, you can upgrade your plan:

Navigate to app.kimchi.dev/settings
Select your desired plan
Click Upgrade and complete the checkout

Rate limit increases take effect immediately after upgrading.