API rate limiting

FOR BUSINESS USERS
All existing integrations are fully supported under normal operating conditions, with limits set well above the thresholds typically encountered, even during high-demand periods such as BFCM.
The purpose of rate limiting is not to restrict your regular workflows, but to ensure platform stability and reliability, especially during unusually high traffic. To summarize, you and your teams can continue using NewStore APIs as you always have.

To provide a stable, reliable, and fair platform for all developers, our APIs implement rate limiting. Rate limits prevent individual applications or tenants from consuming excessive resources, help ensure consistent performance across the platform, and protect against both accidental and malicious traffic spikes.

Rate limits are enforced at two levels:

Global limits per tenant, ensuring overall fair usage across the entire platform.
Endpoint-specific limits, protecting critical or high-load endpoints.

In addition to the documented tenant and endpoint rate limits, NewStore employs infrastructure-level safeguards to protect the platform against denial-of-service (DoS/DDoS) attacks and other abnormal traffic patterns. These protections are adaptive and dynamic.

They are not intended to limit normal API usage, and legitimate applications should not encounter them under standard operating conditions. If you believe your application has been affected by these protective measures, contact your NewStore support team.

Global limits for each tenant type

An API rate limit is applied to the total number of API calls that can be made for each NewStore tenant type.

Rate limit	Production tenants	Development tenants
Total API calls per tenant	200 calls per second	100 calls per second

Once the rate limit is exceeded, additional requests are rejected with an HTTP 429 Too Many Requests response. The HTTP 429 response contains a Retry-After header instructing the client when it should retry the request.

Endpoint-specific limits

In addition to the global rate limit applied for each tenant type, a selected set of API endpoints have specific rate limits on the number of calls that can be made to them by a given tenant.

API Endpoint	Production tenants	Development tenants
`/v0/d/fulfill_order`	20 calls per second	8 calls per second
`/v0/orders`	30 calls per second	12 calls per second
`/v0/orders/{id}`	100 calls per second	40 calls per second
`/v0/stock/insights`	90 calls per second	36 calls per second
`/v0/fulfillment_requests/{order_uuid}`	70 calls per second	28 calls per second

Legacy token endpoint

If your integration still relies on the deprecated integration-user authentication, the rate limit that has been in place throughout its lifetime continues to apply.

API Endpoint	All tenants
`/v0/token`	20 calls per minute per user per IP address

We recommend migrating to the new API client authentication as soon as possible. For details, see the migration guide: Migrating from Token API v0 to API client.

Leaky Bucket rate limiting algorithm

Our APIs use the Leaky Bucket algorithm to enforce rate limits. This method helps smooth out bursts of traffic while ensuring requests are processed at a steady and predictable rate.

How does the Leaky Bucket algorithm work?

The Leaky Bucket algorithm can be visualized as a container with a small hole at the bottom:

Incoming requests fill the bucket.
Each API request is added to the bucket as it arrives.
The bucket leaks at a fixed rate.
Requests are processed and leave the bucket at a constant speed, representing the maximum allowed throughput.
Bursts are smoothed out.
Short bursts of requests are tolerated, as long as the bucket has capacity and they can “leak out” over time.
Overflow is rejected.
If requests arrive faster than they can be processed, the bucket fills up. Once full, additional requests “overflow” and are rejected with an HTTP 429 Too Many Requests response.

The Leaky Bucket algorithm accommodates short bursts of requests while enforcing a steady, sustainable processing rate. By adhering to rate limits and following best practices such as implementing retry logic and caching, developers can build integrations that are efficient, reliable, and resilient.

Common causes and handling of rate limiting

Rate limiting can occur under a variety of conditions, but the most common scenario is when a client sends a large number of requests in quick succession. This often happens during activities such as data migrations, bulk imports, or analytical operations. To minimize the risk of being rate limited, you should proactively manage request volumes on the client side.

When rate limiting does occur, integrations should be designed to handle it gracefully:

Monitor for 429 responses and implement a retry mechanism.
Use exponential backoff with jitter (randomized delay) to avoid a “thundering herd” effect when multiple clients retry at once.
Control traffic globally rather than only optimizing individual requests. A client-side rate limiting strategy, such as the token bucket algorithm, can help throttle request volume intelligently. Mature, ready-made implementations of token bucket algorithms are available in most programming languages.

By implementing these strategies, you can ensure that your application remains resilient, reduces unnecessary retries, and makes efficient use of available rate limits.

API rate limiting

Global limits for each tenant type

Endpoint-specific limits

Legacy token endpoint

Leaky Bucket rate limiting algorithm

How does the Leaky Bucket algorithm work?

Common causes and handling of rate limiting

Platform

Stories

Developers

Support