Rate limits
Limits are enforced on two axes simultaneously: per API key and per source IP. The lower of the two applies.
Per-plan limits
| Plan | POST /v1/extract | Other endpoints | Concurrent uploads |
|---|---|---|---|
| Free | 50 / month | 100 / minute | 1 |
| Indie | 1,000 / month | 200 / minute | 3 |
| Startup | 25,000 / month | 1,000 / minute | 10 |
| Scale | 250,000 / month | 5,000 / minute | 50 |
| Enterprise | custom | custom | custom |
The extract quota is metered monthly on the calendar UTC month.
Other endpoints are bucketed in 60-second sliding windows.
Response headers
Every response carries:
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 847
X-RateLimit-Reset: 1717878000X-RateLimit-Reset is a Unix timestamp when the window resets.
What 429 looks like
HTTP/1.1 429 Too Many Requests
Retry-After: 12
Content-Type: application/json
{
"error": {
"code": "rate_limit_exceeded",
"message": "Per-minute limit exceeded. Retry in 12 seconds.",
"retry_after": 12
}
}All three SDKs honour Retry-After and back off automatically. You only
need to handle it explicitly if you disable retries.
Burst tolerance
We use a token-bucket with a 2× burst capacity. For Startup (1,000/min), that means the bucket holds 2,000 tokens — useful for warming up a batch job. Sustained throughput is still capped at the limit.
Self-imposed throttling
For batch workloads, prefer a small concurrency limit (p-limit in Node,
asyncio.Semaphore in Python) over hitting 429 and retrying. It reduces
tail latency and is friendlier to your bill.