Rate limits

Limits are enforced on two axes simultaneously: per API key and per source IP. The lower of the two applies.

Per-plan limits

Plan	`POST /v1/extract`	Other endpoints	Concurrent uploads
Free	50 / month	100 / minute	1
Indie	1,000 / month	200 / minute	3
Startup	25,000 / month	1,000 / minute	10
Scale	250,000 / month	5,000 / minute	50
Enterprise	custom	custom	custom

The extract quota is metered monthly on the calendar UTC month. Other endpoints are bucketed in 60-second sliding windows.

Response headers

Every response carries:

X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 847
X-RateLimit-Reset: 1717878000

X-RateLimit-Reset is a Unix timestamp when the window resets.

What 429 looks like

HTTP/1.1 429 Too Many Requests
Retry-After: 12
Content-Type: application/json
 
{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "Per-minute limit exceeded. Retry in 12 seconds.",
    "retry_after": 12
  }
}

All three SDKs honour Retry-After and back off automatically. You only need to handle it explicitly if you disable retries.

Burst tolerance

We use a token-bucket with a 2× burst capacity. For Startup (1,000/min), that means the bucket holds 2,000 tokens — useful for warming up a batch job. Sustained throughput is still capped at the limit.

Self-imposed throttling

For batch workloads, prefer a small concurrency limit (p-limit in Node, asyncio.Semaphore in Python) over hitting 429 and retrying. It reduces tail latency and is friendlier to your bill.

API keys Webhooks