Errors & retries
Tomoul's error responses match the OpenAI error schema. The right retry strategy depends on the status code.
Error shape
{
"error": {
"type": "rate_limit_error",
"message": "Rate limit exceeded for model microsoft/phi-4",
"code": "rpm_exceeded",
"param": null
}
}
Status codes
| Code | Meaning | Retry? |
|---|---|---|
400 | Bad request (malformed JSON, invalid param) | No — fix the request. |
401 | Invalid or missing API key | No — fix the key. |
403 | Key revoked or scope insufficient | No. |
404 | Unknown model or endpoint | No. |
409 | Model not available in pinned region | No — change region or remove pin. |
422 | Validation error (e.g. context too long) | No. |
429 | Rate limited | Yes — honour Retry-After. |
499 | Client cancelled | No. |
500 | Internal error | Yes — bounded backoff. |
503 | We don't return this for capacity. If you see it, something is genuinely broken. | Yes — alert your team. |
504 | Upstream timeout | Yes. |
Retry strategy
Exponential backoff with jitter, starting at 250 ms, capped at 8 s, max 5
attempts. Retry only on 429, 500, 502, 503, 504. The official
OpenAI SDK does this for free — if you're using it, you don't need to write
the loop.
Idempotency
Send Idempotency-Key: <uuid> on POST requests to make them safe to retry.
Tomoul deduplicates inside a 24-hour window. The first response wins;
subsequent retries with the same key get the same response, no extra
billing.
curl https://api.tomoul.ai/v1/chat/completions \
-H "Authorization: Bearer $TOMOUL_KEY" \
-H "Idempotency-Key: $(uuidgen)" \
-H "Content-Type: application/json" \
-d '{ "model": "microsoft/phi-4", "messages": [...] }'
Request IDs
Every response has an X-Request-ID header. Quote it when you contact
support — it's the fastest way for us to find your call in our logs.
Stash X-Request-ID alongside every API call in your application logs.
When a customer reports a bad answer six weeks later, you'll be glad you
did.