Authentication
Tomoul uses bearer-token auth — same shape as OpenAI. Every request to `api.tomoul.ai/v1` needs an `Authorization: Bearer` header.
Get a key
Create a key from the console. Keys are scoped to one organization and one set of models. Tomoul will only ever show the full key once — copy it out, store it in a secrets manager.
Tomoul keys start with tomoul_sk_. If yours starts with sk-, that's an
OpenAI key in the wrong env var.
Make a request
Set TOMOUL_KEY in your environment and call any endpoint with the bearer
header. The example below hits chat completions; the same header works for
embeddings, rerank, and audio.
curl https://api.tomoul.ai/v1/chat/completions \ -H "Authorization: Bearer $TOMOUL_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "tomoul/inkubalm-0.4b", "messages": [{"role": "user", "content": "Habari"}] }'
SDK swap from OpenAI
If you already use the OpenAI SDK, only the base URL and key change.
from openai import OpenAI client = OpenAI( api_key="$TOMOUL_KEY", base_url="https://api.tomoul.ai/v1", ) resp = client.chat.completions.create( model="tomoul/phi-4", messages=[{"role": "user", "content": "Hi"}], )
Rotating keys
Keys can be rotated at any time from the console. Old keys keep working for up to 24 hours after you mark them stale — enough time to redeploy. Hard-revoke is one click if you suspect compromise.
Per-key rate limits
Each key has its own rate-limit envelope, defaulting to your plan's ceiling.
Buckets are per-key, per-region, per-model — a noisy key on
eu-helsinki1 doesn't back-pressure another key on eu-frankfurt1.
Responses include X-RateLimit-Remaining, X-RateLimit-Reset, and on throttle
a 429 with Retry-After — never a 503.