Pricing & metering
Per-token billing. No subscriptions, no minimums, no hidden egress. You pay for what your requests actually consume.
What we meter
- Chat completions — input tokens and output tokens, priced independently.
- Embeddings — input tokens only.
- Rerank — query tokens + document tokens (combined).
- Audio transcription — audio seconds.
Every response carries a usage object with the exact counts that hit your
invoice — reconcile against your own metering at any time.
Where to find rates
Live rates are on the pricing page. Programmatic access via
GET /v1/models — every entry includes a pricing
object with prompt_per_million, completion_per_million, and
cache_read_per_million.
Prompt-cache pricing
When you send the same prefix twice within 5 minutes, the second call hits
our prompt cache. Cached tokens are billed at ~10% of normal input rates
and show up as a separate line on your invoice and in
usage.cache_read_tokens.
You don't opt in. If you want to opt out for a specific call, send
X-Tomoul-Cache: no-store.
Billing cycles
We bill on the first of each month for the prior month's usage. Auto top-up converts a prepaid balance instead — useful if your procurement team prefers credits.
Invoices ship as PDF and JSON. The JSON form is available via the
/v1/usage endpoint for programmatic reconciliation.
Currency
Invoices are in USD at launch. Local-currency display (NGN, KES, ZAR, EGP, GHS, EUR) ships with the local-rails wave (Flutterwave, Paystack, M-Pesa). Both invoice currency and payment method are independent settings in the console.