Models & deployments

Tomoul hosts open-weights models. Every model is identified by a `provider/model-name` slug — the same string you pass as `model` in any API call.

Model slugs

Tomoul model slugs have the form provider/model-name[-variant]:

baai/bge-m3
tomoul/inkubalm-0.4b
microsoft/phi-4
openai/gpt-oss-20b

The provider segment matches the model's origin (BAAI, Microsoft, Qwen) — except for Tomoul exclusives, which use the tomoul/ prefix. Use GET /v1/models to list every slug live.

The catalog

The Day-1 catalog spans three tiers:

TierExamplesWhy it's here
Embeddings & rerankersbaai/bge-m3, baai/bge-reranker-v2-m3, alibaba/gte-qwen2-1.5bUnder-served on OpenRouter; first-class here.
Trending small/mid LLMsmicrosoft/phi-4, openai/gpt-oss-20b, qwen/qwen3-30b-a3bTop-20-adjacent. Sparse providers.
Tomoul exclusivestomoul/inkubalm-0.4b, partner-hosted African-language models1-of-1 hosting. The moat.

Full live catalog: models page.

Shared vs dedicated

  • Shared. Default. Pay per token. Cold-start handled for you. Best for most workloads.
  • Dedicated (Phase 2). Reserved capacity, hourly billing, your own pod. Use when you need predictable latency or quantization-specific routing.

At launch, every endpoint runs on shared infra. Dedicated is on the roadmap — not yet bookable.

Versions and pinning

Each model has a base slug (microsoft/phi-4) and an optional version pin (microsoft/phi-4@2025-12-01). The base slug always points at our recommended version. Pin a version when you need bit-stable outputs across deploys.

curl https://api.tomoul.ai/v1/chat/completions \
  -H "Authorization: Bearer $TOMOUL_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "microsoft/phi-4@2025-12-01",
    "messages": [...]
  }'

Lifecycle and deprecation

We give 90 days' notice before retiring any version. Deprecations show up in two places:

  • GET /v1/models includes a deprecated_at field on the affected version.
  • Calls to a deprecated model return a Tomoul-Deprecated response header.

Wire the header into your monitoring — by the time you notice the 90-day clock has run out, it's already running.

Last updated 13 May 2026Edit this page on GitHub