Migrating from OpenAI
Three minutes. Two changes. Most OpenAI codebases run against Tomoul after a base-URL swap and a key swap. Here's the exact diff.
The two-line swap
Python — set the base URL, swap the key, ship.
# Before from openai import OpenAI client = OpenAI(api_key=os.environ["OPENAI_API_KEY"]) # After from openai import OpenAI client = OpenAI( api_key=os.environ["TOMOUL_KEY"], base_url="https://api.tomoul.ai/v1", )
Everything else — chat.completions.create, embeddings.create, streaming,
function calling, JSON mode — works unchanged.
Picking a Tomoul model
There is no gpt-4o. Map by use case:
| OpenAI | Tomoul equivalent | Notes |
|---|---|---|
gpt-4o-mini | microsoft/phi-4 | Strong reasoning, 14B, much cheaper. |
gpt-4o | qwen/qwen3-30b-a3b or openai/gpt-oss-120b | Larger context, top-20 quality. |
text-embedding-3-small | baai/bge-m3 | Multilingual, cheaper. |
text-embedding-3-large | intfloat/e5-mistral-7b-instruct | Premium quality. |
whisper-1 | openai/whisper-large-v3 | Same model, in EU. |
The live catalog — with pricing, regions, and capability flags — is at
GET /v1/models.
Behavioural differences
A handful of quirks to budget for:
- No
gpt-prefix. Models useprovider/modelslugs. seedis deterministic only on Tomoul-exclusive models. Third-party models route on best-effort.- Token counts differ. Different tokenizers — plan for ±15% drift versus your current OpenAI bill on the same prompts.
- Rate-limit headers match OpenAI's shape. See Rate limits.
- Streaming is identical — Server-Sent Events,
data: [DONE]terminator.
Run your existing test suite against the Tomoul base URL on a feature branch before flipping production. Most teams find one or two spots that hard-coded an OpenAI-only model name.
What we don't do
A short list to plan migrations around:
- No Assistants / Threads API. Stateful assistants aren't on the roadmap. Build state in your app, or use Files for blob storage.
- No Realtime API. Voice / realtime is a separate roadmap item — not at launch.
- No image generation. Use the audio transcription endpoint or wait for Phase 2.
Everything else — chat, embeddings, function calling, JSON mode, tools — works the same. The full list is in SDKs & clients.