Migrating from OpenAI

Three minutes. Two changes. Most OpenAI codebases run against Tomoul after a base-URL swap and a key swap. Here's the exact diff.

The two-line swap

Python — set the base URL, swap the key, ship.

# Before
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# After
from openai import OpenAI
client = OpenAI(
  api_key=os.environ["TOMOUL_KEY"],
  base_url="https://api.tomoul.ai/v1",
)

Everything else — chat.completions.create, embeddings.create, streaming, function calling, JSON mode — works unchanged.

Picking a Tomoul model

There is no gpt-4o. Map by use case:

OpenAI	Tomoul equivalent	Notes
`gpt-4o-mini`	`microsoft/phi-4`	Strong reasoning, 14B, much cheaper.
`gpt-4o`	`qwen/qwen3-30b-a3b` or `openai/gpt-oss-120b`	Larger context, top-20 quality.
`text-embedding-3-small`	`baai/bge-m3`	Multilingual, cheaper.
`text-embedding-3-large`	`intfloat/e5-mistral-7b-instruct`	Premium quality.
`whisper-1`	`openai/whisper-large-v3`	Same model, in EU.

The live catalog — with pricing, regions, and capability flags — is at GET /v1/models.

Behavioural differences

A handful of quirks to budget for:

No gpt- prefix. Models use provider/model slugs.
seed is deterministic only on Tomoul-exclusive models. Third-party models route on best-effort.
Token counts differ. Different tokenizers — plan for ±15% drift versus your current OpenAI bill on the same prompts.
Rate-limit headers match OpenAI's shape. See Rate limits.
Streaming is identical — Server-Sent Events, data: [DONE] terminator.

Heads-up.

Run your existing test suite against the Tomoul base URL on a feature branch before flipping production. Most teams find one or two spots that hard-coded an OpenAI-only model name.

What we don't do

A short list to plan migrations around:

No Assistants / Threads API. Stateful assistants aren't on the roadmap. Build state in your app, or use Files for blob storage.
No Realtime API. Voice / realtime is a separate roadmap item — not at launch.
No image generation. Use the audio transcription endpoint or wait for Phase 2.

Everything else — chat, embeddings, function calling, JSON mode, tools — works the same. The full list is in SDKs & clients.

← Previous

SDKs

Streaming

Last updated 13 May 2026Edit this page on GitHub