The Tomoul CLI
A single static binary. Run open-weights models on your laptop with an OpenAI-compatible API — chat, embeddings, and speech in one process. Same model catalog as `api.tomoul.ai`. Same engine.
Install
brew install tomoul # macOS
scoop install tomoul # Windows
curl -fsSL https://tomoul.ai/install.sh | sh # Linux
Full install matrix: Installation.
The 60-second flow
$ tomoul serve inkubalm-0.4b
⬇ Downloading inkubalm-0.4b (240 MB Q8_K)...
✓ Loaded. Listening on http://127.0.0.1:8080 (OpenAI-compatible)
# In another shell:
$ curl http://127.0.0.1:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"inkubalm-0.4b","messages":[{"role":"user","content":"Habari"}]}'
That's the entire UX target. Every other subcommand exists in service of this one flow.
Subcommands
| Command | What it does |
|---|---|
tomoul serve <model> | Resolve → download → load → start an OpenAI-compat server. |
tomoul run <model> -i "..." | One-shot generation, no server. Pipe-friendly. |
tomoul pull <model> | Download + cache. Don't serve. |
tomoul models | List cached and available catalog models. |
tomoul rm <model> | Free disk. |
tomoul doctor | Diagnose GPU, drivers, RAM, network. |
tomoul auth login | Log in to bridge to the cloud. |
tomoul serve <model> --cloud | Same UX, but inference runs on api.tomoul.ai. |
tomoul version | Print the version. |
How it compares to Ollama
Ollama is excellent at one thing — a curated LLM catalog with a great install experience. Tomoul CLI matches Ollama on install and wins on three dimensions Ollama doesn't serve:
| Dimension | Ollama | Tomoul |
|---|---|---|
| Static binary, no runtime | Go runtime | Zig static |
| OpenAI-compat by default | Partial | Full |
/v1/audio/transcriptions (Whisper) | — | ✓ |
/v1/embeddings | Preview | ✓ |
| Browser + iOS/Android (via release artifacts) | — | ✓ |
| LLM catalog breadth | 200+ models | ~10 launch, growing |
| Underserved / African models | — | ✓ |
We don't match Ollama on LLM catalog breadth. We win on multimodal-in-one binary, non-server platforms, and underserved models.
Bridging to the cloud
Add --cloud to any serve or run invocation and the same command routes
to api.tomoul.ai instead of your local GPU. Your app keeps calling
localhost:8080; the CLI proxies. See --cloud bridge.