The Tomoul CLI

A single static binary. Run open-weights models on your laptop with an OpenAI-compatible API — chat, embeddings, and speech in one process. Same model catalog as `api.tomoul.ai`. Same engine.

Install

brew install tomoul                              # macOS
scoop install tomoul                             # Windows
curl -fsSL https://tomoul.ai/install.sh | sh     # Linux

Full install matrix: Installation.

The 60-second flow

$ tomoul serve inkubalm-0.4b
  Downloading inkubalm-0.4b (240 MB Q8_K)...
  Loaded. Listening on http://127.0.0.1:8080 (OpenAI-compatible)
 
# In another shell:
$ curl http://127.0.0.1:8080/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model":"inkubalm-0.4b","messages":[{"role":"user","content":"Habari"}]}'

That's the entire UX target. Every other subcommand exists in service of this one flow.

Subcommands

CommandWhat it does
tomoul serve <model>Resolve → download → load → start an OpenAI-compat server.
tomoul run <model> -i "..."One-shot generation, no server. Pipe-friendly.
tomoul pull <model>Download + cache. Don't serve.
tomoul modelsList cached and available catalog models.
tomoul rm <model>Free disk.
tomoul doctorDiagnose GPU, drivers, RAM, network.
tomoul auth loginLog in to bridge to the cloud.
tomoul serve <model> --cloudSame UX, but inference runs on api.tomoul.ai.
tomoul versionPrint the version.

How it compares to Ollama

Ollama is excellent at one thing — a curated LLM catalog with a great install experience. Tomoul CLI matches Ollama on install and wins on three dimensions Ollama doesn't serve:

DimensionOllamaTomoul
Static binary, no runtimeGo runtimeZig static
OpenAI-compat by defaultPartialFull
/v1/audio/transcriptions (Whisper)
/v1/embeddingsPreview
Browser + iOS/Android (via release artifacts)
LLM catalog breadth200+ models~10 launch, growing
Underserved / African models

We don't match Ollama on LLM catalog breadth. We win on multimodal-in-one binary, non-server platforms, and underserved models.

Bridging to the cloud

Add --cloud to any serve or run invocation and the same command routes to api.tomoul.ai instead of your local GPU. Your app keeps calling localhost:8080; the CLI proxies. See --cloud bridge.

Last updated 13 May 2026Edit this page on GitHub