--cloud bridge
Same CLI, same model slugs, same OpenAI-compat API — but routes to `api.tomoul.ai` instead of your local GPU. The local-to-cloud handoff is one flag.
--cloud is in beta. The exact flag and config shape may shift before GA.
Treat this page as the spec we're targeting, not a frozen contract.
Why use it
You wrote your app against tomoul serve on your laptop. Production needs
more GPU than your laptop has. With --cloud, the same localhost:8080
endpoint your app already calls becomes a proxy to api.tomoul.ai. Zero
code changes in the application.
Setup
tomoul auth login # opens browser, stores key in OS keychain
tomoul serve microsoft/phi-4 --cloud
# Listening on http://127.0.0.1:8080 (cloud bridge → api.tomoul.ai)
Usage
Identical to local serve. Your app keeps calling
http://127.0.0.1:8080/v1/.... The CLI translates requests to
https://api.tomoul.ai/v1/..., attaches your key, and streams the response
back over your local socket.
Auth
Use tomoul auth login (OAuth, stored in OS keychain) or
set TOMOUL_KEY in the environment.
Local fallback
Add --cloud=auto to prefer local when the model is cached and the GPU has
headroom, fall back to cloud otherwise. Useful for dev laptops that flip
between offline and online.
tomoul serve microsoft/phi-4 --cloud=auto
Pure --cloud always routes to the cloud; --cloud=auto is the hybrid.