Audio transcription
`POST /v1/audio/transcriptions` — speech to text. OpenAI-Whisper-compatible. Multilingual, with explicit support for Swahili, Hausa, Yoruba, Amharic.
Endpoint
POST https://api.tomoul.ai/v1/audio/transcriptions
Multipart form upload.
Request body (multipart)
| Field | Type | Notes |
|---|---|---|
file | binary | mp3, wav, flac, m4a, ogg, webm. Max 25 MB. |
model | string | e.g. openai/whisper-large-v3 |
language | string (opt.) | ISO 639-1 code, e.g. sw, ha, yo |
response_format | enum (opt.) | json (default), text, srt, vtt, verbose_json |
temperature | float (opt.) | 0–1 |
prompt | string (opt.) | Vocabulary hints |
curl https://api.tomoul.ai/v1/audio/transcriptions \
-H "Authorization: Bearer $TOMOUL_KEY" \
-F file=@meeting.mp3 \
-F model=openai/whisper-large-v3 \
-F language=sw
Response
Default json:
{ "text": "Habari za asubuhi, karibu Tomoul." }
Models
| Slug | Notes |
|---|---|
openai/whisper-large-v3 | Default. Strong multilingual. |
openai/whisper-large-v3-turbo | 4× faster, slightly lower quality. |
nvidia/parakeet-tdt-1.1b | English-only, fastest. |
Language hints
Whisper auto-detects, but the language hint cuts cost and improves
accuracy on short clips. Day-1 supported codes include en, fr, ar,
pt, sw, ha, yo, am, zu, xh, so.
Output formats
json— default. Just{ "text": "..." }.text— raw text, no JSON wrapping.srt/vtt— subtitle files with timestamps.verbose_json— word-level timestamps and confidence scores.
Last updated 13 May 2026Edit this page on GitHub