Quickstart
Get your first transcription running in under 5 minutes.
Create an API key
Go to the API Keys page in your dashboard and create a new key. Copy it — you'll only see it once.
Your key looks like: sk_live_12ab34cd...
Add balance
Transcription costs $0.20/hour of audio, billed per second. Add funds on the Billing page. Minimum deposit is $5.00 (~25 hours of audio).
Transcribe
Send a URL and we'll download, validate, and transcribe it. Supports direct file links, Google Drive, Dropbox, and OneDrive sharing links.
resp = requests.post(f"{BASE}/transcribe",headers={"Authorization": API_KEY},json={"url": "https://example.com/podcast.mp3","diarize": True})job = resp.json()print(f"Transcription ID: {job['id']}")
You'll get back a transcription ID immediately. The file downloads and processes in the background.
Optionally enable speaker diarization with diarize: true, and wire up automatic webhooks via webhook_url. If you know how many speakers are on the recording, pass speaker_count along with diarize — providing it noticeably improves accuracy versus letting the model auto-detect. Word-level timestamps (align) are on by default; pass align: false to opt out. When diarize: true, alignment is always enabled (it's required for speaker assignment) regardless of what you pass.
Pass language when you know it. If you omit it, the model auto-detects, which usually works but can misclassify short clips, code-switched audio, or files that start with music. Passing the ISO code (e.g. "language": "en") is both faster and more accurate. See the language parameter notes for details.
Get the result
Poll until the status is completed or failed. Typical transcriptions complete in under a minute.
import timewhile True:resp = requests.get(f"{BASE}/transcribe/{job['id']}",headers={"Authorization": API_KEY})result = resp.json()if result["status"] == "completed":print(result["result"]["full_transcript"])breakelif result["status"] == "failed":print("Error:", result["error"])breaktime.sleep(5)
The response includes the full transcript, timestamped utterances, word-level timestamps (alignment is on by default), and speaker labels when diarization is enabled:
{"id": "txn-456","status": "completed","audio_duration_seconds": 120,"cost_cents": 0.6667,"result": {"full_transcript": "Hello, thanks for joining the call today...","language": "en","duration_seconds": 120,"speakers": ["SPEAKER 1", "SPEAKER 2"],"utterances": [{"start": 0.5,"end": 3.2,"text": "Hello, thanks for joining the call today.","speaker": "SPEAKER 1","confidence": 0.95,"words": [{ "word": "Hello,", "start": 0.5, "end": 0.9, "confidence": 0.98, "speaker": "SPEAKER 1" },{ "word": "thanks", "start": 1.0, "end": 1.3, "confidence": 0.97, "speaker": "SPEAKER 1" }]}]}}
Need to upload your own files?
If you don't have a public URL, you can upload files directly using the file upload flow in the API Reference.