Skip to main content
Here’s what happens between your API request and the JSON you get back.

Lifecycle

1

You send a request

POST /v1/calls with a phone number (to), a prompt telling the AI what to do, optional context with background info, and a returns schema describing the data you want out of the call.
2

CallingBox dials the number

The call goes out right away. If the line is busy, goes to voicemail, or nobody picks up, CallingBox catches that and updates the status. You’re only billed for time someone actually answers.
3

The AI has the conversation

Once the person picks up, a real-time voice loop kicks in:
  • Speech-to-text transcribes what the person says
  • The language model generates a response using your prompt and context
  • Text-to-speech speaks the reply
This continues until the goal is reached or the call wraps up on its own.
4

Data gets extracted

After the call ends, if you set a returns schema, CallingBox reads through the transcript and pulls out the fields you asked for. If you asked for {"confirmed": "boolean"}, you’d get back something like {"confirmed": true}.
5

You get the result

Two options:
  • Poll GET /v1/calls/{id} and read the result field
  • If you passed a webhook_url, CallingBox POSTs the full call object to your server when processing finishes

When nobody answers

CallingBox detects voicemail, busy signals, and no-answer. The status updates to no_answer, busy, or failed. Voicemail is reported as status: no_answer with answered_by: machine so the dashboard labels it “Voicemail”. By default CallingBox hangs up as soon as voicemail is detected; pass voicemail_action: "leave_message" on the create-call request to keep the line open. No extraction runs, no charge. More detail in Statuses and failures.

Calls

All the request fields and how to use them.

Structured results

Define a returns schema and get typed JSON back.

Delivery

Polling vs webhooks for getting results.