Voice Calls

Voicebip connects your AI agent to real phone calls via Nigerian MNO infrastructure. Your agent can receive inbound calls and make outbound calls on provisioned +234 numbers.

Making an Outbound Call

$curl -X POST "https://api.voicebip.com/v1/calls" \
> -H "Authorization: Bearer pk_live_your_key" \
> -H "Content-Type: application/json" \
> -d '{
> "agent_id": "agt_PAEZ_njcfm2kycpjs",
> "to_number": "+2348031234567",
> "from_number": "+2342013500010"
> }'

Response:

1{
2 "call_id": "call_abc123xyz",
3 "status": "initiated",
4 "direction": "outbound",
5 "from_number": "+2342013500010",
6 "to_number": "+2348031234567",
7 "started_at": "2026-04-08T15:30:00Z"
8}

Receiving Inbound Calls

When a call arrives at a provisioned +234 number, Voicebip:

  1. Routes the call to the agent assigned to that number (DID routing)
  2. Plays a greeting via TTS
  3. Optionally plays a recording consent prompt (FR-044)
  4. Starts real-time speech-to-text transcription
  5. Feeds the transcript to the AI for a response
  6. Synthesizes the response via TTS and plays it back

The entire conversation loop runs automatically. You receive events via webhooks.

Voice Pipeline

Caller speaks → audio captured → STT transcribes →
AI generates response → TTS synthesizes speech → audio played to caller →
Webhooks delivered

Supported AI Providers

Set ai_provider on your agent to choose the hosted model:

ValueDescription
openaiOpenAI (default)
geminiGoogle Gemini 2.0 Flash Live — lowest latency
byomYour own model via webhook — see BYOM

List Calls

Retrieve call history for your workspace, optionally filtered by agent:

$curl "https://api.voicebip.com/v1/calls?agent_id=agt_PAEZ_njcfm2kycpjs&page_size=20" \
> -H "Authorization: Bearer pk_live_your_key"

Get a Call

$curl "https://api.voicebip.com/v1/calls/call_abc123xyz" \
> -H "Authorization: Bearer pk_live_your_key"

End a Call

Terminate an active call programmatically:

$curl -X DELETE "https://api.voicebip.com/v1/calls/call_abc123xyz" \
> -H "Authorization: Bearer pk_live_your_key"

Returns 204 No Content with an empty body.

Real-Time Call Monitoring

WebSocket Stream

Connect to the WebSocket endpoint to receive live call events:

1const ws = new WebSocket(
2 `wss://api.voicebip.com/v1/calls/${callId}/stream?token=${apiKey}`
3);
4
5ws.onmessage = (event) => {
6 const data = JSON.parse(event.data);
7 console.log(data.event_type, data.payload);
8 // "call.transcription" — real-time transcript
9 // "call.completed" — call ended
10};

SSE Transcript Stream

GET /v1/calls/{call_id}/transcript/stream

An alternative to WebSocket for environments where persistent bidirectional connections are unavailable or blocked. Uses the browser-native EventSource API (Server-Sent Events): the server pushes frames as the call progresses and the connection closes when the call ends or the client disconnects.

When to use SSE instead of WebSocket:

  • Server-side rendering — SSE works with fetch and readable streams without browser WebSocket support.
  • Proxies or corporate firewalls that block WebSocket upgrades.
  • Streaming from a curl command for quick debugging.
  • The call transcript is read-only and you don’t need to send messages back.

Authentication: pass your API key in the ?token= query parameter or in the Authorization: Bearer header. The ?token= form exists because the browser EventSource constructor does not support custom headers.

1const es = new EventSource(
2 `https://api.voicebip.com/v1/calls/${callId}/transcript/stream?token=${apiKey}`
3);
4
5es.onmessage = (e) => {
6 const data = JSON.parse(e.data);
7 // data.role — "user" (caller) or "agent"
8 // data.text — transcript text for this turn
9 // data.is_final — true when the turn is complete
10 // data.turn_id — stable ID for this turn (correlates with call.transcription events)
11 // data.ts — ISO 8601 UTC timestamp
12 console.log(`[${data.role}] ${data.text}`);
13};
14
15es.addEventListener("error", (e) => {
16 // "error" named events carry { "message": "..." }
17 // Triggered when the NATS subscription could not be established.
18 console.error("stream error", e.data);
19});

Event format: each frame is a standard SSE data: line followed by a blank line. Normal transcript frames carry no named event: type — they reach onmessage. Only terminal error frames use event: error.

data: {"role":"user","text":"I'd like to check my balance.","is_final":true,"turn_id":"turn_a1b2c3d4","ts":"2026-04-14T09:00:07Z"}
data: {"role":"agent","text":"Your balance is ₦45,000, due April 20th.","is_final":true,"turn_id":"turn_e5f6g7h8","ts":"2026-04-14T09:00:09Z"}

Constraints:

  • Read-only — no messages can be sent back through an SSE connection.
  • Returns 404 NOT_FOUND if the call_id is unknown or belongs to a different workspace. Call existence is never leaked across tenants.
  • The stream closes when the call ends or the client disconnects. The EventSource API will attempt to reconnect by default; call es.close() once you receive a call.completed webhook event if you want to suppress reconnects.

Webhook Events

EventDescription
call.initiatedCall started (within 500ms of answer)
call.transcriptionReal-time speech transcript (within 500ms of speech end)
call.completedCall ended with duration, transcript, and billing

Call recording is managed through the consent flow. When a call connects, the platform plays a consent prompt. Recording starts automatically after consent is granted.

“This call may be recorded for quality purposes. Press 1 to continue, or press 2 to opt out of recording.”

  • DTMF “1” — recording starts
  • DTMF “2” — call continues without recording
  • No response within 12 seconds — defaults to no recording

Latency Targets

MetricTarget
call.initiated webhook< 500ms from CHANNEL_ANSWER (P95)
call.transcription delivery< 500ms from speech end (P95)
TTS playback start (BYOM)< 800ms from developer webhook response (P95)
Full hosted turn< 500ms end-to-end (P95)

AI Fallback

If the AI provider is unavailable (rate limit, timeout, error), Voicebip:

  1. Retries once with a 2-second timeout
  2. If retry fails, plays a fallback message: “I’m having trouble right now. Please try again.”
  3. After 5 consecutive failures, the circuit breaker opens for 30 seconds

This ensures the caller never hears silence.