Live Streaming (SSE & WebSocket)

Voicebip exposes three live-streaming endpoints. They serve different use cases:

EndpointTransportUse case
/v1/calls/{id}/transcript/streamSSEDashboard transcript view; lightweight monitoring tools
/v1/conversations/{id}/streamSSELive messaging dashboard; operator handoff UI
/v1/calls/{id}/streamWebSocketAnything that needs sub-100ms event delivery or full call lifecycle in one socket

Authentication

All endpoints accept Authorization: Bearer pk_live_xxx as a request header — preferred for server-to-server consumers.

For browser clients that cannot set custom headers, the ?token=pk_live_xxx query parameter is supported on two of the three paths:

EndpointSupports ?token=
GET /v1/calls/{id}/transcript/stream (SSE)Yes — required for browser EventSource
GET /v1/calls/{id}/stream (WebSocket)Yes — required for browser WebSocket
GET /v1/conversations/{id}/stream (SSE)No — use the Authorization header

Query-parameter tokens leak via browser history, Referer headers, and proxy access logs. Use them only on the two streaming paths that require it. The gateway limits ?token= support to those paths to prevent URL-embedded key authentication on the broader API surface.

Call Transcript (SSE)

Stream the per-turn transcript of a live call as it’s spoken. Each frame is one user or agent turn.

1const es = new EventSource(
2 `https://api.voicebip.com/v1/calls/${callId}/transcript/stream?token=${apiKey}`
3);
4
5es.onmessage = (frame) => {
6 const turn = JSON.parse(frame.data);
7 // turn.role: "user" | "agent"
8 // turn.text: "What time do you open?"
9 // turn.is_final: true (partial turns also arrive with is_final: false)
10 // turn.turn_id: "trn_xyz"
11 // turn.ts: "2026-05-15T14:32:01.123Z"
12};
13
14es.addEventListener('error', (frame) => {
15 // event: error frames carry a terminal {error_code, message}
16 console.error(JSON.parse(frame.data));
17});

Frame format

FieldTypeNotes
role"user" | "agent"Who spoke this turn
textstringTranscript text
is_finalbooleanfalse for partial (in-progress) transcripts; true for finalised turns
turn_idstringStable ID — partials and the final for the same turn share this
tsstringISO 8601 UTC timestamp

The stream closes when the call ends or the client disconnects. Reconnect logic is your responsibility — the platform does not buffer beyond the in-flight turn.

Conversation Stream (SSE)

Stream messaging activity (SMS or WhatsApp) for a single conversation.

This endpoint requires the Authorization header and is intended for server-to-server consumers (it does not support the ?token= query parameter). Use a server-side SSE client library or a custom fetch-based reader:

1// Node.js / server-side example using fetch (streaming)
2const response = await fetch(
3 `https://api.voicebip.com/v1/conversations/${conversationId}/stream`,
4 { headers: { Authorization: `Bearer ${apiKey}` } }
5);
6
7const reader = response.body.getReader();
8const decoder = new TextDecoder();
9
10while (true) {
11 const { done, value } = await reader.read();
12 if (done) break;
13 const text = decoder.decode(value);
14 // Parse SSE frames from `text` — each data line is one event
15 for (const line of text.split("\n")) {
16 if (line.startsWith("data: ")) {
17 const event = JSON.parse(line.slice(6));
18 // event.event_type: "message.received" | "message.delivered" | "message.read" | "conversation.mode_changed"
19 // event.conversation_id: "conv_abc"
20 // event.timestamp: "2026-05-15T14:32:01.123Z"
21 // event.payload: event-specific fields nested verbatim from the NATS envelope
22 console.log(event);
23 }
24 }
25}

The payload shape varies by event_typemessage.* events carry { message_id, direction, text, status, ... }, while conversation.mode_changed carries { from_mode, to_mode, actor_user_id }. The dashboard dispatches on event.event_type rather than discriminating by URL path. The same events surface (with different transport guarantees) through the webhook system — see Webhooks.

Call Event Stream (WebSocket)

Bidirectional channel for any consumer that needs low-latency call events: barge-ins, idle silences, quality degradation, and (eventually) raw audio frames for in-browser monitoring.

1const ws = new WebSocket(
2 `wss://api.voicebip.com/v1/calls/${callId}/stream?token=${apiKey}`
3);
4
5ws.onmessage = (frame) => {
6 const event = JSON.parse(frame.data);
7 // event.event_type: "call.transcription" | "call.quality_degraded" | "call.completed" | ...
8 // event.payload: channel-specific data
9};

The platform issues a 101 Switching Protocols upgrade on a successful auth. After upgrade, frames are JSON envelopes matching the webhook payload format.

call.barge_in events are emitted for phone calls (ESL and SIP transports) but not for WebRTC browser calls. If you are watching a WebRTC call via this stream, barge-in will not appear in the event sequence.

Choosing Between SSE and WebSocket

ConcernSSEWebSocket
Browser supportBuilt-in (EventSource)Built-in (WebSocket)
DirectionServer → client onlyBidirectional
Auto-reconnectNativeManual
Through corporate proxiesReliableFrequently blocked
Server-push latency~50ms~10ms

If you’re building a dashboard view that only consumes events, choose SSE — fewer moving parts. If you’re building real-time control surfaces (e.g. injecting DTMF mid-call), choose WebSocket.

Operational Notes

  • Connection limits. The gateway disables its 10-second write timeout on these three paths only, so long-lived streams work. Other /v1 routes will kill SSE connections after 10s — don’t try to repurpose them.
  • Workspace isolation. Streams are RLS-scoped — you can only subscribe to call_id / conversation_id values owned by your workspace. Cross-workspace IDs return 404, not 401, to prevent enumeration.
  • Heartbeats. SSE streams send a comment line every 30 seconds. Most HTTP libraries handle this transparently; if you’re parsing the wire format manually, ignore lines starting with :.
  • Backpressure. Each stream has a bounded in-process frame buffer (64 frames). If your consumer falls behind, the server drops the next frame rather than blocking the NATS subscriber. Treat the stream as best-effort live view — for authoritative event delivery use webhooks, which retry independently for 24 hours.