Live Streaming (SSE & WebSocket) | Voicebip

Voicebip exposes three live-streaming endpoints. They serve different use cases:

Endpoint	Transport	Use case
`/v1/calls/{id}/transcript/stream`	SSE	Dashboard transcript view; lightweight monitoring tools
`/v1/conversations/{id}/stream`	SSE	Live messaging dashboard; operator handoff UI
`/v1/calls/{id}/stream`	WebSocket	Anything that needs sub-100ms event delivery or full call lifecycle in one socket

Authentication

All endpoints accept Authorization: Bearer pk_live_xxx as a request header — preferred for server-to-server consumers.

For browser clients that cannot set custom headers, the ?token=pk_live_xxx query parameter is supported on two of the three paths:

Endpoint	Supports `?token=`
`GET /v1/calls/{id}/transcript/stream` (SSE)	Yes — required for browser `EventSource`
`GET /v1/calls/{id}/stream` (WebSocket)	Yes — required for browser `WebSocket`
`GET /v1/conversations/{id}/stream` (SSE)	No — use the `Authorization` header

Query-parameter tokens leak via browser history, Referer headers, and proxy access logs. Use them only on the two streaming paths that require it. The gateway limits ?token= support to those paths to prevent URL-embedded key authentication on the broader API surface.

Call Transcript (SSE)

Stream the per-turn transcript of a live call as it’s spoken. Each frame is one user or agent turn.

1 const es = new EventSource(
2   `https://api.voicebip.com/v1/calls/${callId}/transcript/stream?token=${apiKey}`
3 );
4 
5 es.onmessage = (frame) => {
6   const turn = JSON.parse(frame.data);
7   // turn.role:    "user" | "agent"
8   // turn.text:    "What time do you open?"
9   // turn.is_final: true (partial turns also arrive with is_final: false)
10   // turn.turn_id: "trn_xyz"
11   // turn.ts:      "2026-05-15T14:32:01.123Z"
12 };
13 
14 es.addEventListener('error', (frame) => {
15   // event: error frames carry a terminal {error_code, message}
16   console.error(JSON.parse(frame.data));
17 });

Frame format

Field	Type	Notes
`role`	`"user" \| "agent"`	Who spoke this turn
`text`	`string`	Transcript text
`is_final`	`boolean`	`false` for partial (in-progress) transcripts; `true` for finalised turns
`turn_id`	`string`	Stable ID — partials and the final for the same turn share this
`ts`	`string`	ISO 8601 UTC timestamp

The stream closes when the call ends or the client disconnects. Reconnect logic is your responsibility — the platform does not buffer beyond the in-flight turn.

Conversation Stream (SSE)

Stream messaging activity (SMS or WhatsApp) for a single conversation.

This endpoint requires the Authorization header and is intended for server-to-server consumers (it does not support the ?token= query parameter). Use a server-side SSE client library or a custom fetch-based reader:

1 // Node.js / server-side example using fetch (streaming)
2 const response = await fetch(
3   `https://api.voicebip.com/v1/conversations/${conversationId}/stream`,
4   { headers: { Authorization: `Bearer ${apiKey}` } }
5 );
6 
7 const reader = response.body.getReader();
8 const decoder = new TextDecoder();
9 
10 while (true) {
11   const { done, value } = await reader.read();
12   if (done) break;
13   const text = decoder.decode(value);
14   // Parse SSE frames from `text` — each data line is one event
15   for (const line of text.split("\n")) {
16     if (line.startsWith("data: ")) {
17       const event = JSON.parse(line.slice(6));
18       // event.event_type:      "message.received" | "message.delivered" | "message.read" | "conversation.mode_changed"
19       // event.conversation_id: "conv_abc"
20       // event.timestamp:       "2026-05-15T14:32:01.123Z"
21       // event.payload:         event-specific fields nested verbatim from the NATS envelope
22       console.log(event);
23     }
24   }
25 }

The payload shape varies by event_type — message.* events carry { message_id, direction, text, status, ... }, while conversation.mode_changed carries { from_mode, to_mode, actor_user_id }. The dashboard dispatches on event.event_type rather than discriminating by URL path. The same events surface (with different transport guarantees) through the webhook system — see Webhooks.

Call Event Stream (WebSocket)

Bidirectional channel for any consumer that needs low-latency call events: barge-ins, idle silences, quality degradation, and (eventually) raw audio frames for in-browser monitoring.

1 const ws = new WebSocket(
2   `wss://api.voicebip.com/v1/calls/${callId}/stream?token=${apiKey}`
3 );
4 
5 ws.onmessage = (frame) => {
6   const event = JSON.parse(frame.data);
7   // event.event_type: "call.transcription" | "call.quality_degraded" | "call.completed" | ...
8   // event.payload:    channel-specific data
9 };

The platform issues a 101 Switching Protocols upgrade on a successful auth. After upgrade, frames are JSON envelopes matching the webhook payload format.

call.barge_in events are emitted for phone calls (ESL and SIP transports) but not for WebRTC browser calls. If you are watching a WebRTC call via this stream, barge-in will not appear in the event sequence.

Choosing Between SSE and WebSocket

Concern	SSE	WebSocket
Browser support	Built-in (`EventSource`)	Built-in (`WebSocket`)
Direction	Server → client only	Bidirectional
Auto-reconnect	Native	Manual
Through corporate proxies	Reliable	Frequently blocked
Server-push latency	~50ms	~10ms

If you’re building a dashboard view that only consumes events, choose SSE — fewer moving parts. If you’re building real-time control surfaces (e.g. injecting DTMF mid-call), choose WebSocket.

Operational Notes

Connection limits. The gateway disables its 10-second write timeout on these three paths only, so long-lived streams work. Other /v1 routes will kill SSE connections after 10s — don’t try to repurpose them.
Workspace isolation. Streams are RLS-scoped — you can only subscribe to call_id / conversation_id values owned by your workspace. Cross-workspace IDs return 404, not 401, to prevent enumeration.
Heartbeats. SSE streams send a comment line every 30 seconds. Most HTTP libraries handle this transparently; if you’re parsing the wire format manually, ignore lines starting with :.
Backpressure. Each stream has a bounded in-process frame buffer (64 frames). If your consumer falls behind, the server drops the next frame rather than blocking the NATS subscriber. Treat the stream as best-effort live view — for authoritative event delivery use webhooks, which retry independently for 24 hours.