Live Streaming (SSE & WebSocket)
Live Streaming (SSE & WebSocket)
Voicebip exposes three live-streaming endpoints. They serve different use cases:
Authentication
All endpoints accept Authorization: Bearer pk_live_xxx as a request header — preferred for server-to-server consumers.
For browser clients that cannot set custom headers, the ?token=pk_live_xxx query parameter is supported on two of the three paths:
Query-parameter tokens leak via browser history, Referer headers, and proxy access logs. Use them only on the two streaming paths that require it. The gateway limits ?token= support to those paths to prevent URL-embedded key authentication on the broader API surface.
Call Transcript (SSE)
Stream the per-turn transcript of a live call as it’s spoken. Each frame is one user or agent turn.
Frame format
The stream closes when the call ends or the client disconnects. Reconnect logic is your responsibility — the platform does not buffer beyond the in-flight turn.
Conversation Stream (SSE)
Stream messaging activity (SMS or WhatsApp) for a single conversation.
This endpoint requires the Authorization header and is intended for server-to-server consumers (it does not support the ?token= query parameter). Use a server-side SSE client library or a custom fetch-based reader:
The payload shape varies by event_type — message.* events carry { message_id, direction, text, status, ... }, while conversation.mode_changed carries { from_mode, to_mode, actor_user_id }. The dashboard dispatches on event.event_type rather than discriminating by URL path. The same events surface (with different transport guarantees) through the webhook system — see Webhooks.
Call Event Stream (WebSocket)
Bidirectional channel for any consumer that needs low-latency call events: barge-ins, idle silences, quality degradation, and (eventually) raw audio frames for in-browser monitoring.
The platform issues a 101 Switching Protocols upgrade on a successful auth. After upgrade, frames are JSON envelopes matching the webhook payload format.
call.barge_in events are emitted for phone calls (ESL and SIP transports) but not for WebRTC browser calls. If you are watching a WebRTC call via this stream, barge-in will not appear in the event sequence.
Choosing Between SSE and WebSocket
If you’re building a dashboard view that only consumes events, choose SSE — fewer moving parts. If you’re building real-time control surfaces (e.g. injecting DTMF mid-call), choose WebSocket.
Operational Notes
- Connection limits. The gateway disables its 10-second write timeout on these three paths only, so long-lived streams work. Other
/v1routes will kill SSE connections after 10s — don’t try to repurpose them. - Workspace isolation. Streams are RLS-scoped — you can only subscribe to
call_id/conversation_idvalues owned by your workspace. Cross-workspace IDs return 404, not 401, to prevent enumeration. - Heartbeats. SSE streams send a comment line every 30 seconds. Most HTTP libraries handle this transparently; if you’re parsing the wire format manually, ignore lines starting with
:. - Backpressure. Each stream has a bounded in-process frame buffer (64 frames). If your consumer falls behind, the server drops the next frame rather than blocking the NATS subscriber. Treat the stream as best-effort live view — for authoritative event delivery use webhooks, which retry independently for 24 hours.