BYOM (Bring Your Own Model)
BYOM (Bring Your Own Model)
BYOM lets you drive an agent’s conversation with any LLM or agent framework you choose — OpenAI, Anthropic Claude, Gemini, Llama on Ollama, a LangGraph pipeline, or a deterministic state machine. Voicebip handles telephony, STT, TTS, MNO failover, call control, billing, and HMAC-signed events; your webhook only has to return the next thing the agent should say.
If you want Voicebip to host the model for you, see Agents and
set ai_provider to openai or gemini instead.
Hosted AI vs BYOM
Both modes use the same voice pipeline, the same phone numbers, and the same
event stream. You can switch an agent between modes with a single PATCH —
nothing else in your integration changes.
Configure an Agent for BYOM
Set ai_provider to byom and point webhook_url at the endpoint that will
handle conversation turns.
Your webhook_url receives everything for that agent — both BYOM turn
requests and lifecycle events (call.initiated, call.transcription,
call.completed). Distinguish them by the presence of an event_type field
in the JSON body: if event_type is present it is a lifecycle event and you
should return {"received": true}; if absent it is a BYOM turn request and
you must return a BYOMVoiceWebhookResponse with a text field. See
Webhooks for the lifecycle event payload shapes.
The Turn Contract
On every caller turn, Voicebip POSTs a JSON body to your webhook_url and
expects a JSON response. This is a single request/response — BYOM does not
stream tokens.
Request
Each entry in messages has:
You can pass the array straight into an LLM’s messages parameter after mapping roles.
Response
Return a JSON body with the agent’s next action. Only text is required.
end_call, transfer_to, and dtmf are mutually exclusive with a normal
continue-the-conversation response — pick one per turn.
Voice Pipeline
When a call comes in on a BYOM agent:
Your webhook sits squarely on the latency critical path. Voicebip enforces a 5-second hard timeout per request; responses that arrive later are treated as a webhook failure.
Latency Budget
The voice pipeline targets < 800ms P95 from your webhook response to TTS playback start. To stay inside a natural-feeling turn:
- < 300ms is the target for your webhook round-trip end-to-end.
- < 500ms + webhook RTT is the realistic total turn latency callers perceive.
- 5s is the hard timeout. Beyond that, the turn is dropped.
Practical implications:
- Host your webhook in the same region as the call. EU/Africa regions are recommended for Nigerian traffic; US regions add ~200ms baseline.
- Use a warm, persistent HTTP server. Cold-started serverless functions blow the budget on the first turn of every call.
- Stream from your LLM internally and start responding as soon as you have the first complete sentence — don’t wait for the full completion.
- For tool calls that can’t finish in 300ms, return a filler like
"One moment while I check that..."and resolve the real answer on the next turn.
See Best Practices for the full latency table.
Conversation History
Voicebip automatically trims messages to the last 20 turns before sending.
This keeps payloads small and LLM context windows efficient. You do not need
to maintain your own session store — pass the array straight through to your LLM
as conversation context, keyed by call_id if you need to correlate with
anything else on your side.
If you need the full transcript, subscribe to the call.completed event — it
carries the complete turn-by-turn history when the call ends.
Errors and Failure Modes
BYOM has no hosted fallback. An agent configured with ai_provider: byom
will only use your webhook — if it’s unreachable or the configuration is
missing, the turn fails and the caller hears the configured failure prompt.
Circuit Breaker
Each workspace has a BYOM circuit breaker. After 5 consecutive failures or
> 50% error rate over 30 seconds, the breaker trips to OPEN and new turns
fail fast for 30 seconds before entering HALF-OPEN probe mode. This
protects the voice pipeline from a broken webhook dragging down every call on
the workspace.
Watch the byom.webhook.errors and byom.breaker.state metrics (or the
equivalent dashboard panels) if your agents go quiet.
Security
HMAC Signature Verification
Voicebip signs every BYOM request (voice and messaging) with two headers:
The signature covers "{timestamp}.{raw_body}" — the timestamp string, a
literal dot, then the exact raw request body bytes:
Verification checklist:
- Read
X-Voicebip-Timestampand reject if|now - timestamp| > 300 s(replay protection). - Compute the HMAC over the raw body bytes — before any JSON parsing.
- Compare with
hmac.Equal(constant-time) — never a plain string compare. - Reject requests where the signature is absent or wrong.
Secret Rotation
When you rotate via POST /v1/workspace/signing-secret/rotate, Voicebip
moves the old secret to a 24-hour grace window. During that window, two
signature headers appear on every BYOM request:
X-Voicebip-Signature— new secretX-Voicebip-Signature-Previous— old secret
Accept either as valid during your rollout, then stop accepting the old one after 24 hours. This prevents dropped turns while you deploy the new secret.
Other security practices
- Idempotency: Voicebip does not retry BYOM turns (unlike event webhooks),
but you may receive duplicate
call_id + transcriptionpairs if a caller repeats themselves. Don’t assume uniqueness. - Never trust
transcriptionas sanitized input. Treat it as user content and pass it to your LLM through the same guards you’d use for any user text. - Log
call_idandagent_idon every turn so you can correlate with Voicebip’s request IDs when debugging.
Minimal Handler
A working Node.js/Express handler with HMAC verification lives in Code Examples. The shape below is the bare minimum if you already have signature verification wired up.
Flask and Go versions are also in Code Examples.
Testing Your Webhook
Before wiring a real number, test the turn contract end-to-end:
- Point the agent at an ngrok tunnel —
webhook_url: "https://abc123.ngrok-free.app/voicebip/byom". - Use the sandbox — any API key prefixed with
pk_test_routes to sandbox mode. Sandbox calls synthesize transcription events and exercise the full BYOM path without touching real SIP/SMPP or being billed. - Fire a test lifecycle event with
POST /v1/webhooks/testto confirm your event handler works before a live call. See Webhooks → Testing for the full list of supportedevent_typevalues and a BYOM-specific end-to-end testing checklist. - Place a real call to an agent’s number (still in sandbox) and watch your handler logs plus the dashboard’s live transcript view.
Gemini Live Mode
When you set ai_provider to gemini, Voicebip uses Gemini 2.0 Flash Live —
a bidirectional audio streaming mode that bypasses the traditional STT → LLM →
TTS pipeline entirely and saves ~400–600ms per turn.
The three pipeline modes
Turn detection and barge-in
In Gemini Live mode, Gemini owns turn detection entirely — it decides when the caller has finished speaking and when the agent should respond.
- Turn sensitivity is adjusted through Google’s Gemini Live session configuration, not through Voicebip agent settings.
- The idle-silence prompt (played after a long caller silence) only applies to the classic STT path and does not run in Gemini Live mode.
- Barge-in behavior is governed by Gemini’s real-time audio processing rather than the Voicebip barge coordinator used on classic STT calls.
SSE transcript stream
In Gemini Live mode, live transcript events may not appear in the SSE transcript
stream (/v1/calls/{id}/transcript/stream). For authoritative call transcripts,
use the call.completed webhook event — it carries the full turn-by-turn history
when the call ends.
Messaging BYOM (SMS + WhatsApp)
The same webhook_url on an agent also handles inbound SMS and WhatsApp
messages. Each inbound message triggers a POST to your webhook with the
conversation history — route on channel to handle SMS and WhatsApp separately.
Messaging Request
Messaging Response
When multiple fields are set, precedence is: end_conversation > escalate >
template_name > reply. An empty {} body is valid — use it to silently
acknowledge and take action on your side without sending a reply.
HMAC signing is identical to voice BYOM — same headers (X-Voicebip-Signature,
X-Voicebip-Timestamp), same algorithm, same rotation grace-period behaviour.
See HMAC Signature Verification above.
Switching Modes Mid-Lifetime
Switching an agent between byom, openai, and gemini is a single PATCH
on ai_provider. New calls use the new mode immediately; in-flight calls keep
the mode they started with. This is the intended migration path if you prototype
on hosted AI and later move to BYOM (or vice versa).