Vox + Gemini EU ⭐ Best
europe-west1 Belgium · Vertex AI · 3 hops · all-EU
~85ms RTT
| Hop | Route | RTT | Loss | Type |
| 1 | RKV → Voximplant Frankfurt | 25ms | <0.01% | SIP/TLS · IS→EU |
| 2 | Vox FRA → Gemini EU BEL | 40ms | <0.01% | WSS BidiGenerate |
| 3 | Gemini EU → CF Worker FRA | 5ms | <0.01% | HTTPS · same city |
| TOTAL TTFA | 70–100ms | No TTS overhead · native audio |
Vox + Gemini US (current)
generativelanguage.googleapis.com · Iowa · 4 hops
~160ms RTT
| Hop | Route | RTT | Loss | Type |
| 1 | RKV → Voximplant Frankfurt | 25ms | <0.01% | SIP/TLS |
| 2 | Vox FRA → Gemini Iowa | 100ms | ~0.05% | WSS · transatlantic |
| 3 | Gemini → CF Worker FRA | 40ms | <0.01% | HTTPS US→EU |
| 4 | Audio return → RKV | 20ms | <0.01% | PCM 24→8kHz |
| TOTAL TTFA | 120–200ms | No TTS · good latency |
Teams Phone · Azure SBC EU
Direct Routing · northeurope Dublin · 5+ hops
~400ms AI turn
| Hop | Route | RTT | Loss | Type |
| 1 | RKV → Azure SBC northeurope | 35ms | <0.01% | SIP TLS · Dublin |
| 2 | SBC → Teams Transport Relay | 20ms | <0.01% | SRTP/ICE · Azure |
| 3 | Media Processor (bot audio) | 50ms | <0.01% | OPUS 48kHz |
| 4 | Azure STT (Speech) | 200ms | <0.01% | Azure Speech → text |
| 5 | GPT-4o inference | 300ms | <0.01% | LLM turn |
| 6 | Azure TTS → audio | 250ms | <0.01% | Neural TTS |
| 7 | Teams jitter buffer + client | 60ms | ~0.1% | WebRTC OPUS |
| TOTAL AI TURN | ~900ms | STT+LLM+TTS chain ⚠ |
Teams call quality (PSTN only): excellent — OPUS + jitter buffer = MOS 4.0+, <0.1% loss. Teams bot/AI turns: ~900ms total because it runs STT→GPT-4o→Neural TTS in series. Same bottleneck as Twilio. Gemini Live eliminates this chain entirely.
Twilio PSTN (legacy)
+1-424-622-5842 · US number · 7 hops · TTS bottleneck
~675ms TTFA
| Hop | Route | RTT | Loss | Type |
| 1 | RKV → PSTN → Twilio San Jose | 130ms | 0.05–0.1% | PSTN · FARICE cable |
| 2 | Twilio → Media (Ashburn VA) | 80ms | <0.01% | TwiML · MULAW 8k |
| 3 | Media → Ultravox WebSocket | 30ms | <0.01% | WS joinUrl |
| 4 | Ultravox → CF webhook | 120ms | <0.01% | HTTPS transatlantic |
| 5 | 🔴 TTS HTTP (per utterance) | 400–800ms | <0.01% | OpenAI/GCP TTS |
| TOTAL TTFA | 450–900ms | TTS per turn is the killer |
Time To First Audio (TTFA)
Per AI-Turn Latency
Vox+Gemini EU
Native audio
Vox+Gemini US
Native audio
Feature Matrix
| Feature | EU | Teams | Twilio |
| IS phone number | ✓ | SBC | US# |
| Native audio AI | ✓ | ✗ | ✗ |
| EU data residency | ✓ GDPR | ✓ Azure | ✗ US |
| AI turn latency | ~60ms | ~900ms | >1s |
| Custom MCP tools | 47 tools | Bot SDK | webhooks |
| Icelandic quality | Gemini 2.5 | GPT-4o | GPT-4o mini |
| Conference/PBX | via Vox | Full PBX | Full |
| Est. cost/min | ~$0.015 | ~$0.04 | ~$0.025 |
Packet Loss — 100 RTP packets
Vox+Gemini EU — <0.01%
Vox+Gemini US — ~0.05%
Teams SBC EU — ~0.1% (jitter buffer)
Twilio PSTN — 0.05–0.1%
■ Received ■ Lost/Late
Gemini Live: lost packet = 20ms artifact. TTS model: lost packet = full re-request (+400ms).
Jitter (ms) — 50 measurements
🩷 EU: 2–5ms
🟠 US: 5–15ms
🟣 Teams: 8–20ms
🔵 PSTN: 15–40ms
Codec Comparison
| System | Codec | Sample Rate | Bitrate | Loss Resilience |
| Vox+Gemini EU | PCM native | 16/24kHz | ~256kbps | High |
| Teams | OPUS | 48kHz | 6–510kbps | High (FEC) |
| Twilio | MULAW | 8kHz | 64kbps | Medium |
Switch to EU Gemini — voximplant-session.ts:
// Current (US generativelanguage):
const WS = "wss://generativelanguage
.googleapis.com/ws/google.ai
.generativelanguage.v1beta
.GenerativeService
.BidiGenerateContent"
// EU Vertex AI (belgium):
const WS = "wss://europe-west1-
aiplatform.googleapis.com/ws/
google.cloud.aiplatform.v1
.LlmBidiService
/BidiGenerateContent"
Auth change — Vertex needs OAuth2:
// Add to VoximplantSession DO:
async getVertexToken(env) {
const sa = JSON.parse(
env.GOOGLE_SERVICE_ACCOUNT
);
// Use google-auth-library or
// manual JWT → access token
return bearerToken;
}
// WebSocket header:
headers: {
"Authorization": `Bearer ${t}`
}
Teams Direct Routing — SBC config (comparison):
// Azure northeurope SBC:
// sip.pstnhub.microsoft.com
// 52.114.148.0/24
// Media bypass: ON → 35ms
// Media bypass: OFF → 85ms
// Teams Bot AI pipeline:
// Azure Speech STT: ~200ms
// GPT-4o: ~300ms
// Azure Neural TTS: ~250ms
// Total bot turn: ~750-900ms ❌
// vs Gemini Live EU: ~60ms ✅
Bottom line: Teams Phone is the right choice for enterprise PBX — compliance recording, meetings, full PSTN integration, OPUS quality. For conversational AI response speed, Voximplant + Gemini Live EU is ~12× faster per AI turn because it has no STT→LLM→TTS chain. The native audio model hears and speaks directly.