Together AI Launches Voice Agent Platform With Sub-700ms Latency

Lawrence Jengar
Mar 13, 2026 01:57

Together AI debuts unified voice agent infrastructure with Deepgram and Cartesia integrations, targeting enterprise deployments with end-to-end latency under 700ms.

Together AI rolled out a unified voice agent platform that keeps speech-to-text, language models, and text-to-speech processing on the same infrastructure cluster. The $3.3 billion AI cloud startup claims the setup delivers end-to-end latency under 700 milliseconds—fast enough for natural conversation flow.

The platform integrates natively with Deepgram for transcription and Cartesia for voice synthesis, both running on Together’s co-located servers rather than bouncing audio across multiple cloud providers.

Why Co-Location Matters for Voice

Most production voice systems stitch together separate vendors for each pipeline stage. Audio hits one provider for transcription, routes to another for the LLM response, then bounces to a third for speech synthesis. Each handoff adds network latency and failure points.

Together’s pitch: keep everything in the same datacenter. The company reports sub-500ms latency in optimal conditions, though the 700ms figure represents their stated ceiling for end-to-end processing.

“Voice agents live or die by latency, and every network hop between providers is a place where the experience breaks down,” said Abe Pursell, Deepgram’s VP of Partnerships.

Model Flexibility Without the Patchwork

The platform supports Whisper Large v3, Minimax Speech 2.6 Turbo, Rime Arcana, and Kokoro alongside Together’s full LLM catalog. Developers can swap components without rebuilding integrations—useful for teams testing different voice characteristics or transcription accuracy for specific use cases.

Cartesia brings its Sonic-3 and Sonic-2 TTS models to the platform. Deepgram contributes Nova-3, Nova-3 Multilingual for transcription, Flux for conversational STT, and Aura-2 for synthesis.

Unlike opaque speech-to-speech systems, Together’s modular approach preserves access to intermediate transcripts and response text. Teams can inspect, modify, and route data mid-stream—a requirement for many enterprise compliance workflows.

Enterprise Requirements and Production Use

The platform targets regulated industries with zero data retention options, SOC 2 Type II certification, HIPAA compliance, and dedicated data residency. Decagon, which runs customer support voice agents handling billing inquiries and technical troubleshooting, already operates on the stack.

Together AI raised $305 million in February 2025 at a $3.3 billion valuation, with reports suggesting the company is now in talks to raise at $7.5 billion. The company has surpassed 450,000 developers and crossed $100 million in annualized revenue.

The voice platform launch represents Together’s expansion beyond its core LLM inference business into the growing voice AI market, where latency and reliability remain persistent pain points for production deployments.

Image source: Shutterstock

Source link