Cartesia Sonic Text-to-Speech API

The Sonic text-to-speech API converts text into ultra-low-latency, emotive speech with sub-100ms time-to-first-byte. It supports REST, server-sent events, and WebSocket streaming for real-time voice agents and applications.

Cartesia Sonic Text-to-Speech API is one of 2 APIs that Cartesia publishes on the APIs.io network.

Tagged areas include TTS, Streaming, SSE, WebSocket, and Real-Time. The published artifact set on APIs.io includes API documentation, a getting-started guide, an API reference, SDKs, a GitHub repository, and pricing.

API entry from apis.yml

apis.yml Raw ↑
aid: cartesia:tts-api
name: Cartesia Sonic Text-to-Speech API
description: The Sonic text-to-speech API converts text into ultra-low-latency, emotive speech with sub-100ms
  time-to-first-byte. It supports REST, server-sent events, and WebSocket streaming for real-time voice
  agents and applications.
humanURL: https://docs.cartesia.ai
baseURL: https://api.cartesia.ai
tags:
- TTS
- Streaming
- SSE
- WebSocket
- Real-Time
- Voice
properties:
- type: Documentation
  url: https://docs.cartesia.ai
- type: GettingStarted
  url: https://docs.cartesia.ai/get-started
- type: SignUp
  url: https://play.cartesia.ai
- type: APIReference
  url: https://docs.cartesia.ai/api-reference
- type: SDK
  url: https://github.com/cartesia-ai/cartesia-python
- type: SDK
  url: https://github.com/cartesia-ai/cartesia-js
- type: SDK
  url: https://github.com/cartesia-ai/cartesia-go
- type: GitHubRepository
  url: https://github.com/cartesia-ai
- type: Pricing
  url: https://cartesia.ai/pricing
- type: Authentication
  url: https://docs.cartesia.ai
features:
- name: Ultra-Low Latency
  description: First audio byte in as little as 90ms for real-time conversational agents.
- name: Multilingual
  description: More than 40 languages covering most major markets.
- name: Emotive Speech
  description: Expressive prosody including laughter and emotion control.
- name: Streaming Outputs
  description: REST, server-sent events, and WebSocket interfaces for streaming audio.
- name: Voice Library
  description: Catalog of prebuilt voices accessible by ID across languages.
- name: Instant Voice Clone
  description: Create a voice from a short reference clip for fast iteration.
- name: Professional Voice Clone
  description: Higher-fidelity voice cloning for production avatars and brands.
- name: Voice Localization
  description: Localize cloned and library voices into target languages.
useCases:
- name: Voice Agents
  description: Build low-latency conversational voice agents for support and sales.
- name: Dubbing and Localization
  description: Dub video and audio into additional languages with voice continuity.
- name: Interactive Characters
  description: Voice game characters, avatars, and interactive narration.
- name: Accessibility
  description: Provide spoken interfaces and read-aloud features for accessibility.
- name: Healthcare and IVR
  description: Power compliant voice experiences in healthcare and IVR systems.
integrations:
- name: LiveKit
- name: Pipecat
- name: Vapi
- name: LangChain
- name: LlamaIndex
- name: Twilio
- name: Daily
- name: Vercel AI SDK
- name: Retell
- name: Bland
authentication:
- type: API Key
  description: API key authentication via the X-API-Key header alongside the Cartesia-Version header.