Inworld TTS API
Inworld TTS — real-time text-to-speech API with the #1-ranked voice models on the Artificial Analysis Speech Arena. Supports the Realtime TTS-2 model (100+ languages, natural-language steering), Realtime TTS 1.5 Max (15 languages), and Realtime TTS 1.5 Mini (cost-optimized, sub-120 ms first-token). Provides synchronous synthesis, server-streamed synthesis, and a streaming WebSocket interface with instant + professional voice cloning, voice design from text prompts, custom pronunciation, pause controls, word/character/phoneme alignment for lipsync, and zero-data-retention plus on-premise deployment options.
Documentation
Documentation
https://docs.inworld.ai/tts/tts
GettingStarted
https://docs.inworld.ai/quickstart-tts
Documentation
https://docs.inworld.ai/api-reference/ttsAPI/texttospeech/synthesize-speech
Documentation
https://docs.inworld.ai/api-reference/ttsAPI/texttospeech/synthesize-speech-stream
Documentation
https://docs.inworld.ai/api-reference/ttsAPI/texttospeech/synthesize-speech-websocket
Documentation
https://docs.inworld.ai/tts/voice-cloning
Documentation
https://docs.inworld.ai/tts/voice-design
Documentation
https://docs.inworld.ai/tts/on-premises
Specifications
OpenAPI
https://raw.githubusercontent.com/api-evangelist/inworld-ai/refs/heads/main/openapi/inworld-tts-api-openapi.yml
AsyncAPI
https://raw.githubusercontent.com/api-evangelist/inworld-ai/refs/heads/main/asyncapi/inworld-ai-asyncapi.yml