Cartesia Ink Speech-to-Text API

The Ink streaming speech-to-text API transcribes audio in real time with native turn detection tuned for voice agents and conversational systems.

Cartesia Ink Speech-to-Text API is one of 2 APIs that Cartesia publishes on the APIs.io network.

Tagged areas include STT, Streaming, Turn Detection, Voice Agents, and WebSocket. The published artifact set on APIs.io includes API documentation, an API reference, SDKs, and pricing.

API entry from apis.yml

apis.yml Raw ↑
aid: cartesia:stt-api
name: Cartesia Ink Speech-to-Text API
description: The Ink streaming speech-to-text API transcribes audio in real time with native turn detection
  tuned for voice agents and conversational systems.
humanURL: https://docs.cartesia.ai
baseURL: https://api.cartesia.ai
tags:
- STT
- Streaming
- Turn Detection
- Voice Agents
- WebSocket
properties:
- type: Documentation
  url: https://docs.cartesia.ai
- type: APIReference
  url: https://docs.cartesia.ai/api-reference
- type: SDK
  url: https://github.com/cartesia-ai/cartesia-python
- type: SDK
  url: https://github.com/cartesia-ai/cartesia-js
- type: Pricing
  url: https://cartesia.ai/pricing
features:
- name: Streaming Transcription
  description: Real-time transcription of audio streams over WebSocket.
- name: Turn Detection
  description: Native turn detection to decide when users finish speaking.
- name: Voice Agent Optimization
  description: Tuned specifically for voice agent loops and barge-in handling.
useCases:
- name: Voice Agent Listening
  description: Provide low-latency listening for voice agent stacks.
- name: Live Captioning
  description: Generate live captions for meetings and broadcasts.
- name: Voice Form Capture
  description: Capture structured input from voice in real time.
integrations:
- name: LiveKit
- name: Pipecat
- name: Daily
- name: Twilio
authentication:
- type: API Key
  description: API key authentication via the X-API-Key header alongside the Cartesia-Version header.