Deepgram
Deepgram Text-To-Speech API

The Deepgram Text-to-Speech API converts text into natural-sounding speech using the Aura model family. It supports both single text requests and continuous streaming text-to-speech, delivering sub-200 millisecond latency suitable for real-time voice agents and conversational AI applications. The API offers multiple voice options and is designed for enterprise-grade deployments including voicebots, IVR systems, and interactive voice applications.
Documentation GitHub OpenAPI
OpenAPI Specification

openapi: 3.1.0
info:
  title: Deepgram Text-to-Speech API
  description: >-
    The Deepgram Text-to-Speech API converts text into natural-sounding speech
    using the Aura model family. It supports single text requests via a REST
    endpoint, delivering sub-200 millisecond latency suitable for real-time
    voice agents and conversational AI applications. The API offers multiple
    voice options with configurable encoding, sample rate, and container
    format settings for enterprise-grade deployments including voicebots,
    IVR systems, and interactive voice applications.
  version: '1.0'
  contact:
    name: Deepgram Support
    url: https://developers.deepgram.com
  termsOfService: https://deepgram.com/tos
externalDocs:
  description: Deepgram Text-to-Speech Documentation
  url: https://developers.deepgram.com/reference/text-to-speech-api
servers:
  - url: https://api.deepgram.com
    description: Deepgram Production Server
  - url: https://api.eu.deepgram.com
    description: Deepgram EU Server
tags:
  - name: Text-To-Speech
    description: >-
      Convert text into natural-sounding speech audio.
security:
  - bearerAuth: []
paths:
  /v1/speak:
    post:
      operationId: synthesizeSpeech
      summary: Deepgram Convert text to speech
      description: >-
        Converts text content into natural-sounding speech audio using
        Deepgram's Aura model family. Returns audio data in the specified
        encoding format. Supports multiple voices and configurable audio
        output settings including encoding, sample rate, and container format.
      tags:
        - Text-To-Speech
      parameters:
        - $ref: '#/components/parameters/model'
        - $ref: '#/components/parameters/encoding'
        - $ref: '#/components/parameters/container'
        - $ref: '#/components/parameters/sample_rate'
        - $ref: '#/components/parameters/bit_rate'
        - $ref: '#/components/parameters/callback'
        - $ref: '#/components/parameters/callback_method'
      requestBody:
        description: >-
          Text content to convert to speech.
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/SpeakRequest'
      responses:
        '200':
          description: Speech audio generated successfully
          content:
            audio/wav:
              schema:
                type: string
                format: binary
                description: >-
                  Generated speech audio in WAV format.
            audio/mpeg:
              schema:
                type: string
                format: binary
                description: >-
                  Generated speech audio in MP3 format.
            audio/opus:
              schema:
                type: string
                format: binary
                description: >-
                  Generated speech audio in Opus format.
            audio/flac:
              schema:
                type: string
                format: binary
                description: >-
                  Generated speech audio in FLAC format.
            audio/aac:
              schema:
                type: string
                format: binary
                description: >-
                  Generated speech audio in AAC format.
          headers:
            x-request-id:
              schema:
                type: string
              description: >-
                Unique identifier for the request.
            content-type:
              schema:
                type: string
              description: >-
                MIME type of the returned audio.
        '400':
          description: Bad request due to invalid parameters or text content
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Error'
        '401':
          description: Unauthorized due to missing or invalid API key
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Error'
        '402':
          description: Insufficient credits
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Error'
components:
  securitySchemes:
    bearerAuth:
      type: http
      scheme: bearer
      description: >-
        Deepgram API key passed as a bearer token in the Authorization header.
  parameters:
    model:
      name: model
      in: query
      description: >-
        Voice model to use for speech synthesis. Aura model family voices
        include aura-asteria-en, aura-luna-en, aura-stella-en, aura-athena-en,
        aura-hera-en, aura-orion-en, aura-arcas-en, aura-perseus-en,
        aura-angus-en, aura-orpheus-en, aura-helios-en, and aura-zeus-en.
      schema:
        type: string
        default: aura-asteria-en
    encoding:
      name: encoding
      in: query
      description: >-
        Audio encoding format for the output. Options include linear16,
        mulaw, alaw, mp3, opus, flac, and aac.
      schema:
        type: string
        enum:
          - linear16
          - mulaw
          - alaw
          - mp3
          - opus
          - flac
          - aac
        default: linear16
    container:
      name: container
      in: query
      description: >-
        File format container for the output audio. The default depends on
        the encoding selected.
      schema:
        type: string
        enum:
          - wav
          - ogg
          - none
        default: wav
    sample_rate:
      name: sample_rate
      in: query
      description: >-
        Sample rate in Hertz for the output audio. Default is 24000. Valid
        values depend on the selected encoding.
      schema:
        type: integer
        default: 24000
    bit_rate:
      name: bit_rate
      in: query
      description: >-
        Bit rate for compressed audio formats such as MP3 and Opus.
      schema:
        type: integer
    callback:
      name: callback
      in: query
      description: >-
        URL to which Deepgram will send the audio when processing is complete.
      schema:
        type: string
        format: uri
    callback_method:
      name: callback_method
      in: query
      description: >-
        HTTP method for the callback request.
      schema:
        type: string
        enum:
          - POST
          - PUT
        default: POST
  schemas:
    SpeakRequest:
      type: object
      required:
        - text
      properties:
        text:
          type: string
          description: >-
            Text content to convert to speech.
          minLength: 1
    Error:
      type: object
      properties:
        err_code:
          type: string
          description: >-
            Error code identifying the type of error.
        err_msg:
          type: string
          description: >-
            Human-readable error message.
        request_id:
          type: string
          description: >-
            Unique identifier for the request that produced the error.
Deepgram Text-To-Speech API

Documentation

Specifications

Other Resources

OpenAPI Specification