Convai Streaming Transcription API

Real-time speech-to-text over WebSocket (wss://transcribe.convai.com/stream). Stream 16-bit PCM, mono, 16 kHz audio and receive incremental transcripts. Used by the Convai plugins to drive low-latency voice input from players.

Convai Streaming Transcription API is one of 10 APIs that Convai publishes on the APIs.io network, described by a machine-readable OpenAPI specification.

Tagged areas include AI, Speech, Transcription, Streaming, and WebSocket. The published artifact set on APIs.io includes API documentation and an OpenAPI specification.

OpenAPI Specification

convai-streaming-transcription-api-openapi.yml Raw ↑
openapi: 3.1.0
info:
  title: Convai Streaming Transcription API
  version: "1.0"
  description: |
    Real-time speech-to-text over a WebSocket connection. Stream 16-bit PCM,
    mono, 16 kHz audio and receive incremental transcripts. Used by the Convai
    plugins to drive low-latency voice input from players. Endpoint is
    documented here as an OpenAPI definition for discoverability; the live
    transport is wss://transcribe.convai.com/stream.
servers:
- url: https://transcribe.convai.com
security:
- ConvaiApiKey: []
paths:
  /stream:
    get:
      summary: Open Streaming Transcription WebSocket
      operationId: openTranscriptionStream
      tags: [Streaming]
      description: |
        Upgrade to WebSocket. Send 16-bit PCM, mono, 16 kHz audio binary
        frames. Receive JSON transcript frames with `text`, `is_final`, and
        `confidence`. Authentication via `CONVAI-API-KEY` header or
        `?convai-api-key=` query parameter.
      parameters:
      - in: query
        name: convai-api-key
        schema: { type: string }
        description: Optional alternative to the header auth
      responses:
        '101':
          description: Switching Protocols (WebSocket upgrade)
components:
  securitySchemes:
    ConvaiApiKey:
      type: apiKey
      in: header
      name: CONVAI-API-KEY