Cohere Tokenize API

The Cohere Tokenize API splits input text into tokens using the tokenizer associated with a specified model. It returns both the token strings and their corresponding token IDs. This is useful for understanding how text will be processed by Cohere models, estimating token counts for billing purposes, and debugging input formatting issues.

OpenAPI Specification

cohere-tokenize-api-openapi.yml Raw ↑
openapi: 3.1.0
info:
  title: Cohere Tokenize API
  description: >-
    The Cohere Tokenize API splits input text into tokens using the
    tokenizer associated with a specified model. It returns both the token
    strings and their corresponding token IDs. This is useful for
    understanding how text will be processed by Cohere models, estimating
    token counts for billing purposes, and debugging input formatting issues.
  version: '1.0'
  contact:
    name: Cohere Support
    url: https://support.cohere.com
  termsOfService: https://cohere.com/terms-of-use
externalDocs:
  description: Cohere Tokenize API Documentation
  url: https://docs.cohere.com/reference/tokenize
servers:
  - url: https://api.cohere.com
    description: Cohere Production Server
tags:
  - name: Tokenize
    description: >-
      Endpoints for splitting text into tokens using byte-pair encoding for
      a specified Cohere model.
security:
  - bearerAuth: []
paths:
  /v1/tokenize:
    post:
      operationId: tokenize
      summary: Tokenize text
      description: >-
        Splits input text into smaller units called tokens using byte-pair
        encoding (BPE) for the specified model. Returns both the token
        strings and their corresponding integer token IDs. The text must be
        between 1 and 65536 characters.
      tags:
        - Tokenize
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/TokenizeRequest'
      responses:
        '200':
          description: Successful tokenization response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/TokenizeResponse'
        '400':
          description: Bad request due to invalid parameters
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Error'
        '401':
          description: Unauthorized due to missing or invalid API key
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Error'
components:
  securitySchemes:
    bearerAuth:
      type: http
      scheme: bearer
      description: >-
        Bearer authentication using a Cohere API key.
  schemas:
    TokenizeRequest:
      type: object
      required:
        - text
        - model
      properties:
        text:
          type: string
          description: >-
            The string to be tokenized. Must be between 1 and 65536
            characters.
          minLength: 1
          maxLength: 65536
        model:
          type: string
          description: >-
            The name of the model whose tokenizer will be used to tokenize
            the input text.
          example: command-r-plus
    TokenizeResponse:
      type: object
      properties:
        tokens:
          type: array
          description: >-
            An array of token strings resulting from tokenizing the input
            text.
          items:
            type: string
        token_strings:
          type: array
          description: >-
            An array of token strings corresponding to each token.
          items:
            type: string
        meta:
          type: object
          description: >-
            Metadata about the API request.
          properties:
            api_version:
              type: object
              properties:
                version:
                  type: string
                  description: >-
                    The API version used for the request.
    Error:
      type: object
      properties:
        message:
          type: string
          description: >-
            A human-readable error message describing what went wrong.