NVIDIA NIM Completions API

Legacy OpenAI-compatible text completion endpoint (/v1/completions) for non-chat foundation models served by NIM. Accepts a raw prompt and returns generated text with the same streaming, sampling, and stopping-criterion controls as the chat endpoint.

NVIDIA NIM Completions API is one of 10 APIs that NVIDIA NIM publishes on the APIs.io network, described by a machine-readable OpenAPI specification.

This API exposes 1 machine-runnable capability that can be deployed as REST, MCP, or Agent Skill surfaces via Naftiko.

Tagged areas include AI, Artificial Intelligence, Completions, LLM, and OpenAI Compatible. The published artifact set on APIs.io includes API documentation, an OpenAPI specification, and 1 Naftiko capability spec.

OpenAPI Specification

nvidia-nim-completions-api-openapi.yml Raw ↑
openapi: 3.1.0
info:
  title: NVIDIA NIM Completions API
  description: >
    Legacy OpenAI-compatible text completion endpoint. Accepts a raw prompt
    (instead of a structured messages array) and returns generated text. Same
    streaming, stop, and sampling controls as the chat endpoint.
  version: '2026-05-25'
  contact:
    name: NVIDIA Developer Support
    url: https://forums.developer.nvidia.com/c/ai-data-science/nemo-llm-service/
  license:
    name: NVIDIA AI Enterprise License
    url: https://www.nvidia.com/en-us/data-center/products/ai-enterprise/
servers:
  - url: https://integrate.api.nvidia.com
    description: NVIDIA-hosted NIM endpoint
  - url: http://localhost:8000
    description: Self-hosted NIM container default
security:
  - BearerAuth: []
tags:
  - name: Completions
    description: Legacy text completion operations
paths:
  /v1/completions:
    post:
      summary: Create A Text Completion
      description: Generate a text completion from a raw prompt against a supported NIM-served model.
      operationId: createCompletion
      tags:
        - Completions
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/CompletionRequest'
      responses:
        '200':
          description: Completion response (or SSE stream when stream=true).
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/CompletionResponse'
        '400':
          description: Invalid request.
        '401':
          description: Missing or invalid API key.
        '429':
          description: Rate limit exceeded.
components:
  securitySchemes:
    BearerAuth:
      type: http
      scheme: bearer
      bearerFormat: nvapi-...
  schemas:
    CompletionRequest:
      type: object
      required: [model, prompt]
      properties:
        model:
          type: string
        prompt:
          oneOf:
            - type: string
            - type: array
              items:
                type: string
        max_tokens:
          type: integer
          default: 1024
        temperature:
          type: number
          default: 0.2
        top_p:
          type: number
          default: 0.7
        n:
          type: integer
          default: 1
        stream:
          type: boolean
          default: false
        stop:
          oneOf:
            - type: string
            - type: array
              items:
                type: string
        seed:
          type: integer
        frequency_penalty:
          type: number
        presence_penalty:
          type: number
        echo:
          type: boolean
        logprobs:
          type: integer
    CompletionResponse:
      type: object
      properties:
        id:
          type: string
        object:
          type: string
          example: text_completion
        created:
          type: integer
        model:
          type: string
        choices:
          type: array
          items:
            type: object
            properties:
              text:
                type: string
              index:
                type: integer
              finish_reason:
                type: string
        usage:
          type: object
          properties:
            prompt_tokens:
              type: integer
            completion_tokens:
              type: integer
            total_tokens:
              type: integer