NVIDIA NIM Health API

Liveness, readiness, and startup probes exposed by self-hosted NIM containers (/v1/health/live, /v1/health/ready) and a Prometheus /v1/metrics scrape endpoint for GPU utilization, request latency, and queue depth. Drives Kubernetes pod lifecycle and HPA scaling via the NIM Operator.

NVIDIA NIM Health API is one of 10 APIs that NVIDIA NIM publishes on the APIs.io network, described by a machine-readable OpenAPI specification.

This API exposes 1 machine-runnable capability that can be deployed as REST, MCP, or Agent Skill surfaces via Naftiko.

Tagged areas include Health, Observability, and Kubernetes. The published artifact set on APIs.io includes API documentation, an OpenAPI specification, and 1 Naftiko capability spec.

OpenAPI Specification

nvidia-nim-health-api-openapi.yml Raw ↑
openapi: 3.1.0
info:
  title: NVIDIA NIM Health API
  description: >
    Liveness, readiness, and metrics endpoints exposed by every self-hosted
    NIM container on port 8000. The NIM Operator uses these for Kubernetes
    probes; Prometheus scrapes /v1/metrics for GPU utilization, request
    latency, queue depth, and per-engine counters.
  version: '2026-05-25'
  contact:
    name: NVIDIA Developer Support
    url: https://forums.developer.nvidia.com/c/ai-data-science/nemo-llm-service/
  license:
    name: NVIDIA AI Enterprise License
    url: https://www.nvidia.com/en-us/data-center/products/ai-enterprise/
servers:
  - url: http://localhost:8000
    description: Self-hosted NIM container default
tags:
  - name: Health
    description: Liveness, readiness, and metrics probes
paths:
  /v1/health/live:
    get:
      summary: Liveness Probe
      description: Returns 200 OK if the container process is alive. Used as Kubernetes livenessProbe.
      operationId: getLiveness
      tags:
        - Health
      responses:
        '200':
          description: Container is alive.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HealthStatus'
        '503':
          description: Container is unhealthy and should be restarted.
  /v1/health/ready:
    get:
      summary: Readiness Probe
      description: Returns 200 OK only once the model engine has loaded and the container can accept traffic.
      operationId: getReadiness
      tags:
        - Health
      responses:
        '200':
          description: Ready to serve.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HealthStatus'
        '503':
          description: Not ready yet (e.g. model still loading).
  /v1/metrics:
    get:
      summary: Prometheus Metrics
      description: Prometheus text exposition format. Includes GPU utilization, request latency histograms, queue depth, and engine-specific counters.
      operationId: getMetrics
      tags:
        - Health
      responses:
        '200':
          description: Prometheus metrics payload.
          content:
            text/plain:
              schema:
                type: string
components:
  schemas:
    HealthStatus:
      type: object
      properties:
        message:
          type: string
          example: Service is live.
        object:
          type: string
          example: health-response