NVIDIA NIM

NVIDIA NIM is a set of inference microservices for streamlined AI model deployment, prebuilt and optimized for low-latency, high-throughput inference on NVIDIA-accelerated infrastructure. Includes TensorRT and TensorRT-LLM-backed engines and exposes stable OpenAI-compatible APIs for self-hosted and cloud deployment.

API entry from apis.yml

apis.yml Raw ↑
aid: ai-gateway:nvidia-nim
name: NVIDIA NIM
description: NVIDIA NIM is a set of inference microservices for streamlined AI model deployment, prebuilt
  and optimized for low-latency, high-throughput inference on NVIDIA-accelerated infrastructure. Includes
  TensorRT and TensorRT-LLM-backed engines and exposes stable OpenAI-compatible APIs for self-hosted and
  cloud deployment.
humanURL: https://www.nvidia.com/en-us/ai/
baseURL: https://build.nvidia.com
tags:
- AI Gateway
- Inference
- Self-Hosted
- NVIDIA
- GPU
properties:
- type: Portal
  url: https://build.nvidia.com/
- type: Documentation
  url: https://docs.nvidia.com/nim/
- type: GitHubOrganization
  url: https://github.com/NVIDIA
x-deployment:
- self-host
- cloud
x-license: Proprietary