NVIDIA Triton Inference Server HTTP API

NVIDIA Triton Inference Server is an open-source inference serving software that implements the KServe Open Inference Protocol (V2). Supports TensorRT, ONNX, TensorFlow, PyTorch, and Python backends. Provides dynamic batching, model ensembles, model analyzers, and GPU/CPU inference. Used extensively in production ML pipelines requiring maximum throughput.

API entry from apis.yml

apis.yml Raw ↑
name: NVIDIA Triton Inference Server HTTP API
description: NVIDIA Triton Inference Server is an open-source inference serving software that implements
  the KServe Open Inference Protocol (V2). Supports TensorRT, ONNX, TensorFlow, PyTorch, and Python backends.
  Provides dynamic batching, model ensembles, model analyzers, and GPU/CPU inference. Used extensively
  in production ML pipelines requiring maximum throughput.
image: https://developer.nvidia.com/favicon.ico
humanUrl: https://developer.nvidia.com/triton-inference-server
baseUrl: https://triton.example.com
tags:
- GPU
- Inference
- Model Serving
- NVIDIA
- Open Source
- TensorRT
- Triton
properties:
- type: Documentation
  url: https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/
- type: GitHub
  url: https://github.com/triton-inference-server/server
- type: GettingStarted
  url: https://github.com/triton-inference-server/tutorials
- type: APIReference
  url: https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/customization_guide/inference_protocols.html
contact:
- type: GitHub Issues
  url: https://github.com/triton-inference-server/server/issues
- type: Forums
  url: https://forums.developer.nvidia.com/c/ai-data-science/deep-learning/triton-inference-server/