NVIDIA Triton Inference Server HTTP API
NVIDIA Triton Inference Server is an open-source inference serving software that implements the KServe Open Inference Protocol (V2). Supports TensorRT, ONNX, TensorFlow, PyTorch, and Python backends. Provides dynamic batching, model ensembles, model analyzers, and GPU/CPU inference. Used extensively in production ML pipelines requiring maximum throughput.
Documentation
Documentation
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/
GettingStarted
https://github.com/triton-inference-server/tutorials
APIReference
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/customization_guide/inference_protocols.html