Scalable Inference Serving

KServe Open Inference Protocol API

KServe implements the Open Inference Protocol (OIP), also known as the KServe V2 Inference Protocol, which provides a standardized REST and gRPC interface for model inference across frameworks. KServe is a standardized distributed generative and predictive AI inference platform for scalable, multi-framework deployment on Kubernetes. CNCF incubating project since November 2025. Supports TensorFlow, PyTorch, scikit-learn, XGBoost, ONNX, vLLM, and HuggingFace.

Documentation GitHub OpenAPI

Documentation

https://kserve.github.io/website/docs/intro

https://kserve.github.io/website/docs/get_started/

Specifications

https://raw.githubusercontent.com/api-evangelist/scalable-inference-serving/main/openapi/kserve-open-inference-protocol-openapi.yml

Other Resources

https://github.com/kserve/kserve

https://github.com/kserve/kserve/releases

https://kserve.github.io/website/latest/reference/swagger-ui/

NaftikoCapability

https://raw.githubusercontent.com/api-evangelist/scalable-inference-serving/refs/heads/main/capabilities/kserve-open-inference-protocol-health.yaml

NaftikoCapability

https://raw.githubusercontent.com/api-evangelist/scalable-inference-serving/refs/heads/main/capabilities/kserve-open-inference-protocol-inference.yaml

NaftikoCapability

https://raw.githubusercontent.com/api-evangelist/scalable-inference-serving/refs/heads/main/capabilities/kserve-open-inference-protocol-metadata.yaml

NaftikoCapability

https://raw.githubusercontent.com/api-evangelist/scalable-inference-serving/refs/heads/main/capabilities/kserve-open-inference-protocol-models.yaml

OpenAPI Specification