BentoML REST API

BentoML is an open-source unified inference platform for deploying and scaling AI models. It auto-generates RESTful APIs from Python service definitions, provides built-in OpenAPI/Swagger documentation, supports adaptive batching, and integrates with KServe for Kubernetes deployment. BentoML 1.0 introduced the Runner abstraction for parallelizing inference workloads with adaptive batching and independent scaling of pre/post-processing from model inference.

API entry from apis.yml

apis.yml Raw ↑
name: BentoML REST API
description: BentoML is an open-source unified inference platform for deploying and scaling AI models.
  It auto-generates RESTful APIs from Python service definitions, provides built-in OpenAPI/Swagger documentation,
  supports adaptive batching, and integrates with KServe for Kubernetes deployment. BentoML 1.0 introduced
  the Runner abstraction for parallelizing inference workloads with adaptive batching and independent
  scaling of pre/post-processing from model inference.
image: https://www.bentoml.com/favicon.ico
humanUrl: https://www.bentoml.com/
baseUrl: https://api.bentoml.example.com
tags:
- Batching
- Inference
- Model Serving
- Open Source
- Python
- REST API
properties:
- type: Documentation
  url: https://docs.bentoml.com/en/latest/
- type: GitHub
  url: https://github.com/bentoml/BentoML
- type: GettingStarted
  url: https://docs.bentoml.com/en/latest/get-started/quickstart.html
- type: Pricing
  url: https://www.bentoml.com/pricing
- type: APIReference
  url: https://docs.bentoml.com/en/latest/reference/index.html
contact:
- type: Community
  url: https://l.bentoml.com/join-slack
- type: GitHub Issues
  url: https://github.com/bentoml/BentoML/issues