Scalable Inference Serving

Ray Serve REST API

Ray Serve is a scalable model serving library built on Ray, designed for building online inference APIs. Supports composable deployments, autoscaling, HTTP ingress, gRPC, WebSockets, and request batching. Integrates with any ML framework. The Ray Serve dashboard and REST API manage deployments, replicas, routes, and application status.

Documentation GitHub

Documentation

📖

Documentation

https://docs.ray.io/en/latest/serve/index.html

📖

GettingStarted

https://docs.ray.io/en/latest/serve/getting_started.html

📖

APIReference

https://docs.ray.io/en/latest/serve/api/index.html

Other Resources

🔗

GitHub

https://github.com/ray-project/ray

API entry from apis.yml

name: Ray Serve REST API
description: Ray Serve is a scalable model serving library built on Ray, designed for building online
  inference APIs. Supports composable deployments, autoscaling, HTTP ingress, gRPC, WebSockets, and request
  batching. Integrates with any ML framework. The Ray Serve dashboard and REST API manage deployments,
  replicas, routes, and application status.
image: https://www.ray.io/favicon.ico
humanUrl: https://docs.ray.io/en/latest/serve/index.html
baseUrl: https://ray-serve.example.com
tags:
- Autoscaling
- Inference
- Machine Learning
- Model Serving
- Open Source
- Python
- Ray
properties:
- type: Documentation
  url: https://docs.ray.io/en/latest/serve/index.html
- type: GitHub
  url: https://github.com/ray-project/ray
- type: GettingStarted
  url: https://docs.ray.io/en/latest/serve/getting_started.html
- type: APIReference
  url: https://docs.ray.io/en/latest/serve/api/index.html
contact:
- type: Community
  url: https://discuss.ray.io/
- type: GitHub Issues
  url: https://github.com/ray-project/ray/issues