TrueFoundry Model Serving API

TrueFoundry's Model Serving capability enables deployment and management of LLM and embedding models using backends like vLLM and Triton on Kubernetes infrastructure. It provides APIs for deploying models from a community registry of 1000+ configurations, managing inference endpoints, and controlling autoscaling behavior including scale-to-zero.

API entry from apis.yml

apis.yml Raw ↑
aid: truefoundry:truefoundry-model-serving-api
name: TrueFoundry Model Serving API
description: TrueFoundry's Model Serving capability enables deployment and management of LLM and embedding
  models using backends like vLLM and Triton on Kubernetes infrastructure. It provides APIs for deploying
  models from a community registry of 1000+ configurations, managing inference endpoints, and controlling
  autoscaling behavior including scale-to-zero.
humanURL: https://www.truefoundry.com/docs/introduction-to-a-service
baseURL: https://app.truefoundry.com
tags:
- Kubernetes
- LLM Inference
- MLOps
- Model Serving
properties:
- type: Documentation
  url: https://www.truefoundry.com/docs/introduction-to-a-service