vLLM

vLLM OpenAI-Compatible Server

OpenAI-compatible REST API exposed by `vllm serve`. Endpoints include /v1/chat/completions, /v1/completions, /v1/embeddings, /v1/score, /v1/audio/transcriptions, /v1/audio/translations, /v1/realtime (WebSocket), /tokenize, /detokenize, and /generative_scoring. Authentication via the --api-key flag set on server start; clients can use the official OpenAI Python library unmodified, with vLLM-specific extensions passed via extra_body.

Documentation GitHub

Documentation

📖

Documentation

https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html

Other Resources

🔗

GitHub

https://github.com/vllm-project/vllm

🔗

OpenAICompat

https://platform.openai.com/docs/api-reference

API entry from apis.yml

aid: vllm:openai-compatible
name: vLLM OpenAI-Compatible Server
description: OpenAI-compatible REST API exposed by `vllm serve`. Endpoints include /v1/chat/completions,
  /v1/completions, /v1/embeddings, /v1/score, /v1/audio/transcriptions, /v1/audio/translations, /v1/realtime
  (WebSocket), /tokenize, /detokenize, and /generative_scoring. Authentication via the --api-key flag
  set on server start; clients can use the official OpenAI Python library unmodified, with vLLM-specific
  extensions passed via extra_body.
humanURL: https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html
baseURL: http://localhost:8000/v1
tags:
- Chat
- Completions
- Embeddings
- Audio
- Score
- OpenAI-Compatible
properties:
- type: Documentation
  url: https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html
- type: GitHub
  url: https://github.com/vllm-project/vllm
- type: OpenAICompat
  url: https://platform.openai.com/docs/api-reference