Text Generation Inference (TGI)

High-performance inference server for large language models with continuous batching, token streaming, tensor parallelism, and OpenAI-compatible chat completions endpoints.

API entry from apis.yml

apis.yml Raw ↑
aid: hugging-face-transformers:text-generation-inference
name: Text Generation Inference (TGI)
description: High-performance inference server for large language models with continuous batching, token
  streaming, tensor parallelism, and OpenAI-compatible chat completions endpoints.
humanURL: https://huggingface.co/docs/text-generation-inference/index
tags:
- Inference Server
- LLM
- Streaming
- Text Generation
properties:
- type: Documentation
  url: https://huggingface.co/docs/text-generation-inference/index
- type: GitHub Repository
  url: https://github.com/huggingface/text-generation-inference
- type: API Reference
  url: https://huggingface.co/docs/text-generation-inference/basic_tutorials/consuming_tgi
- type: Supported Models
  url: https://huggingface.co/docs/text-generation-inference/supported_models