Lambda Inference API
Lambda Inference API is an OpenAI-compatible REST gateway at https://api.lambda.ai/v1 that serves hosted open-source language models (Llama, DeepSeek, Hermes, Qwen, and others) behind the standard OpenAI Chat Completions surface. Chat completion responses can be streamed as HTTP Server-Sent Events by setting "stream":true on the POST /chat/completions request body; the SSE stream emits chat.completion.chunk events terminated by a data [DONE] sentinel. As of 2026-05-29 Lambda has announced the Inference API is winding down in favor of customer self-hosted deployments on Lambda GPU instances.