Generate a chat completion
This endpoint can be used to generate chat completions from a Large Language Model.
It is a simple proxy forwarding your requests to the desired model.
Any LightOn model is deployed on a vLLM-based image.
Response Types:
- When
stream=false(default): Returns a complete JSON response with all completion choices - When
stream=true: Returns Server-Sent Events (SSE) with incremental completion chunks
Streaming Format:
Each SSE event contains a JSON object with incremental text. The stream ends with data: [DONE].
Documentation Index
Fetch the complete documentation index at: https://docs.lighton.ai/llms.txt
Use this file to discover all available pages before exploring further.
Authorizations
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Body
Request serializer for chat completions endpoint (OpenAI-compatible).
Model to use for generating chat completions, must exist and be configured from the admin
List of messages comprising the conversation so far
Maximum number of tokens to generate
Sampling temperature between 0 and 2
Nucleus sampling parameter
Number of chat completion choices to generate
Whether to stream back partial progress
Up to 4 sequences where the API will stop generating further tokens
Penalty for new tokens based on whether they appear in the text so far
Penalty for new tokens based on their existing frequency in the text
Modify the likelihood of specified tokens appearing in the completion
A unique identifier representing your end-user
List of functions the model may call
Controls how the model responds to function calls
Response
Response serializer for chat completions endpoint results.
Unique identifier for the chat completion
Object type, always 'chat.completion'
Unix timestamp of when the chat completion was created
The model used for generating the chat completion
List of chat completion choices generated by the model
Usage statistics for the chat completion request