> ## Documentation Index
> Fetch the complete documentation index at: https://docs.lighton.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Generate a chat completion

> This endpoint can be used to generate chat completions from a Large Language Model.

It is a simple proxy forwarding your requests to the desired model.

Any LightOn model is deployed on a vLLM-based image.

**Response Types:**
- When `stream=false` **(default)**: Returns a complete JSON response with all completion choices
- When `stream=true`: Returns Server-Sent Events (SSE) with incremental completion chunks

**Streaming Format:**

Each SSE event contains a JSON object with incremental text. The stream ends with `data: [DONE]`.



## OpenAPI

````yaml /api-reference/openapi-v3.yaml post /api/v3/chat/completions
openapi: 3.0.3
info:
  title: Paradigm API
  version: xenial-xerus (v3)
  description: >-
    A versatile and adaptable tool designed to integrate Generative AI into your
    applications
servers:
  - url: https://paradigm.lighton.ai
security: []
tags:
  - name: Agents
    description: Operations about agents
  - name: Threads
    description: Operations about agents conversation threads
  - name: Tools
    description: Operations about native tools
  - name: Models
    description: Operations about AI models
  - name: MCP
    description: Operations about MCP servers
  - name: Sources
    description: Operations about sources used by agents conversation threads
  - name: Artifacts
    description: Operations about artifacts generated by agents conversation threads
  - name: Agent
    description: >-
      Operations about agents (deprecated). Please use the 'Agents' API
      component instead.
  - name: Files
    description: Operations about files
  - name: Files Processing
    description: Operations about files processing
  - name: Tags
    description: Operations about tags
  - name: Workspaces
    description: Operations about workspaces
  - name: Users
    description: Operations about users
  - name: User Groups
    description: Operations about user groups
  - name: Companies
    description: Operations about companies
  - name: SCIM
    description: Operations about SCIM
paths:
  /api/v3/chat/completions:
    post:
      tags:
        - Models
      summary: Generate a chat completion
      description: >-
        This endpoint can be used to generate chat completions from a Large
        Language Model.


        It is a simple proxy forwarding your requests to the desired model.


        Any LightOn model is deployed on a vLLM-based image.


        **Response Types:**

        - When `stream=false` **(default)**: Returns a complete JSON response
        with all completion choices

        - When `stream=true`: Returns Server-Sent Events (SSE) with incremental
        completion chunks


        **Streaming Format:**


        Each SSE event contains a JSON object with incremental text. The stream
        ends with `data: [DONE]`.
      operationId: api_v3_chat_completions_create
      requestBody:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/ChatCompletionsRequest'
            examples:
              LightOnModelExample:
                value:
                  model: alfred-4.2
                  messages:
                    - role: system
                      content: You are a helpful assistant.
                    - role: user
                      content: Hello!
                summary: LightOn model example
              StreamingRequestExample:
                value:
                  model: alfred-4.2
                  messages:
                    - role: system
                      content: You are a helpful assistant.
                    - role: user
                      content: Hello!
                  stream: true
                summary: Streaming request example
          application/x-www-form-urlencoded:
            schema:
              $ref: '#/components/schemas/ChatCompletionsRequest'
          multipart/form-data:
            schema:
              $ref: '#/components/schemas/ChatCompletionsRequest'
        required: true
      responses:
        '200':
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ChatCompletionsResponse'
              examples:
                LightOnModelExample:
                  value:
                    model: alfred-4.2
                    messages:
                      - role: system
                        content: You are a helpful assistant.
                      - role: user
                        content: Hello!
                  summary: LightOn model example
                StreamingRequestExample:
                  value:
                    model: alfred-4.2
                    messages:
                      - role: system
                        content: You are a helpful assistant.
                      - role: user
                        content: Hello!
                    stream: true
                  summary: Streaming request example
          description: ''
      security:
        - bearerAuth: []
components:
  schemas:
    ChatCompletionsRequest:
      type: object
      description: Request serializer for chat completions endpoint (OpenAI-compatible).
      properties:
        model:
          type: string
          description: >-
            Model to use for generating chat completions, must exist and be
            configured from the admin
        messages:
          type: array
          items:
            $ref: '#/components/schemas/ChatMessage'
          description: List of messages comprising the conversation so far
        max_tokens:
          type: integer
          description: Maximum number of tokens to generate
        temperature:
          type: number
          format: double
          description: Sampling temperature between 0 and 2
        top_p:
          type: number
          format: double
          description: Nucleus sampling parameter
        'n':
          type: integer
          description: Number of chat completion choices to generate
        stream:
          type: boolean
          description: Whether to stream back partial progress
        stop:
          type: array
          items:
            type: string
          description: Up to 4 sequences where the API will stop generating further tokens
        presence_penalty:
          type: number
          format: double
          description: >-
            Penalty for new tokens based on whether they appear in the text so
            far
        frequency_penalty:
          type: number
          format: double
          description: Penalty for new tokens based on their existing frequency in the text
        logit_bias:
          type: object
          additionalProperties: {}
          description: >-
            Modify the likelihood of specified tokens appearing in the
            completion
        user:
          type: string
          description: A unique identifier representing your end-user
        functions:
          type: array
          items:
            type: object
            additionalProperties: {}
          description: List of functions the model may call
        function_call:
          type: string
          description: Controls how the model responds to function calls
      required:
        - messages
        - model
    ChatCompletionsResponse:
      type: object
      description: Response serializer for chat completions endpoint results.
      properties:
        id:
          type: string
          description: Unique identifier for the chat completion
        object:
          type: string
          description: Object type, always 'chat.completion'
        created:
          type: integer
          description: Unix timestamp of when the chat completion was created
        model:
          type: string
          description: The model used for generating the chat completion
        choices:
          type: array
          items:
            $ref: '#/components/schemas/ChatCompletionChoice'
          description: List of chat completion choices generated by the model
        usage:
          allOf:
            - $ref: '#/components/schemas/CompletionUsage'
          description: Usage statistics for the chat completion request
      required:
        - choices
        - created
        - id
        - model
        - object
    ChatMessage:
      type: object
      description: Serializer for individual chat messages.
      properties:
        role:
          allOf:
            - $ref: '#/components/schemas/ChatMessageRoleEnum'
          description: |-
            The role of the message author

            * `system` - system
            * `user` - user
            * `assistant` - assistant
            * `function` - function
            * `tool` - tool
        content:
          oneOf:
            - type: string
              description: Plain text content
            - type: array
              items:
                type: object
              description: Structured content as a list of objects
          nullable: true
          description: The content of the message
        name:
          type: string
          description: Name of the message author (for function calls)
        function_call:
          type: object
          additionalProperties: {}
          description: Function call details (for assistant messages)
      required:
        - role
    ChatCompletionChoice:
      type: object
      description: Serializer for individual chat completion choices.
      properties:
        index:
          type: integer
          description: The index of this choice in the list of choices
        message:
          allOf:
            - $ref: '#/components/schemas/ChatMessageResponse'
          description: The chat message generated by the model
        finish_reason:
          type: string
          nullable: true
          description: The reason the model stopped generating tokens
      required:
        - index
        - message
    CompletionUsage:
      type: object
      description: Serializer for token usage information.
      properties:
        prompt_tokens:
          type: integer
          description: Number of tokens in the prompt
        completion_tokens:
          type: integer
          description: Number of tokens in the completion
        total_tokens:
          type: integer
          description: Total number of tokens used in the request
      required:
        - completion_tokens
        - prompt_tokens
        - total_tokens
    ChatMessageRoleEnum:
      enum:
        - system
        - user
        - assistant
        - function
        - tool
      type: string
      description: |-
        * `system` - system
        * `user` - user
        * `assistant` - assistant
        * `function` - function
        * `tool` - tool
    ChatMessageResponse:
      type: object
      description: Serializer for chat message in responses.
      properties:
        role:
          type: string
          description: The role of the message author
        content:
          type: string
          nullable: true
          description: The content of the message
        function_call:
          type: object
          additionalProperties: {}
          description: Function call details (if applicable)
      required:
        - role
  securitySchemes:
    bearerAuth:
      type: http
      scheme: bearer
      description: >-
        Bearer authentication header of the form `Bearer <token>`, where
        `<token>` is your auth token.

````