> ## Documentation Index
> Fetch the complete documentation index at: https://docs.lighton.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Generate a text completion

> This endpoint can be used to generate completions from a Large Language Model.

It is a simple proxy forwarding your requests to the desired model.

Any LightOn model is deployed on a vLLM-based image.

**Response Types:**
- When `stream=false` **(default)**: Returns a complete JSON response with all completion choices
- When `stream=true`: Returns Server-Sent Events (SSE) with incremental completion chunks

**Streaming Format:**

Each SSE event contains a JSON object with incremental text. The stream ends with `data: [DONE]`.


## OpenAPI

````yaml /api-reference/openapi-v2.yaml post /api/v2/completions
openapi: 3.0.3
info:
  title: Paradigm API
  version: xenial-xerus (v2)
  description: >-
    A versatile and adaptable tool designed to integrate Generative AI into your
    applications
servers:
  - url: https://paradigm.lighton.ai
security: []
tags:
  - name: Models
    description: Operations about AI models
  - name: Files Search
    description: Operations about files search
  - name: Files
    description: Operations about files
  - name: Upload Sessions
    description: Operations about upload sessions
  - name: Workspaces
    description: Operations about workspaces
  - name: Users
    description: Operations about users
  - name: Companies
    description: Operations about companies
  - name: SCIM
    description: Operations about SCIM
  - name: Feedbacks
    description: Operations about feedbacks
  - name: Reporting
    description: Operations about reporting
  - name: Monitoring
    description: Operations about monitoring
  - name: Platform Status
    description: Operations about platform status
paths:
  /api/v2/completions:
    post:
      tags:
        - Models
      summary: Generate a text completion
      description: >-
        This endpoint can be used to generate completions from a Large Language
        Model.


        It is a simple proxy forwarding your requests to the desired model.


        Any LightOn model is deployed on a vLLM-based image.


        **Response Types:**

        - When `stream=false` **(default)**: Returns a complete JSON response
        with all completion choices

        - When `stream=true`: Returns Server-Sent Events (SSE) with incremental
        completion chunks


        **Streaming Format:**


        Each SSE event contains a JSON object with incremental text. The stream
        ends with `data: [DONE]`.
      operationId: api_v2_completions_create
      requestBody:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/CompletionsRequest'
            examples:
              LightOnModelExample:
                value:
                  model: alfred-4.2
                  prompt: 'Hello, '
                summary: LightOn model example
              StreamingRequestExample:
                value:
                  model: alfred-4.2
                  prompt: 'Hello, '
                  stream: true
                summary: Streaming request example
          application/x-www-form-urlencoded:
            schema:
              $ref: '#/components/schemas/CompletionsRequest'
          multipart/form-data:
            schema:
              $ref: '#/components/schemas/CompletionsRequest'
        required: true
      responses:
        '200':
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/CompletionsResponse'
              examples:
                LightOnModelExample:
                  value:
                    model: alfred-4.2
                    prompt: 'Hello, '
                  summary: LightOn model example
                StreamingRequestExample:
                  value:
                    model: alfred-4.2
                    prompt: 'Hello, '
                    stream: true
                  summary: Streaming request example
          description: ''
      security:
        - bearerAuth: []
components:
  schemas:
    CompletionsRequest:
      type: object
      description: Request serializer for completions endpoint (OpenAI-compatible).
      properties:
        model:
          type: string
          description: >-
            Model to use for generating completions, must exist and be
            configured from the admin
        prompt:
          type: string
          description: The prompt to generate completions for
        max_tokens:
          type: integer
          description: Maximum number of tokens to generate
        temperature:
          type: number
          format: double
          description: Sampling temperature between 0 and 2
        top_p:
          type: number
          format: double
          description: Nucleus sampling parameter
        'n':
          type: integer
          description: Number of completions to generate
        stream:
          type: boolean
          description: Whether to stream back partial progress
        logprobs:
          type: integer
          nullable: true
          description: Include the log probabilities on the logprobs most likely tokens
        echo:
          type: boolean
          description: Echo back the prompt in addition to the completion
        stop:
          type: array
          items:
            type: string
          description: Up to 4 sequences where the API will stop generating further tokens
        presence_penalty:
          type: number
          format: double
          description: >-
            Penalty for new tokens based on whether they appear in the text so
            far
        frequency_penalty:
          type: number
          format: double
          description: Penalty for new tokens based on their existing frequency in the text
        best_of:
          type: integer
          description: Generates multiple completions server-side and returns the best
        logit_bias:
          type: object
          additionalProperties: {}
          description: >-
            Modify the likelihood of specified tokens appearing in the
            completion
        user:
          type: string
          description: A unique identifier representing your end-user
        suffix:
          type: string
          description: The suffix that comes after a completion of inserted text
      required:
        - model
    CompletionsResponse:
      type: object
      description: Response serializer for completions endpoint results.
      properties:
        id:
          type: string
          description: Unique identifier for the completion
        object:
          type: string
          description: Object type, always 'text_completion'
        created:
          type: integer
          description: Unix timestamp of when the completion was created
        model:
          type: string
          description: The model used for generating the completion
        choices:
          type: array
          items:
            $ref: '#/components/schemas/CompletionChoice'
          description: List of completion choices generated by the model
        usage:
          allOf:
            - $ref: '#/components/schemas/CompletionUsage'
          description: Usage statistics for the completion request
      required:
        - choices
        - created
        - id
        - model
        - object
    CompletionChoice:
      type: object
      description: Serializer for individual completion choices.
      properties:
        text:
          type: string
          description: The generated text completion
        index:
          type: integer
          description: The index of this choice in the list of choices
        logprobs:
          nullable: true
          description: Log probability information for the choice
        finish_reason:
          type: string
          nullable: true
          description: The reason the model stopped generating tokens
      required:
        - index
        - text
    CompletionUsage:
      type: object
      description: Serializer for token usage information.
      properties:
        prompt_tokens:
          type: integer
          description: Number of tokens in the prompt
        completion_tokens:
          type: integer
          description: Number of tokens in the completion
        total_tokens:
          type: integer
          description: Total number of tokens used in the request
      required:
        - completion_tokens
        - prompt_tokens
        - total_tokens
  securitySchemes:
    bearerAuth:
      type: http
      scheme: bearer
      description: >-
        Bearer authentication header of the form `Bearer <token>`, where
        `<token>` is your auth token.

````