> ## Documentation Index
> Fetch the complete documentation index at: https://docs.lighton.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Retrieve document chunks

> Invoke the document retrieval pipeline (embedding + hybrid vector search + Reranking) and return matched document chunks.

**Reranking:** applied by default (`skip_rerank=false`). When active, each result includes a `scoring.certainty` field with the reranker confidence score. Set `skip_rerank=true` to return raw retrieval order without certainty.

**Scoping:** use `workspace_id` and/or `tag_id` to narrow retrieval to specific workspaces or tags, or use `file_id` to target specific files. `file_id` cannot be combined with `workspace_id` or `tag_id` (400). A 403 is returned if any provided filter resolves to no authorized resources. When no filters are provided, retrieval runs across all authorized documents.

**Modes:**
- `text` (default): hybrid text search
- `vision`: image-based search


## OpenAPI

````yaml /api-reference/openapi-v3.yaml post /api/v3/retrieve
openapi: 3.0.3
info:
  title: Paradigm API
  version: xenial-xerus (v3)
  description: >-
    A versatile and adaptable tool designed to integrate Generative AI into your
    applications
servers:
  - url: https://paradigm.lighton.ai
security: []
tags:
  - name: Agents
    description: Operations about agents
  - name: Threads
    description: Operations about agents conversation threads
  - name: Tools
    description: Operations about native tools
  - name: Models
    description: Operations about AI models
  - name: MCP
    description: Operations about MCP servers
  - name: Sources
    description: Operations about sources used by agents conversation threads
  - name: Artifacts
    description: Operations about artifacts generated by agents conversation threads
  - name: Agent
    description: >-
      Operations about agents (deprecated). Please use the 'Agents' API
      component instead.
  - name: Files
    description: Operations about files
  - name: Files Processing
    description: Operations about files processing
  - name: Tags
    description: Operations about tags
  - name: Workspaces
    description: Operations about workspaces
  - name: Users
    description: Operations about users
  - name: User Groups
    description: Operations about user groups
  - name: Companies
    description: Operations about companies
  - name: SCIM
    description: Operations about SCIM
paths:
  /api/v3/retrieve:
    post:
      tags:
        - Files Processing
      summary: Retrieve document chunks
      description: >-
        Invoke the document retrieval pipeline (embedding + hybrid vector search
        + Reranking) and return matched document chunks.


        **Reranking:** applied by default (`skip_rerank=false`). When active,
        each result includes a `scoring.certainty` field with the reranker
        confidence score. Set `skip_rerank=true` to return raw retrieval order
        without certainty.


        **Scoping:** use `workspace_id` and/or `tag_id` to narrow retrieval to
        specific workspaces or tags, or use `file_id` to target specific files.
        `file_id` cannot be combined with `workspace_id` or `tag_id` (400). A
        403 is returned if any provided filter resolves to no authorized
        resources. When no filters are provided, retrieval runs across all
        authorized documents.


        **Modes:**

        - `text` (default): hybrid text search

        - `vision`: image-based search
      operationId: api_v3_retrieve_create
      requestBody:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/RetrieveRequest'
            examples:
              TextMode—ScopedToWorkspace:
                value:
                  query: authentication system JWT tokens
                  mode: text
                  top_k: 20
                  top_n: 5
                  workspace_id:
                    - 42
                summary: Text mode — scoped to workspace
                description: >-
                  Retrieve chunks from a specific workspace using hybrid text
                  search with reranking.
              TextMode—ScopedToFiles:
                value:
                  query: quarterly revenue forecast
                  mode: text
                  top_k: 10
                  top_n: 3
                  file_id:
                    - 101
                    - 102
                summary: Text mode — scoped to files
                description: Retrieve chunks from specific files only.
              TextMode—AcrossAllDocuments:
                value:
                  query: onboarding process
                  mode: text
                  top_k: 20
                  top_n: 5
                summary: Text mode — across all documents
                description: >-
                  Retrieve chunks across all documents the API key has access
                  to.
              VisionModeWithImage:
                value:
                  query: architecture diagram
                  mode: vision
                  top_k: 10
                  top_n: 3
                  include_image: true
                summary: Vision mode with image
                description: >-
                  Retrieve vision chunks (images/diagrams) using image-based
                  search.
              SkipRerank—RawRetrieval:
                value:
                  query: incident response playbook
                  mode: text
                  top_k: 20
                  top_n: 10
                  skip_rerank: true
                summary: Skip rerank — raw retrieval
                description: >-
                  Bypass the reranker to isolate raw vector + lexical retrieval
                  quality.
          application/x-www-form-urlencoded:
            schema:
              $ref: '#/components/schemas/RetrieveRequest'
            examples:
              TextMode—ScopedToWorkspace:
                value:
                  query: authentication system JWT tokens
                  mode: text
                  top_k: 20
                  top_n: 5
                  workspace_id:
                    - 42
                summary: Text mode — scoped to workspace
                description: >-
                  Retrieve chunks from a specific workspace using hybrid text
                  search with reranking.
              TextMode—ScopedToFiles:
                value:
                  query: quarterly revenue forecast
                  mode: text
                  top_k: 10
                  top_n: 3
                  file_id:
                    - 101
                    - 102
                summary: Text mode — scoped to files
                description: Retrieve chunks from specific files only.
              TextMode—AcrossAllDocuments:
                value:
                  query: onboarding process
                  mode: text
                  top_k: 20
                  top_n: 5
                summary: Text mode — across all documents
                description: >-
                  Retrieve chunks across all documents the API key has access
                  to.
              VisionModeWithImage:
                value:
                  query: architecture diagram
                  mode: vision
                  top_k: 10
                  top_n: 3
                  include_image: true
                summary: Vision mode with image
                description: >-
                  Retrieve vision chunks (images/diagrams) using image-based
                  search.
              SkipRerank—RawRetrieval:
                value:
                  query: incident response playbook
                  mode: text
                  top_k: 20
                  top_n: 10
                  skip_rerank: true
                summary: Skip rerank — raw retrieval
                description: >-
                  Bypass the reranker to isolate raw vector + lexical retrieval
                  quality.
          multipart/form-data:
            schema:
              $ref: '#/components/schemas/RetrieveRequest'
            examples:
              TextMode—ScopedToWorkspace:
                value:
                  query: authentication system JWT tokens
                  mode: text
                  top_k: 20
                  top_n: 5
                  workspace_id:
                    - 42
                summary: Text mode — scoped to workspace
                description: >-
                  Retrieve chunks from a specific workspace using hybrid text
                  search with reranking.
              TextMode—ScopedToFiles:
                value:
                  query: quarterly revenue forecast
                  mode: text
                  top_k: 10
                  top_n: 3
                  file_id:
                    - 101
                    - 102
                summary: Text mode — scoped to files
                description: Retrieve chunks from specific files only.
              TextMode—AcrossAllDocuments:
                value:
                  query: onboarding process
                  mode: text
                  top_k: 20
                  top_n: 5
                summary: Text mode — across all documents
                description: >-
                  Retrieve chunks across all documents the API key has access
                  to.
              VisionModeWithImage:
                value:
                  query: architecture diagram
                  mode: vision
                  top_k: 10
                  top_n: 3
                  include_image: true
                summary: Vision mode with image
                description: >-
                  Retrieve vision chunks (images/diagrams) using image-based
                  search.
              SkipRerank—RawRetrieval:
                value:
                  query: incident response playbook
                  mode: text
                  top_k: 20
                  top_n: 10
                  skip_rerank: true
                summary: Skip rerank — raw retrieval
                description: >-
                  Bypass the reranker to isolate raw vector + lexical retrieval
                  quality.
        required: true
      responses:
        '200':
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/RetrieveV3Response'
              examples:
                TextModeResult—WithReranking:
                  value:
                    query: JWT authentication
                    retrieve_params:
                      mode: text
                      top_k: 20
                      top_n: 10
                      skip_rerank: false
                      include_image: false
                    scoping_params:
                      file_id: []
                      tag_id: []
                      workspace_id:
                        - 42
                    results:
                      - chunk:
                          id: 1024
                          uuid: 550e8400-e29b-41d4-a716-446655440000
                          content_id: content_id_550e8400-e29b-41d4-a716-446655440000
                          text: >-
                            JWT tokens are signed using RS256 and expire after 1
                            hour.
                          chunk_type: text
                          metadata:
                            pages: 3-4
                            total_pages: '12'
                          created_at: '2025-11-01T09:00:00Z'
                          updated_at: '2025-11-01T09:00:00Z'
                        scoring:
                          score: 0.92
                          distance: 0.08
                          lexical_score: 0.74
                          certainty: 0.95
                        workspace:
                          id: 42
                          name: Engineering Docs
                          workspace_type: custom
                        document:
                          id: 512
                          name: auth-system.pdf
                          file_type: pdf
                          status: embedded
                          total_pages: 12
                          uploaded_at: '2025-10-30T08:00:00Z'
                          title: Authentication System Design
                          tags:
                            - id: 7
                              name: security
                              auto_assigned: false
                          external_metadata: null
                  summary: Text mode result — with reranking
                  description: >-
                    Retrieval with reranking applied (skip_rerank=false,
                    default). scoring.certainty reflects the reranker
                    confidence.
                TextModeResult—SkipRerank:
                  value:
                    query: JWT authentication
                    retrieve_params:
                      mode: text
                      top_k: 20
                      top_n: 10
                      skip_rerank: true
                      include_image: false
                    scoping_params:
                      file_id: []
                      tag_id: []
                      workspace_id: []
                    results:
                      - chunk:
                          id: 1024
                          uuid: 550e8400-e29b-41d4-a716-446655440000
                          content_id: content_id_550e8400-e29b-41d4-a716-446655440000
                          text: >-
                            JWT tokens are signed using RS256 and expire after 1
                            hour.
                          chunk_type: text
                          metadata:
                            pages: '3'
                            total_pages: '12'
                          created_at: '2025-11-01T09:00:00Z'
                          updated_at: '2025-11-01T09:00:00Z'
                        scoring:
                          score: 0.92
                          distance: 0.08
                          lexical_score: 0.74
                        workspace:
                          id: 42
                          name: Engineering Docs
                          workspace_type: custom
                        document:
                          id: 512
                          name: auth-system.pdf
                          file_type: pdf
                          status: embedded
                          total_pages: 12
                          uploaded_at: '2025-10-30T08:00:00Z'
                          title: Authentication System Design
                          tags:
                            - id: 7
                              name: security
                              auto_assigned: false
                          external_metadata: null
                  summary: Text mode result — skip rerank
                  description: >-
                    Raw retrieval without reranking (skip_rerank=true). No
                    certainty field in scoring.
                VisionModeResult:
                  value:
                    query: architecture diagram
                    retrieve_params:
                      mode: vision
                      top_k: 20
                      top_n: 10
                      skip_rerank: false
                      include_image: true
                    scoping_params:
                      file_id: []
                      tag_id: []
                      workspace_id: []
                    results:
                      - chunk:
                          id: 2048
                          uuid: 661f9511-f3ac-52e5-b827-557766551111
                          metadata:
                            pages: '4'
                            total_pages: '20'
                          created_at: '2025-11-02T10:00:00Z'
                          updated_at: '2025-11-02T10:00:00Z'
                        scoring:
                          score: 0.85
                          distance: 0.15
                          certainty: 0.88
                        workspace:
                          id: 42
                          name: Engineering Docs
                          workspace_type: custom
                        document:
                          id: 513
                          name: infra-overview.pdf
                          file_type: pdf
                          status: embedded
                          total_pages: 20
                          uploaded_at: '2025-10-31T08:00:00Z'
                          title: Infrastructure Overview
                          tags: []
                          external_metadata: null
                        corresponding_image:
                          b64_content: iVBORw0KGgo...
                  summary: Vision mode result
                  description: Vision chunk result with base64-encoded image.
                NoMatchingDocuments:
                  value:
                    results: []
                  summary: No matching documents
                  description: Query returned no results — empty array with HTTP 200.
          description: Chunks retrieved successfully. Empty array if no documents match.
        '400':
          description: >-
            Validation error — missing or invalid fields, or incompatible filter
            combination.
        '401':
          description: Authentication credentials were not provided or are invalid.
        '403':
          description: >-
            A provided filter (workspace_id, file_id, or tag_id) does not exist
            or the user is not authorized to access it.
        '429':
          description: Rate limit exceeded.
        '503':
          description: >-
            Reranker service is unavailable. Retry or set skip_rerank=true to
            bypass it.
      security:
        - bearerAuth: []
components:
  schemas:
    RetrieveRequest:
      type: object
      properties:
        query:
          type: string
          description: Natural-language search query.
        mode:
          allOf:
            - $ref: '#/components/schemas/ModeEnum'
          default: text
          description: >-
            Retrieval pipeline: "text" (hybrid search on DocumentChunk) or
            "vision" (image-based on VisionChunk).


            * `text` - text

            * `vision` - vision
        top_k:
          type: integer
          maximum: 100
          minimum: 1
          default: 20
          description: >-
            Number of chunks retrieved from the vector store before reranking.
            Range: 1–100.
        top_n:
          type: integer
          maximum: 50
          minimum: 1
          default: 10
          description: >-
            Number of chunks returned after filtering. Range: 1–50. Must be ≤
            top_k.
        workspace_id:
          type: array
          items:
            type: integer
          description: Scope retrieval to specific workspace IDs (authorized only).
        file_id:
          type: array
          items:
            type: integer
          description: Scope retrieval to specific file IDs (authorized only).
        tag_id:
          type: array
          items:
            type: integer
          description: >-
            Scope retrieval to documents with any of these tag IDs
            (company-scoped).
        skip_rerank:
          type: boolean
          default: false
          description: Skip reranking. Useful to isolate retrieval quality.
        include_image:
          type: boolean
          default: false
          description: Include base64-encoded page image in each result.
      required:
        - query
    RetrieveV3Response:
      type: object
      properties:
        query:
          type: string
          description: The search query that was executed.
        retrieve_params:
          allOf:
            - $ref: '#/components/schemas/RetrieveParams'
          description: Retrieval parameters used (including defaults).
        scoping_params:
          allOf:
            - $ref: '#/components/schemas/ScopingParams'
          description: Scoping parameters used to narrow retrieval.
        results:
          type: array
          items:
            $ref: '#/components/schemas/RetrieveResultItem'
          description: Retrieved chunks with context
      required:
        - query
        - results
        - retrieve_params
        - scoping_params
    ModeEnum:
      enum:
        - text
        - vision
      type: string
      description: |-
        * `text` - text
        * `vision` - vision
    RetrieveParams:
      type: object
      properties:
        mode:
          type: string
          description: 'Retrieval mode used: "text" or "vision".'
        top_k:
          type: integer
          description: Number of chunks retrieved from the vector store before reranking.
        top_n:
          type: integer
          description: Number of chunks returned after filtering.
        skip_rerank:
          type: boolean
          description: Whether reranking was skipped.
        include_image:
          type: boolean
          description: Whether page images are included.
      required:
        - include_image
        - mode
        - skip_rerank
        - top_k
        - top_n
    ScopingParams:
      type: object
      properties:
        file_id:
          type: array
          items:
            type: integer
          description: File IDs used for scoping.
        tag_id:
          type: array
          items:
            type: integer
          description: Tag IDs used for scoping.
        workspace_id:
          type: array
          items:
            type: integer
          description: Workspace IDs used for scoping.
      required:
        - file_id
        - tag_id
        - workspace_id
    RetrieveResultItem:
      type: object
      properties:
        chunk:
          type: object
          additionalProperties: {}
          description: >-
            Chunk data. Text mode: id, uuid, content_id, text, chunk_type,
            metadata, created_at, updated_at. Vision mode: id, uuid, metadata,
            created_at, updated_at.
        scoring:
          allOf:
            - $ref: '#/components/schemas/RetrieveScoring'
          description: Relevance scores for this chunk
        workspace:
          allOf:
            - $ref: '#/components/schemas/WorkspaceInFileResponseSerializerV3'
          nullable: true
          description: >-
            Workspace the document belongs to. Null if the document has no
            workspace.
        document:
          allOf:
            - $ref: '#/components/schemas/RetrieveDocument'
          description: Source document
        corresponding_image:
          allOf:
            - $ref: '#/components/schemas/RetrieveCorrespondingImage'
          description: Page image (present when include_image=true)
      required:
        - chunk
        - document
        - scoring
        - workspace
    RetrieveScoring:
      type: object
      properties:
        score:
          type: number
          format: double
          description: Overall relevance score
        distance:
          type: number
          format: double
          description: Vector distance from query embedding
        lexical_score:
          type: number
          format: double
          nullable: true
          description: BM25 lexical score (text mode only)
        certainty:
          type: number
          format: double
          nullable: true
          description: Reranker confidence score (present when skip_rerank=false)
      required:
        - distance
        - score
    WorkspaceInFileResponseSerializerV3:
      type: object
      description: Minimal workspace info for file responses.
      properties:
        id:
          type: integer
          description: Workspace ID
        name:
          type: string
          description: Workspace name
        workspace_type:
          type: string
          description: Workspace type (company, personal, or custom)
      required:
        - id
        - name
        - workspace_type
    RetrieveDocument:
      type: object
      properties:
        id:
          type: integer
          readOnly: true
        name:
          type: string
          readOnly: true
          description: Document filename
        file_type:
          type: string
          nullable: true
          maxLength: 30
        status:
          $ref: '#/components/schemas/Status88dEnum'
        total_pages:
          type: integer
          readOnly: true
          description: Total number of pages
        uploaded_at:
          type: string
          format: date-time
          readOnly: true
        title:
          type: string
          nullable: true
          maxLength: 255
        tags:
          type: array
          items:
            $ref: '#/components/schemas/TagItem'
          readOnly: true
          description: Tags associated with the document
        external_metadata:
          allOf:
            - $ref: '#/components/schemas/RetrieveDocumentExternalMetadata'
          nullable: true
          readOnly: true
          description: External metadata, if any
      required:
        - external_metadata
        - id
        - name
        - tags
        - total_pages
        - uploaded_at
    RetrieveCorrespondingImage:
      type: object
      properties:
        b64_content:
          type: string
          description: >-
            Base64-encoded page image. Empty string when the image is
            unavailable.
      required:
        - b64_content
    Status88dEnum:
      enum:
        - pending
        - parsing
        - parsing_failed
        - embedding
        - embedding_failed
        - embedded
        - fail
        - updating
      type: string
      description: |-
        * `pending` - Pending
        * `parsing` - Parsing
        * `parsing_failed` - Parsing Failed
        * `embedding` - Embedding
        * `embedding_failed` - Embedding Failed
        * `embedded` - Embedded
        * `fail` - Fail
        * `updating` - Updating
    TagItem:
      type: object
      description: Serializer for tag items in file list response.
      properties:
        id:
          type: integer
          description: Tag ID
        name:
          type: string
          description: Tag name
        auto_assigned:
          type: boolean
          description: >-
            True if this tag was automatically assigned by the system, False if
            manually assigned by a user
      required:
        - auto_assigned
        - id
        - name
    RetrieveDocumentExternalMetadata:
      type: object
      properties:
        external_id:
          type: string
          description: External document ID
        doc_type:
          type: string
          title: External document type
          maxLength: 255
        additional_metadata:
          nullable: true
      required:
        - external_id
  securitySchemes:
    bearerAuth:
      type: http
      scheme: bearer
      description: >-
        Bearer authentication header of the form `Bearer <token>`, where
        `<token>` is your auth token.

````