> ## Documentation Index
> Fetch the complete documentation index at: https://docs.lighton.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Preview a seed dataset

> Generate a preview of what a seed dataset would contain (no DB writes).

Returns sample documents with realistic metadata, content type
assignments, and namespaced attribute values. ~50% of documents are
assigned multiple content types (cross-vertical when possible) to
demonstrate multi-classification.

**Response sections:**
- `meta` — content type codes, link to content-types API, generation stats
- `summary` — rolled-up facet counts (content types, workspaces, statuses)
  ready for UI sidebar consumption
- `sample_documents` — individual documents with attribute values
  namespaced by content type path

**Attribute values** are keyed by content type path to avoid collisions
when a document has multiple classifications:
```json
{
  "legal:contract:nda": {"counterparty": "Acme"},
  "tech:security:threat-model": {"threat_level": "High"}
}
```

**Content type definitions** are NOT included in the response.
Use the `content_types_url` in `meta` to fetch full tree + attribute
schemas.



## OpenAPI

````yaml /api-reference/openapi-v3.yaml get /api/v3/facet/seed/preview
openapi: 3.0.3
info:
  title: Paradigm API
  version: xenial-xerus (v3)
  description: >-
    A versatile and adaptable tool designed to integrate Generative AI into your
    applications
servers:
  - url: https://paradigm.lighton.ai
security: []
tags:
  - name: Agents
    description: Operations about agents
  - name: Threads
    description: Operations about agents conversation threads
  - name: Tools
    description: Operations about native tools
  - name: Models
    description: Operations about AI models
  - name: MCP
    description: Operations about MCP servers
  - name: Sources
    description: Operations about sources used by agents conversation threads
  - name: Artifacts
    description: Operations about artifacts generated by agents conversation threads
  - name: Agent
    description: >-
      Operations about agents (deprecated). Please use the 'Agents' API
      component instead.
  - name: Files
    description: Operations about files
  - name: Files Processing
    description: Operations about files processing
  - name: Tags
    description: Operations about tags
  - name: Workspaces
    description: Operations about workspaces
  - name: Users
    description: Operations about users
  - name: User Groups
    description: Operations about user groups
  - name: Companies
    description: Operations about companies
  - name: SCIM
    description: Operations about SCIM
paths:
  /api/v3/facet/seed/preview:
    get:
      tags:
        - Facets
      summary: Preview a seed dataset
      description: >-
        Generate a preview of what a seed dataset would contain (no DB writes).


        Returns sample documents with realistic metadata, content type

        assignments, and namespaced attribute values. ~50% of documents are

        assigned multiple content types (cross-vertical when possible) to

        demonstrate multi-classification.


        **Response sections:**

        - `meta` — content type codes, link to content-types API, generation
        stats

        - `summary` — rolled-up facet counts (content types, workspaces,
        statuses)
          ready for UI sidebar consumption
        - `sample_documents` — individual documents with attribute values
          namespaced by content type path

        **Attribute values** are keyed by content type path to avoid collisions

        when a document has multiple classifications:

        ```json

        {
          "legal:contract:nda": {"counterparty": "Acme"},
          "tech:security:threat-model": {"threat_level": "High"}
        }

        ```


        **Content type definitions** are NOT included in the response.

        Use the `content_types_url` in `meta` to fetch full tree + attribute

        schemas.
      operationId: api_v3_facet_seed_preview_retrieve
      parameters:
        - in: query
          name: content_types
          schema:
            type: string
          description: >-
            Content type code(s) to include in the preview (comma-separated,
            e.g., `?content_types=legal,tech`). Available: finance, healthcare,
            legal, manufacturing, patent, sic, tech
          required: true
          examples:
            SingleVertical:
              value: legal
              summary: Single vertical
              description: Preview with legal documents only
            Multi-vertical:
              value: legal,tech
              description: Preview with cross-vertical document classification
        - in: query
          name: preview_max
          schema:
            type: integer
            default: 20
          description: Maximum number of sample documents to generate (1–100).
        - in: query
          name: scale
          schema:
            type: string
            enum:
              - enterprise
              - medium
              - private
              - small
            default: small
          description: >-
            Dataset scale controlling workspace layout and document statuses.
            private=1 workspace, all embedded. small=3, medium=5, enterprise=8
            workspaces.
      responses:
        '200':
          description: Preview generated successfully
        '400':
          description: Invalid parameters
        '401':
          description: Authentication required
      security:
        - bearerAuth: []
components:
  securitySchemes:
    bearerAuth:
      type: http
      scheme: bearer
      description: >-
        Bearer authentication header of the form `Bearer <token>`, where
        `<token>` is your auth token.

````