> ## Documentation Index
> Fetch the complete documentation index at: https://docs.lighton.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Upload Files to Paradigm

> Upload documents to Paradigm using the V3 Files API - simplified one-step process with support for single files and batch uploads.

## Overview

The V3 Files API provides a **simple, one-step upload process** for adding documents to your Paradigm workspace. Upload a file with a single API call, and Paradigm handles the rest - parsing, indexing, and making your documents searchable.

**Key features:**

* **Asynchronous processing** - Files are queued and processed in the background, so uploads return immediately
* **Automatic upload sessions** - Files are automatically grouped into sessions for efficient batch processing
* **Direct tag assignment** - Organize documents by applying tags during upload
* **Flexible configuration** - Override parser selection or use automatic detection (default)
* **Progress tracking** - Monitor processing status via GET endpoints

## Prerequisites

### Required

* **Paradigm API key**: Generate one at `/settings/api-key` in your Paradigm instance
* **Workspace ID**: The ID of the workspace where documents will be stored

### How to Get Your Workspace ID

You can find your workspace ID in several ways:

1. **From the admin panel**: Navigate to your workspace in the admin interface and check the URL or workspace details
2. **From the API**: List the workspaces you have access to with `GET /api/v3/workspaces`

```bash theme={null}
curl $PARADIGM_BASE_URL/api/v3/workspaces \
  -H "Authorization: Bearer $PARADIGM_API_KEY"
```

### File Requirements

* **Maximum file size**: 25MB per file by default (or custom with MAX\_DOCUMENT\_SIZE config key of your instance)
* **Supported formats**: PDF, DOCX, DOC, PPTX, PPT, TXT, MD, Markdown, HTML, XLSX, XLS, CSV, RTF, ODT, ODS, ODP and more

## Quick Start

The simplest upload requires just a file and workspace ID:

### Python

```python theme={null}
import requests
import os

api_key = os.getenv("PARADIGM_API_KEY")
base_url = os.getenv("PARADIGM_BASE_URL", "https://paradigm.lighton.ai")

response = requests.post(
    f"{base_url}/api/v3/files",
    headers={"Authorization": f"Bearer {api_key}"},
    files={"file": open("document.pdf", "rb")},
    data={"workspace_id": 42}
)

print(response.json())
```

### cURL

```bash theme={null}
curl $PARADIGM_BASE_URL/api/v3/files \
  -H "Authorization: Bearer $PARADIGM_API_KEY" \
  -F "file=@document.pdf" \
  -F "workspace_id=42"
```

### Response

```json theme={null}
{
  "id": 12345,
  "filename": "document.pdf",
  "workspace": {"id": 42, "name": "My Workspace", "workspace_type": "custom"},
  "summaries": [],
  "title": "document",
  "extension": "pdf",
  "status": "pending",
  "status_vision": null,
  "created_at": "2025-03-01T10:30:00Z",
  "updated_at": "2025-03-01T10:30:00Z",
  "total_pages": 0,
  "tags": [],
  "created_by": {"id": 1, "first_name": "Jane", "last_name": "Doe", "username": "jdoe"},
  "upload_session_uuid": "550e8400-e29b-41d4-a716-446655440000",
  "external_metadata": null,
  "message": "File queued for processing"
}
```

## Upload Parameters

### Required Parameters

| Parameter      | Type    | Description                                 |
| -------------- | ------- | ------------------------------------------- |
| `file`         | binary  | The file to upload                          |
| `workspace_id` | integer | Workspace where the document will be stored |

### Optional Parameters

| Parameter  | Type              | Description                                                                                                             |
| ---------- | ----------------- | ----------------------------------------------------------------------------------------------------------------------- |
| `title`    | string            | Custom title for the document (defaults to filename without extension)                                                  |
| `filename` | string            | Override the uploaded filename                                                                                          |
| `parser`   | string            | Specify ingestion pipeline (e.g., "v2.2.1", "v2.1") - defaults to automatic selection                                   |
| `tags`     | array of integers | Tag IDs to assign to the document on upload (tags must belong to your company and you must have permission to use them) |

### Examples with Optional Parameters

**Custom title and tags:**

```python theme={null}
response = requests.post(
    f"{base_url}/api/v3/files",
    headers={"Authorization": f"Bearer {api_key}"},
    files={"file": open("Q4_report.pdf", "rb")},
    data={
        "workspace_id": 42,
        "title": "Q4 Financial Report 2025",
        "tags": [1, 2]  # Tag IDs
    }
)
```

**Custom parser:**

```python theme={null}
response = requests.post(
    f"{base_url}/api/v3/files",
    headers={"Authorization": f"Bearer {api_key}"},
    files={"file": open("technical_doc.pdf", "rb")},
    data={
        "workspace_id": 42,
        "parser": "v2.2.1"
    }
)
```

## Tracking Upload Status

After uploading, files are processed asynchronously. Track their progress using the file ID.

### Check Individual File Status

```python theme={null}
file_id = 12345
response = requests.get(
    f"{base_url}/api/v3/files/{file_id}",
    headers={"Authorization": f"Bearer {api_key}"}
)

status = response.json()["status"]
print(f"Processing status: {status}")
```

### Understanding Status Values

| Status     | Description                                               |
| ---------- | --------------------------------------------------------- |
| `pending`  | File uploaded, waiting to be processed                    |
| `parsing`  | Currently being parsed and processed                      |
| `embedded` | Successfully processed and available for search           |
| `failed`   | Processing failed (check `status_detail` field for error) |

## Batch Upload: Multiple Files

When uploading multiple files, Paradigm automatically creates an **upload session** to group your files together. Files are queued and processed asynchronously in the background, allowing you to upload large batches without waiting for processing to complete.

Each file upload returns an `upload_session_uuid` that you can use to track all files in the batch. The upload session handles rate limiting and ensures efficient processing of your documents.

For uploading many files efficiently, use the provided batch upload script or implement your own concurrent upload logic.

### Track All Files in a Batch Upload

After uploading multiple files, you can filter by the `upload_session_uuid` returned in each upload response:

```python theme={null}
upload_session_uuid = "550e8400-e29b-41d4-a716-446655440000"

response = requests.get(
    f"{base_url}/api/v3/files",
    headers={"Authorization": f"Bearer {api_key}"},
    params={"upload_session_uuid": upload_session_uuid}
)

files = response.json()["results"]
for file in files:
    print(f"{file['filename']}: {file['status']}")
```

This is particularly useful when monitoring the progress of batch uploads.

### Using the Batch Upload Script

<Card icon="download" href="https://drive.google.com/file/d/1-z8a2GhuZJY0Zibae1L1Pa0aRdyEEF6K/view?usp=sharing" title="batch_upload_v3.py">
  Production-ready async batch upload script with concurrent uploads, progress tracking, and resume capability.
</Card>

**Requirements:**

* Python 3.10 or higher
* [uv](https://docs.astral.sh/uv/) — dependencies are installed automatically when running with `uv run`

**Basic usage:**

```bash theme={null}
uv run batch_upload_v3.py \
  --api-key="your_api_key" \
  --base-url="https://paradigm.lighton.ai" \
  --files-dir="/path/to/documents" \
  --workspace-id=42
```

**With options:**

```bash theme={null}
uv run batch_upload_v3.py \
  --files-dir="/path/to/documents" \
  --workspace-id=42 \
  --batch-size=20 \
  --max-fails=5 \
  --tags=1,2,3 \
  --state-file="upload_state.json"
```

**Upload only specific file types:**

```bash theme={null}
# Only upload PDFs and Word documents
uv run batch_upload_v3.py \
  --files-dir="/path/to/documents" \
  --workspace-id=42 \
  --include-extensions=pdf,docx,doc

# Upload all files except temporary files and system files
uv run batch_upload_v3.py \
  --files-dir="/path/to/documents" \
  --workspace-id=42 \
  --exclude-extensions=tmp,log,.DS_Store,.gitkeep
```

### Script Arguments

| Argument               | Description                                                                 | Default                       |
| ---------------------- | --------------------------------------------------------------------------- | ----------------------------- |
| `--api-key`            | Paradigm API key (or set `PARADIGM_API_KEY` env var)                        | Required                      |
| `--base-url`           | Paradigm instance URL (or set `PARADIGM_BASE_URL` env var)                  | `https://paradigm.lighton.ai` |
| `--files-dir`          | Directory containing files to upload (scans recursively)                    | Required                      |
| `--workspace-id`       | Workspace ID where files will be stored                                     | Required                      |
| `--batch-size`         | Number of concurrent uploads (max: 50)                                      | `10`                          |
| `--max-fails`          | Stop after N failures (must be >= 1)                                        | `1`                           |
| `--tags`               | Comma-separated tag IDs to apply to all files                               | None                          |
| `--state-file`         | JSON file to track progress and enable resume                               | None                          |
| `--include-extensions` | Only upload files with these extensions or filenames (e.g., `pdf,docx,txt`) | None (all files)              |
| `--exclude-extensions` | Skip files with these extensions or filenames (e.g., `tmp,log,.DS_Store`)   | None                          |

### Script Features

* **High throughput** - concurrent uploads optimized for speed (default: 10 concurrent, max: 50)
* **Recursive scanning** - automatically finds all files in subdirectories
* **Progress tracking** - real-time progress bar with upload statistics
* **Error resilience** - stops after first failure by default (configurable with `--max-fails`)
* **Smart error handling** - automatically skips files with unsupported extensions and files >100MB (doesn't count as failures)
* **Resume capability** - use `--state-file` to resume interrupted uploads
* **Bulk tagging** - apply tags to all uploaded files automatically

### Example: Resume After Interruption

If your upload was interrupted, resume using the state file:

```bash theme={null}
# First attempt (interrupted)
uv run batch_upload_v3.py \
  --files-dir="/path/to/documents" \
  --workspace-id=42 \
  --state-file="upload_state.json"

# Resume (skips already uploaded files)
uv run batch_upload_v3.py \
  --files-dir="/path/to/documents" \
  --workspace-id=42 \
  --state-file="upload_state.json"
```

## Migration from API V2

If you're currently using the V2 upload API, here's what changed:

### What's Different

**V2 required two steps:**

1. Create an upload session: `POST /api/v2/upload-session`
2. Upload files to the session: `POST /api/v2/upload-session/{uuid}`

**V3 is a single step:**

* Just upload directly: `POST /api/v3/files`

### What Was Removed

The following concepts from V2 are no longer needed:

* **Upload session management** - Sessions are created automatically in the background
* **Collection types** - Simply use `workspace_id` instead of `collection_type` and `collection`
* **OCR configuration** - Processing settings are applied automatically (or override with `parser` parameter)
* **Session activation/deactivation** - Handled automatically
* **`purpose field`** - No longer needed

### What's New

V3 adds new capabilities not available in V2:

* **Direct tag assignment** - Use the `tags` parameter to tag documents on upload
* **Simplified status tracking** - Filter files by `upload_session_uuid` to track batch uploads

### Tracking Progress

**V2:** Track session status with `GET /api/v2/upload-session/{uuid}`

**V3:** Filter files by upload session UUID:

```python theme={null}
# Get upload session UUID from upload response
upload_session_uuid = response.json()["upload_session_uuid"]

# Track all files in this batch
files = requests.get(
    f"{base_url}/api/v3/files",
    headers={"Authorization": f"Bearer {api_key}"},
    params={"upload_session_uuid": upload_session_uuid}
).json()["results"]
```

The V3 API is more intuitive and requires less code while maintaining the same reliability and performance as V2.
