Parse a document to Markdown via VLM

curl --request POST \
  --url https://paradigm.lighton.ai/api/v3/ocr \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form 'file=<string>'

import requests

url = "https://paradigm.lighton.ai/api/v3/ocr"

payload = "-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"file\"\r\n\r\n<string>\r\n-----011000010111000001101001--"
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "multipart/form-data"
}

response = requests.post(url, data=payload, headers=headers)

print(response.text)

const form = new FormData();
form.append('file', '<string>');

const options = {method: 'POST', headers: {Authorization: 'Bearer <token>'}};

options.body = form;

fetch('https://paradigm.lighton.ai/api/v3/ocr', options)
  .then(res => res.json())
  .then(res => console.log(res))
  .catch(err => console.error(err));

{
  "model": "LightOnOCR",
  "total_pages": 3,
  "pages_parsed": [
    1,
    2,
    3
  ],
  "processing_time_ms": 4520,
  "enable_antilooping": true,
  "sampling_params": {
    "temperature": 0.2,
    "max_tokens": 5888,
    "repetition_penalty": null
  },
  "pages": [
    {
      "page_number": 1,
      "markdown": "# Invoice\n\n| Item | Qty | Price |\n|---|---|---|\n| Widget A | 10 | $5.00 |"
    },
    {
      "page_number": 2,
      "markdown": "## Terms and Conditions\n\nPayment is due within 30 days..."
    },
    {
      "page_number": 3,
      "markdown": "## Appendix\n\n![Figure 1: Sales chart summary]"
    }
  ]
}

Models

Parse a document to Markdown via VLM

Upload a file for synchronous OCR processing. This endpoint is intended for lightweight, low-volume document parsing and returns results inline in the response.

For large documents, high-throughput workloads, or asynchronous processing, use the /files endpoints, which are optimized for those use cases.

Supported file types: .pdf, .png, .jpg, .jpeg, .pptx, .ppt, .odp, .docx, .odt, .doc, .html

A maximum of 16 pages are processed per request. For documents exceeding this limit, split the content across multiple sequential calls using the pages parameter (e.g., pages="1-16" for the first call, pages="17-32" for the second).

POST

api

ocr

Parse a document to Markdown via VLM

curl --request POST \
  --url https://paradigm.lighton.ai/api/v3/ocr \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form 'file=<string>'

import requests

url = "https://paradigm.lighton.ai/api/v3/ocr"

payload = "-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"file\"\r\n\r\n<string>\r\n-----011000010111000001101001--"
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "multipart/form-data"
}

response = requests.post(url, data=payload, headers=headers)

print(response.text)

const form = new FormData();
form.append('file', '<string>');

const options = {method: 'POST', headers: {Authorization: 'Bearer <token>'}};

options.body = form;

fetch('https://paradigm.lighton.ai/api/v3/ocr', options)
  .then(res => res.json())
  .then(res => console.log(res))
  .catch(err => console.error(err));

{
  "model": "LightOnOCR",
  "total_pages": 3,
  "pages_parsed": [
    1,
    2,
    3
  ],
  "processing_time_ms": 4520,
  "enable_antilooping": true,
  "sampling_params": {
    "temperature": 0.2,
    "max_tokens": 5888,
    "repetition_penalty": null
  },
  "pages": [
    {
      "page_number": 1,
      "markdown": "# Invoice\n\n| Item | Qty | Price |\n|---|---|---|\n| Widget A | 10 | $5.00 |"
    },
    {
      "page_number": 2,
      "markdown": "## Terms and Conditions\n\nPayment is due within 30 days..."
    },
    {
      "page_number": 3,
      "markdown": "## Appendix\n\n![Figure 1: Sales chart summary]"
    }
  ]
}

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

file

string<uri>

required

The document to parse.

model

string | null

technical_name of an enabled parser model. Falls back to platform default.

pages

string

default:all

Page range to parse. Formats: "all", "1-5", "1,3,7", "2-4,8". Maximum 16 pages per request. For larger documents, make multiple calls with different page ranges.

enable_antilooping

boolean

default:true

When enabled, detects repetitive generation loops and gradually increases the sampling temperature to break out of them.

temperature

number<double> | null

default:0.2

Controls the randomness of the model output. Lower values (e.g. 0.1) produce more deterministic results, higher values (e.g. 1.0) increase variety. Range: 0.0–2.0.

Required range: 0 <= x <= 2

max_tokens

integer | null

default:5000

Maximum number of tokens the model can generate per page. Higher values allow longer outputs but increase processing time. Range: 1–16384.

Required range: 1 <= x <= 16384

repetition_penalty

number<double> | null

default:1

Penalizes repeated tokens to reduce redundant output. A value of 1.0 applies no penalty; higher values (e.g. 1.2) discourage repetition more strongly. Range: 1.0–2.0.

Required range: 1 <= x <= 2

Response

Document parsed successfully.

model

string

required

total_pages

integer

required

pages_parsed

integer[]

required

processing_time_ms

integer

required

enable_antilooping

boolean

required

sampling_params

object

required

Show child attributes

pages

object[]

required

Show child attributes

Create embeddings List MCP servers