Synchronous endpoint that uploads a document and returns Markdown-formatted OCR results. The request blocks until the VLM finishes processing all requested pages. Rate limits are enforced per user and scale with the underlying VLM capacity — Returns 503 when the VLM backend is overloaded.
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
The document to parse.
technical_name of an enabled parser model. Falls back to platform default.
Page range to parse. Formats: "all", "1-5", "1,3,7", "2-4,8".
When enabled, detects repetitive generation loops and gradually increases the sampling temperature to break out of them.
Controls the randomness of the model output. Lower values (e.g. 0.1) produce more deterministic results, higher values (e.g. 1.0) increase variety. Range: 0.0–2.0.
0 <= x <= 2Maximum number of tokens the model can generate per page. Higher values allow longer outputs but increase processing time. Range: 1–16384.
1 <= x <= 16384Penalizes repeated tokens to reduce redundant output. A value of 1.0 applies no penalty; higher values (e.g. 1.2) discourage repetition more strongly. Range: 1.0–2.0.
1 <= x <= 2