Generate a preview of what a seed dataset would contain (no DB writes).
Returns sample documents with realistic metadata, content type assignments, and namespaced attribute values. ~50% of documents are assigned multiple content types (cross-vertical when possible) to demonstrate multi-classification.
Response sections:
meta — content type codes, link to content-types API, generation statssummary — rolled-up facet counts (content types, workspaces, statuses)
ready for UI sidebar consumptionsample_documents — individual documents with attribute values
namespaced by content type pathAttribute values are keyed by content type path to avoid collisions when a document has multiple classifications:
{
"legal:contract:nda": {"counterparty": "Acme"},
"tech:security:threat-model": {"threat_level": "High"}
}
Content type definitions are NOT included in the response.
Use the content_types_url in meta to fetch full tree + attribute
schemas.
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Content type code(s) to include in the preview (comma-separated, e.g., ?content_types=legal,tech). Available: finance, healthcare, legal, manufacturing, patent, sic, tech
Maximum number of sample documents to generate (1–100).
Dataset scale controlling workspace layout and document statuses. private=1 workspace, all embedded. small=3, medium=5, enterprise=8 workspaces.
enterprise, medium, private, small Preview generated successfully