Dynamiq
AI Gateway

Document Extract

Extract structured JSON from PDFs and images: an OCR pass followed by schema-guided extraction, in the playground or via /v1/ocr/extract.

Document Extract runs two LLM passes over a PDF or image: an OCR LLM converts the document to Markdown (the same pipeline as Document Parse), then a Structured Output LLM extracts the fields you define in a JSON template. The result is a JSON object shaped like your template — invoice fields, line items, totals — instead of raw text.

Extract data in the UI

Open the playground

In your project, open AI Gateway and switch to the DOCUMENT EXTRACT tab. The PLAYGROUND sub-tab is active by default; CODE shows an equivalent API snippet.

The DOCUMENT EXTRACT playground with OCR LLM and Structured Output LLM selectors, the Output Template editor, and the Extract Data button

Configure the two LLMs

  • OCR LLM — reads the document pages; pick a vision-capable model, its Connection, and settings (temperature, max tokens, reasoning effort where supported).
  • Structured Output LLM — turns the OCR text into JSON; any strong text model works, and it can be a different provider than the OCR LLM.

Define the output template

Edit Output Template (JSON Schema) — a JSON object whose keys are the fields you want and whose values describe their types. The default template extracts receipt data:

{
  "store_name": "string",
  "date": "string",
  "total_amount": "number",
  "items": [{ "name": "string", "quantity": "number", "price": "number" }]
}

Upload and extract

Drop a PDF or image into the dropzone and click Extract Data. The extracted JSON appears next to the file preview.

An extracted document showing the file preview beside the structured JSON result matching the template

screenshot: gateway-extract-result

Call the API

POST https://api.getdynamiq.ai/v1/ocr/extract is a multipart/form-data request with two fields:

filefilerequired
The PDF or image to process.
optionsstring (JSON)required
JSON-encoded extract options — see below. Invalid JSON returns 422.

The options JSON:

ocr_llmobjectrequired
The vision model for the OCR pass: {type, model, connection_id, temperature, max_tokens}.
structured_output_llmobjectrequired
The model for the extraction pass — same shape as ocr_llm.
templatestringrequired
The JSON template as a string (serialize it separately from the options object).
streamboolean
true streams extraction events as Server-Sent Events. Default false.
import json
import os

import requests

template = {
    "invoice_number": "string",
    "date": "string",
    "total_amount": "number",
    "items": [
        {"description": "string", "quantity": "number", "price": "number"}
    ],
}

llm = {
    "type": "openai",
    "model": "gpt-4o",
    "connection_id": os.getenv("DYNAMIQ_CONNECTION_ID"),
    "temperature": 0.0,
    "max_tokens": 4096,
}

response = requests.post(
    "https://api.getdynamiq.ai/v1/ocr/extract",
    headers={"Authorization": f"Bearer {os.getenv('DYNAMIQ_ACCESS_KEY')}"},
    files={"file": open("invoice.pdf", "rb")},
    data={
        "options": json.dumps(
            {
                "ocr_llm": llm,
                "structured_output_llm": llm,
                "template": json.dumps(template),
                "stream": False,
            }
        )
    },
)
response.raise_for_status()

print(json.dumps(response.json()["data"], indent=2))
curl https://api.getdynamiq.ai/v1/ocr/extract \
  -H "Authorization: Bearer $DYNAMIQ_ACCESS_KEY" \
  -F "file=@invoice.pdf" \
  -F 'options={
    "ocr_llm": {"type": "openai", "model": "gpt-4o", "connection_id": "'"$DYNAMIQ_CONNECTION_ID"'", "temperature": 0.0, "max_tokens": 4096},
    "structured_output_llm": {"type": "openai", "model": "gpt-4o", "connection_id": "'"$DYNAMIQ_CONNECTION_ID"'", "temperature": 0.0, "max_tokens": 4096},
    "template": "{\"invoice_number\": \"string\", \"date\": \"string\", \"total_amount\": \"number\"}",
    "stream": false
  }'

Both options and the template inside it are JSON strings: serialize the template first, then serialize the options object that contains it (double encoding, as in the Python sample).

Response

{
  "data": {
    "invoice_number": "4812",
    "date": "2026-06-01",
    "total_amount": 98.0,
    "items": [
      { "description": "Pro plan (June)", "quantity": 2, "price": 49.0 }
    ]
  }
}

data is shaped by your template. Under the hood, the structured-output pass instructs the model to return a JSON object under an extracted_data key; the endpoint parses it and returns the contents as data. With "stream": true the response is an SSE stream instead.

Errors

StatusCause
422The options field is not valid JSON or fails validation
400OCR failed, the extraction LLM returned no output, or its output couldn't be parsed as JSON
401Missing or invalid credentials

On this page