Docling Synthetic Data Generation

Tools for synthesizing labeled document data from real corpora — useful for fine-tuning layout, table, and reading-order models, and for stress-testing downstream RAG pipelines.

Docling Synthetic Data Generation is one of 16 APIs that Docling publishes on the APIs.io network.

Tagged areas include Synthetic Data, Training, and Documents. The published artifact set on APIs.io includes API documentation.

API entry from apis.yml

apis.yml Raw ↑
aid: docling:docling-sdg
name: Docling Synthetic Data Generation
tags:
- Synthetic Data
- Training
- Documents
humanURL: https://github.com/docling-project/docling-sdg
properties:
- url: https://github.com/docling-project/docling-sdg
  type: Documentation
- url: https://github.com/docling-project/docling-sdg
  type: SourceCode
description: Tools for synthesizing labeled document data from real corpora — useful for fine-tuning layout,
  table, and reading-order models, and for stress-testing downstream RAG pipelines.