Docling

Process documents using Docling library for parsing and conversion.

What’s inside

Please refer to the above links for usage and configuration details.

Maven coordinates

<dependency>
    <groupId>org.apache.camel.springboot</groupId>
    <artifactId>camel-docling-starter</artifactId>
</dependency>

Spring Boot Auto-Configuration

The starter supports 54 options, which are listed below.

Name Description Default Type

camel.component.docling.abort-on-error

Abort processing on error

false

Boolean

camel.component.docling.api-key-header

Header name for API key authentication

X-API-Key

String

camel.component.docling.async-poll-interval

Polling interval for async conversion status in milliseconds

2000

Long

camel.component.docling.async-task-ttl

Time-to-live for pending async conversion tasks in milliseconds. Tasks older than this will be evicted from memory to prevent leaks.

86400000

Long

camel.component.docling.async-timeout

Maximum time to wait for async conversion completion in milliseconds

300000

Long

camel.component.docling.authentication-scheme

Authentication scheme (BEARER, API_KEY, NONE)

none

AuthenticationScheme

camel.component.docling.authentication-token

Authentication token for docling-serve API (Bearer token or API key)

String

camel.component.docling.autowired-enabled

Whether autowiring is enabled. This is used for automatic autowiring options (the option must be marked as autowired) by looking up in the registry to find if there is a single instance of matching type, which then gets configured on the component. This can be used for automatic configuring JDBC data sources, JMS connection factories, AWS Clients, etc.

true

Boolean

camel.component.docling.batch-fail-on-first-error

Fail entire batch on first error (true) or continue processing remaining documents (false)

true

Boolean

camel.component.docling.batch-parallelism

Number of parallel threads for batch processing

4

Integer

camel.component.docling.batch-size

Number of documents to submit per sub-batch. Documents are partitioned into sub-batches of this size and each sub-batch is processed before starting the next one. Within each sub-batch, up to batchParallelism threads are used concurrently. This controls memory usage and back-pressure when processing large document sets.

10

Integer

camel.component.docling.batch-timeout

Maximum time to wait for batch completion in milliseconds

300000

Long

camel.component.docling.chunking-include-raw-text

Include raw text in chunk output

false

Boolean

camel.component.docling.chunking-max-tokens

Maximum number of tokens per chunk for hybrid chunking

Integer

camel.component.docling.chunking-merge-peers

Whether to merge peer chunks in hybrid chunking

true

Boolean

camel.component.docling.chunking-tokenizer

Tokenizer model for hybrid chunking (e.g. sentence-transformers/all-MiniLM-L6-v2)

String

camel.component.docling.chunking-use-markdown-tables

Use markdown format for tables in chunk output

false

Boolean

camel.component.docling.configuration

The configuration for the Docling Endpoint. The option is a org.apache.camel.component.docling.DoclingConfiguration type.

DoclingConfiguration

camel.component.docling.content-in-body

Include the content of the output file in the exchange body and delete the output file

false

Boolean

camel.component.docling.do-code-enrichment

Enable code enrichment in document processing

false

Boolean

camel.component.docling.do-formula-enrichment

Enable formula enrichment in document processing

false

Boolean

camel.component.docling.do-ocr

Enable OCR processing in docling-serve API mode. When not set, the server uses its own defaults. Set enableOCR to false to explicitly disable OCR.

false

Boolean

camel.component.docling.do-picture-classification

Enable picture classification in document processing

false

Boolean

camel.component.docling.do-picture-description

Enable picture description generation in document processing

false

Boolean

camel.component.docling.do-table-structure

Enable table structure recognition

false

Boolean

camel.component.docling.docling-command

Path to Docling Python executable or command

String

camel.component.docling.docling-serve-url

Docling-serve API URL (e.g., http://localhost:5001)

http://localhost:5001

String

camel.component.docling.document-timeout

Document processing timeout in seconds

Long

camel.component.docling.enable-o-c-r

Enable OCR processing for scanned documents

true

Boolean

camel.component.docling.enabled

Whether to enable auto configuration of the docling component. This is enabled by default.

Boolean

camel.component.docling.force-ocr

Force OCR processing even for digital documents

false

Boolean

camel.component.docling.image-export-mode

Image export mode for referenced images

String

camel.component.docling.images-scale

Scale factor for exported images

Double

camel.component.docling.include-images

Include images in the conversion output

false

Boolean

camel.component.docling.include-layout-info

Show layout information with bounding boxes

false

Boolean

camel.component.docling.include-metadata-in-headers

Include metadata in message headers when extracting metadata

true

Boolean

camel.component.docling.include-raw-metadata

Include raw metadata as returned by the parser

false

Boolean

camel.component.docling.lazy-start-producer

Whether the producer should be started lazy (on the first message). By starting lazy you can use this to allow CamelContext and routes to startup in situations where a producer may otherwise fail during starting and cause the route to fail being started. By deferring this startup to be lazy then the startup failure can be handled during routing messages via Camel’s routing error handlers. Beware that when the first message is processed then creating and starting the producer may take a little time and prolong the total processing time of the processing.

false

Boolean

camel.component.docling.max-file-size

Maximum file size in bytes for processing

52428800

Long

camel.component.docling.md-page-break-placeholder

Placeholder string for page breaks in markdown output

String

camel.component.docling.oauth-profile

OAuth profile name for obtaining an access token via the OAuth 2.0 Client Credentials grant. When set, the token is acquired from the configured identity provider and used as authenticationToken. Requires camel-oauth on the classpath.

String

camel.component.docling.ocr-engine

OCR engine to use

String

camel.component.docling.ocr-language

Language code for OCR processing

en

String

camel.component.docling.operation

The operation to perform

convert-to-markdown

DoclingOperations

camel.component.docling.output-format

Output format for document conversion

markdown

String

camel.component.docling.pdf-backend

PDF parsing backend

String

camel.component.docling.pipeline

Processing pipeline to use

String

camel.component.docling.process-timeout

Timeout for Docling process execution in milliseconds

30000

Long

camel.component.docling.split-batch-results

Split batch results into individual exchanges (one per document) instead of single BatchProcessingResults

false

Boolean

camel.component.docling.table-cell-matching

Enable table cell matching post-processing

false

Boolean

camel.component.docling.table-mode

Table structure recognition mode

String

camel.component.docling.use-async-mode

Use asynchronous conversion mode (docling-serve API only)

false

Boolean

camel.component.docling.use-docling-serve

Use docling-serve API instead of CLI command

false

Boolean

camel.component.docling.working-directory

Working directory for Docling execution

String