Docling
Process documents using Docling library for parsing and conversion.
What’s inside
-
Docling component, URI syntax:
docling:operationId
Please refer to the above links for usage and configuration details.
Maven coordinates
<dependency>
<groupId>org.apache.camel.springboot</groupId>
<artifactId>camel-docling-starter</artifactId>
</dependency> Spring Boot Auto-Configuration
The starter supports 54 options, which are listed below.
| Name | Description | Default | Type |
|---|---|---|---|
camel.component.docling.abort-on-error | Abort processing on error | false | Boolean |
camel.component.docling.api-key-header | Header name for API key authentication | X-API-Key | String |
camel.component.docling.async-poll-interval | Polling interval for async conversion status in milliseconds | 2000 | Long |
camel.component.docling.async-task-ttl | Time-to-live for pending async conversion tasks in milliseconds. Tasks older than this will be evicted from memory to prevent leaks. | 86400000 | Long |
camel.component.docling.async-timeout | Maximum time to wait for async conversion completion in milliseconds | 300000 | Long |
camel.component.docling.authentication-scheme | Authentication scheme (BEARER, API_KEY, NONE) | none | AuthenticationScheme |
camel.component.docling.authentication-token | Authentication token for docling-serve API (Bearer token or API key) | String | |
camel.component.docling.autowired-enabled | Whether autowiring is enabled. This is used for automatic autowiring options (the option must be marked as autowired) by looking up in the registry to find if there is a single instance of matching type, which then gets configured on the component. This can be used for automatic configuring JDBC data sources, JMS connection factories, AWS Clients, etc. | true | Boolean |
camel.component.docling.batch-fail-on-first-error | Fail entire batch on first error (true) or continue processing remaining documents (false) | true | Boolean |
camel.component.docling.batch-parallelism | Number of parallel threads for batch processing | 4 | Integer |
camel.component.docling.batch-size | Number of documents to submit per sub-batch. Documents are partitioned into sub-batches of this size and each sub-batch is processed before starting the next one. Within each sub-batch, up to batchParallelism threads are used concurrently. This controls memory usage and back-pressure when processing large document sets. | 10 | Integer |
camel.component.docling.batch-timeout | Maximum time to wait for batch completion in milliseconds | 300000 | Long |
camel.component.docling.chunking-include-raw-text | Include raw text in chunk output | false | Boolean |
camel.component.docling.chunking-max-tokens | Maximum number of tokens per chunk for hybrid chunking | Integer | |
camel.component.docling.chunking-merge-peers | Whether to merge peer chunks in hybrid chunking | true | Boolean |
camel.component.docling.chunking-tokenizer | Tokenizer model for hybrid chunking (e.g. sentence-transformers/all-MiniLM-L6-v2) | String | |
camel.component.docling.chunking-use-markdown-tables | Use markdown format for tables in chunk output | false | Boolean |
camel.component.docling.configuration | The configuration for the Docling Endpoint. The option is a org.apache.camel.component.docling.DoclingConfiguration type. | DoclingConfiguration | |
camel.component.docling.content-in-body | Include the content of the output file in the exchange body and delete the output file | false | Boolean |
camel.component.docling.do-code-enrichment | Enable code enrichment in document processing | false | Boolean |
camel.component.docling.do-formula-enrichment | Enable formula enrichment in document processing | false | Boolean |
camel.component.docling.do-ocr | Enable OCR processing in docling-serve API mode. When not set, the server uses its own defaults. Set enableOCR to false to explicitly disable OCR. | false | Boolean |
camel.component.docling.do-picture-classification | Enable picture classification in document processing | false | Boolean |
camel.component.docling.do-picture-description | Enable picture description generation in document processing | false | Boolean |
camel.component.docling.do-table-structure | Enable table structure recognition | false | Boolean |
camel.component.docling.docling-command | Path to Docling Python executable or command | String | |
camel.component.docling.docling-serve-url | Docling-serve API URL (e.g., http://localhost:5001) | String | |
camel.component.docling.document-timeout | Document processing timeout in seconds | Long | |
camel.component.docling.enable-o-c-r | Enable OCR processing for scanned documents | true | Boolean |
camel.component.docling.enabled | Whether to enable auto configuration of the docling component. This is enabled by default. | Boolean | |
camel.component.docling.force-ocr | Force OCR processing even for digital documents | false | Boolean |
camel.component.docling.image-export-mode | Image export mode for referenced images | String | |
camel.component.docling.images-scale | Scale factor for exported images | Double | |
camel.component.docling.include-images | Include images in the conversion output | false | Boolean |
camel.component.docling.include-layout-info | Show layout information with bounding boxes | false | Boolean |
camel.component.docling.include-metadata-in-headers | Include metadata in message headers when extracting metadata | true | Boolean |
camel.component.docling.include-raw-metadata | Include raw metadata as returned by the parser | false | Boolean |
camel.component.docling.lazy-start-producer | Whether the producer should be started lazy (on the first message). By starting lazy you can use this to allow CamelContext and routes to startup in situations where a producer may otherwise fail during starting and cause the route to fail being started. By deferring this startup to be lazy then the startup failure can be handled during routing messages via Camel’s routing error handlers. Beware that when the first message is processed then creating and starting the producer may take a little time and prolong the total processing time of the processing. | false | Boolean |
camel.component.docling.max-file-size | Maximum file size in bytes for processing | 52428800 | Long |
camel.component.docling.md-page-break-placeholder | Placeholder string for page breaks in markdown output | String | |
camel.component.docling.oauth-profile | OAuth profile name for obtaining an access token via the OAuth 2.0 Client Credentials grant. When set, the token is acquired from the configured identity provider and used as authenticationToken. Requires camel-oauth on the classpath. | String | |
camel.component.docling.ocr-engine | OCR engine to use | String | |
camel.component.docling.ocr-language | Language code for OCR processing | en | String |
camel.component.docling.operation | The operation to perform | convert-to-markdown | DoclingOperations |
camel.component.docling.output-format | Output format for document conversion | markdown | String |
camel.component.docling.pdf-backend | PDF parsing backend | String | |
camel.component.docling.pipeline | Processing pipeline to use | String | |
camel.component.docling.process-timeout | Timeout for Docling process execution in milliseconds | 30000 | Long |
camel.component.docling.split-batch-results | Split batch results into individual exchanges (one per document) instead of single BatchProcessingResults | false | Boolean |
camel.component.docling.table-cell-matching | Enable table cell matching post-processing | false | Boolean |
camel.component.docling.table-mode | Table structure recognition mode | String | |
camel.component.docling.use-async-mode | Use asynchronous conversion mode (docling-serve API only) | false | Boolean |
camel.component.docling.use-docling-serve | Use docling-serve API instead of CLI command | false | Boolean |
camel.component.docling.working-directory | Working directory for Docling execution | String |