Tika

JVM since1.0.0 Native since1.0.0

Parse documents and extract metadata and text using Apache Tika.

What’s inside

Please refer to the above link for usage and configuration details.

Maven coordinates

Or add the coordinates to your existing project:

<dependency>
    <groupId>org.apache.camel.quarkus</groupId>
    <artifactId>camel-quarkus-tika</artifactId>
</dependency>

Check the User guide for more information about writing Camel Quarkus applications.

Camel Quarkus limitations

Parameters tikaConfig and tikaConfigUri are not available in quarkus camel tika extension. Configuration can be changed only via application.properties.

While you can use any of the available Tika parsers in JVM mode, only some of those are supported in native mode - see the Quarkus Tika guide.

PDF and ODF parsers can not be used both in JVM mode or in the native mode. Pdf extension is suggested for purposes of pdf consumption to avoid a version conflict between Camel and Quarkus-tika extension involving PdfBox dependency.

Use of the Tika parser without any configuration will initialize all available parsers. Unfortunately as some of them don’t work in the native mode, the whole execution will fail.

In order to make the Tika parser work in the native mode, selection of parsers for initialization should be used.

  • quarkus.tika.parsers Comma separated list of parsers (abbreviations). There are two predefined parsers: pdf and odf.

  • quarkus.tika.parser.* Adds new parser abbreviation to be used with previous property. Value is the full class of the parser.

Example of application.properties:

quarkus.tika.parsers = pdf,odf,office
quarkus.tika.parser.office = org.apache.tika.parser.microsoft.OfficeParser

For more information about selecting parsers see the Quarkus Tika guide.

You may need to add the quarkus-awt extension to build the native image. For more information, see Quarkus Tika guide.