Parquet File

Since Camel 4.0

The ParquetAvro Data Format is a Camel Framework’s data format implementation based on parquet-avro library for (de)/serialization purposes. Messages can be unmarshalled to Avro’s GenericRecords or plain Java objects (POJOs). By the help of Camel’s routing engine and data transformations you can then play with them and apply customised formatting and call other Camel Component’s to convert and send messages to upstream systems.

Parquet Data Format Options

The Parquet File dataformat supports 2 options, which are listed below.

Name Default Java Type Description

compressionCodecName

GZIP

String

Compression codec to use when marshalling.

unmarshalType

String

Class to use when (un)marshalling. If omitted, parquet files are converted into Avro’s GenericRecords for unmarshalling and input objects are assumed as GenericRecords for marshalling.

Unmarshal

There are ways to unmarshal parquet files/structures (Usually binary parquet files) where camel DSL allows

In this first example we unmarshal file payload to OutputStream and send it to mock endpoint, then we will be able to get GenericRecord or POJO (it could be a list if that is coming through)

from("direct:unmarshal").unmarshal(parquet).to("mock:unmarshal");

Marshal

Marshalling is the reverse process of unmarshalling so when you have your GenericRecord or POJO and marshal it you will get the parquet formatted output stream on your producer endpoint.

from("direct:marshal").marshal(parquet).to("mock:marshal");

Dependencies

To use parquet-avro data format in your camel routes you need to add a dependency on camel-parquet-avro which implements this data format.

If you use Maven you can just add the following to your pom.xml, substituting the version number for the latest & greatest release (see the download page for the latest versions).

<dependency>
  <groupId>org.apache.camel</groupId>
  <artifactId>camel-parquet-avro</artifactId>
  <version>x.x.x</version>
  <!-- use the same version as your Camel core version -->
</dependency>

Spring Boot Auto-Configuration

When using parquetAvro with Spring Boot make sure to use the following Maven dependency to have support for auto configuration:

<dependency>
  <groupId>org.apache.camel.springboot</groupId>
  <artifactId>camel-parquet-avro-starter</artifactId>
  <version>x.x.x</version>
  <!-- use the same version as your Camel core version -->
</dependency>

The component supports 3 options, which are listed below.

Name Description Default Type

camel.dataformat.parquet-avro.compression-codec-name

Compression codec to use when marshalling.

GZIP

String

camel.dataformat.parquet-avro.enabled

Whether to enable auto configuration of the parquetAvro data format. This is enabled by default.

Boolean

camel.dataformat.parquet-avro.unmarshal-type

Class to use when (un)marshalling. If omitted, parquet files are converted into Avro’s GenericRecords for unmarshalling and input objects are assumed as GenericRecords for marshalling.

String