PGVector
Since Camel 4.19
Only producer is supported
The PGVector Component provides support for interacting with pgvector, the open-source vector similarity search extension for PostgreSQL.
URI format
pgvector:collection[?options]
Where collection represents the table name used to store vectors in the PostgreSQL database.
Configuring the DataSource
A javax.sql.DataSource must be provided. It is recommended to use a connection pooling DataSource (such as HikariCP) for production deployments. The DataSource can be set on the component or endpoint configuration, or autowired from the registry.
Actions
The following actions are supported via the CamelPgVectorAction header:
-
CREATE_TABLE- Creates the pgvector extension and a table with columns: id, text_content, metadata, embedding -
CREATE_INDEX- Creates an HNSW index on the embedding column for faster approximate nearest neighbor search. The index uses the distance type configured on the endpoint (default: COSINE) -
DROP_TABLE- Drops the table -
UPSERT- Inserts or updates a vector record. The body must be aList<Float>. SetCamelPgVectorRecordIdfor the ID (auto-generated UUID if not set),CamelPgVectorTextContentfor text, andCamelPgVectorMetadatafor metadata -
DELETE- Deletes a record byCamelPgVectorRecordId -
SIMILARITY_SEARCH- Searches for similar vectors. The body must be aList<Float>query vector. SetCamelPgVectorQueryTopKfor max results (default 3). Optionally setCamelPgVectorFilterwith a SQL WHERE clause to filter results (e.g.,text_content LIKE '%hello%'). Returns aList<Map<String, Object>>with keys: id, text_content, metadata, distance
Defaults
-
dimension:
384(matches common embedding models likeall-MiniLM-L6-v2) -
distanceType:
COSINE(other options:EUCLIDEAN,INNER_PRODUCT)
Parameterized Filters
When using the SIMILARITY_SEARCH action, you can filter results using a SQL WHERE clause via the CamelPgVectorFilter header. For safe handling of dynamic values, use parameterized queries with ? placeholders and provide values via the CamelPgVectorFilterParams header:
from("direct:search")
.setHeader(PgVectorHeaders.ACTION).constant(PgVectorAction.SIMILARITY_SEARCH)
.setHeader(PgVectorHeaders.FILTER).constant("text_content LIKE ? AND metadata::jsonb->>'category' = ?")
.setHeader(PgVectorHeaders.FILTER_PARAMS).constant(List.of("%hello%", "science"))
.to("pgvector:documents"); Security
The CamelPgVectorFilter header value is appended directly as a SQL WHERE clause. When using static, developer-controlled filter expressions this is safe. However, never pass untrusted user input directly as the filter value without using parameterized queries (? placeholders with CamelPgVectorFilterParams), as this could lead to SQL injection.
OpenAI Integration
The component works directly with the OpenAI component for embedding generation. The OpenAI embeddings endpoint returns a List<Float>, which is exactly the body format expected by the pgvector UPSERT and SIMILARITY_SEARCH actions.
-
Java
-
YAML
// Index a document
from("direct:index")
.setVariable("text", body())
.to("openai:embeddings?embeddingModel=nomic-embed-text")
.setHeader(PgVectorHeaders.ACTION).constant(PgVectorAction.UPSERT)
.setHeader(PgVectorHeaders.TEXT_CONTENT).variable("text")
.to("pgvector:documents");
// Similarity search
from("direct:search")
.to("openai:embeddings?embeddingModel=nomic-embed-text")
.setHeader(PgVectorHeaders.ACTION).constant(PgVectorAction.SIMILARITY_SEARCH)
.setHeader(PgVectorHeaders.QUERY_TOP_K).constant(5)
.to("pgvector:documents"); - route:
from:
uri: direct:index
steps:
- setVariable:
name: text
simple: "${body}"
- to:
uri: openai:embeddings
parameters:
embeddingModel: nomic-embed-text
- setHeader:
name: CamelPgVectorAction
constant: UPSERT
- setHeader:
name: CamelPgVectorTextContent
simple: "${variable.text}"
- to: pgvector:documents
- route:
from:
uri: direct:search
steps:
- to:
uri: openai:embeddings
parameters:
embeddingModel: nomic-embed-text
- setHeader:
name: CamelPgVectorAction
constant: SIMILARITY_SEARCH
- setHeader:
name: CamelPgVectorQueryTopK
constant: 5
- to: pgvector:documents LangChain4j Integration
This component provides data type transformers for LangChain4j Embeddings integration:
-
pgvector:embeddings- Transforms LangChain4j embedding output into a format suitable for the PGVector UPSERT action -
pgvector:rag- Transforms similarity search results into aList<String>for RAG pipelines
-
Java
-
YAML
// Store embeddings
from("direct:store")
.to("langchain4j-embeddings:embed")
.setHeader(PgVectorHeaders.ACTION).constant(PgVectorAction.UPSERT)
.transformDataType(new DataType("pgvector:embeddings"))
.to("pgvector:myCollection");
// Similarity search for RAG
from("direct:search")
.to("langchain4j-embeddings:embed")
.transformDataType(new DataType("pgvector:embeddings"))
.setHeader(PgVectorHeaders.ACTION, constant(PgVectorAction.SIMILARITY_SEARCH))
.to("pgvector:myCollection")
.transformDataType(new DataType("pgvector:rag")); - route:
from:
uri: "direct:store"
steps:
- to: "langchain4j-embeddings:embed"
- setHeader:
name: CamelPgVectorAction
constant: UPSERT
- transform:
dataType: "pgvector:embeddings"
- to: "pgvector:myCollection"
- route:
from:
uri: "direct:search"
steps:
- to: "langchain4j-embeddings:embed"
- transform:
dataType: "pgvector:embeddings"
- setHeader:
name: CamelPgVectorAction
constant: SIMILARITY_SEARCH
- to: "pgvector:myCollection"
- transform:
dataType: "pgvector:rag" Configuring Options
Camel components are configured on two separate levels:
-
component level
-
endpoint level
Configuring Component Options
At the component level, you set general and shared configurations that are, then, inherited by the endpoints. It is the highest configuration level.
For example, a component may have security settings, credentials for authentication, urls for network connection and so forth.
Some components only have a few options, and others may have many. Because components typically have pre-configured defaults that are commonly used, then you may often only need to configure a few options on a component; or none at all.
You can configure components using:
-
the Component DSL.
-
in a configuration file (
application.properties,*.yamlfiles, etc). -
directly in the Java code.
Configuring Endpoint Options
You usually spend more time setting up endpoints because they have many options. These options help you customize what you want the endpoint to do. The options are also categorized into whether the endpoint is used as a consumer (from), as a producer (to), or both.
Configuring endpoints is most often done directly in the endpoint URI as path and query parameters. You can also use the Endpoint DSL and DataFormat DSL as a type safe way of configuring endpoints and data formats in Java.
A good practice when configuring options is to use Property Placeholders.
Property placeholders provide a few benefits:
-
They help prevent using hardcoded urls, port numbers, sensitive information, and other settings.
-
They allow externalizing the configuration from the code.
-
They help the code to become more flexible and reusable.
The following two sections list all the options, firstly for the component followed by the endpoint.
Component Options
The PGVector component supports 6 options, which are listed below.
| Name | Description | Default | Type |
|---|---|---|---|
The configuration;. | PgVectorConfiguration | ||
Autowired The DataSource to use for connecting to the PostgreSQL database with pgvector extension. | DataSource | ||
The dimension of the vectors to store. | 384 | int | |
The distance type to use for similarity search. Enum values:
| COSINE | PgVectorDistanceType | |
Whether the producer should be started lazy (on the first message). By starting lazy you can use this to allow CamelContext and routes to startup in situations where a producer may otherwise fail during starting and cause the route to fail being started. By deferring this startup to be lazy then the startup failure can be handled during routing messages via Camel’s routing error handlers. Beware that when the first message is processed then creating and starting the producer may take a little time and prolong the total processing time of the processing. | false | boolean | |
Whether autowiring is enabled. This is used for automatic autowiring options (the option must be marked as autowired) by looking up in the registry to find if there is a single instance of matching type, which then gets configured on the component. This can be used for automatic configuring JDBC data sources, JMS connection factories, AWS Clients, etc. | true | boolean |
Endpoint Options
The PGVector endpoint is configured using URI syntax:
pgvector:collection
With the following path and query parameters:
Query Parameters (4 parameters)
| Name | Description | Default | Type |
|---|---|---|---|
Autowired The DataSource to use for connecting to the PostgreSQL database with pgvector extension. | DataSource | ||
The dimension of the vectors to store. | 384 | int | |
The distance type to use for similarity search. Enum values:
| COSINE | PgVectorDistanceType | |
Whether the producer should be started lazy (on the first message). By starting lazy you can use this to allow CamelContext and routes to startup in situations where a producer may otherwise fail during starting and cause the route to fail being started. By deferring this startup to be lazy then the startup failure can be handled during routing messages via Camel’s routing error handlers. Beware that when the first message is processed then creating and starting the producer may take a little time and prolong the total processing time of the processing. | false | boolean |
Message Headers
The PGVector component supports 7 message header(s), which is/are listed below:
| Name | Description | Default | Type |
|---|---|---|---|
CamelPgVectorAction (producer) Constant: | The action to be performed. Enum values:
| String | |
CamelPgVectorRecordId (producer) Constant: | The id of the vector record. | String | |
CamelPgVectorQueryTopK (producer) Constant: | The maximum number of results to return for similarity search. | 3 | Integer |
CamelPgVectorTextContent (producer) Constant: | The text content to store alongside the vector embedding. | String | |
CamelPgVectorMetadata (producer) Constant: | The metadata associated with the vector record, stored as JSON. | String | |
CamelPgVectorFilter (producer) Constant: | Filter condition for similarity search. Applied as a SQL WHERE clause on the text_content and metadata columns. Supports parameterized queries using placeholders with values provided via the CamelPgVectorFilterParams header. WARNING: When not using parameterized queries, the filter value is appended directly as SQL. Never use untrusted input as the filter value without parameterization, as this could lead to SQL injection. | String | |
CamelPgVectorFilterParams (producer) Constant: | Parameter values for parameterized filter queries. Use with placeholders in the CamelPgVectorFilter header. Example: filter = 'text_content LIKE AND metadata::jsonb-'category' = ' with filterParams = List.of(%hello%, science). | List |