Blog post featured image

Introduction

In the previous blog posts (camel-tensorflow-serving and camel-torchserve), we discussed the recent release of Apache Camel 4.10 LTS, which introduced three new AI model serving components. 1

We previously wrote about the TorchServe and TensorFlow Serving components. This post introduces the KServe component, concluding the series.

KServe Component

KServe is a platform for serving AI models on Kubernetes. KServe defines an API protocol enabling clients to perform health checks, retrieve metadata, and run inference on model servers. This KServe API 2 allows you to interact uniformly with KServe-compliant model servers. The Camel KServe component enables you to request inference from a Camel route to model servers via the KServe API.

Preparation

Before diving into the sample code for the Camel KServe component, let’s set up the necessary environment.

First, let’s install the Camel CLI if you haven’t installed it yet:


INFO: If JBang is not installed, first install JBang by referring to: https://www.jbang.dev/download/


jbang app install camel@apache/camel

Verify the installation was successful:

$ camel --version
4.10.2 # Or newer

Launching the server with pre-deployed models

Next, let’s set up a KServe-compliant model server on your local machine. To experiment with the KServe component, you’ll need a model server that supports the KServe Open Inference Protocol V2. Several model servers are available, such as OpenVINO and Triton. In this blog post, we will use the Triton Inference Server Docker image. The Triton Inference Server image supports not only amd64 but also arm64, making it accessible for macOS users as well.

Since the KServe API doesn’t include an operation for registering models, we’ll load the model beforehand when starting the server. In this blog post, we will use the demo model simple provided in the Triton Inference Server repository. Details about this model are provided later in the Inference section.

Download the entire simple directory from the Triton Inference Server repository and place it within a models directory.


TIPS: To download files under a specific directory in a GitHub repository efficiently, you can clone the entire repository. However, using VS Code for the Web is often easier: with the GitHub repository displayed, press . on your keyboard or change the URL from github.com to github.dev. This opens the repository directly in VS Code within your browser. Then, locate the directory you wish to download and select Download from the context menu (right-click). Select a local directory, and you can download all files from the chosen repository directory into it.


Once the simple directory is downloaded and placed under models, start the container from the directory containing the models folder using the following command:

docker run --rm --name triton \
    -p 8000:8000 \
    -p 8001:8001 \
    -p 8002:8002 \
    -v ./models:/models \
    nvcr.io/nvidia/tritonserver:25.02-py3 \
    tritonserver --model-repository=/models

INFO: The Triton Inference Server Docker image nvcr.io/nvidia/tritonserver is quite large (approx. 18.2GB), so pulling the image for the first time might take some time.


Server and model operations


INFO: If you’re primarily interested in learning how to perform inference with Camel KServe, feel free to skip this section and proceed directly to the Inference section.


The KServe Open Protocol V2 defines management operations other than inference, categorised as follows:

  1. Server Operations
  2. Model Operations

Let’s examine how to invoke each of these operations from a Camel route.

Server readiness check

To check if the server is running and ready from a Camel route, use the following endpoint:

kserve:server/ready

server_ready.java

//DEPS org.apache.camel:camel-bom:4.10.2@pom
//DEPS org.apache.camel:camel-core
//DEPS org.apache.camel:camel-kserve

import org.apache.camel.builder.RouteBuilder;

public class server_ready extends RouteBuilder {
    @Override
    public void configure() throws Exception {
        from("timer:server-ready?repeatCount=1")
            .to("kserve:server/ready")
            .log("Ready: ${body.ready}");
    }
}

Execute it using the Camel CLI:

camel run server_ready.java

Upon successful execution, you can verify the server readiness status:

Ready: true

Server liveness check

To check if the server is live from a Camel route, utilise the following endpoint:

kserve:server/live

server_live.java

//DEPS org.apache.camel:camel-bom:4.10.2@pom
//DEPS org.apache.camel:camel-core
//DEPS org.apache.camel:camel-kserve

import org.apache.camel.builder.RouteBuilder;

public class server_live extends RouteBuilder {
    @Override
    public void configure() throws Exception {
        from("timer:server-live?repeatCount=1")
            .to("kserve:server/live")
            .log("Live: ${body.live}");
    }
}

Execute it using the Camel CLI:

camel run server_live.java

Upon successful execution, you can verify the server liveness status:

Live: true

Retrieving server metadata

To retrieve server metadata from a Camel route, use the following endpoint:

kserve:server/metadata

server_metadata.java

//DEPS org.apache.camel:camel-bom:4.10.2@pom
//DEPS org.apache.camel:camel-core
//DEPS org.apache.camel:camel-kserve

import org.apache.camel.builder.RouteBuilder;

public class server_metadata extends RouteBuilder {
    @Override
    public void configure() throws Exception {
        from("timer:server-metadata?repeatCount=1")
            .to("kserve:server/metadata")
            .log("Metadata:\n${body}");
    }
}

Execute it using the Camel CLI:

camel run server_metadata.java

Upon successful execution, you can retrieve the server metadata:

Metadata:
name: "triton"
version: "2.55.0"
extensions: "classification"
extensions: "sequence"
extensions: "model_repository"
extensions: "model_repository(unload_dependents)"
extensions: "schedule_policy"
extensions: "model_configuration"
extensions: "system_shared_memory"
extensions: "cuda_shared_memory"
extensions: "binary_tensor_data"
extensions: "parameters"
extensions: "statistics"
extensions: "trace"
extensions: "logging"

Model readiness check

To check if a specific model is ready for inference, use the following endpoint:

kserve:model/ready?modelName=simple&modelVersion=1

model_ready.java

//DEPS org.apache.camel:camel-bom:4.10.2@pom
//DEPS org.apache.camel:camel-core
//DEPS org.apache.camel:camel-kserve

import org.apache.camel.builder.RouteBuilder;

public class model_ready extends RouteBuilder {
    @Override
    public void configure() throws Exception {
        from("timer:model-ready?repeatCount=1")
            .to("kserve:model/ready?modelName=simple&modelVersion=1")
            .log("Ready: ${body.ready}");
    }
}

Execute it using the Camel CLI:

camel run model_ready.java

Upon successful execution, you can verify the model readiness status:

Ready: true

Retrieving model metadata

Similar to TorchServe and TensorFlow Serving, understanding the input and output signatures of an AI model is crucial for interacting with it effectively. To achieve this, you need to retrieve the model’s metadata.

Since metadata retrieval is typically a one-time operation, you can inspect the model signatures in JSON format by calling the following REST API (for the simple model):

http://localhost:8000/v2/models/simple/versions/1

To retrieve model metadata from within a Camel route, use the following endpoint:

kserve:model/metadata?modelName=simple&modelVersion=1

model_metadata.java

//DEPS org.apache.camel:camel-bom:4.10.2@pom
//DEPS org.apache.camel:camel-core
//DEPS org.apache.camel:camel-kserve

import org.apache.camel.builder.RouteBuilder;

public class model_metadata extends RouteBuilder {
    @Override
    public void configure() throws Exception {
        from("timer:model-metadata?repeatCount=1")
            .to("kserve:model/metadata?modelName=simple&modelVersion=1")
            .log("Metadata:\n${body}");
    }
}

Execute it using the Camel CLI:

camel run model_metadata.java

Upon successful execution, you can retrieve the model metadata:

Metadata:
name: "simple"
versions: "1"
platform: "tensorflow_graphdef"
inputs {
  name: "INPUT0"
  datatype: "INT32"
  shape: -1
  shape: 16
}
inputs {
  name: "INPUT1"
  datatype: "INT32"
  shape: -1
  shape: 16
}
outputs {
  name: "OUTPUT0"
  datatype: "INT32"
  shape: -1
  shape: 16
}
outputs {
  name: "OUTPUT1"
  datatype: "INT32"
  shape: -1
  shape: 16
}

Inference

Let’s perform inference on a model using KServe. Here, we’ll use the simple model to perform a basic calculation.

As observed in the Retrieving model metadata section, the simple model accepts two INT32 lists of size 16 (INPUT0 and INPUT1) as input and returns two INT32 lists of size 16 (OUTPUT0 and OUTPUT1) as output. This model calculates the element-wise sum of INPUT0 and INPUT1, returning the result as OUTPUT0, and calculates the element-wise difference, returning it as OUTPUT1.

In the example code below, we provide the following inputs to the model:

INPUT0  = [1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16]
INPUT1  = [0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15]

Consequently, we expect to receive the following outputs:

OUTPUT0 = [1,  3,  5,  7,  9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31]
OUTPUT1 = [1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1]

Calling the simple model

Calling the simple model

Use the following endpoint for inference:

kserve:infer?modelName=simple&modelVersion=1

infer_simple.java

//DEPS org.apache.camel:camel-bom:4.10.2@pom
//DEPS org.apache.camel:camel-core
//DEPS org.apache.camel:camel-kserve

import java.nio.ByteOrder;
import java.util.ArrayList;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
import org.apache.camel.Exchange;
import org.apache.camel.builder.RouteBuilder;
import com.google.protobuf.ByteString;
import inference.GrpcPredictV2.InferTensorContents;
import inference.GrpcPredictV2.ModelInferRequest;
import inference.GrpcPredictV2.ModelInferResponse;

public class infer_simple extends RouteBuilder {
    @Override
    public void configure() throws Exception {
        from("timer:infer-simple?repeatCount=1")
            .setBody(constant(createRequest()))
            .to("kserve:infer?modelName=simple&modelVersion=1")
            .process(this::postprocess)
            .log("Result[0]: ${body[0]}")
            .log("Result[1]: ${body[1]}");
    }

    ModelInferRequest createRequest() {
        var ints0 = IntStream.range(1, 17).boxed().collect(Collectors.toList());
        var content0 = InferTensorContents.newBuilder().addAllIntContents(ints0);
        var input0 = ModelInferRequest.InferInputTensor.newBuilder()
                .setName("INPUT0").setDatatype("INT32").addShape(1).addShape(16)
                .setContents(content0);
        var ints1 = IntStream.range(0, 16).boxed().collect(Collectors.toList());
        var content1 = InferTensorContents.newBuilder().addAllIntContents(ints1);
        var input1 = ModelInferRequest.InferInputTensor.newBuilder()
                .setName("INPUT1").setDatatype("INT32").addShape(1).addShape(16)
                .setContents(content1);
        return ModelInferRequest.newBuilder()
                .addInputs(0, input0).addInputs(1, input1)
                .build();
    }

    void postprocess(Exchange exchange) {
        var response = exchange.getMessage().getBody(ModelInferResponse.class);
        var outList = response.getRawOutputContentsList().stream()
                .map(ByteString::asReadOnlyByteBuffer)
                .map(buf -> buf.order(ByteOrder.LITTLE_ENDIAN).asIntBuffer())
                .map(buf -> {
                    var ints = new ArrayList<Integer>(buf.remaining());
                    while (buf.hasRemaining()) {
                        ints.add(buf.get());
                    }
                    return ints;
                })
                .collect(Collectors.toList());
        exchange.getMessage().setBody(outList);
    }
}

Execute it using the Camel CLI:

camel run infer_simple.java

Upon successful execution, you should observe the following results. The calculation results match the explanation provided earlier.

Result[0]: [1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31]
Result[1]: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Summary

Concluding our series on AI model serving components, this post provided a brief overview of the KServe component’s functionality, one of the AI model serving components introduced in the latest Camel 4.10 LTS release.

With the addition of the KServe component, alongside TorchServe and TensorFlow Serving, Camel now supports most mainstream AI model servers. This prepares the ground for building integrations that combine Camel with these model servers.

Furthermore, KServe is emerging as the de facto standard API for model serving within Kubernetes-based MLOps pipelines. This enables you to leverage Camel integrations as the application layer for AI models within AI systems built on MLOps platforms such as Kubeflow.

Explore the possibilities of intelligent integration using Apache Camel AI.

The sample code presented in this blog post is available in the following repository:

https://github.com/megacamelus/camel-ai-examples


  1. The Camel TorchServe component has been available since version 4.9. ↩︎

  2. KServe Open Inference Protocol V2 ↩︎