Knative Eventing’s KafkaSource is the key component for consuming messages from Kafka topics and forwarding them to Knative services. In this third installment of the Knative series, the focus is on the communication flow between Kafka topics, KafkaSource and Knative services, all while providing an in-depth explanation of how Knative handles the interaction between KafkaSource and Knative services.
When using Knative to process Kafka messages, the inner workings can be complex and non-intuitive. The aim of this blog is to clarify this process by providing a clear and detailed explanation of how the communication flow works within Knative.
For a general overview of Knative, you can refer to the first blog in our Knative series. If you are interested in a more hands-on tutorial for handling Kafka events with Knative, check the Handling Kafka Events with Knative blog.
Overview of KafkaSource communication in Knative
Kafka consumers and virtual replicas
In Knative Eventing, Kafka consumers and “consumers” (virtual replicas) are two separate terms. The
consumers parameter, which can be configured inside the KafkaSource YAML file, is actually the number of “consumers” (virtual replicas). When a KafkaSource resource is created, there are also one or more kafka-source-dispatcher pods created within the specified namespace. Each of these kafka-source-dispatcher pods contains a single Kafka consumer instance and one or more virtual replicas. The Kafka consumer is responsible for polling messages from Kafka topics, while virtual replicas are responsible for sending messages from the KafkaSource to a service.
We want to avoid confusion when discussing these concepts. Therefore, in the rest of this page, we will use the term Kafka consumer when referring to the actual Kafka consumer instance running in the kafka-source-dispatcher pod responsible for polling messages from Kafka topics. On the other hand, we will use the term virtual replica when referring to the abstract representation of throughput that is controlled by the
consumers parameter and responsible for sending messages to the Knative service.
Configuring KafkaSource parameters
Apart from the
consumers parameter, there are two other parameters that can be configured, that control the behaviour of KafkaSource:
capacity parameter is used to limit the number of virtual replicas on each of the kafka-source-dispatcher pods. By default, there can be a maximum of 20 virtual replicas on each pod. If you set the number of virtual replicas in a KafkaSource to a value higher than 20, more dispatcher pods will be created. For example, if you set the
consumers parameter to 25, two pods will be created –
kafka-source-dispatcher-0 with 13 virtual replicas and
kafka-source-dispatcher-1 with 12 virtual replicas.
To configure the capacity, you can use the following command:
kubectl set env deployment/kafka-controller -n knative-eventing POD_CAPACITY=1
This command sets the environment variable
1 for the
kafka-controller deployment in the
knative-eventing namespace. You can check if the configuration was set correctly by running the following command:
kubectl describe deployment kafka-controller -n knative-eventing
This will show the current configuration of the
kafka-controller deployment, including the value of the
POD_CAPACITY environment variable.
rate-limiter feature is used for limiting the rate of messages sent from the KafkaSource to the service and has nothing to do with the throughput of messages when they are being consumed by Kafka consumers. When the
rate-limiter feature is disabled (default behavior), having multiple virtual replicas on the same pod has no effect on throughput. This means that even if you have 10 virtual replicas running on the
kafka-source-dispatcher-0 pod, it would be the same as having 1 virtual replica running on the aforementioned pod. On the other hand, when the
rate-limiter feature is enabled, virtual replicas serve as a multiplier for the rate limiter. For example, if the base rate limit is set to 50 events/s, with 2 virtual replicas, you would get at most 50*2 events/s = 100 events/s. The base rate limit can be configured using the
max.poll.records parameter inside the config-kafka-source-data-plane ConfigMap.
To configure the
dispatcher.rate-limiter, you should update the ConfigMap named config-kafka-features located in the knative-eventing namespace. You can update it using the following command:
kubectl edit configmap config-kafka-features -n knative-eventing
This will open a YAML file where you can set
dispatcher.rate-limiter to disabled or enabled, depending on your needs.
Testing and visualization of KafkaSource behaviour
As already mentioned, KafkaSource has two tasks. The first is consuming messages from Kafka topics using a Kafka consumer, and the second is sending these messages to services using virtual replicas. Both Kafka consumers and virtual replicas are running on KafkaSource dispatcher pods. The number of these pods that will be created depends on how we configure
It is important to ensure that you have enough resources (CPU and memory) in your cluster for all the pods you want. If there are many
consumers and you set the
capacity to 1, your pods might get stuck in “Pending,” which means that they cannot be scheduled onto a node. Therefore, it’s important to consider the resource requirements of your pods and ensure that your cluster has enough resources to accommodate them.
To test and visualize this behaviour, we will use a KafkaSource with 4
consumers consuming from a topic with 4 partitions. Below you can see three diagrams that show the relationship between the KafkaSource consumers, virtual replicas, dispatcher pods and services, based on different configurations of
The first diagram showcases the configuration where
dispatcher.rate-limiter=disabled. In this configuration, each kafka-source-dispatcher pod can have up to 20 virtual replicas running but since the rate-limiter is disabled, only one of them is actually being used.
KafkaSource Setup with
The second diagram showcases the configuration where
dispatcher.rate-limiter=enabled. In this configuration, each kafka-source-dispatcher pod can have up to 20 virtual replicas running and since the rate-limiter is enabled, all the virtual replicas are being used.
KafkaSource Setup with
The third diagram showcases the configuration where
capacity=1. It doesn’t make a difference whether
dispatcher.rate-limiter is disabled or enabled because there is always just one virtual replica in the pod.
The choice of
capacity=1 in this case is intentional, as it allows for parallel processing on the partitions in accordance with Kafka’s design.
KafkaSource Setup with
Inspecting the KafkaSource configuration
For the third case, we will describe a KafkaSource to see where the consumers are running, which in our case are these 4 kafka-source-dispatcher pods:
kubectl describe kafkasource kafka-source-4
Here you can see some configurations that we have already mentioned, such as the Knative service that we will send messages to, the topic from which messages are being consumed, the number of consumers, and the placement of our consumers and virtual replicas.
API Version: sources.knative.dev/v1beta1
API Version: serving.knative.dev/v1
Pod Name: kafka-source-dispatcher-0
Pod Name: kafka-source-dispatcher-1
Pod Name: kafka-source-dispatcher-2
Pod Name: kafka-source-dispatcher-3
In this final blog of our series, we’ve unraveled how KafkaSource communicates with Kafka consumers and virtual replicas within Knative Eventing. This journey uncovers the inner workings, emphasizing the importance of tweaking settings like capacity and rate-limiter for smoother operations. Understanding these inner workings paves the way for more efficient operations and smoother message handling in event-driven systems.
Knative Slack community channel: