Previous blog post fom Knative series:
Knative can be used to build serverless applications that respond to events in real-time. It provides a platform for running stateless workloads that can be scaled up or down based on demand. Knative enables developers to focus on writing code without worrying about the underlying infrastructure.
This tutorial demonstrates how to use Knative to trigger a user-defined function written in Go for processing Kafka records, in order to showcase the autoscaling capabilities of Knative. The tutorial is designed to be as simple as possible, using only the Knative
Service is used to deploy the user-defined function as a Knative Service and automatically scale it up and down based on demand.
KafkaSource is used to trigger the Knative Service every time a new record is produced to the Kafka topic.
To follow this tutorial, you will need the following prerequisites:
The Kubernetes command-line tool –
Step 1: Deploy Apache Kafka Cluster and create a topic
The Strimzi Kubernetes operator can be used to deploy the Apache Kafka Cluster in your Kubernetes cluster. For deploying a Kafka cluster and creating a topic with 6 partitions, which is used in this demo, you can follow the steps at . If carried out exactly as written, this will result in your Kafka topic having 3 partitions. To create a topic with 6 partitions, simply set the
spec.partitions inside your KafkaTopic YAML file to 6. Once completed, you will have a Kafka cluster named
my-cluster on your AKS with a topic named
my-topic, consisting of 6 partitions.
Step 2: Install the KafkaSource controller
KafkaSource controller reads the messages stored in the existing Apache Kafka topics and sends those messages as CloudEvents through HTTP to its configured
sink. Simultaneously, it also preserves the order of the messages stored in the topic partitions. It does this by waiting for a successful response from the
sink, before delivering the next message in the same partition.
KafkaSource controller by entering the following command (check the latest versions on ):
kubectl apply -f https://github.com/knative-sandbox/eventing-kafka-broker/releases/download/knative-v1.9.3/eventing-kafka-controller.yaml
Then, install the
KafkaSource data plane by entering the following command:
kubectl apply -f https://github.com/knative-sandbox/eventing-kafka-broker/releases/download/knative-v1.9.3/eventing-kafka-source.yaml
Step 3: Create a Knative Service
Now that an Apache Kafka cluster with a topic has been deployed, you can create a Knative Service to manage the scaling and deployment of our function. Here’s the example file (
- image: gcr.io/knative-releases/knative.dev/eventing/cmd/event_display
This YAML file creates a Knative Service named
event-display that runs the container image:
gcr.io/knative-releases/github.com/knative/eventing-sources/cmd/event_display. This image corresponds to a Go function, which simply displays the received events on the console.
The provided Go code sets up an HTTP server using the
https://github.com/cloudevents/sdk-go/v2 library to receive and handle CloudEvents.
Knative does not require the use of the CloudEvents SDK for handling Kafka events. You can also use a plain HTTP handler to handle Kafka events. When working directly with HTTP, keep in mind that the CloudEvents metadata is in the headers, and the actual message data is in the body (the Binary content mode is used). The HTTP endpoint servers need to be built to accept messages at the root path. However, the CloudEvents SDK simplifies the handling of events by providing a standard way to encode and decode event data in a format that can be easily consumed by different event sources and event sinks.
Save this file to your working directory, and run the following command to deploy the service:
kubectl apply -f event-display.yaml
This will create a new Knative service in your cluster.
Inside the Service YAML file, you can change some autoscaling options by configuring
Knative Serving supports the implementation of Knative Pod Autoscaler (KPA) and Kubernetes’ Horizontal Pod Autoscaler (HPA). The type of Autoscaler implementation (KPA or HPA) can be configured by using the
class annotation. The default Autoscaler type is
The main difference between these two types is that KPA supports scale-to-zero functionality, and HPA doesn’t. On the other hand, KPA doesn’t support CPU-based autoscaling, while HPA does
If you want to use Kubernetes Horizontal Pod Autoscaler (HPA), you must install it after you install Knative Serving.
The metric configuration defines which metric type is watched by the Autoscaler. The default KPA Autoscaler supports
rps (requests-per-second) metrics:
concurrency– measures the number of active requests being processed simultaneously.
rps– measures the rate at which requests are being made to a service.
More about metrics can be found at .
Target configuration provides the Autoscaler with a value that it will try to maintain for the particular metric, in a specific Revision. In the example above, the
rps metric is used with the
target set to 5.
You can also enable or disable scale to zero functionality, how long the last replica will be kept after traffic ends and the minimum amount of time that the last pod will remain active after the Autoscaler decides to scale pods to zero.
Scale to zero can only be enabled if you are using the KnativePodAutoscaler (KPA) and can only be configured globally.
Upper and lower bounds can also be configured to control the autoscaling behavior. The initial scale that a Revision is scaled to immediately after creation can also be specified. This can be used as a default configuration for all Revisions, or for a specific Revision using an annotation.
More in-detail explanation of this can be found on .
There are also some additional autoscaling configurations for KPA, such as scale rates. One of the configurations you can change is
max-scale-down-rate, which defaults to
2.0. This setting determines the maximum ratio of existing to desired pods. For example, with a value of
2.0, the Revision can only scale from
N/2 pods in one evaluation cycle.
Step 4: Create a KafkaSource
The next step is to create a KafkaSource which acts as the trigger for the Knative Service. Here’s an example file (
- <kafka-bootstrap-server:9092> # e.g. my-cluster-kafka-bootstrap.kafka-cluster:9092
This YAML file defines a KafkaSource called
kafka-source that consumes records from the chosen Kafka topic and sends them to the Knative Service called
event-display. The best practice is to configure the KafkaSource resource to have as many consumers as your topic has partitions.
The current implementation of KafkaSource does not provide a way to disable message ordering or send batches of messages for faster processing.
Save this file to your working directory and run the following command to deploy the service:
kubectl apply -f event-source.yaml
Step 5: Test the Knative service
To test the Knative service, send some records to the topic you specified in your Kafka cluster. Running the following command will prompt you to produce messages:
kubectl -n kafka-cluster run kafka-producer -i --image=quay.io/strimzi/kafka:0.33.2-kafka-3.4.0 --rm=true --restart=Never -- bin/kafka-console-producer.sh --bootstrap-server my-cluster-kafka-bootstrap:9092 --topic my-topic
You can monitor the logs of the Knative service to see the output of the user-defined function, which is in this case the message you have just published. You can verify that the service received the message by opening another terminal and executing the following command:
kubectl logs -f --selector='serving.knative.dev/service=event-display' -c user-container
This will show you the logs of all the pods belonging to the
event-display service logs after initially testing the Knative Service
Step 6: Scale the Knative service
To test the autoscaling, you can send a large number of messages to the Kafka topic and monitor the number of replicas of the Knative Service. The Knative Serving component will automatically scale up or down the number of replicas based on the incoming traffic. This can be tested with a simple command which creates some random data and sends it to our topic:
base64 /dev/urandom | head -c 10000 | kubectl -n kafka-cluster run kafka-producer -i --image=quay.io/strimzi/kafka:0.33.2-kafka-3.4.0 --rm=true --restart=Never -- bin/kafka-console-producer.sh --bootstrap-server my-cluster-kafka-bootstrap:9092 --topic my-topic
You can view these logs again with the command from the previous step.
event-display service logs after scaling the Knative Service
You can use the following command to continuously monitor the number of available replicas of the application:
kubectl get deploy -w
This will display the current state of all deployments in the cluster, including the
event-display deployment, as well as the number of available replicas, desired replicas and the current status of each replica set.
Note that the deployment name may include a suffix number, such as
event-display-00002-deployment, etc. These suffix numbers represent different replica sets and are created when you update the deployment. Each replica set corresponds to a specific version of the deployment, and the suffix number is used to differentiate them from each other. The latest version of the
event-display deployment should be indicated by the highest suffix number, which in this case is
The autoscaler is designed to automatically adjust the number of replicas of a service based on the current load and the configured scaling policies. In the specific case of the
event-display deployment, the autoscaler was able to scale up to a maximum of 13 replicas based on the load generated by the Kafka source. However, as the load decreased, the autoscaler began to scale down the number of replicas gradually, according to the configured scaling policies. In this case, the
max-scale-down-rate was set to 2 by default, which means that the autoscaler was allowed to decrease the number of replicas by a maximum factor of 2 in each evaluation cycle. As a result, the number of replicas gradually decreased from 13 to 6, and then to 3 and 1, over the course of several evaluation cycles.
Replica fluctuation during demo
If you don’t set the upper bound using a
max-scale parameter inside the Service YAML file, there is a chance that the number of replicas of your service will scale up to a number greater than the number of partitions of the KafkaSource topic, as in this case.
In this demo, we have configured
kafka-source to have 6 consumers because
my-topic has 6 partitions. In the screenshot above, we can see that the number of replicas of the
event-display service scales up to 13, even though there is no increase in processing speed by adding the extra 7 pods, due to the fact that each message has to wait for a response before the next one can be sent (because the ordering is preserved at the partition level).
This happens because the entirety of the
event-display service is abstracted to the KafkaSource, so the service can’t know that the burst traffic is only limited to 6 parallel requests at a time. On the other hand, the service determines that 13 replicas might be a good target based on the incoming traffic and KPA metrics we configured.
To avoid unnecessary cost of creating more service pods than there are partitions, the person creating the service needs to be aware of the number of partitions and consumers being used in order to properly configure the service YAML. In such cases, the
max-scale parameter should be set to the number of partitions. For example, if the KafkaSource is configured to have 6 consumers because the topic has 6 partitions, the
max-scale parameter should be set to 6.
If the incoming traffic is large enough, this will result in 6 pods being created. These pods are placed behind the load balancer, and they are not directly connected to specific partitions. That’s why each of the pods will receive messages from all partitions of the KafkaSource topic. Messages will be distributed evenly among the pods. In our case, the first pod will have approximately 1/6 of all the messages in the topic and those messages will come from all 6 partitions.
Architecture using KafkaSource and Knative Service
Below, we can see a simple visualization of how the KafkaSource and the Knative Service are connected, and how the KafkaSource references the Service inside a YAML file. The KafkaSource resource is much more complex, and that’s why it is only illustrated here as a black box. The following blog of this series will explain its inner workings and how it sends messages to the Knative Service.
KafkaSource and Knative Service visualization
Additionally, you can see some specific details about your deployment. The last part of the output you get with
describe command are events with their messages indicating scaling up or scaling down:
kubectl describe deploy event-display-00004-deployment
Log details displaying scaling events (up and down) during deployment
In conclusion, this tutorial showcased the power of Knative in orchestrating serverless applications, leveraging its event-driven architecture to enable seamless scaling and efficient handling of Kafka events. By demonstrating the simple configuration of Knative Services and KafkaSources, it highlights the potential for dynamic scaling based on incoming traffic. Moreover, it underlines the importance of careful configuration, especially regarding autoscaling policies and their alignment with resource constraints. This short guide serves as a starting point for deeper explorations into Knative’s capabilities and its integration within event-driven architectures. For further insights into KafkaSource-to-Service communication within Knative, stay tuned for the next blog post in this series.