This folder houses all of the assets necessary to run benchmarks for Apache Kafka. In order to run these benchmarks, you'll need to:
- Create the necessary local artifacts
- Stand up a Kafka cluster on Amazon Web Services (which includes a client host for running the benchmarks)
- SSH into the client host
- Run the benchmarks from the client host
In order to create the local artifacts necessary to run the Kafka benchmarks in AWS, you'll need to have Maven installed. Once Maven's installed, you can create the necessary artifacts with a single Maven command:
$ mvn install
In order to create an Apache Kafka cluster on AWS, you'll need to have the following installed:
-
The messaging benchmarks repository:
$ git clone https://github.com/streamlio/messaging-benchmark $ cd messaging-benchmark/driver-kafka/deploy
In addition, you will need to:
- Create an AWS account (or use an existing account)
- Install the
aws
CLI tool - Configure the
aws
CLI tool
Once those conditions are in place, you'll need to create an SSH public and private key at ~/.ssh/kafka_aws
(private) and ~/.ssh/kafka_aws.pub
(public), respectively.
$ ssh-keygen -f ~/.ssh/kafka_aws
When prompted to enter a passphrase, simply hit Enter twice. Then, make sure that the keys have been created:
$ ls ~/.ssh/kafka_aws*
With SSH keys in place, you can create the necessary AWS resources using a single Terraform command:
$ cd driver-kafka/deploy
$ terraform init
$ terraform apply
That will install the following EC2 instances (plus some other resources, such as a Virtual Private Cloud (VPC)):
Resource | Description | Count |
---|---|---|
Kafka instances | The VMs on which a Kafka broker will run | 3 |
ZooKeeper instances | The VMs on which a ZooKeeper node will run | 3 |
Client instance | The VM from which the benchmarking suite itself will be run | 1 |
When you run terraform apply
, you will be prompted to type yes
. Type yes
to continue with the installation or anything else to quit.
Once the installation is complete, you will see a confirmation message listing the resources that have been installed.
There's a handful of configurable parameters related to the Terraform deployment that you can alter by modifying the defaults in the terraform.tfvars
file.
Variable | Description | Default |
---|---|---|
region |
The AWS region in which the Kafka cluster will be deployed | us-west-2 |
public_key_path |
The path to the SSH public key that you've generated | ~/.ssh/kafka_aws.pub |
ami |
The Amazon Machine Image (AWI) to be used by the cluster's machines | ami-9fa343e7 |
instance_types |
The EC2 instance types used by the various components | i3.4xlarge (Kafka brokers), t2.small (ZooKeeper), c4.8xlarge (benchmarking client) |
If you modify the
public_key_path
, make sure that you point to the appropriate SSH key path when running the Ansible playbook.
With the appropriate infrastructure in place, you can install and start the Kafka cluster using Ansible with just one command:
$ ansible-playbook \
--user ec2-user \
--inventory `which terraform-inventory` \
deploy.yaml
If you're using an SSH private key path different from
~/.ssh/kafka_aws
, you can specify that path using the--private-key
flag, for example--private-key=~/.ssh/my_key
.
In the output produced by Terraform, there's a client_ssh_host
variable that provides the IP address for the client EC2 host from which benchmarks can be run. You can SSH into that host using this command:
$ ssh -i ~/.ssh/kafka_aws ec2-user@$(terraform output client_ssh_host)
Once you've successfully SSHed into the client host, you can run the benchmarks like this:
$ cd /opt/benchmark
$ sudo bin/benchmark --drivers driver-kafka/kafka.yaml workloads/*.yaml
You can also run specific workloads in the workloads
folder. Here's an example:
$ sudo bin/benchmark --drivers driver-kafka/kafka.yaml workloads/1-topic-16-partitions-1kb.yaml
There are multiple Kafka "modes" for which you can run benchmarks. Each mode has its own YAML configuration file in the driver-kafka
folder.
Mode | Description | Config file |
---|---|---|
Standard | Kafka with message idempotence disabled (at-least-once semantics) | kafka.yaml |
Exactly once | Kafka with message idempotence enabled ("exactly-once" semantics) | kafka-exactly-once.yaml |
Sync | Kafka with durability enabled (all published messages synced to disk) | kafka-sync.yaml |
The example used the "standard" mode as configured in driver-kafka/kafka.yaml
. To run all available benchmark workloads in "exactly once" or "sync" mode instead:
# Exactly once
$ sudo bin/benchmark --drivers driver-kafka/kafka-exactly-once.yaml workloads/*.yaml
# Sync
$ sudo bin/benchmark --drivers driver-kafka/kafka-sync.yaml workloads/*.yaml
Here's an example of running a specific benchmarking workload in exactly once mode:
$ sudo bin/benchmark --drivers driver-kafka/kafka-exactly-once.yaml workloads/1-topic-16-partitions-1kb.yaml