Apache Kafka benchmarks

This folder houses all of the assets necessary to run benchmarks for Apache Kafka. In order to run these benchmarks, you'll need to:

Create the necessary local artifacts
Stand up a Kafka cluster on Amazon Web Services (which includes a client host for running the benchmarks)
SSH into the client host
Run the benchmarks from the client host

Creating local artifacts

In order to create the local artifacts necessary to run the Kafka benchmarks in AWS, you'll need to have Maven installed. Once Maven's installed, you can create the necessary artifacts with a single Maven command:

$ mvn install

Creating a Kafka cluster on Amazon Web Services (AWS) using Terraform and Ansible

In order to create an Apache Kafka cluster on AWS, you'll need to have the following installed:

Terraform
The terraform-inventory plugin for Terraform
Ansible

The messaging benchmarks repository:

$ git clone https://github.com/streamlio/messaging-benchmark
$ cd messaging-benchmark/driver-kafka/deploy

In addition, you will need to:

Create an AWS account (or use an existing account)
Install the aws CLI tool
Configure the aws CLI tool

Once those conditions are in place, you'll need to create an SSH public and private key at ~/.ssh/kafka_aws (private) and ~/.ssh/kafka_aws.pub (public), respectively.

$ ssh-keygen -f ~/.ssh/kafka_aws

When prompted to enter a passphrase, simply hit Enter twice. Then, make sure that the keys have been created:

$ ls ~/.ssh/kafka_aws*

With SSH keys in place, you can create the necessary AWS resources using a single Terraform command:

$ cd driver-kafka/deploy
$ terraform init
$ terraform apply

That will install the following EC2 instances (plus some other resources, such as a Virtual Private Cloud (VPC)):

Resource	Description	Count
Kafka instances	The VMs on which a Kafka broker will run	3
ZooKeeper instances	The VMs on which a ZooKeeper node will run	3
Client instance	The VM from which the benchmarking suite itself will be run	1

When you run terraform apply, you will be prompted to type yes. Type yes to continue with the installation or anything else to quit.

Once the installation is complete, you will see a confirmation message listing the resources that have been installed.

Variables

There's a handful of configurable parameters related to the Terraform deployment that you can alter by modifying the defaults in the terraform.tfvars file.

Variable	Description	Default
`region`	The AWS region in which the Kafka cluster will be deployed	`us-west-2`
`public_key_path`	The path to the SSH public key that you've generated	`~/.ssh/kafka_aws.pub`
`ami`	The Amazon Machine Image (AWI) to be used by the cluster's machines	`ami-9fa343e7`
`instance_types`	The EC2 instance types used by the various components	`i3.4xlarge` (Kafka brokers), `t2.small` (ZooKeeper), `c4.8xlarge` (benchmarking client)

If you modify the public_key_path, make sure that you point to the appropriate SSH key path when running the Ansible playbook.

Running the Ansible playbook

With the appropriate infrastructure in place, you can install and start the Kafka cluster using Ansible with just one command:

$ ansible-playbook \
  --user ec2-user \
  --inventory `which terraform-inventory` \
  deploy.yaml

If you're using an SSH private key path different from ~/.ssh/kafka_aws, you can specify that path using the --private-key flag, for example --private-key=~/.ssh/my_key.

SSHing into the client host

In the output produced by Terraform, there's a client_ssh_host variable that provides the IP address for the client EC2 host from which benchmarks can be run. You can SSH into that host using this command:

$ ssh -i ~/.ssh/kafka_aws ec2-user@$(terraform output client_ssh_host)

Running the benchmarks from the client host

Once you've successfully SSHed into the client host, you can run the benchmarks like this:

$ cd /opt/benchmark
$ sudo bin/benchmark --drivers driver-kafka/kafka.yaml workloads/*.yaml

You can also run specific workloads in the workloads folder. Here's an example:

$ sudo bin/benchmark --drivers driver-kafka/kafka.yaml workloads/1-topic-16-partitions-1kb.yaml

There are multiple Kafka "modes" for which you can run benchmarks. Each mode has its own YAML configuration file in the driver-kafka folder.

Mode	Description	Config file
Standard	Kafka with message idempotence disabled (at-least-once semantics)	`kafka.yaml`
Exactly once	Kafka with message idempotence enabled ("exactly-once" semantics)	`kafka-exactly-once.yaml`
Sync	Kafka with durability enabled (all published messages synced to disk)	`kafka-sync.yaml`

The example used the "standard" mode as configured in driver-kafka/kafka.yaml. To run all available benchmark workloads in "exactly once" or "sync" mode instead:

# Exactly once
$ sudo bin/benchmark --drivers driver-kafka/kafka-exactly-once.yaml workloads/*.yaml

# Sync
$ sudo bin/benchmark --drivers driver-kafka/kafka-sync.yaml workloads/*.yaml

Here's an example of running a specific benchmarking workload in exactly once mode:

$ sudo bin/benchmark --drivers driver-kafka/kafka-exactly-once.yaml workloads/1-topic-16-partitions-1kb.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Apache Kafka benchmarks

Creating local artifacts

Creating a Kafka cluster on Amazon Web Services (AWS) using Terraform and Ansible

Variables

Running the Ansible playbook

SSHing into the client host

Running the benchmarks from the client host

Files

README.md

Latest commit

History

README.md

File metadata and controls

Apache Kafka benchmarks

Creating local artifacts

Creating a Kafka cluster on Amazon Web Services (AWS) using Terraform and Ansible

Variables

Running the Ansible playbook

SSHing into the client host

Running the benchmarks from the client host