This folder houses all of the assets necessary to run benchmarks for Apache Pulsar. In order to run these benchmarks, you'll need to:
- Create the necessary local artifacts
- Stand up a Pulsar cluster on Amazon Web Services (which includes a client host for running the benchmarks)
- SSH into the client host
- Run the benchmarks from the client host
In order to create the local artifacts necessary to run the Pulsar benchmarks in AWS, you'll need to have Maven installed. Once Maven's installed, you can create the necessary artifacts with a single Maven command:
$ mvn install
In order to create an Apache Pulsar cluster on AWS, you'll need to have the following installed:
-
The messaging benchmarks repository:
$ git clone https://github.com/streamlio/messaging-benchmark $ cd messaging-benchmark/driver-pulsar/deploy
In addition, you will need to:
- Create an AWS account (or use an existing account)
- Install the
aws
CLI tool - Configure the
aws
CLI tool
Once those conditions are in place, you'll need to create an SSH public and private key at ~/.ssh/pulsar_aws
(private) and ~/.ssh/pulsar_aws.pub
(public), respectively.
$ ssh-keygen -f ~/.ssh/pulsar_aws
When prompted to enter a passphrase, simply hit Enter twice. Then, make sure that the keys have been created:
$ ls ~/.ssh/pulsar_aws*
With SSH keys in place, you can create the necessary AWS resources using a single Terraform command:
$ cd driver-pulsar/deploy
$ terraform init
$ terraform apply
That will install the following EC2 instances (plus some other resources, such as a Virtual Private Cloud (VPC)):
Resource | Description | Count |
---|---|---|
Pulsar/BookKeeper instances | The VMs on which a Pulsar broker and BookKeeper bookie will run | 3 |
ZooKeeper instances | The VMs on which a ZooKeeper node will run | 3 |
Client instance | The VM from which the benchmarking suite itself will be run | 1 |
When you run terraform apply
, you will be prompted to type yes
. Type yes
to continue with the installation or anything else to quit.
Once the installation is complete, you will see a confirmation message listing the resources that have been installed.
There's a handful of configurable parameters related to the Terraform deployment that you can alter by modifying the defaults in the terraform.tfvars
file.
Variable | Description | Default |
---|---|---|
region |
The AWS region in which the Pulsar cluster will be deployed | us-west-2 |
public_key_path |
The path to the SSH public key that you've generated | ~/.ssh/pulsar_aws.pub |
ami |
The Amazon Machine Image (AWI) to be used by the cluster's machines | ami-9fa343e7 |
instance_types |
The EC2 instance types used by the various components | i3.4xlarge (Pulsar brokers and BookKeeper bookies), t2.small (ZooKeeper), c4.8xlarge (benchmarking client) |
If you modify the
public_key_path
, make sure that you point to the appropriate SSH key path when running the Ansible playbook.
With the appropriate infrastructure in place, you can install and start the Pulsar cluster using Ansible with just one command:
$ ansible-playbook \
--user ec2-user \
--inventory `which terraform-inventory` \
deploy.yaml
If you're using an SSH private key path different from
~/.ssh/pulsar_aws
, you can specify that path using the--private-key
flag, for example--private-key=~/.ssh/my_key
.
If it's keep asking for the ssh key passphrase, you may add the keys to the ssh agent by runningssh-agent bash
andssh-add ~/.ssh/pulsar_aws
.
In the output produced by Terraform, there's a client_ssh_host
variable that provides the IP address for the client EC2 host from which benchmarks can be run. You can SSH into that host using this command:
$ ssh -i ~/.ssh/pulsar_aws ec2-user@$(terraform output client_ssh_host)
Once you've successfully SSHed into the client host, you can run all available benchmark workloads like this:
$ cd /opt/benchmark
$ sudo bin/benchmark --drivers driver-pulsar/pulsar.yaml workloads/*.yaml
You can also run specific workloads in the workloads
folder. Here's an example:
$ sudo bin/benchmark --drivers driver-pulsar/pulsar.yaml workloads/1-topic-16-partitions-1kb.yaml
There are multiple Pulsar "modes" for which you can run benchmarks. Each mode has its own YAML configuration file in the driver-pulsar
folder.
Mode | Description | Config file |
---|---|---|
Standard | Pulsar with message de-duplication disabled (at-least-once semantics) | pulsar.yaml |
Effectively once | Pulsar with message de-duplication enabled ("effectively-once" semantics) | pulsar-effectively-once.yaml |
The example used the "standard" mode as configured in driver-pulsar/pulsar.yaml
. To run all available benchmark workloads in "effectively once" mode:
$ sudo bin/benchmark --drivers driver-pulsar/pulsar-effectively-once.yaml workloads/*.yaml
Here's an example of running a specific benchmarking workload in effectively once mode:
$ sudo bin/benchmark --drivers driver-pulsar/pulsar-effectively-once.yaml workloads/1-topic-16-partitions-1kb.yaml