This is a rack-aware tool for assigning Kafka partitions to brokers that minimizes data movement. It also includes the ability to inspect the current live brokers in the cluster and the current partition assignment.
Using this tool will greatly simplify operations like decommissioning a broker, adding a new broker, or replacing a broker.
Kafka's built-in algorithm is easy to use and monitor, but it does not take into account existing assignments of partitions to nodes. Instead, the burden is on the operator to either move entire topics across brokers, or come up with a sane way of moving some number of partitions of existing topics. This is extremely disruptive.
This tool minimizes the number of partitions already assigned that need to leave a given node, while ensuring that each broker is responsible for a similar number of partitions. This enables use cases like node replacement, in which we would like to bring up a broker that is responsible for the same data as a misbehaving broker that it is replacing.
This tool uses a strategy that behaves similarly to Apache Helix's auto-rebalancing algorithm. It first assigns as many already-assigned partitions back to nodes as it can (while ensuring that no node is overloaded), and then evenly assigns all other partitions such that every node eventually ends up responsible for roughly the same number of partitions.
- Download from the "Releases" page
tar xf kafka-assigner-1.1-pkg.tar
cd kafka-assigner-1.1/bin
Requires Java 1.7+
./kafka-assignment-generator.sh [options...] arguments...
--broker_hosts VAL : comma-separated list of broker
hostnames (instead of broker IDs)
--broker_hosts_to_remove VAL : comma-separated list of broker
hostnames to exclude (instead of
broker IDs)
--disable_rack_awareness : set to true to ignore rack
configurations
--integer_broker_ids VAL : comma-separated list of Kafka broker
IDs (integers)
--mode [PRINT_CURRENT_ASSIGNMENT | : the mode to run (PRINT_CURRENT_ASSIGNM
PRINT_CURRENT_BROKERS | ENT, PRINT_CURRENT_BROKERS,
PRINT_REASSIGNMENT] PRINT_REASSIGNMENT)
--topics VAL : comma-separated list of topics
--zk_string VAL : ZK quorum as comma-separated
host:port pairs
./kafka-assignment-generator.sh --zk_string my-zk-host:2181 --mode PRINT_REASSIGNMENT
The output JSON can then be fed into Kafka's reassign partitions command. See here for instructions.
This mode is useful for decommissioning or replacing a node. The partitions will be assigned to all live hosts, excluding the hosts that are specified.
./kafka-assignment-generator.sh --zk_string my-zk-host:2181 --mode PRINT_REASSIGNMENT --broker_hosts_to_remove misbehaving-host1,misbehaving-host2
The output JSON can then be fed into Kafka's reassign partitions command. See here for instructions.
Note that in this mode, it is expected that every host that should own partitions should be specified, including existing ones.
./kafka-assignment-generator.sh --zk_string my-zk-host:2181 --mode PRINT_REASSIGNMENT --broker_hosts host1,host2,host3
The output JSON can then be fed into Kafka's reassign partitions command. See here for instructions.
./kafka-assignment-generator.sh --zk_string my-zk-host:2181 --mode PRINT_CURRENT_BROKERS
./kafka-assignment-generator.sh --zk_string my-zk-host:2181 --mode PRINT_CURRENT_ASSIGNMENT
Requires Java 1.7+ and Maven 3.2+
- Clone this repository
mvn install package
- Artifacts are in
target/kafka-assigner-pkg
Licensed under the Apache License 2.0.