Skip to content

Releases: anuvu/cruise-control

Netty CVE fixes

19 Sep 22:58
Compare
Choose a tag to compare

cherrypicked netty fix

Periodic sync to upstream master- May 2023

07 Jun 20:08
Compare
Choose a tag to compare

Feb 2023 Sync

24 Feb 00:22
Compare
Choose a tag to compare

Syncing fork to upstream

PartitionReassignmentTimeout

06 Apr 17:58
Compare
Choose a tag to compare
fix/cruisecontrol: add partition movement timeout to executor

There is an edge case wherein after the partition reassignment was submitted to kafka and before it finished, there was a partition leadership re-lection- this causes the reassignment to stall until there is another re-election. However, we do see cases where there is no re-election triggered leading to a partition reaissgnment being in IN_PROGRESS indefinitely and potentially missing new anomalies due to executor state being in INTER_BROKER_REPLICA_ACTION

By adding a max timeout, we avoid this state by cancelling such reassignemnts and retrying them later

includes minor cleanup

DeleteStaleReassignments

21 Mar 03:30
Compare
Choose a tag to compare
2.8.10

add option to delete partition reassignments not started by CruiseCon…

LaggingReplicasReassignmentGoal

16 Nov 02:11
10d601f
Compare
Choose a tag to compare

Sometimes replicas stop fetching/updating state to ZK without any reason- so far we have seen it happen on rolling restarts but as the source of this bug is as yet unknown, there could be other scenarios as well

This is problematic as the replica will be out of ISR until we manually reassign the partition to a broker(even same one)

Adding a new LaggingReplicaReassignmentGoal that will track such replicas and reassign them to same brokers once the MAX_LAGGING_REPLICA_REASSIGN_MS(default 30 mins) is reached

See also: https://lists.apache.org/thread.html/rbfe9557a4dd8604cffce369e76cc74f90ff8f717f934e6e8b5141053%40%3Cusers.kafka.apache.org%3E

Handle limbo-state partitionsReassignments

30 Jul 02:27
Compare
Choose a tag to compare
feat: cleaup stuck partitionReassignments

Sometimes, an active partition reassignment goes into a limbo state
due to the destination and source brokers going offline at the same time.
When this happens, there will be a partitionReassignment stuck in kafka
until it is maually cleared- due to this, CC stops reacting to any
anomalies/broker failures/etc.

This commit is for detecting and fixing such stuck active partitionReassignments.

Optimize maybeUpdateReplicationFactor

21 Jul 22:54
Compare
Choose a tag to compare
optimize replica updates

- for increase, just add new replicas
- for decrease(by k-replicas) remove last k replicas

Sample Store RF fixes

20 Jul 18:31
Compare
Choose a tag to compare
sample store replication-fixes

- fix sample.store.replication.factor not being set
- add maybeUpdateReplicationFactor to ensure sample.store.replication.factor is honored

Change sample store RF to 3

18 Jul 01:00
Compare
Choose a tag to compare
change DEFAULT_SAMPLE_STORE_TOPIC_REPLICATION_FACTOR to 3

(to be consistent with ND platform min. replicas policy)
TODO: fix sample.store.topic.replication.factor not being set