Releases: anuvu/cruise-control
Releases · anuvu/cruise-control
Netty CVE fixes
Periodic sync to upstream master- May 2023
Required for fix linkedin#1957
Feb 2023 Sync
Syncing fork to upstream
PartitionReassignmentTimeout
fix/cruisecontrol: add partition movement timeout to executor There is an edge case wherein after the partition reassignment was submitted to kafka and before it finished, there was a partition leadership re-lection- this causes the reassignment to stall until there is another re-election. However, we do see cases where there is no re-election triggered leading to a partition reaissgnment being in IN_PROGRESS indefinitely and potentially missing new anomalies due to executor state being in INTER_BROKER_REPLICA_ACTION By adding a max timeout, we avoid this state by cancelling such reassignemnts and retrying them later includes minor cleanup
DeleteStaleReassignments
2.8.10 add option to delete partition reassignments not started by CruiseCon…
LaggingReplicasReassignmentGoal
Sometimes replicas stop fetching/updating state to ZK without any reason- so far we have seen it happen on rolling restarts but as the source of this bug is as yet unknown, there could be other scenarios as well
This is problematic as the replica will be out of ISR until we manually reassign the partition to a broker(even same one)
Adding a new LaggingReplicaReassignmentGoal that will track such replicas and reassign them to same brokers once the MAX_LAGGING_REPLICA_REASSIGN_MS(default 30 mins) is reached
Handle limbo-state partitionsReassignments
feat: cleaup stuck partitionReassignments Sometimes, an active partition reassignment goes into a limbo state due to the destination and source brokers going offline at the same time. When this happens, there will be a partitionReassignment stuck in kafka until it is maually cleared- due to this, CC stops reacting to any anomalies/broker failures/etc. This commit is for detecting and fixing such stuck active partitionReassignments.
Optimize maybeUpdateReplicationFactor
optimize replica updates - for increase, just add new replicas - for decrease(by k-replicas) remove last k replicas
Sample Store RF fixes
sample store replication-fixes - fix sample.store.replication.factor not being set - add maybeUpdateReplicationFactor to ensure sample.store.replication.factor is honored
Change sample store RF to 3
change DEFAULT_SAMPLE_STORE_TOPIC_REPLICATION_FACTOR to 3 (to be consistent with ND platform min. replicas policy) TODO: fix sample.store.topic.replication.factor not being set