Set replication factor for kafka stability #1606
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This change resolves the issue
Failed to get watermark offsets: Local: Unknown partition
. The root cause was related to the Kafka replication configuration. By setting the replicationFactor to 3 (matching the number of Kafka brokers/controllers), this fix ensures consistent behavior when retrieving high watermark offsets. This issue was reported in sentry-kubernetes/charts#1458.Technical Explanation
The issue arises because the replicationFactor was previously set to 1, meaning that each partition only had a single replica. In this configuration, the high watermark offset—a key value in Kafka that indicates the maximum offset successfully replicated to all in-sync replicas (ISRs)—becomes unreliable.
Without sufficient replication, the loss of a single broker or temporary unavailability can result in Kafka being unable to compute or provide the high watermark for affected partitions. This leads to the error:
Failed to get watermark offsets: Local: Unknown partition.
By increasing the replicationFactor from 1 to 3, each partition is replicated across all three brokers/controllers. This ensures that the high watermark offset remains consistently available, even if a broker becomes unavailable or experiences minor instability. Additionally, the increased replication enhances fault tolerance and improves the overall availability of partition data across the cluster.
For more details on how Kafka replication works and the role of the high watermark, refer to the official documentation: Replication in Apache Kafka.