Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deflake total sum of full synchronizations #622

Open
wants to merge 1 commit into
base: unstable
Choose a base branch
from

Conversation

naglera
Copy link
Contributor

@naglera naglera commented Jun 10, 2024

Stabilize test PSYNC2: total sum of full synchronizations at least 4

Explanation: During the previous test PSYNC2: generate load while killing replication links, load is generated on the master, and the replica's connection is repeatedly terminated. On a busy machine, this load can cause the replica to perform a full sync. There is no guarantee that the replicas will find the necessary bytes in the COB.

Deflake test "PSYNC2: total sum of full synchronizations at least 4"
Explaination: during the prior test "PSYNC2: generate load while killing
replication links" load is generated on the master, and the replica's
connection is killed multiple times. On busy machine this load can force
the replica to full sync. There is no guarantee that the replicas will
find the necessary bytes in the COB.

Signed-off-by: naglera <[email protected]>
@zuiderkwast
Copy link
Contributor

Have you seen this fail recently in Daily? If yes, then this fix is good to include in 8.0.

@enjoy-binbin
Copy link
Member

I haven't seen this test fail in a long, long time. (It was hardened a long time ago by me and oran)

@naglera
Copy link
Contributor Author

naglera commented Sep 1, 2024

I don't recall if I saw this specific test failing locally or in a PR GitHub workflow. However, the scenario where the primary does not have the necessary replication data for PSYNC is a valid possibility, especially under high load conditions and when the connection between the master and replica is often killed.
Do we have a reason to believe that the replica should not disconnect for too long in this scenario?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants