Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gql] Increase EPOCH_DURATION_MS to reduce tests interacting with reconfiguration #20692

Merged
merged 3 commits into from
Dec 26, 2024

Conversation

wlmyng
Copy link
Contributor

@wlmyng wlmyng commented Dec 19, 2024

Description

When we submit a transaction to the test cluster while the nodes are reconfiguring for the new epoch, they will likely time out, causing several tests to frequently flake out.

Test plan

Flakey tests pass without multiple retries


Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

  • Protocol:
  • Nodes (Validators and Full nodes):
  • Indexer:
  • JSON-RPC:
  • GraphQL:
  • CLI:
  • Rust SDK:
  • REST API:

Copy link

vercel bot commented Dec 19, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Dec 23, 2024 11:48pm
3 Skipped Deployments
Name Status Preview Comments Updated (UTC)
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Dec 23, 2024 11:48pm
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Dec 23, 2024 11:48pm
sui-typescript-docs ⬜️ Ignored (Inspect) Visit Preview Dec 23, 2024 11:48pm

…sactions to timeout? noticed that fund_address_and_return_gas was frequently timing out, but this is a test function that is used rather frequently elsewhere.
@wlmyng wlmyng force-pushed the gql-epoch-reconfiguration-causing-flakiness-maybe branch from 72916c0 to e9baae9 Compare December 23, 2024 23:44
@wlmyng wlmyng temporarily deployed to sui-typescript-aws-kms-test-env December 23, 2024 23:44 — with GitHub Actions Inactive
@wlmyng wlmyng changed the title perhaps its just a matter of the epoch reconfiguring and causing tran… [gql] Increase EPOCH_DURATION_MS to reduce tests interacting with reconfiguration Dec 24, 2024
@wlmyng wlmyng marked this pull request as ready for review December 24, 2024 18:11
@wlmyng wlmyng temporarily deployed to sui-typescript-aws-kms-test-env December 24, 2024 18:11 — with GitHub Actions Inactive
@stefan-mysten
Copy link
Contributor

Thanks @wlmyng. No expert here, but I was wondering if it wouldn't make sense to have a higher epoch duration to around 60s, instead of requiring to force_reconfiguration.

Also, there might have been a reason as to why the duration was set to 10s. Do you happen to know why?

@wlmyng
Copy link
Contributor Author

wlmyng commented Dec 24, 2024

Thanks @wlmyng. No expert here, but I was wondering if it wouldn't make sense to have a higher epoch duration to around 60s, instead of requiring to force_reconfiguration.

Also, there might have been a reason as to why the duration was set to 10s. Do you happen to know why?

I've actually bumped the epoch duration to 300s, but otherwise some indefinitely long amount of time would work

I think the original idea was that we'd keep EPOCH_DURATION_MS short for testing. But what ends up happening is that transactions end up waiting for reconfiguration, which may even happen back to back, causing transactions to time out. What we should do instead is basically set epoch duration to some indefinitely long amount, and users of TestCluster should instead progress by trigger_configuration.

Note that this only affects the TestCluster instantiation in sui-graphql-rpc, and doesn't change the defaults from the test cluster crate. In a follow-up, I plan to expose a way to adjust this per instantiation in the graphql crate.

And I think it was set to 10s basically since its conception. Think the last relevant change was from 14 months ago, from me (ignoring Brandon's pg refactor)

@wlmyng wlmyng merged commit 0441f76 into main Dec 26, 2024
85 of 90 checks passed
@wlmyng wlmyng deleted the gql-epoch-reconfiguration-causing-flakiness-maybe branch December 26, 2024 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants