Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Scheduled try-state checks and alerting #1938

Closed
wants to merge 31 commits into from

Conversation

liamaharon
Copy link
Contributor

@liamaharon liamaharon commented Oct 19, 2023

Schedule a Github Action every day to run try-runtime try-state checks on all runtimes.

This is a 0 -> 1 task to get our try-state checks deployed at all. In the future, it would be better for us to move these checks to dedicated infrastructure that can run them more frequently, perhaps using follow-chain.

Task list

@liamaharon liamaharon added the T10-tests This PR/Issue is related to tests. label Oct 19, 2023
@liamaharon liamaharon requested review from a team as code owners October 19, 2023 01:22
@paritytech-ci paritytech-ci requested a review from a team October 19, 2023 09:54
@bkchr
Copy link
Member

bkchr commented Oct 20, 2023

What is the purpose of running it every hour?

@liamaharon
Copy link
Contributor Author

What is the purpose of running it every hour?

To notify us asap in the case of any storage invariants breaking.

@bkchr
Copy link
Member

bkchr commented Oct 20, 2023

What is the purpose of running it every hour?

To notify us asap in the case of any storage invariants breaking.

Shouldn't this be some continuous job?

@ggwpez
Copy link
Member

ggwpez commented Oct 20, 2023

Shouldn't this be some continuous job?

I think follow-chain should check the storage invariants, or?

@liamaharon
Copy link
Contributor Author

liamaharon commented Oct 21, 2023

Shouldn't this be some continuous job?

I think follow-chain should check the storage invariants, or?

Ideally, yes we would have dedicated infrastructure that runs follow-chain to check these invariants.

In reality, there're some drawbacks of using follow-chain:

  • Requires high up-front and ongoing human and capital cost to maintain and monitor infrastructure for every chain, kept up to date with the latest runtimes.
  • Couldn't actually run checks 'continuously' anyway. The checks are too expensive to run every block, so would be executed in round-robin mode, checking a subset of pallets each block.

Internally, follow-chain does the same thing as execute-block. The benefit of using execute-block directly is:

  • Can define the job as a Github Action.
    • Trivial for Fellowship or parachain teams to replicate these checks in their own repos for their own runtimes (exponentially less initial and on-going cost compared to spinning up infra and CI/CD for follow-chain)
    • Trivial for all developers to monitor and maintain the job, rather than requiring work from other (infrastructure) team to maintain, monitor, and manage access to infra
  • With artifact caching we can run it much more frequently than once an hr, so not like it would take that much longer to be notified of issues compared to follow-chain.

For these reasons, I suggested to Kian and he agreed to opt to go with this approach for checking storage invariants. My backlog of work is pretty large, so this option is nice as it allows us to get something out the door quickly that works and doesn't (as far as we could tell) have significant drawbacks compared with dedicated infra running follow-chain (which does have some significant drawbacks). If we've missed something let us know.

@liamaharon liamaharon changed the title [WIP / DNM] Scheduled try-state checks and alerting [WIP] Scheduled try-state checks and alerting Oct 26, 2023
@liamaharon liamaharon added R0-silent Changes should not be mentioned in any release notes and removed T10-tests This PR/Issue is related to tests. labels Oct 27, 2023
Copy link
Member

@ggwpez ggwpez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

- cron: "0 0 * * *"

env:
# TODO: Replace this with paritytech/polkadot-sdk 'latest' release URL once 1.3.0 is published
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.3.0 did not include the Github workflow that builds the runtimes so we need to wait 1 more release for that to happen automatically, but I'm inclined to upload the debug builds there manually so we can use it.

@liamaharon
Copy link
Contributor Author

Now that we're running full try-state checks in the CI I'm questioning how valuable this cron job will be.

Looking into some alternatives that will hopefully provide more value and be more easily accessible to ecosystem chains without them needing to spin up dedicated CI infra.

@bkchr bkchr closed this Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
R0-silent Changes should not be mentioned in any release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants