-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add replay verify on archive related workflows #15272
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
66 changes: 66 additions & 0 deletions
66
.github/workflows/provision-replay-verify-archive-disks.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
name: "provision-replay-verify-archive-disks" | ||
on: | ||
# Allow triggering manually | ||
workflow_dispatch: | ||
inputs: | ||
NETWORK: | ||
required: true | ||
type: choice | ||
description: The network to provision storage for.If not specified, it will provision snapshot for both testnet and mainnet. | ||
options: [testnet, mainnet, all] | ||
default: all | ||
pull_request: | ||
paths: | ||
- ".github/workflows/provision-replay-verify-archive-disks.yaml" | ||
- ".github/workflows/workflow-run-replay-verify-archive-storage-provision.yaml" | ||
schedule: | ||
- cron: "0 22 * * 1,3,5" # This runs every Mon,Wed,Fri | ||
|
||
permissions: | ||
contents: read | ||
id-token: write #required for GCP Workload Identity federation which we use to login into Google Artifact Registry | ||
issues: read | ||
pull-requests: read | ||
|
||
# cancel redundant builds | ||
concurrency: | ||
# cancel redundant builds on PRs (only on PR, not on branches) | ||
group: ${{ github.workflow }}-${{ (github.event_name == 'pull_request' && github.ref) || github.sha }} | ||
cancel-in-progress: true | ||
|
||
jobs: | ||
determine-test-metadata: | ||
runs-on: ubuntu-latest | ||
steps: | ||
# checkout the repo first, so check-aptos-core can use it and cancel the workflow if necessary | ||
- uses: actions/checkout@v4 | ||
- uses: ./.github/actions/check-aptos-core | ||
with: | ||
cancel-workflow: ${{ github.event_name == 'schedule' }} # Cancel the workflow if it is scheduled on a fork | ||
|
||
- name: Debug | ||
run: | | ||
echo "Event name: ${{ github.event_name }}" | ||
echo "Network: ${{ inputs.NETWORK }}" | ||
provision-testnet: | ||
if: | | ||
github.event_name == 'schedule' || | ||
github.event_name == 'push' || | ||
github.event_name == 'workflow_dispatch' && (inputs.NETWORK == 'testnet' || inputs.NETWORK == 'all') | ||
needs: determine-test-metadata | ||
uses: ./.github/workflows/workflow-run-replay-verify-archive-storage-provision.yaml | ||
secrets: inherit | ||
with: | ||
NETWORK: testnet | ||
|
||
provision-mainnet: | ||
if: | | ||
github.event_name == 'schedule' || | ||
github.event_name == 'push' || | ||
github.event_name == 'workflow_dispatch' && (inputs.NETWORK == 'testnet' || inputs.NETWORK == 'all') | ||
needs: determine-test-metadata | ||
uses: ./.github/workflows/workflow-run-replay-verify-archive-storage-provision.yaml | ||
secrets: inherit | ||
with: | ||
NETWORK: mainnet |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
# This defines a workflow to replay transactions on the given chain with the latest aptos node software. | ||
# In order to trigger it go to the Actions Tab of the Repo, click "replay-verify" and then "Run Workflow". | ||
# | ||
# On PR, a single test case will run. On workflow_dispatch, you may specify the CHAIN_NAME to verify. | ||
|
||
name: "replay-verify-on-archive" | ||
on: | ||
# Allow triggering manually | ||
workflow_dispatch: | ||
inputs: | ||
NETWORK: | ||
required: true | ||
type: choice | ||
options: [testnet, mainnet, all] | ||
default: all | ||
description: The chain name to test. If not specified, it will test both testnet and mainnet. | ||
IMAGE_TAG: | ||
required: false | ||
type: string | ||
description: The image tag of the feature branch to test, if not specified, it will use the latest commit on current branch. | ||
START_VERSION: | ||
required: false | ||
type: string | ||
description: Optional version to start replaying. If not specified, replay-verify will determines start version itself. | ||
END_VERSION: | ||
required: false | ||
type: string | ||
description: Optional version to end replaying. If not specified, replay-verify will determines end version itself. | ||
pull_request: | ||
paths: | ||
- ".github/workflows/replay-verify-on-archive.yaml" | ||
- ".github/workflows/workflow-run-replay-verify-on-archive.yaml" | ||
schedule: | ||
- cron: "0 22 * * 0,2,4" # The main branch cadence. This runs every Sun,Tues,Thurs | ||
|
||
permissions: | ||
contents: read | ||
id-token: write #required for GCP Workload Identity federation which we use to login into Google Artifact Registry | ||
issues: read | ||
pull-requests: read | ||
|
||
# cancel redundant builds | ||
concurrency: | ||
# cancel redundant builds on PRs (only on PR, not on branches) | ||
group: ${{ github.workflow }}-${{ (github.event_name == 'pull_request' && github.ref) || github.sha }} | ||
cancel-in-progress: true | ||
|
||
jobs: | ||
determine-test-metadata: | ||
runs-on: ubuntu-latest-32-core | ||
steps: | ||
# checkout the repo first, so check-aptos-core can use it and cancel the workflow if necessary | ||
- uses: actions/checkout@v4 | ||
- uses: ./.github/actions/check-aptos-core | ||
with: | ||
cancel-workflow: ${{ github.event_name == 'schedule' }} # Cancel the workflow if it is scheduled on a fork | ||
|
||
replay-testnet: | ||
if: | | ||
github.event_name == 'schedule' || | ||
github.event_name == 'push' || | ||
github.event_name == 'workflow_dispatch' && (inputs.NETWORK == 'testnet' || inputs.NETWORK == 'all') | ||
needs: determine-test-metadata | ||
uses: ./.github/workflows/workflow-run-replay-verify-on-archive.yaml | ||
secrets: inherit | ||
with: | ||
NETWORK: "testnet" | ||
IMAGE_TAG: ${{ inputs.IMAGE_TAG }} | ||
START_VERSION: ${{ inputs.START_VERSION }} | ||
END_VERSION: ${{ inputs.END_VERSION }} | ||
|
||
replay-mainnet: | ||
if: | | ||
github.event_name == 'schedule' || | ||
github.event_name == 'push' || | ||
github.event_name == 'pull_request' || | ||
github.event_name == 'workflow_dispatch' && (inputs.NETWORK == 'mainnet' || inputs.NETWORK == 'all' ) | ||
needs: determine-test-metadata | ||
uses: ./.github/workflows/workflow-run-replay-verify-on-archive.yaml | ||
secrets: inherit | ||
with: | ||
NETWORK: "mainnet" | ||
IMAGE_TAG: ${{ inputs.IMAGE_TAG }} | ||
START_VERSION: ${{ inputs.START_VERSION }} | ||
END_VERSION: ${{ inputs.END_VERSION }} |
61 changes: 61 additions & 0 deletions
61
.github/workflows/workflow-run-replay-verify-archive-storage-provision.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
name: "*run archive storage provision workflow" | ||
|
||
on: | ||
# This allows the workflow to be triggered from another workflow | ||
workflow_call: | ||
inputs: | ||
NETWORK: | ||
required: true | ||
type: string | ||
description: The network to provision storage for. | ||
workflow_dispatch: | ||
inputs: | ||
NETWORK: | ||
description: The network to provision storage for. | ||
type: string | ||
required: true | ||
jobs: | ||
provision: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- name: Checkout code | ||
uses: actions/checkout@v4 | ||
with: | ||
ref: ${{ github.event.inputs.BRANCH || 'add_replay_verify_workflow' }} | ||
|
||
# Authenticate to Google Cloud the project is aptos-ci | ||
- name: Authenticate to Google Cloud | ||
id: auth | ||
uses: "google-github-actions/auth@v2" | ||
with: | ||
workload_identity_provider: ${{ secrets.GCP_WORKLOAD_IDENTITY_PROVIDER }} | ||
service_account: ${{ secrets.GCP_SERVICE_ACCOUNT_EMAIL }} | ||
export_environment_variables: false | ||
create_credentials_file: true | ||
|
||
# This is required since we need to switch from aptos-ci to aptos-devinfra-0 | ||
- name: Setup Credentials | ||
run: | | ||
echo "GOOGLE_APPLICATION_CREDENTIALS=${{ steps.auth.outputs.credentials_file_path }}" >> $GITHUB_ENV | ||
echo "CLOUDSDK_AUTH_CREDENTIAL_FILE_OVERRIDE=${{ steps.auth.outputs.credentials_file_path }}" >> $GITHUB_ENV | ||
echo "GOOGLE_GHA_CREDS_PATH=${{ steps.auth.outputs.credentials_file_path }}" >> $GITHUB_ENV | ||
echo "CLOUDSDK_AUTH_ACCESS_TOKEN=${{ steps.auth.outputs.access_token }}" >> $GITHUB_ENV | ||
- name: Set up Cloud SDK | ||
uses: "google-github-actions/setup-gcloud@v2" | ||
with: | ||
install_components: "kubectl, gke-gcloud-auth-plugin" | ||
|
||
- name: "Setup GCloud Project" | ||
shell: bash | ||
run: gcloud config set project aptos-devinfra-0 | ||
|
||
- uses: ./.github/actions/python-setup | ||
with: | ||
pyproject_directory: testsuite/replay-verify | ||
|
||
- name: "Provision Storage" | ||
env: | ||
GOOGLE_CLOUD_PROJECT: aptos-devinfra-0 | ||
run: cd testsuite/replay-verify && poetry run python archive_disk_utils.py --network ${{ inputs.NETWORK }} | ||
|
119 changes: 119 additions & 0 deletions
119
.github/workflows/workflow-run-replay-verify-on-archive.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
name: "*run replay-verify on archive reusable workflow" | ||
|
||
on: | ||
# This allows the workflow to be triggered from another workflow | ||
workflow_call: | ||
inputs: | ||
NETWORK: | ||
required: true | ||
type: string | ||
description: The network to run replay verify on. | ||
IMAGE_TAG: | ||
required: false | ||
type: string | ||
description: The image tag of the feature branch to test, if not specified, it will use the latest commit on current branch. | ||
START_VERSION: | ||
required: false | ||
type: string | ||
description: Optional version to start replaying. If not specified, replay-verify will determines start version itself. | ||
END_VERSION: | ||
required: false | ||
type: string | ||
description: Optional version to end replaying. If not specified, replay-verify will determines end version itself. | ||
|
||
workflow_dispatch: | ||
inputs: | ||
NETWORK: | ||
required: true | ||
type: string | ||
description: The network to run replay verify on. | ||
IMAGE_TAG: | ||
required: false | ||
type: string | ||
description: The image tag of the feature branch to test, if not specified, it will use the latest commit on current branch. | ||
START_VERSION: | ||
required: false | ||
type: string | ||
description: The history start to use for the backup. If not specified, it will use the default history start. | ||
END_VERSION: | ||
required: false | ||
type: string | ||
description: The end version to use for the backup. If not specified, it will use the latest version. | ||
jobs: | ||
run-replay-verify: | ||
runs-on: ubuntu-latest-32-core | ||
steps: | ||
- name: Checkout code | ||
uses: actions/checkout@v4 | ||
with: | ||
ref: ${{ github.event.inputs.BRANCH || 'add_replay_verify_workflow' }} | ||
|
||
- uses: aptos-labs/aptos-core/.github/actions/docker-setup@main | ||
id: docker-setup | ||
with: | ||
GCP_WORKLOAD_IDENTITY_PROVIDER: ${{ secrets.GCP_WORKLOAD_IDENTITY_PROVIDER }} | ||
GCP_SERVICE_ACCOUNT_EMAIL: ${{ secrets.GCP_SERVICE_ACCOUNT_EMAIL }} | ||
EXPORT_GCP_PROJECT_VARIABLES: "false" | ||
GIT_CREDENTIALS: ${{ secrets.GIT_CREDENTIALS }} | ||
|
||
# Authenticate to Google Cloud the project is aptos-ci with credentails files generated | ||
- name: Authenticate to Google Cloud | ||
id: auth | ||
uses: "google-github-actions/auth@v2" | ||
with: | ||
workload_identity_provider: ${{ secrets.GCP_WORKLOAD_IDENTITY_PROVIDER }} | ||
service_account: ${{ secrets.GCP_SERVICE_ACCOUNT_EMAIL }} | ||
export_environment_variables: false | ||
create_credentials_file: true | ||
|
||
# This is required since we need to switch from aptos-ci to aptos-devinfra-0 | ||
- name: Setup credentials | ||
run: | | ||
echo "GOOGLE_APPLICATION_CREDENTIALS=${{ steps.auth.outputs.credentials_file_path }}" >> $GITHUB_ENV | ||
echo "CLOUDSDK_AUTH_CREDENTIAL_FILE_OVERRIDE=${{ steps.auth.outputs.credentials_file_path }}" >> $GITHUB_ENV | ||
echo "GOOGLE_GHA_CREDS_PATH=${{ steps.auth.outputs.credentials_file_path }}" >> $GITHUB_ENV | ||
echo "CLOUDSDK_AUTH_ACCESS_TOKEN=${{ steps.auth.outputs.access_token }}" >> $GITHUB_ENV | ||
- name: Set up Cloud SDK | ||
uses: "google-github-actions/setup-gcloud@v2" | ||
with: | ||
install_components: "kubectl, gke-gcloud-auth-plugin" | ||
|
||
- name: "Setup GCloud project" | ||
shell: bash | ||
run: gcloud config set project aptos-devinfra-0 | ||
|
||
- uses: ./.github/actions/python-setup | ||
with: | ||
pyproject_directory: testsuite/replay-verify | ||
|
||
- name: Schedule replay verify | ||
env: | ||
GOOGLE_CLOUD_PROJECT: aptos-devinfra-0 | ||
run: | | ||
cd testsuite/replay-verify | ||
CMD="poetry run python main.py --network ${{ inputs.NETWORK }}" | ||
if [ -n "${{ inputs.START_VERSION }}" ]; then | ||
CMD="$CMD --start ${{ inputs.START_VERSION }}" | ||
fi | ||
if [ -n "${{ inputs.END_VERSION }}" ]; then | ||
CMD="$CMD --end ${{ inputs.END_VERSION }}" | ||
fi | ||
if [ -n "${{ inputs.IMAGE_TAG }}" ]; then | ||
CMD="$CMD --end ${{ inputs.IMAGE_TAG }}" | ||
fi | ||
eval $CMD | ||
# This is in case user manually cancel the step above, we still want to cleanup the resources | ||
- name: Post-run cleanup | ||
env: | ||
GOOGLE_CLOUD_PROJECT: aptos-devinfra-0 | ||
if: ${{ always() }} | ||
run: | | ||
cd testsuite/replay-verify | ||
poetry run python main.py --network ${{ inputs.NETWORK }} --cleanup | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
import os | ||
import sys | ||
|
||
|
||
path = os.path.dirname(__file__) | ||
|
||
if path not in sys.path: | ||
sys.path.append(path) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
apiVersion: v1 | ||
kind: PersistentVolumeClaim | ||
metadata: | ||
annotations: | ||
volume.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io | ||
name: testnet-archive-claim | ||
labels: | ||
run: some-label | ||
spec: | ||
accessModes: | ||
- ReadOnlyMany | ||
resources: | ||
requests: | ||
storage: 10Ti | ||
storageClassName: ssd-data-xfs | ||
volumeMode: Filesystem | ||
dataSourceRef: | ||
name: testnet-archive | ||
kind: VolumeSnapshot | ||
apiGroup: snapshot.storage.k8s.io |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timeout is an okay, not an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timeout here is to force the worker to stop replaying and return the existing results immediately. error would discard the all the results and can be a waste
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use something like tokio::timeout here instead of making our own timeout?
If we dont enforce a timeout on a future using tokio (which also has its own caveats) then it seems like this might never actually time out here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The process will stop because it runs in small batches, and timeout checks occur per batch. I wrote this solution to address the issue of handling some long-running ranges, such as graffio transactions. It's not crucial for me that it stops exactly at the timeout; a few minutes later is acceptable. My main goal is to save the results from whatever has been replayed so that we can have partial results of these transactions.