Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forge: mark pods as non-evictable #15554

Merged
merged 1 commit into from
Dec 10, 2024
Merged

Forge: mark pods as non-evictable #15554

merged 1 commit into from
Dec 10, 2024

Conversation

sionescu
Copy link
Contributor

@sionescu sionescu commented Dec 10, 2024

Description

Add annotation to prevent the runner itself, as well as the validator and validator fullnode pods from being evicted by the cluster autoscaler.

Test plan

Make sure the Forge tests pass, and check the GHA logs that the added annotation is being used.

@sionescu sionescu requested a review from geekflyer December 10, 2024 20:46
Copy link

trunk-io bot commented Dec 10, 2024

⏱️ 2h 56m total CI duration on this PR
Slowest 15 Jobs Cumulative Duration Recent Runs
execution-performance / single-node-performance 47m 🟩🟩🟩
forge-compat-test / forge 13m 🟩
rust-move-tests 13m 🟩
rust-move-tests 13m 🟩
forge-e2e-test / forge 11m 🟩
execution-performance / test-target-determinator 11m 🟩🟩🟩
test-target-determinator 11m 🟩🟩🟩
check 11m 🟩🟩🟩
rust-move-tests 7m
rust-cargo-deny 7m 🟩🟩🟩
check-dynamic-deps 5m 🟩🟩🟩🟩
rust-doc-tests 5m 🟩
rust-doc-tests 5m 🟩
rust-doc-tests 5m 🟩
fetch-last-released-docker-image-tag 5m 🟩🟩🟩

🚨 2 jobs on the last run were significantly faster/slower than expected

Job Duration vs 7d avg Delta
execution-performance / single-node-performance 23m 15m +52%
execution-performance / test-target-determinator 3m 4m -26%

settingsfeedbackdocs ⋅ learn more about trunk.io

@sionescu sionescu added the CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR label Dec 10, 2024
@sionescu sionescu force-pushed the stelian/forge-evictions branch from e63f606 to ea8c83e Compare December 10, 2024 20:55
@sionescu sionescu changed the title Forge: mark validator pods as non-evictable Forge: mark pods as non-evictable Dec 10, 2024

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@perryjrandall
Copy link
Contributor

Please write a test plan

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@sionescu
Copy link
Contributor Author

I checked the GHA logs and both runners & validators had the correct annotation. The Forge tests also passed.

@sionescu sionescu enabled auto-merge (rebase) December 10, 2024 22:53

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

✅ Forge suite compat success on 3c6e693a27339e73520f41030dce8fc9cd504967 ==> ea8c83e3fed269dcf8296d9e7d9a4f9cbcfe01d4

Compatibility test results for 3c6e693a27339e73520f41030dce8fc9cd504967 ==> ea8c83e3fed269dcf8296d9e7d9a4f9cbcfe01d4 (PR)
1. Check liveness of validators at old version: 3c6e693a27339e73520f41030dce8fc9cd504967
compatibility::simple-validator-upgrade::liveness-check : committed: 16598.55 txn/s, latency: 2052.03 ms, (p50: 2100 ms, p70: 2200, p90: 2400 ms, p99: 2800 ms), latency samples: 533240
2. Upgrading first Validator to new version: ea8c83e3fed269dcf8296d9e7d9a4f9cbcfe01d4
compatibility::simple-validator-upgrade::single-validator-upgrading : committed: 7065.38 txn/s, latency: 4015.96 ms, (p50: 4600 ms, p70: 4700, p90: 4900 ms, p99: 5000 ms), latency samples: 131380
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 7138.93 txn/s, latency: 4573.91 ms, (p50: 4900 ms, p70: 4900, p90: 5100 ms, p99: 5500 ms), latency samples: 243780
3. Upgrading rest of first batch to new version: ea8c83e3fed269dcf8296d9e7d9a4f9cbcfe01d4
compatibility::simple-validator-upgrade::half-validator-upgrading : committed: 7007.73 txn/s, latency: 4026.20 ms, (p50: 4600 ms, p70: 4800, p90: 5100 ms, p99: 5300 ms), latency samples: 128020
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 7350.53 txn/s, latency: 4450.88 ms, (p50: 4800 ms, p70: 4900, p90: 5000 ms, p99: 5300 ms), latency samples: 247680
4. upgrading second batch to new version: ea8c83e3fed269dcf8296d9e7d9a4f9cbcfe01d4
compatibility::simple-validator-upgrade::rest-validator-upgrading : committed: 13069.16 txn/s, latency: 2107.48 ms, (p50: 2300 ms, p70: 2400, p90: 2500 ms, p99: 2600 ms), latency samples: 226800
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 13576.66 txn/s, latency: 2336.47 ms, (p50: 2400 ms, p70: 2500, p90: 2500 ms, p99: 2600 ms), latency samples: 437160
5. check swarm health
Compatibility test for 3c6e693a27339e73520f41030dce8fc9cd504967 ==> ea8c83e3fed269dcf8296d9e7d9a4f9cbcfe01d4 passed
Test Ok

Copy link
Contributor

✅ Forge suite realistic_env_max_load success on ea8c83e3fed269dcf8296d9e7d9a4f9cbcfe01d4

two traffics test: inner traffic : committed: 14664.86 txn/s, latency: 2708.53 ms, (p50: 2700 ms, p70: 2700, p90: 3000 ms, p99: 3300 ms), latency samples: 5575960
two traffics test : committed: 100.05 txn/s, latency: 1434.24 ms, (p50: 1400 ms, p70: 1500, p90: 1600 ms, p99: 1800 ms), latency samples: 1760
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 1.537, avg: 1.481", "ConsensusProposalToOrdered: max: 0.319, avg: 0.293", "ConsensusOrderedToCommit: max: 0.400, avg: 0.388", "ConsensusProposalToCommit: max: 0.691, avg: 0.681"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 1.28s no progress at version 23296 (avg 0.20s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.63s no progress at version 2174601 (avg 0.63s) [limit 16].
Test Ok

Copy link
Contributor

✅ Forge suite framework_upgrade success on 3c6e693a27339e73520f41030dce8fc9cd504967 ==> ea8c83e3fed269dcf8296d9e7d9a4f9cbcfe01d4

Compatibility test results for 3c6e693a27339e73520f41030dce8fc9cd504967 ==> ea8c83e3fed269dcf8296d9e7d9a4f9cbcfe01d4 (PR)
Upgrade the nodes to version: ea8c83e3fed269dcf8296d9e7d9a4f9cbcfe01d4
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1489.20 txn/s, submitted: 1492.29 txn/s, failed submission: 3.09 txn/s, expired: 3.09 txn/s, latency: 2050.89 ms, (p50: 2100 ms, p70: 2300, p90: 2500 ms, p99: 3900 ms), latency samples: 134980
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1333.95 txn/s, submitted: 1337.22 txn/s, failed submission: 3.28 txn/s, expired: 3.28 txn/s, latency: 2187.50 ms, (p50: 2100 ms, p70: 2400, p90: 3100 ms, p99: 4800 ms), latency samples: 122180
5. check swarm health
Compatibility test for 3c6e693a27339e73520f41030dce8fc9cd504967 ==> ea8c83e3fed269dcf8296d9e7d9a4f9cbcfe01d4 passed
Upgrade the remaining nodes to version: ea8c83e3fed269dcf8296d9e7d9a4f9cbcfe01d4
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1407.27 txn/s, submitted: 1410.11 txn/s, failed submission: 2.84 txn/s, expired: 2.84 txn/s, latency: 2258.21 ms, (p50: 2100 ms, p70: 2400, p90: 3000 ms, p99: 4300 ms), latency samples: 118720
Test Ok

@sionescu sionescu merged commit 807a2db into main Dec 10, 2024
93 checks passed
@sionescu sionescu deleted the stelian/forge-evictions branch December 10, 2024 23:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants