Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execution backpressure: reduce target 250 -> 200ms #15220

Merged
merged 2 commits into from
Nov 13, 2024

Conversation

bchocho
Copy link
Contributor

@bchocho bchocho commented Nov 7, 2024

Description

Both forge realistic env and mainnet get around 5 blocks/s so a target of 200 ms seems reasonable. Reducing the target should help latency at higher loads.

How Has This Been Tested?

Forge tests, in particular realistic_env_fairness_workload_sweep

Key Areas to Review

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Performance improvement
  • Refactoring
  • Dependency update
  • Documentation update
  • Tests

Which Components or Systems Does This Change Impact?

  • Validator Node
  • Full Node (API, Indexer, etc.)
  • Move/Aptos Virtual Machine
  • Aptos Framework
  • Aptos CLI/SDK
  • Developer Infrastructure
  • Move Compiler
  • Other (specify)

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

Copy link

trunk-io bot commented Nov 7, 2024

@bchocho bchocho added the CICD:run-forge-e2e-perf Run the e2e perf forge only label Nov 7, 2024

This comment has been minimized.

This comment has been minimized.

@bchocho bchocho marked this pull request as ready for review November 7, 2024 02:52
@bchocho bchocho requested a review from igor-aptos November 7, 2024 02:52
@bchocho
Copy link
Contributor Author

bchocho commented Nov 7, 2024

@igor-aptos I ran the fairness test. The latency marginally improves, but the TPS for load that previously spent almost 100% of time on execution does go down. I feel it's an OK tradeoff for now, but would like to hear your thoughts as well.
https://github.com/aptos-labs/aptos-core/actions/runs/11714631180/attempts/1

@igor-aptos igor-aptos requested a review from sitalkedia November 7, 2024 03:23
Copy link
Contributor

@igor-aptos igor-aptos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

order->commit latency looks quite a bit better (1.2s -> 0.95s), the rest will get lost in the noise.

we should potentially check the impact on throughput to the land blocking one based on history - if that is good - we can proceed..

unrelated, should we be increasing the threshold for landblocking? it is 14k even here, and threshold is 10k

@@ -197,7 +197,7 @@ impl Default for ConsensusConfig {
num_blocks_to_look_at: 12,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

increase this to 20, for less variance. that should still be only 2 seconds with increased blocks/s

@bchocho bchocho force-pushed the brian/reduce-exec-backpressure-target branch from 6c9e857 to ac04787 Compare November 12, 2024 19:10

This comment has been minimized.

1 similar comment

This comment has been minimized.

This comment has been minimized.

@bchocho
Copy link
Contributor Author

bchocho commented Nov 13, 2024

Here's the latest fairness test run: https://github.com/aptos-labs/aptos-core/actions/runs/11804380571
It took a few retries, but seems to be an unrelated issue with a VFN lagging behind

Copy link
Contributor

@sitalkedia sitalkedia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this pass other forge stable test? Let's verify before landing?

@bchocho
Copy link
Contributor Author

bchocho commented Nov 13, 2024

Does this pass other forge stable test? Let's verify before landing?

Will verify as part of release testing :)

@bchocho bchocho enabled auto-merge (squash) November 13, 2024 20:40

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

✅ Forge suite realistic_env_max_load success on ac047874b947adf8ad9fe1565d7a6b1cfdd6e458

two traffics test: inner traffic : committed: 14402.86 txn/s, latency: 2762.65 ms, (p50: 2700 ms, p70: 2700, p90: 3000 ms, p99: 3200 ms), latency samples: 5476300
two traffics test : committed: 100.08 txn/s, latency: 1453.65 ms, (p50: 1300 ms, p70: 1400, p90: 1500 ms, p99: 11100 ms), latency samples: 1720
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 2.073, avg: 1.583", "ConsensusProposalToOrdered: max: 0.327, avg: 0.293", "ConsensusOrderedToCommit: max: 0.367, avg: 0.354", "ConsensusProposalToCommit: max: 0.659, avg: 0.647"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.90s no progress at version 2268721 (avg 0.20s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 8.71s no progress at version 2268719 (avg 8.71s) [limit 15].
Test Ok

Copy link
Contributor

✅ Forge suite framework_upgrade success on ea6e45f0eee4b6da2ebf93b9b89d269d334fcf70 ==> ac047874b947adf8ad9fe1565d7a6b1cfdd6e458

Compatibility test results for ea6e45f0eee4b6da2ebf93b9b89d269d334fcf70 ==> ac047874b947adf8ad9fe1565d7a6b1cfdd6e458 (PR)
Upgrade the nodes to version: ac047874b947adf8ad9fe1565d7a6b1cfdd6e458
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1322.23 txn/s, submitted: 1324.05 txn/s, failed submission: 1.82 txn/s, expired: 1.82 txn/s, latency: 2333.33 ms, (p50: 2100 ms, p70: 2400, p90: 3600 ms, p99: 4800 ms), latency samples: 116080
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1438.63 txn/s, submitted: 1442.64 txn/s, failed submission: 4.01 txn/s, expired: 4.01 txn/s, latency: 2088.52 ms, (p50: 2100 ms, p70: 2100, p90: 3000 ms, p99: 4500 ms), latency samples: 129000
5. check swarm health
Compatibility test for ea6e45f0eee4b6da2ebf93b9b89d269d334fcf70 ==> ac047874b947adf8ad9fe1565d7a6b1cfdd6e458 passed
Upgrade the remaining nodes to version: ac047874b947adf8ad9fe1565d7a6b1cfdd6e458
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1454.14 txn/s, submitted: 1458.04 txn/s, failed submission: 3.90 txn/s, expired: 3.90 txn/s, latency: 2154.30 ms, (p50: 2100 ms, p70: 2400, p90: 3200 ms, p99: 4800 ms), latency samples: 126700
Test Ok

Copy link
Contributor

✅ Forge suite compat success on ea6e45f0eee4b6da2ebf93b9b89d269d334fcf70 ==> ac047874b947adf8ad9fe1565d7a6b1cfdd6e458

Compatibility test results for ea6e45f0eee4b6da2ebf93b9b89d269d334fcf70 ==> ac047874b947adf8ad9fe1565d7a6b1cfdd6e458 (PR)
1. Check liveness of validators at old version: ea6e45f0eee4b6da2ebf93b9b89d269d334fcf70
compatibility::simple-validator-upgrade::liveness-check : committed: 14462.77 txn/s, latency: 2002.25 ms, (p50: 1700 ms, p70: 1800, p90: 2100 ms, p99: 10900 ms), latency samples: 558400
2. Upgrading first Validator to new version: ac047874b947adf8ad9fe1565d7a6b1cfdd6e458
compatibility::simple-validator-upgrade::single-validator-upgrading : committed: 7392.48 txn/s, latency: 3893.84 ms, (p50: 4300 ms, p70: 4600, p90: 4700 ms, p99: 4700 ms), latency samples: 139800
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 6873.96 txn/s, latency: 4602.90 ms, (p50: 4700 ms, p70: 4800, p90: 6600 ms, p99: 7000 ms), latency samples: 226640
3. Upgrading rest of first batch to new version: ac047874b947adf8ad9fe1565d7a6b1cfdd6e458
compatibility::simple-validator-upgrade::half-validator-upgrading : committed: 7999.27 txn/s, latency: 3576.13 ms, (p50: 3900 ms, p70: 4200, p90: 4300 ms, p99: 4400 ms), latency samples: 146000
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 8005.48 txn/s, latency: 3988.67 ms, (p50: 4200 ms, p70: 4300, p90: 5700 ms, p99: 6000 ms), latency samples: 266300
4. upgrading second batch to new version: ac047874b947adf8ad9fe1565d7a6b1cfdd6e458
compatibility::simple-validator-upgrade::rest-validator-upgrading : committed: 11744.39 txn/s, latency: 2348.36 ms, (p50: 2400 ms, p70: 2600, p90: 3400 ms, p99: 3600 ms), latency samples: 203360
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 11154.52 txn/s, latency: 2770.63 ms, (p50: 2500 ms, p70: 2700, p90: 4900 ms, p99: 6500 ms), latency samples: 363360
5. check swarm health
Compatibility test for ea6e45f0eee4b6da2ebf93b9b89d269d334fcf70 ==> ac047874b947adf8ad9fe1565d7a6b1cfdd6e458 passed
Test Ok

@bchocho bchocho merged commit 88aee93 into main Nov 13, 2024
95 checks passed
@bchocho bchocho deleted the brian/reduce-exec-backpressure-target branch November 13, 2024 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-forge-e2e-perf Run the e2e perf forge only
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants