chore(consensus): reuse db and prevent InsufficientPeers error #2242

asmaastarkware · 2024-07-22T11:33:06Z

This change is

codecov · 2024-07-22T11:40:47Z

Codecov Report

Attention: Patch coverage is 87.50000% with 1 line in your changes missing coverage. Please review.

Project coverage is 66.34%. Comparing base (2db883b) to head (647e080).
Report is 5 commits behind head on main.

Files with missing lines	Patch %	Lines
crates/papyrus_node/src/main.rs	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2242      +/-   ##
==========================================
+ Coverage   66.31%   66.34%   +0.02%     
==========================================
  Files         139      139              
  Lines       18341    18387      +46     
  Branches    18341    18387      +46     
==========================================
+ Hits        12163    12199      +36     
- Misses       4883     4891       +8     
- Partials     1295     1297       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

matan-starkware

Reviewed 1 of 1 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @asmaastarkware)

crates/sequencing/papyrus_consensus/run_consensus.py line 139 at r1 (raw file):

    # 1. The bootnode (validator 1) is started first for network peering.
    # 2. Validators 2+ are started next to join the network through the bootnode.
    # 3. Validator 0, which is the proposer, is started last so the validators don't miss the proposals.

Why did you remove this?

Code quote:

    # Validators are started in a specific order to ensure proper network formation:
    # 1. The bootnode (validator 1) is started first for network peering.
    # 2. Validators 2+ are started next to join the network through the bootnode.
    # 3. Validator 0, which is the proposer, is started last so the validators don't miss the proposals.

crates/sequencing/papyrus_consensus/run_consensus.py line 149 at r1 (raw file):

    return nodes

def acquire_lock(data_dir):

Have you tested this both succeeding and failing?

Code quote:

def acquire_lock(data_dir):

crates/sequencing/papyrus_consensus/run_consensus.py line 165 at r1 (raw file):

    logs_dir = tempfile.mkdtemp()

    if db_dir:

Suggestion:

    if db_dir is not None:

crates/sequencing/papyrus_consensus/run_consensus.py line 166 at r1 (raw file):

    if db_dir:
        assert os.path.exists(db_dir), f"The specified directory {db_dir} does not exist."

listdir triggers FileNotFoundError which I think is descriptive enough.

crates/sequencing/papyrus_consensus/run_consensus.py line 167 at r1 (raw file):

    if db_dir:
        assert os.path.exists(db_dir), f"The specified directory {db_dir} does not exist."
        data_dirs = [d for d in os.listdir(db_dir) if os.path.isdir(os.path.join(db_dir, d))]

Just create here as a set.

Suggestion:

        data_dirs = {d for d in os.listdir(db_dir) if os.path.isdir(os.path.join(db_dir, d))}

crates/sequencing/papyrus_consensus/run_consensus.py line 172 at r1 (raw file):

        assert (
            len(data_dirs) == num_validators
        ), f"The specified directory {db_dir} must contain exactly {num_validators} validator directories."

This is not needed based on the expected_dirs check below.

crates/sequencing/papyrus_consensus/run_consensus.py line 176 at r1 (raw file):

        # Ensure the directories are named data0, data1, ..., data{num_validators - 1}
        expected_dirs = {f"data{i}" for i in range(num_validators)}
        actual_dirs = set(data_dirs)

Create as set above

crates/sequencing/papyrus_consensus/run_consensus.py line 180 at r1 (raw file):

        assert (
            expected_dirs == actual_dirs
        ), f"The directories in {db_dir} must be named {', '.join(expected_dirs)}."

We want to know that there are enough db folders. We don't care if there are also other folders.

Suggestion:

        assert (
            expected_dirs.issubset(actual_dirs)
        ), f"The directories in {db_dir} must be named {', '.join(expected_dirs)}."

crates/sequencing/papyrus_consensus/run_consensus.py line 187 at r1 (raw file):

    # Acquire lock on the db_dir
    lock_fd = acquire_lock(db_dir)

Using enter and exit to enable a with statement is a more pythonic way. Means you don't need to put the close explicitly here.

Suggestion:

with LockFile(db_dif) as lockfile:

crates/sequencing/papyrus_consensus/run_consensus.py line 209 at r1 (raw file):

    parser.add_argument("--base_layer_node_url", required=True)
    parser.add_argument("--num_validators", type=int, required=True)
    parser.add_argument("--db_dir", required=False, default=None)

Let's add a description:

Directory with existing DBs that this simulation can reuse.

Code quote:

    parser.add_argument("--db_dir", required=False, default=None)

crates/sequencing/papyrus_consensus/src/lib.rs line 116 at r2 (raw file):

{
    // Add a short delay to allow peers to connect and avoid "InsufficientPeers" error
    tokio::time::sleep(std::time::Duration::from_secs(5)).await;

Let's make this part of the consensus config.

Code quote:

    // Add a short delay to allow peers to connect and avoid "InsufficientPeers" error
    tokio::time::sleep(std::time::Duration::from_secs(5)).await;

asmaastarkware

Reviewable status: 0 of 6 files reviewed, 11 unresolved discussions (waiting on @matan-starkware)

crates/sequencing/papyrus_consensus/run_consensus.py line 139 at r1 (raw file):

Previously, matan-starkware wrote…

Why did you remove this?

Done.

crates/sequencing/papyrus_consensus/run_consensus.py line 149 at r1 (raw file):

Previously, matan-starkware wrote…

Have you tested this both succeeding and failing?

Yes, I have run simulations with the following cases:

without db_dir.
reuse an existing db_dir with a single simulation.
used an invalid db_dir that either doesn't exist or doesn't have the required data dirs.
ran multiple simulations simultaneously using the same db_dir.

crates/sequencing/papyrus_consensus/run_consensus.py line 165 at r1 (raw file):

    logs_dir = tempfile.mkdtemp()

    if db_dir:

Done.

crates/sequencing/papyrus_consensus/run_consensus.py line 166 at r1 (raw file):

Previously, matan-starkware wrote…

listdir triggers FileNotFoundError which I think is descriptive enough.

Done.

crates/sequencing/papyrus_consensus/run_consensus.py line 167 at r1 (raw file):

Previously, matan-starkware wrote…

Just create here as a set.

Done.

crates/sequencing/papyrus_consensus/run_consensus.py line 172 at r1 (raw file):

Previously, matan-starkware wrote…

This is not needed based on the expected_dirs check below.

Done.

crates/sequencing/papyrus_consensus/run_consensus.py line 176 at r1 (raw file):

Previously, matan-starkware wrote…

Create as set above

Done.

crates/sequencing/papyrus_consensus/run_consensus.py line 180 at r1 (raw file):

Previously, matan-starkware wrote…

We want to know that there are enough db folders. We don't care if there are also other folders.

Done.

crates/sequencing/papyrus_consensus/run_consensus.py line 187 at r1 (raw file):

Previously, matan-starkware wrote…

Using enter and exit to enable a with statement is a more pythonic way. Means you don't need to put the close explicitly here.

Done.

crates/sequencing/papyrus_consensus/run_consensus.py line 209 at r1 (raw file):

Previously, matan-starkware wrote…

Let's add a description:

Directory with existing DBs that this simulation can reuse.

Done.

crates/sequencing/papyrus_consensus/src/lib.rs line 116 at r2 (raw file):

Previously, matan-starkware wrote…

Let's make this part of the consensus config.

Done.

matan-starkware

Reviewed 2 of 2 files at r3, 5 of 5 files at r4, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @asmaastarkware)

crates/sequencing/papyrus_consensus/run_consensus.py line 149 at r1 (raw file):

Previously, asmaastarkware (asmaa-starkware) wrote…

Yes, I have run simulations with the following cases:

without db_dir.

reuse an existing db_dir with a single simulation.

used an invalid db_dir that either doesn't exist or doesn't have the required data dirs.

ran multiple simulations simultaneously using the same db_dir.

Great.

Have you tried running multiple simulations simultaneously correctly (different db_dir)?

crates/sequencing/papyrus_consensus/run_consensus.py line 38 at r3 (raw file):

        command = f"curl -s -X GET http://localhost:{port}/monitoring/metrics | grep -oP 'papyrus_consensus_height \\K\\d+'"
        result = subprocess.run(command, shell=True, capture_output=True, text=True)
        # returns the most recently decided height, or None if node is not ready or consensus has not yet reached any height.

This line is too long.

Code quote:

        # returns the most recently decided height, or None if node is not ready or consensus has not yet reached any height.

crates/sequencing/papyrus_consensus/run_consensus.py line 186 at r3 (raw file):

        actual_dirs = {d for d in os.listdir(db_dir) if os.path.isdir(os.path.join(db_dir, d))}

        # Ensure the directories are named data0, data1, ..., data{num_validators - 1}

Let's just delete this and the empty line above. I think it's obvious enough from he code, and we can see the else right here where we originally created them.

crates/sequencing/papyrus_consensus/run_consensus.py line 203 at r3 (raw file):

        print(
            f"Output files will be stored in: {logs_dir} and data files will be stored in: {db_dir}"
        )

I think it's easier to read these on 2 separate lines (see how you feel actually running the script).

Suggestion:

        print(f"DB files will be stored in: {db_dir}")
        print(f"Logs will be stored in: {logs_dir}")

crates/sequencing/papyrus_consensus/src/config.rs line 27 at r4 (raw file):

    pub num_validators: u64,
    /// The delay in seconds before starting consensus to prevent InsufficientPeers error.
    pub consensus_delay: u64,

Can you put a Duration in a config?

Suggestion:

Duration

crates/sequencing/papyrus_consensus/src/config.rs line 61 at r4 (raw file):

                &self.consensus_delay,
                "The delay in seconds before starting consensus to prevent 'InsufficientPeers' \
                 error.",

Suggestion:

Delay (seconds) before starting consensus to give time for network peering.

crates/sequencing/papyrus_consensus/src/lib.rs line 109 at r4 (raw file):

    start_height: BlockNumber,
    validator_id: ValidatorId,
    consensus_delay: u64,

Suggestion:

Duration

asmaastarkware

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @matan-starkware)

crates/sequencing/papyrus_consensus/run_consensus.py line 149 at r1 (raw file):

Previously, matan-starkware wrote…

Great.

Have you tried running multiple simulations simultaneously correctly (different db_dir)?

done

crates/sequencing/papyrus_consensus/run_consensus.py line 38 at r3 (raw file):

Previously, matan-starkware wrote…

This line is too long.

Done.

crates/sequencing/papyrus_consensus/run_consensus.py line 186 at r3 (raw file):

Previously, matan-starkware wrote…

Let's just delete this and the empty line above. I think it's obvious enough from he code, and we can see the else right here where we originally created them.

Done.

crates/sequencing/papyrus_consensus/run_consensus.py line 203 at r3 (raw file):

Previously, matan-starkware wrote…

I think it's easier to read these on 2 separate lines (see how you feel actually running the script).

Done.

crates/sequencing/papyrus_consensus/src/config.rs line 27 at r4 (raw file):

Previously, matan-starkware wrote…

Can you put a Duration in a config?

Done.

crates/sequencing/papyrus_consensus/src/config.rs line 61 at r4 (raw file):

                &self.consensus_delay,
                "The delay in seconds before starting consensus to prevent 'InsufficientPeers' \
                 error.",

Done.

crates/sequencing/papyrus_consensus/src/lib.rs line 109 at r4 (raw file):

    start_height: BlockNumber,
    validator_id: ValidatorId,
    consensus_delay: u64,

Done.

… error

matan-starkware

Reviewed 6 of 6 files at r5, 5 of 5 files at r6, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @asmaastarkware)

github-actions · 2024-08-23T08:01:01Z

There hasn't been any activity on this pull request recently, and in order to prioritize active work, it has been marked as stale.
This PR will be closed and locked in 7 days if no further activity occurs.
Thank you for your contributions!

asmaastarkware requested a review from matan-starkware July 22, 2024 11:33

asmaastarkware force-pushed the asmaa/consensus_use_same_db branch from 5f948dd to a76a409 Compare July 23, 2024 06:33

asmaastarkware changed the title ~~chore(consensus): use previous db and prevent InsufficientPeers error~~ chore(consensus): reuse db and prevent InsufficientPeers error Jul 23, 2024

matan-starkware requested changes Jul 23, 2024

View reviewed changes

asmaastarkware force-pushed the asmaa/consensus_use_same_db branch from a76a409 to b97d984 Compare July 23, 2024 10:42

asmaastarkware commented Jul 23, 2024

View reviewed changes

matan-starkware self-requested a review July 23, 2024 10:49

matan-starkware requested changes Jul 23, 2024

View reviewed changes

asmaastarkware force-pushed the asmaa/consensus_use_same_db branch from b97d984 to 1d18575 Compare July 23, 2024 11:59

asmaastarkware commented Jul 23, 2024

View reviewed changes

chore(consensus): reuse DB

faa2d2d

asmaastarkware force-pushed the asmaa/consensus_use_same_db branch from 1d18575 to 3841618 Compare July 23, 2024 12:02

chore(config): add short consensus delay to prevent InsufficientPeers…

647e080

… error

asmaastarkware force-pushed the asmaa/consensus_use_same_db branch from 3841618 to 647e080 Compare July 23, 2024 12:05

matan-starkware approved these changes Jul 23, 2024

View reviewed changes

github-actions bot added the stale No activity for quite some time. label Aug 23, 2024

github-actions bot closed this Aug 31, 2024

github-actions bot locked and limited conversation to collaborators Sep 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(consensus): reuse db and prevent InsufficientPeers error #2242

chore(consensus): reuse db and prevent InsufficientPeers error #2242

asmaastarkware commented Jul 22, 2024 •

edited by spapinistarkware

Loading

codecov bot commented Jul 22, 2024 •

edited

Loading

matan-starkware left a comment

asmaastarkware left a comment

matan-starkware left a comment

asmaastarkware left a comment

matan-starkware left a comment

github-actions bot commented Aug 23, 2024

chore(consensus): reuse db and prevent InsufficientPeers error #2242

chore(consensus): reuse db and prevent InsufficientPeers error #2242

Conversation

asmaastarkware commented Jul 22, 2024 • edited by spapinistarkware Loading

codecov bot commented Jul 22, 2024 • edited Loading

Codecov Report

matan-starkware left a comment

Choose a reason for hiding this comment

asmaastarkware left a comment

Choose a reason for hiding this comment

matan-starkware left a comment

Choose a reason for hiding this comment

asmaastarkware left a comment

Choose a reason for hiding this comment

matan-starkware left a comment

Choose a reason for hiding this comment

github-actions bot commented Aug 23, 2024

asmaastarkware commented Jul 22, 2024 •

edited by spapinistarkware

Loading

codecov bot commented Jul 22, 2024 •

edited

Loading