-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(consensus): reuse db and prevent InsufficientPeers error #2242
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2242 +/- ##
==========================================
+ Coverage 66.31% 66.34% +0.02%
==========================================
Files 139 139
Lines 18341 18387 +46
Branches 18341 18387 +46
==========================================
+ Hits 12163 12199 +36
- Misses 4883 4891 +8
- Partials 1295 1297 +2 ☔ View full report in Codecov by Sentry. |
5f948dd
to
a76a409
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @asmaastarkware)
crates/sequencing/papyrus_consensus/run_consensus.py
line 139 at r1 (raw file):
# 1. The bootnode (validator 1) is started first for network peering. # 2. Validators 2+ are started next to join the network through the bootnode. # 3. Validator 0, which is the proposer, is started last so the validators don't miss the proposals.
Why did you remove this?
Code quote:
# Validators are started in a specific order to ensure proper network formation:
# 1. The bootnode (validator 1) is started first for network peering.
# 2. Validators 2+ are started next to join the network through the bootnode.
# 3. Validator 0, which is the proposer, is started last so the validators don't miss the proposals.
crates/sequencing/papyrus_consensus/run_consensus.py
line 149 at r1 (raw file):
return nodes def acquire_lock(data_dir):
Have you tested this both succeeding and failing?
Code quote:
def acquire_lock(data_dir):
crates/sequencing/papyrus_consensus/run_consensus.py
line 165 at r1 (raw file):
logs_dir = tempfile.mkdtemp() if db_dir:
Suggestion:
if db_dir is not None:
crates/sequencing/papyrus_consensus/run_consensus.py
line 166 at r1 (raw file):
if db_dir: assert os.path.exists(db_dir), f"The specified directory {db_dir} does not exist."
listdir triggers FileNotFoundError
which I think is descriptive enough.
crates/sequencing/papyrus_consensus/run_consensus.py
line 167 at r1 (raw file):
if db_dir: assert os.path.exists(db_dir), f"The specified directory {db_dir} does not exist." data_dirs = [d for d in os.listdir(db_dir) if os.path.isdir(os.path.join(db_dir, d))]
Just create here as a set.
Suggestion:
data_dirs = {d for d in os.listdir(db_dir) if os.path.isdir(os.path.join(db_dir, d))}
crates/sequencing/papyrus_consensus/run_consensus.py
line 172 at r1 (raw file):
assert ( len(data_dirs) == num_validators ), f"The specified directory {db_dir} must contain exactly {num_validators} validator directories."
This is not needed based on the expected_dirs
check below.
crates/sequencing/papyrus_consensus/run_consensus.py
line 176 at r1 (raw file):
# Ensure the directories are named data0, data1, ..., data{num_validators - 1} expected_dirs = {f"data{i}" for i in range(num_validators)} actual_dirs = set(data_dirs)
Create as set above
crates/sequencing/papyrus_consensus/run_consensus.py
line 180 at r1 (raw file):
assert ( expected_dirs == actual_dirs ), f"The directories in {db_dir} must be named {', '.join(expected_dirs)}."
We want to know that there are enough db folders. We don't care if there are also other folders.
Suggestion:
assert (
expected_dirs.issubset(actual_dirs)
), f"The directories in {db_dir} must be named {', '.join(expected_dirs)}."
crates/sequencing/papyrus_consensus/run_consensus.py
line 187 at r1 (raw file):
# Acquire lock on the db_dir lock_fd = acquire_lock(db_dir)
Using enter and exit to enable a with statement is a more pythonic way. Means you don't need to put the close explicitly here.
Suggestion:
with LockFile(db_dif) as lockfile:
crates/sequencing/papyrus_consensus/run_consensus.py
line 209 at r1 (raw file):
parser.add_argument("--base_layer_node_url", required=True) parser.add_argument("--num_validators", type=int, required=True) parser.add_argument("--db_dir", required=False, default=None)
Let's add a description:
Directory with existing DBs that this simulation can reuse.
Code quote:
parser.add_argument("--db_dir", required=False, default=None)
crates/sequencing/papyrus_consensus/src/lib.rs
line 116 at r2 (raw file):
{ // Add a short delay to allow peers to connect and avoid "InsufficientPeers" error tokio::time::sleep(std::time::Duration::from_secs(5)).await;
Let's make this part of the consensus config.
Code quote:
// Add a short delay to allow peers to connect and avoid "InsufficientPeers" error
tokio::time::sleep(std::time::Duration::from_secs(5)).await;
a76a409
to
b97d984
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 6 files reviewed, 11 unresolved discussions (waiting on @matan-starkware)
crates/sequencing/papyrus_consensus/run_consensus.py
line 139 at r1 (raw file):
Previously, matan-starkware wrote…
Why did you remove this?
Done.
crates/sequencing/papyrus_consensus/run_consensus.py
line 149 at r1 (raw file):
Previously, matan-starkware wrote…
Have you tested this both succeeding and failing?
Yes, I have run simulations with the following cases:
- without db_dir.
- reuse an existing db_dir with a single simulation.
- used an invalid db_dir that either doesn't exist or doesn't have the required data dirs.
- ran multiple simulations simultaneously using the same db_dir.
crates/sequencing/papyrus_consensus/run_consensus.py
line 165 at r1 (raw file):
logs_dir = tempfile.mkdtemp() if db_dir:
Done.
crates/sequencing/papyrus_consensus/run_consensus.py
line 166 at r1 (raw file):
Previously, matan-starkware wrote…
listdir triggers
FileNotFoundError
which I think is descriptive enough.
Done.
crates/sequencing/papyrus_consensus/run_consensus.py
line 167 at r1 (raw file):
Previously, matan-starkware wrote…
Just create here as a set.
Done.
crates/sequencing/papyrus_consensus/run_consensus.py
line 172 at r1 (raw file):
Previously, matan-starkware wrote…
This is not needed based on the
expected_dirs
check below.
Done.
crates/sequencing/papyrus_consensus/run_consensus.py
line 176 at r1 (raw file):
Previously, matan-starkware wrote…
Create as set above
Done.
crates/sequencing/papyrus_consensus/run_consensus.py
line 180 at r1 (raw file):
Previously, matan-starkware wrote…
We want to know that there are enough db folders. We don't care if there are also other folders.
Done.
crates/sequencing/papyrus_consensus/run_consensus.py
line 187 at r1 (raw file):
Previously, matan-starkware wrote…
Using enter and exit to enable a with statement is a more pythonic way. Means you don't need to put the close explicitly here.
Done.
crates/sequencing/papyrus_consensus/run_consensus.py
line 209 at r1 (raw file):
Previously, matan-starkware wrote…
Let's add a description:
Directory with existing DBs that this simulation can reuse.
Done.
crates/sequencing/papyrus_consensus/src/lib.rs
line 116 at r2 (raw file):
Previously, matan-starkware wrote…
Let's make this part of the consensus config.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r3, 5 of 5 files at r4, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @asmaastarkware)
crates/sequencing/papyrus_consensus/run_consensus.py
line 149 at r1 (raw file):
Previously, asmaastarkware (asmaa-starkware) wrote…
Yes, I have run simulations with the following cases:
- without db_dir.
- reuse an existing db_dir with a single simulation.
- used an invalid db_dir that either doesn't exist or doesn't have the required data dirs.
- ran multiple simulations simultaneously using the same db_dir.
Great.
Have you tried running multiple simulations simultaneously correctly (different db_dir)?
crates/sequencing/papyrus_consensus/run_consensus.py
line 38 at r3 (raw file):
command = f"curl -s -X GET http://localhost:{port}/monitoring/metrics | grep -oP 'papyrus_consensus_height \\K\\d+'" result = subprocess.run(command, shell=True, capture_output=True, text=True) # returns the most recently decided height, or None if node is not ready or consensus has not yet reached any height.
This line is too long.
Code quote:
# returns the most recently decided height, or None if node is not ready or consensus has not yet reached any height.
crates/sequencing/papyrus_consensus/run_consensus.py
line 186 at r3 (raw file):
actual_dirs = {d for d in os.listdir(db_dir) if os.path.isdir(os.path.join(db_dir, d))} # Ensure the directories are named data0, data1, ..., data{num_validators - 1}
Let's just delete this and the empty line above. I think it's obvious enough from he code, and we can see the else
right here where we originally created them.
crates/sequencing/papyrus_consensus/run_consensus.py
line 203 at r3 (raw file):
print( f"Output files will be stored in: {logs_dir} and data files will be stored in: {db_dir}" )
I think it's easier to read these on 2 separate lines (see how you feel actually running the script).
Suggestion:
print(f"DB files will be stored in: {db_dir}")
print(f"Logs will be stored in: {logs_dir}")
crates/sequencing/papyrus_consensus/src/config.rs
line 27 at r4 (raw file):
pub num_validators: u64, /// The delay in seconds before starting consensus to prevent InsufficientPeers error. pub consensus_delay: u64,
Can you put a Duration in a config?
Suggestion:
Duration
crates/sequencing/papyrus_consensus/src/config.rs
line 61 at r4 (raw file):
&self.consensus_delay, "The delay in seconds before starting consensus to prevent 'InsufficientPeers' \ error.",
Suggestion:
Delay (seconds) before starting consensus to give time for network peering.
crates/sequencing/papyrus_consensus/src/lib.rs
line 109 at r4 (raw file):
start_height: BlockNumber, validator_id: ValidatorId, consensus_delay: u64,
Suggestion:
Duration
b97d984
to
1d18575
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @matan-starkware)
crates/sequencing/papyrus_consensus/run_consensus.py
line 149 at r1 (raw file):
Previously, matan-starkware wrote…
Great.
Have you tried running multiple simulations simultaneously correctly (different db_dir)?
done
crates/sequencing/papyrus_consensus/run_consensus.py
line 38 at r3 (raw file):
Previously, matan-starkware wrote…
This line is too long.
Done.
crates/sequencing/papyrus_consensus/run_consensus.py
line 186 at r3 (raw file):
Previously, matan-starkware wrote…
Let's just delete this and the empty line above. I think it's obvious enough from he code, and we can see the
else
right here where we originally created them.
Done.
crates/sequencing/papyrus_consensus/run_consensus.py
line 203 at r3 (raw file):
Previously, matan-starkware wrote…
I think it's easier to read these on 2 separate lines (see how you feel actually running the script).
Done.
crates/sequencing/papyrus_consensus/src/config.rs
line 27 at r4 (raw file):
Previously, matan-starkware wrote…
Can you put a Duration in a config?
Done.
crates/sequencing/papyrus_consensus/src/config.rs
line 61 at r4 (raw file):
&self.consensus_delay, "The delay in seconds before starting consensus to prevent 'InsufficientPeers' \ error.",
Done.
crates/sequencing/papyrus_consensus/src/lib.rs
line 109 at r4 (raw file):
start_height: BlockNumber, validator_id: ValidatorId, consensus_delay: u64,
Done.
1d18575
to
3841618
Compare
3841618
to
647e080
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r5, 5 of 5 files at r6, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @asmaastarkware)
There hasn't been any activity on this pull request recently, and in order to prioritize active work, it has been marked as stale. |
This change is