Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(blockifier_reexecution): serialize reexecution data to file #1762

Merged
merged 4 commits into from
Nov 7, 2024

Conversation

aner-starkware
Copy link
Contributor

No description provided.

@aner-starkware aner-starkware self-assigned this Nov 3, 2024
@reviewable-StarkWare
Copy link

This change is Reviewable

Copy link

github-actions bot commented Nov 3, 2024

Artifacts upload triggered. View details here

Copy link

codecov bot commented Nov 3, 2024

Codecov Report

Attention: Patch coverage is 0% with 97 lines in your changes missing coverage. Please review.

Project coverage is 41.71%. Comparing base (e3165c4) to head (50ad0f4).
Report is 238 commits behind head on main.

Files with missing lines Patch % Lines
crates/blockifier_reexecution/src/main.rs 0.00% 62 Missing ⚠️
...s/blockifier_reexecution/src/state_reader/utils.rs 0.00% 31 Missing ⚠️
crates/blockifier/src/state/cached_state.rs 0.00% 3 Missing ⚠️
...ution/src/state_reader/reexecution_state_reader.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1762      +/-   ##
==========================================
+ Coverage   40.10%   41.71%   +1.60%     
==========================================
  Files          26      197     +171     
  Lines        1895    23369   +21474     
  Branches     1895    23369   +21474     
==========================================
+ Hits          760     9748    +8988     
- Misses       1100    13164   +12064     
- Partials       35      457     +422     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@aner-starkware aner-starkware force-pushed the aner/serialize_reexecution_data branch from 431d83b to ba49a4c Compare November 3, 2024 14:16
@aner-starkware aner-starkware force-pushed the aner/serialize_block_to_file_poc branch from 2a0db1a to 45335d3 Compare November 3, 2024 14:18
Copy link

github-actions bot commented Nov 3, 2024

Artifacts upload triggered. View details here

@aner-starkware aner-starkware force-pushed the aner/serialize_block_to_file_poc branch from 45335d3 to 57fced1 Compare November 3, 2024 16:36
Copy link

github-actions bot commented Nov 3, 2024

Artifacts upload triggered. View details here

Copy link
Collaborator

@dorimedini-starkware dorimedini-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 6 files at r1, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @aner-starkware and @AvivYossef-starkware)


crates/blockifier/src/state/cached_state.rs line 302 at r1 (raw file):

        Ok(self.cache.borrow().initial_reads.clone())
    }
}

can you solve this by injecting more code into the test state reader?
i.e. every get_X should check if this key was read before, and if not, populate self? like you do with compiled classes?

I do not want to touch blockifier code; this is only useful for reexecution

Code quote:

#[cfg(any(feature = "testing", test))]
impl<S: StateReader> CachedState<S> {
    pub fn get_initial_reads(&self) -> StateResult<StateMaps> {
        Ok(self.cache.borrow().initial_reads.clone())
    }
}

crates/blockifier_reexecution/src/main.rs line 104 at r1 (raw file):

        } => {
            let directory_path = directory_path
                .unwrap_or(format!("./crates/blockifier_reexecution/tmp/block_{block_number}/"));
  1. why give a temporary name?
  2. I suggest putting this in a resources dir; like the read_from_json function expects. blockifier has such a directory for artifacts

Code quote:

tmp

crates/blockifier_reexecution/src/main.rs line 121 at r1 (raw file):

            let transactions_next_block = next_block_state_reader.get_all_txs_in_block().unwrap();

            let blockifier_transactions_next_block = &last_block_state_reader

why are the next txs taken from the last block?

Code quote:

let blockifier_transactions_next_block = &last_block_state_reader

crates/blockifier_reexecution/src/main.rs line 156 at r1 (raw file):

                "RPC replies required for reexecuting block {block_number} written to json file."
            );
        }

code duplication with the non-dump command... please extract to a function that accepts the input data + boolean named dump_mode

Code quote:

        } => {
            let directory_path = directory_path
                .unwrap_or(format!("./crates/blockifier_reexecution/tmp/block_{block_number}/"));
            let config = RpcStateReaderConfig {
                url: node_url,
                json_rpc_version: JSON_RPC_VERSION.to_string(),
            };

            let ConsecutiveTestStateReaders { last_block_state_reader, next_block_state_reader } =
                ConsecutiveTestStateReaders::new(BlockNumber(block_number - 1), Some(config), true);

            let block_info_next_block = next_block_state_reader.get_block_info().unwrap();

            let starknet_version = next_block_state_reader.get_starknet_version().unwrap();

            let state_diff_next_block = next_block_state_reader.get_state_diff().unwrap();

            let transactions_next_block = next_block_state_reader.get_all_txs_in_block().unwrap();

            let blockifier_transactions_next_block = &last_block_state_reader
                .api_txs_to_blockifier_txs(transactions_next_block.clone())
                .unwrap();

            let mut transaction_executor = last_block_state_reader
                .get_transaction_executor(
                    next_block_state_reader.get_block_context().unwrap(),
                    None,
                )
                .unwrap();

            transaction_executor.execute_txs(blockifier_transactions_next_block);

            let block_state = transaction_executor.block_state.unwrap();
            let initial_reads = block_state.get_initial_reads().unwrap();

            let contract_class_mapping =
                block_state.state.get_contract_class_mapping_dumper().unwrap();

            let serializable_offline_reexecution_data = SerializableOfflineReexecutionData {
                state_maps: initial_reads.into(),
                block_info_next_block,
                starknet_version,
                transactions_next_block,
                contract_class_mapping,
                state_diff_next_block,
            };

            serializable_offline_reexecution_data
                .write_to_file(&directory_path, "reexecution_data.json")
                .unwrap();

            println!(
                "RPC replies required for reexecuting block {block_number} written to json file."
            );
        }

@aner-starkware aner-starkware force-pushed the aner/serialize_reexecution_data branch from ba49a4c to 3ee1065 Compare November 4, 2024 06:02
@aner-starkware aner-starkware force-pushed the aner/serialize_block_to_file_poc branch from 57fced1 to 2730e87 Compare November 4, 2024 06:30
Copy link

github-actions bot commented Nov 4, 2024

Artifacts upload triggered. View details here

Copy link
Contributor Author

@aner-starkware aner-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 6 files reviewed, 4 unresolved discussions (waiting on @AvivYossef-starkware and @dorimedini-starkware)


crates/blockifier_reexecution/src/main.rs line 104 at r1 (raw file):

Previously, dorimedini-starkware wrote…
  1. why give a temporary name?
  2. I suggest putting this in a resources dir; like the read_from_json function expects. blockifier has such a directory for artifacts

Done. It's just a question of the behavior for the default path - should it ben written in a "real" location or a temporary one?


crates/blockifier_reexecution/src/main.rs line 121 at r1 (raw file):

Previously, dorimedini-starkware wrote…

why are the next txs taken from the last block?

See my answer in previous PR.


crates/blockifier_reexecution/src/main.rs line 156 at r1 (raw file):

Previously, dorimedini-starkware wrote…

code duplication with the non-dump command... please extract to a function that accepts the input data + boolean named dump_mode

I added a TODO. I think it's better to delay this refactoring until this PR (when it's ready): https://reviewable.io/reviews/starkware-libs/sequencer/1764.
The reason is that the command there has even more similar code, so this would avoid double refactoring. Also, this is not exactly code duplication, so I think it's better that the function corresponds to the other two commands, which are much more similar.

@aner-starkware aner-starkware force-pushed the aner/serialize_reexecution_data branch 2 times, most recently from 0b22118 to 77ed6ee Compare November 4, 2024 09:44
@aner-starkware aner-starkware force-pushed the aner/serialize_block_to_file_poc branch from 2730e87 to 8c75ac2 Compare November 4, 2024 09:46
Copy link

github-actions bot commented Nov 4, 2024

Artifacts upload triggered. View details here

@aner-starkware aner-starkware force-pushed the aner/serialize_block_to_file_poc branch from 8c75ac2 to 95680cb Compare November 4, 2024 12:19
Copy link

github-actions bot commented Nov 4, 2024

Artifacts upload triggered. View details here

@aner-starkware
Copy link
Contributor Author

crates/blockifier/src/state/cached_state.rs line 302 at r1 (raw file):

Previously, dorimedini-starkware wrote…

can you solve this by injecting more code into the test state reader?
i.e. every get_X should check if this key was read before, and if not, populate self? like you do with compiled classes?

I do not want to touch blockifier code; this is only useful for reexecution

Switched the flag, as we agreed.

Copy link
Collaborator

@dorimedini-starkware dorimedini-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r2, 4 of 4 files at r3, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @aner-starkware and @AvivYossef-starkware)


crates/blockifier/Cargo.toml line 13 at r3 (raw file):

[features]
blockifier_reexecution = []

you are in the blockifier crate: no need for this prefix in the feature name imo

Suggestion:

reexecution

crates/blockifier_reexecution/src/main.rs line 121 at r1 (raw file):

Previously, aner-starkware wrote…

See my answer in previous PR.

still relevant (right?)


crates/blockifier_reexecution/src/main.rs line 156 at r1 (raw file):

Previously, aner-starkware wrote…

I added a TODO. I think it's better to delay this refactoring until this PR (when it's ready): https://reviewable.io/reviews/starkware-libs/sequencer/1764.
The reason is that the command there has even more similar code, so this would avoid double refactoring. Also, this is not exactly code duplication, so I think it's better that the function corresponds to the other two commands, which are much more similar.

I think the other way around:

  1. open a PR to extract common logic to a separate function (replacing most of the Command::RpcTest implementation)
  2. change this current PR to use the shared code
  3. in PR 1764 also use common code

@aner-starkware aner-starkware deleted the branch main November 4, 2024 14:35
@aner-starkware aner-starkware reopened this Nov 4, 2024
@aner-starkware aner-starkware changed the base branch from aner/serialize_reexecution_data to main November 4, 2024 14:36
Copy link

github-actions bot commented Nov 4, 2024

Artifacts upload triggered. View details here

@aner-starkware aner-starkware force-pushed the aner/serialize_block_to_file_poc branch from 95680cb to 783af87 Compare November 4, 2024 14:49
@aner-starkware aner-starkware force-pushed the aner/serialize_block_to_file_poc branch from 2070e7b to f450411 Compare November 7, 2024 09:10
Copy link

github-actions bot commented Nov 7, 2024

Artifacts upload triggered. View details here

Copy link
Collaborator

@dorimedini-starkware dorimedini-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 5 files at r9, 2 of 2 files at r12, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @AvivYossef-starkware)

a discussion (no related file):

Previously, aner-starkware wrote…

Is it fixed?

yes, thanks


Copy link

github-actions bot commented Nov 7, 2024

Artifacts upload triggered. View details here

Copy link
Collaborator

@dorimedini-starkware dorimedini-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 6 of 8 files reviewed, 2 unresolved discussions (waiting on @aner-starkware and @AvivYossef-starkware)

a discussion (no related file):
please rebase


Copy link

github-actions bot commented Nov 7, 2024

Artifacts upload triggered. View details here

aner-starkware and others added 3 commits November 7, 2024 14:02
* chore(blockifier_reexecution): offline reexecution from file

* chore(blockifier_reexecution): use pre_process_and_create
@aner-starkware aner-starkware force-pushed the aner/serialize_block_to_file_poc branch from 56af6f9 to 7439c15 Compare November 7, 2024 12:02
Copy link

github-actions bot commented Nov 7, 2024

Artifacts upload triggered. View details here

Copy link
Contributor Author

@aner-starkware aner-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 9 files reviewed, 2 unresolved discussions (waiting on @AvivYossef-starkware and @dorimedini-starkware)

a discussion (no related file):

Previously, dorimedini-starkware wrote…

please rebase

Done.


Copy link
Collaborator

@dorimedini-starkware dorimedini-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r14, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @aner-starkware and @AvivYossef-starkware)


crates/blockifier_reexecution/src/main.rs line 64 at r14 (raw file):

        // "./crates/blockifier_reexecution/resources/block_{block_number}".
        #[clap(long, short = 'd', default_value = None)]
        directory_path: Option<String>,

Suggestion:

        // Directory path to json files. Default:
        // "./crates/blockifier_reexecution/resources/block_{block_number}".
        // TODO: None should indicate using the bucket. Maybe need a commit_id arg.
        #[clap(long, short = 'd', default_value = None)]
        directory_path: Option<String>,
    },

    // Reexecutes the block from JSON files.
    ReexecuteBlock {
        /// Block number.
        #[clap(long, short = 'b')]
        block_number: u64,

        // Directory path to json files. Default:
        // "./crates/blockifier_reexecution/resources/block_{block_number}".
        // TODO: None should indicate using the bucket. Maybe need a commit_id arg.
        #[clap(long, short = 'd', default_value = None)]
        directory_path: Option<String>,

crates/blockifier_reexecution/src/main.rs line 141 at r14 (raw file):

            let full_file_path = directory_path.unwrap_or(format!(
                "./crates/blockifier_reexecution/resources/block_{block_number}"
            )) + "/reexecution_data.json";

Suggestion:

            let full_file_path = directory_path.unwrap_or(format!(
                "./crates/blockifier_reexecution/resources/block_{block_number}/reexecution_data.json"
            ));

Copy link
Contributor Author

@aner-starkware aner-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @AvivYossef-starkware and @dorimedini-starkware)


crates/blockifier_reexecution/src/main.rs line 64 at r14 (raw file):

        // "./crates/blockifier_reexecution/resources/block_{block_number}".
        #[clap(long, short = 'd', default_value = None)]
        directory_path: Option<String>,

Why? None just indicates using the default directory; The bucket will be downloaded to this directory using the script.


crates/blockifier_reexecution/src/main.rs line 141 at r14 (raw file):

            let full_file_path = directory_path.unwrap_or(format!(
                "./crates/blockifier_reexecution/resources/block_{block_number}"
            )) + "/reexecution_data.json";

It's different functionality - the first way you only need to specify the directory, this way you need to also need to specify the file name

Copy link
Collaborator

@dorimedini-starkware dorimedini-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @aner-starkware and @AvivYossef-starkware)


crates/blockifier_reexecution/src/main.rs line 64 at r14 (raw file):

Previously, aner-starkware wrote…

Why? None just indicates using the default directory; The bucket will be downloaded to this directory using the script.

what if I have the files already, and I want to point to that location?
I was thinking there would be two (mutually exclusive) options:

  1. provide an explicit path, and the CLI will assume the relevant file already exists there
  2. provide a git commit, and the CLI will fetch the files from the bucket (to /tmp or something)

wdyt?


crates/blockifier_reexecution/src/main.rs line 141 at r14 (raw file):

Previously, aner-starkware wrote…

It's different functionality - the first way you only need to specify the directory, this way you need to also need to specify the file name

wdym? reexecution_data.json is still part of the string

@aner-starkware aner-starkware force-pushed the aner/serialize_block_to_file_poc branch from 7439c15 to a37cf42 Compare November 7, 2024 14:24
Copy link

github-actions bot commented Nov 7, 2024

Artifacts upload triggered. View details here

@aner-starkware aner-starkware force-pushed the aner/serialize_block_to_file_poc branch from a37cf42 to 67e3847 Compare November 7, 2024 14:25
Copy link

github-actions bot commented Nov 7, 2024

Artifacts upload triggered. View details here

Copy link
Contributor Author

@aner-starkware aner-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 9 files reviewed, 2 unresolved discussions (waiting on @AvivYossef-starkware and @dorimedini-starkware)


crates/blockifier_reexecution/src/main.rs line 64 at r14 (raw file):

Previously, dorimedini-starkware wrote…

what if I have the files already, and I want to point to that location?
I was thinking there would be two (mutually exclusive) options:

  1. provide an explicit path, and the CLI will assume the relevant file already exists there
  2. provide a git commit, and the CLI will fetch the files from the bucket (to /tmp or something)

wdyt?

Done.


crates/blockifier_reexecution/src/main.rs line 141 at r14 (raw file):

Previously, dorimedini-starkware wrote…

wdym? reexecution_data.json is still part of the string

In the second option, if you specify the location yourself, you also need to specify the file name.

Copy link
Collaborator

@dorimedini-starkware dorimedini-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r16, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @aner-starkware and @AvivYossef-starkware)


crates/blockifier_reexecution/src/main.rs line 141 at r14 (raw file):

Previously, aner-starkware wrote…

In the second option, if you specify the location yourself, you also need to specify the file name.

ah, yes
if you specify a path, I'm assuming you're not downloading a file - the user can (and should) supply the full path in this case

@aner-starkware
Copy link
Contributor Author

crates/blockifier_reexecution/src/main.rs line 141 at r14 (raw file):

Previously, dorimedini-starkware wrote…

ah, yes
if you specify a path, I'm assuming you're not downloading a file - the user can (and should) supply the full path in this case

Done.

#1814)

* chore(blockifier_reexecution): refactor cli to reduce code duplication

* chore(blockifier_reexecution): explicitly state command args

* chore(blockifier_reexecution): separate old block hash

* chore(blockifier_reexecution): move reexecution to seperate function

* test(blockifier_reexecution): reexecution test on blocks
@aner-starkware aner-starkware force-pushed the aner/serialize_block_to_file_poc branch from 67e3847 to 50ad0f4 Compare November 7, 2024 15:25
Copy link

github-actions bot commented Nov 7, 2024

Artifacts upload triggered. View details here

Copy link
Collaborator

@dorimedini-starkware dorimedini-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r17, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @aner-starkware and @AvivYossef-starkware)


crates/blockifier_reexecution/src/state_reader/test_state_reader.rs line 90 at r17 (raw file):

impl SerializableOfflineReexecutionData {
    pub fn write_to_file(&self, full_file_path: &str) -> ReexecutionResult<()> {
        let file_path = full_file_path.rsplit_once('/').expect("Invalid file path.").0;

consider using Path and PathBuf objects, we have examples in code - abstracts this kind of logic (non-blocking)

Code quote:

 let file_path = full_file_path.rsplit_once('/').expect("Invalid file path.").0;

Copy link
Contributor Author

@aner-starkware aner-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @AvivYossef-starkware)


crates/blockifier_reexecution/src/state_reader/test_state_reader.rs line 90 at r17 (raw file):

Previously, dorimedini-starkware wrote…

consider using Path and PathBuf objects, we have examples in code - abstracts this kind of logic (non-blocking)

I tried - it's longer and not really any clearer.

@aner-starkware aner-starkware merged commit 9569d8d into main Nov 7, 2024
12 checks passed
@aner-starkware aner-starkware deleted the aner/serialize_block_to_file_poc branch November 7, 2024 16:09
@github-actions github-actions bot locked and limited conversation to collaborators Nov 9, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants