-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report carve dir #1017
base: main
Are you sure you want to change the base?
Report carve dir #1017
Conversation
Current devenv config has problems on at least Ubuntu 22.04. .envrc.user with the below content enables work without devenv: layout_poetry() { # create venv if it doesn't exist poetry run true export VIRTUAL_ENV=$(poetry env info --path) export POETRY_ACTIVE=1 PATH_add "$VIRTUAL_ENV/bin" } layout_poetry export SKIP=nixpkgs-fmt export UNBLOB_USE_DEVENV=false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. I would maybe add some tests where we set the ExtractionConfig
to use non-standard carve_suffix
and extract_suffix
? See if we can spot abnormal behavior that would appear out of assumption made on the default suffix values.
This removes the burden of carving from already complex function _extract_chunks and also allowed for some better variable names.
Carve directories were hard to explain, as they look like extraction directories and there was no public information to tell them apart. Adding this report makes the purpose of the directory visible.
`_FileTask.carve_dir` was initially used for both extraction and carving. The naming of the directories can now differ, so it is not used anymore apart from an existence check, which would terminate this branch of the extraction. This output directory existence check is now present in both the carving and extraction paths, and the output report's name is also renamed, to accommodate both types of output directories. `ExtractDirectoryExistsReport` was generalized to `OutputDirectoryExistsReport` instead of introducing yet another `Report` type - `CarveDirectoryExistsReport`.
f5f8ba5
to
e89ad23
Compare
Chunk statistics require a divide by total chunk size, which can be 0 in certain rare cases. This makes chunk related output is conditional, and not part of the summary. An example command line sequence which leads to a silent failure: (echo a; gzip < README.md ; echo b) > fw unblob fw # the next command would silently fail: unblob fw
With the separation of carve and extract directories, the output directory become dependent on the *content* of the input file: if it has multiple chunks, because it is not covered by a single handler the output directory will be generated as a *carve* directory, otherwise as an *extract* directory.
The output path is printed in the previous commit, so depending on the caller having to look at well known paths is no longer needed.
e89ad23
to
2767e86
Compare
@e3krisztian cool ! I'll wait for the tests to land before I do a final review round. |
The test files were created with this script: # cd tests/files/suffixes # clean rm -rf chunks_carve/ extractions/ collisions.zip # reproduce output seq 100 | gzip > 0-160.gzip seq 128 | gzip > 160-375.gzip dd if=/dev/zero of=375-512.padding bs=1 count=137 cat 0-160.gzip 160-375.gzip 375-512.padding > chunks unblob --carve-suffix _carve chunks cp 0-160.gzip chunks_carve/ echo something else > chunks_carve/0-160.gzip_extract/gzip.uncompressed zip collisions.zip chunks chunks_carve/0-160.gzip chunks_carve/0-160.gzip_extract/gzip.uncompressed for input in collisions.zip chunks do unblob -e extractions/defaults/ $input unblob --carve-suffix _carve -e extractions/_carve_extract/ $input unblob --carve-suffix _c --extract-suffix _e -e extractions/_c_e/ $input done
TEST_DATA_PATH = Path(__file__).parent / "files/suffixes" | ||
|
||
|
||
def _patch(extraction_config: ExtractionConfig, carve_suffix, extract_suffix): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing type hints for suffixes
expected_output_dir = ( | ||
TEST_DATA_PATH / "extractions" / expected_output_dir_name / carve_dir_name | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels something's hidden here. How is unblob made to extract into expected_output_dir_name
? Shouldn't unblob extract into chunks_c
or chunks_carve
as top-level ? What am I missing ?
@e3krisztian two small comments. I think we can rebase and merge when it's cleared. |
Reworked #891 to report carve dirs instead of the carved files, as well as support different suffixes for carve and extraction directories.