Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report carve dir #1017

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Report carve dir #1017

wants to merge 12 commits into from

Conversation

e3krisztian
Copy link
Contributor

Reworked #891 to report carve dirs instead of the carved files, as well as support different suffixes for carve and extraction directories.

e3krisztian and others added 2 commits November 26, 2024 22:36
Current devenv config has problems on at least Ubuntu 22.04.

.envrc.user with the below content enables work without devenv:

layout_poetry() {
 # create venv if it doesn't exist
 poetry run true

 export VIRTUAL_ENV=$(poetry env info --path)
 export POETRY_ACTIVE=1
 PATH_add "$VIRTUAL_ENV/bin"
}

layout_poetry
export SKIP=nixpkgs-fmt
export UNBLOB_USE_DEVENV=false
@e3krisztian e3krisztian mentioned this pull request Nov 26, 2024
Copy link
Contributor

@qkaiser qkaiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. I would maybe add some tests where we set the ExtractionConfig to use non-standard carve_suffix and extract_suffix ? See if we can spot abnormal behavior that would appear out of assumption made on the default suffix values.

unblob/processing.py Outdated Show resolved Hide resolved
unblob/processing.py Outdated Show resolved Hide resolved
unblob/cli.py Show resolved Hide resolved
unblob/processing.py Show resolved Hide resolved
unblob/processing.py Outdated Show resolved Hide resolved
unblob/cli.py Outdated Show resolved Hide resolved
unblob/cli.py Outdated Show resolved Hide resolved
@qkaiser
Copy link
Contributor

qkaiser commented Nov 27, 2024

Commits f2cb06e and 8024aed should be next to each other in the history I think. Easier to follow.

This removes the burden of carving from already complex function
_extract_chunks and also allowed for some better variable names.
Carve directories were hard to explain, as they look like extraction
directories and there was no public information to tell them apart.

Adding this report makes the purpose of the directory visible.
`_FileTask.carve_dir` was initially used for both extraction and carving.
The naming of the directories can now differ, so it is not used anymore
apart from an existence check, which would terminate this branch of the
extraction. This output directory existence check is now present in both
the carving and extraction paths, and the output report's name is also
renamed, to accommodate both types of output directories.

`ExtractDirectoryExistsReport` was generalized to
`OutputDirectoryExistsReport` instead of introducing yet another
`Report` type - `CarveDirectoryExistsReport`.
@e3krisztian e3krisztian force-pushed the report_carve_dir branch 2 times, most recently from f5f8ba5 to e89ad23 Compare November 27, 2024 16:44
Chunk statistics require a divide by total chunk size, which can be 0
in certain rare cases. This makes chunk related output is conditional,
and not part of the summary.

An example command line sequence which leads to a silent failure:

    (echo a; gzip < README.md ; echo b) > fw
    unblob fw
    # the next command would silently fail:
    unblob fw
With the separation of carve and extract directories, the output
directory become dependent on the *content* of the input file:
if it has multiple chunks, because it is not covered by a single handler
the output directory will be generated as a *carve* directory,
otherwise as an *extract* directory.
The output path is printed in the previous commit, so depending on the
caller having to look at well known paths is no longer needed.
@qkaiser
Copy link
Contributor

qkaiser commented Nov 27, 2024

@e3krisztian cool ! I'll wait for the tests to land before I do a final review round.

The test files were created with this script:

    # cd tests/files/suffixes

    # clean
    rm -rf chunks_carve/ extractions/ collisions.zip

    # reproduce output
    seq 100 | gzip > 0-160.gzip
    seq 128 | gzip > 160-375.gzip
    dd if=/dev/zero of=375-512.padding bs=1 count=137
    cat 0-160.gzip 160-375.gzip 375-512.padding > chunks

    unblob --carve-suffix _carve chunks
    cp 0-160.gzip chunks_carve/
    echo something else > chunks_carve/0-160.gzip_extract/gzip.uncompressed

    zip collisions.zip chunks chunks_carve/0-160.gzip chunks_carve/0-160.gzip_extract/gzip.uncompressed

    for input in collisions.zip chunks
    do
      unblob                                   -e extractions/defaults/ $input
      unblob --carve-suffix _carve       -e extractions/_carve_extract/ $input
      unblob --carve-suffix _c --extract-suffix _e -e extractions/_c_e/ $input
    done
TEST_DATA_PATH = Path(__file__).parent / "files/suffixes"


def _patch(extraction_config: ExtractionConfig, carve_suffix, extract_suffix):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing type hints for suffixes

Comment on lines +39 to +41
expected_output_dir = (
TEST_DATA_PATH / "extractions" / expected_output_dir_name / carve_dir_name
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels something's hidden here. How is unblob made to extract into expected_output_dir_name ? Shouldn't unblob extract into chunks_c or chunks_carve as top-level ? What am I missing ?

@qkaiser
Copy link
Contributor

qkaiser commented Nov 29, 2024

@e3krisztian two small comments. I think we can rebase and merge when it's cleared.

@qkaiser qkaiser added enhancement New feature or request python Pull requests that update Python code labels Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request python Pull requests that update Python code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants