-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report carve #891
Report carve #891
Conversation
The addition of However the change from I had about the same idea of simplifying the double step extraction (carve then extract chunks). I thought, if a file is not fully recognized by any handler (=has multiple chunks), it should be categorized as "unknown" (or "composite") and handled by a "default" handler, which would recognize and extract (carve) chunks, and could also assign handlers to them. This would be exactly what you wanted - one task for each file. I went so far as to make an experimental refactor to work like this 2 years ago, but it become too big of a change with untidy commits to review, and also probably with some bad decisions and thus was abandoned without much consideration. One of the problems to solve is how to pass the handler between processes to avoid the duplication of the expensive handler selection. With the current solution it is not needed: handler selection and extraction happens in the same process. I do think understanding and reasoning about a flattened extraction process would be easier, but it would be a big work to rewrite the code now. Could you explain why you would like to handle chunk-files by separate (sub-)tasks? Maybe we can come up with a solution for that problem. |
cdcbef8
to
8c88d16
Compare
Thanks for the feedback. I updated the PR to set the default carve_suffix to be Thanks for pointing me at #464 - I can now see how complex that change would be and will avoid going down that path! What I'm trying to accomplish here is described in #878, but the short version is that I want to collect a clean version of each extraction produced by unblob. Specifically I'm trying to recover multiple partitions from firmware and package them up into archives for subsequent analysis. I don't want to have any I've managed to do this using a terrible incantation of |
8c88d16
to
da9138a
Compare
I'm back ! Did anything happen with this @e3krisztian ? |
Welcome back @qkaiser ! I think this carve suffix needs to be configurable from the CLI, and did not got around to it. |
There are 2 features added in this PR (which if I understand properly could each solve the problem of some directories not being reported):
I think it would be cleaner as 2 independent PR-s, or choosing only one of them. I have made an attempt at adding the command line, rebase, do some cleanups, then to look at it again (see https://github.com/onekey-sec/unblob/tree/report_carve), and do some local experiments. Based on the experiment results, I would go with reporting the carves (as the title says), as it is almost there, only 2 tests need to be modified to know about the new report type. Unfortunately the suffix change has bigger problems:
(the above is not strictly what I saw, but it is probably because of some further mixing up of carve and extraction suffix) |
Agree. Let's make 2 independent merge requests.
It's confusing if we don't explain anything to the end user. I would be okay with this state of things if the non-verbose output would display an "output directory" line with the absolute path to where we put the extracted/carved content. We can also provide context in the interactive help and the online documentation.
I would definitely argue that the mess exists now. We could move to a more generic naming like |
carved_to=carved_to, | ||
start_offset=chunk.start_offset, | ||
end_offset=chunk.end_offset, | ||
handler_name=chunk.handler.NAME, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had another look at this implementation and its report-output, and I do not see how duplicating the chunk attributes is useful.
This is also storing the path to the carved file, while what was missing is some record for the parent directory, which is created to hold the carved files and their extractions.
Maybe I miss some use-case for the stored attributes, but I am inclined to drop this CarveReport
and introduce instead a CarveDirectoryReport
, with the missing carve directory path.
@AndrewFasano thank you for the initial implementation. |
_carve
by default to fix distinguish between _carve and _extract directory #326I really wanted to change carving to generate a subtask where there's a task for the original file being processed that has subtasks for each carved file. Then each carved file is another task that generates subtasks for the extraction. But I couldn't find a sane way to do this - would love any feedback or help with it if y'all think that's a better way to approach this.