-
Notifications
You must be signed in to change notification settings - Fork 565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dotnet: add support for basic blocks #1326
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased)
section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed
CHANGELOG updated or no update needed, thanks! 😄
@mr-tz / @williballenthin before I dig in here any thoughts as to why tests would only be failing on Python 3.7? |
Nope, but looks like something with pydantic... |
https://github.com/mandiant/capa/blob/master/capa/features/freeze/__init__.py#L49 this line recently changed but i have no idea why it would fail on 3.7. i'll try to reproduce locally in a 3.7 virtualenv and see whats going on. |
@williballenthin looks like tests are passing after my latest commit.....very weird the tests were only failing on Python 3.7. I noticed there was a bug manually which led to the fix but I'd rather this have been picked up by our testing suite. |
definitely agree. I'll still try to reproduce and add tests or more assertions to better catch the scenario.
|
I like the rendering in the screenshot you shared - lots of detail and the
important stuff is highlighted.
Do you think we should also try to include the function name, when
possible? Using tokens is very precise and I don't think this should go
away; however, I wonder if including the function name might give more
context (especially when the names aren't obfuscated).
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. A few comments and maybe a larger question if this should maybe life in dncil or dnfile?
bb_next: DnBasicBlock = fh.ctx["blocks"][idx + 1] | ||
bb.succs.append(bb_next) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this always hold just based on the next index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is my thought process (please check my sanity 🙃):
- We process instructions sequentially resulting in leaders getting processed sequentially
- Each time we encounter a leader we create a new basic block and add the block to our list resulting in basic blocks getting processed sequentially
- Therefore, if a basic block has a fallthrough successor we can assume the successor is the next basic block in our list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense and should go in there as a comment.
What about BBs that do not return (IDA calls them fcb_noret
and fcb_enoret
)? They occur for example on call abort
, RtlFailFast
, or potentially on exception handling.
Another corner case from IDA at the call 0x100304ae
:
ID: 8, Start: 0x1003045e, Last instruction: 0x10030465, Size: 12, Type: fcb_normal
-> ID: 16, Start: 0x100304ae, Last instruction: 0x100304b7, Size: 10, Type: fcb_normal
So we should handle call
and jmp
at the end of BBs. And maybe exception handling could throw us of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree the above should be in a comment, just for clarity.
i'm not sure whether or not .NET supports non-returning functions. its possibly that the runtime still requires the ret
even if its dead code. we should try to figure this out and definitely document the results here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other assumptions we should probably document (and please verify):
- .NET doesn't support tail calls, i.e.
jmp
to other routine - .NET doesn't support shared function chunks, i.e. two different entry points to the same function body
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and, right now i see that we're not doing anything with exceptions. i'm not sure how we'd want to represent this in the CFG. there would be a couple things to consider:
- is there a separate graph for each exception handler? how to refer to them?
- how to link the try/catch region with the exception handler graph?
- the end of a try/catch region probably splits a basic block
we may want to figure out how we need to operate on exception handlers before finalizing the decisions here. so maybe this is not a blocker, but something we add once we know we need it.
I'm honestly torn here. I think displaying un-obfuscated method names is valuable, however, my work has seen few samples where this is the case. More frequently, I see large, obfuscated names that would be a pain to handle (display) correctly. We could take a route simliar to Reading your original message I see you say "when possible" which could be a valid path forward, e.g., we only display method names when we can identify the name has not been obfuscated at all (or past an established threshold). |
Co-authored-by: Moritz <[email protected]>
@mr-tz This came to mind about halfway through writing the code haha. I think the basic block logic likely belongs in dncil. Once we've addressed any issues here I'll move the basic block logic to dncil and update this PR to use the updated dncil APIs. |
|
if idx == 0: | ||
# add #1 | ||
leaders.add(insn.offset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could pull this out of the loop so its not re-checked on each iteration
for idx, insn in enumerate(fh.inner.instructions): | ||
if idx == 0: | ||
# add #1 | ||
leaders.add(insn.offset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for idx, insn in enumerate(fh.inner.instructions): | |
if idx == 0: | |
# add #1 | |
leaders.add(insn.offset) | |
leaders.add(fh.inner.instructions[0]) | |
for idx, insn in enumerate(fh.inner.instructions): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assuming there's at least one instruction for each function
leaders.add(insn.offset) | ||
|
||
if any((insn.is_br(), insn.is_cond_br(), insn.is_leave())): | ||
# add #2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# add #2 | |
# add #2: targets of branches are leaders |
if any((insn.is_br(), insn.is_cond_br(), insn.is_leave())): | ||
# add #2 | ||
leaders.add(insn.operand) | ||
# add #3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# add #3 | |
# add #3: fallthrough instructions after conditional branches are leaders |
# calculate basic block leaders where, | ||
# 1. The first instruction of the intermediate code is a leader | ||
# 2. Instructions that are targets of unconditional or conditional jump/goto statements are leaders | ||
# 3. Instructions that immediately follow unconditional or conditional jump/goto statements are considered leaders |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#3: ... immediately follow unconditional ...
can you convince me of this? i think the instruction must be the target of a branch in order to be a leader. otherwise, its unreferenced code.
fh.ctx["blocks"].append(bb_curr) | ||
continue | ||
|
||
assert bb_curr is not None | ||
bb_curr.instructions.append(insn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fh.ctx["blocks"].append(bb_curr) | |
continue | |
assert bb_curr is not None | |
bb_curr.instructions.append(insn) | |
fh.ctx["blocks"].append(bb_curr) | |
else: | |
assert bb_curr is not None | |
bb_curr.instructions.append(insn) |
optional style to clarify one of two cases happens here
if len(bb.instructions) == 0: | ||
# TODO: consider error? | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if len(bb.instructions) == 0: | |
# TODO: consider error? | |
continue | |
assert len(bb.instructions) > 0 |
i dont see how this can ever not be the case, given that the bb is initialized with at least one element above. so, we could keep this assertion around, or also remove it.
if len(bb.instructions) == 0: | ||
# TODO: consider error? | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if len(bb.instructions) == 0: | |
# TODO: consider error? | |
continue | |
assert len(bb.instructions) > 0 |
ditto
try: | ||
bb_next: DnBasicBlock = fh.ctx["blocks"][idx + 1] | ||
bb.succs.append(bb_next) | ||
bb_next.preds.append(bb) | ||
except IndexError: | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try: | |
bb_next: DnBasicBlock = fh.ctx["blocks"][idx + 1] | |
bb.succs.append(bb_next) | |
bb_next.preds.append(bb) | |
except IndexError: | |
continue | |
try: | |
bb_next: DnBasicBlock = fh.ctx["blocks"][idx + 1] | |
except IndexError: | |
continue | |
else: | |
bb.succs.append(bb_next) | |
bb_next.preds.append(bb) |
make region that can throw and handle the exception as small as possible for clarity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the IndexError
a programming error? i suppose we'd hit that case at ret
and/or any other instruction at end of function body. can we codify this in any way? not a priority.
I agree with everything you say here. I'd only recommend including the name when its useful and helpful. I do not think we should do any renaming or show any obfuscated names. So, if its reasonably easy to determine if a name is non-obfuscated, then I think we should maybe do that. How to determine if non-obfuscated? I guess ASCII only, something around entropy, not mixing casing or numbers too weirdly, ...? We should probably move this into a separate feature request. |
@mike-hunhoff, do we want to keep this around here or close it? |
We can close - I'll link the corresponding dncil PR back to this for documentation purposes. |
Add support to calculate IL basic blocks, enabling basic block scope and loop detection for .NET files.