dotnet: add support for basic blocks #1326

mike-hunhoff · 2023-02-24T21:46:50Z

Add support to calculate IL basic blocks, enabling basic block scope and loop detection for .NET files.

github-actions

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

CHANGELOG updated or no update needed, thanks! 😄

mike-hunhoff · 2023-02-27T16:36:00Z

@mr-tz / @williballenthin before I dig in here any thoughts as to why tests would only be failing on Python 3.7?

mr-tz · 2023-02-27T18:14:06Z

Nope, but looks like something with pydantic...

ghost · 2023-02-27T18:37:41Z

https://github.com/mandiant/capa/blob/master/capa/features/freeze/__init__.py#L49

this line recently changed but i have no idea why it would fail on 3.7. i'll try to reproduce locally in a 3.7 virtualenv and see whats going on.

mike-hunhoff · 2023-02-27T19:10:58Z

Most recent push improved how we emit basic block matches. Below we see basic block matches where the method token + offset is used for both the matched basic block and matched instruction in the basic block. This allows a user to quickly identify the exact basic block and instruction matched in a given method.

mike-hunhoff · 2023-02-27T19:35:35Z

@williballenthin looks like tests are passing after my latest commit.....very weird the tests were only failing on Python 3.7. I noticed there was a bug manually which led to the fix but I'd rather this have been picked up by our testing suite.

ghost · 2023-02-27T20:22:42Z

definitely agree. I'll still try to reproduce and add tests or more assertions to better catch the scenario.

ghost · 2023-02-27T20:25:02Z

I like the rendering in the screenshot you shared - lots of detail and the important stuff is highlighted. Do you think we should also try to include the function name, when possible? Using tokens is very precise and I don't think this should go away; however, I wonder if including the function name might give more context (especially when the names aren't obfuscated).

mr-tz

Nice work. A few comments and maybe a larger question if this should maybe life in dncil or dnfile?

capa/features/extractors/dnfile/extractor.py

mr-tz · 2023-02-28T09:01:21Z

capa/features/extractors/dnfile/extractor.py

+                    bb_next: DnBasicBlock = fh.ctx["blocks"][idx + 1]
+                    bb.succs.append(bb_next)


Does this always hold just based on the next index?

Here is my thought process (please check my sanity 🙃):

We process instructions sequentially resulting in leaders getting processed sequentially

Each time we encounter a leader we create a new basic block and add the block to our list resulting in basic blocks getting processed sequentially

Therefore, if a basic block has a fallthrough successor we can assume the successor is the next basic block in our list

That makes sense and should go in there as a comment.

What about BBs that do not return (IDA calls them fcb_noret and fcb_enoret)? They occur for example on call abort, RtlFailFast, or potentially on exception handling.

Another corner case from IDA at the call 0x100304ae:

ID: 8, Start: 0x1003045e, Last instruction: 0x10030465, Size: 12, Type: fcb_normal -> ID: 16, Start: 0x100304ae, Last instruction: 0x100304b7, Size: 10, Type: fcb_normal

So we should handle call and jmp at the end of BBs. And maybe exception handling could throw us of?

agree the above should be in a comment, just for clarity.

i'm not sure whether or not .NET supports non-returning functions. its possibly that the runtime still requires the ret even if its dead code. we should try to figure this out and definitely document the results here.

other assumptions we should probably document (and please verify):

.NET doesn't support tail calls, i.e. jmp to other routine

.NET doesn't support shared function chunks, i.e. two different entry points to the same function body

and, right now i see that we're not doing anything with exceptions. i'm not sure how we'd want to represent this in the CFG. there would be a couple things to consider:

is there a separate graph for each exception handler? how to refer to them?

how to link the try/catch region with the exception handler graph?

the end of a try/catch region probably splits a basic block

we may want to figure out how we need to operate on exception handlers before finalizing the decisions here. so maybe this is not a blocker, but something we add once we know we need it.

capa/features/extractors/dnfile/extractor.py

mike-hunhoff · 2023-02-28T22:09:24Z

I like the rendering in the screenshot you shared - lots of detail and the important stuff is highlighted. Do you think we should also try to include the function name, when possible? Using tokens is very precise and I don't think this should go away; however, I wonder if including the function name might give more context (especially when the names aren't obfuscated).

I'm honestly torn here. I think displaying un-obfuscated method names is valuable, however, my work has seen few samples where this is the case. More frequently, I see large, obfuscated names that would be a pain to handle (display) correctly.

We could take a route simliar to de4dot by attempting to identify and rename obfuscated method names but I see that potentially confusing users who pivot from capa to a tool like dnSpy.

Reading your original message I see you say "when possible" which could be a valid path forward, e.g., we only display method names when we can identify the name has not been obfuscated at all (or past an established threshold).

Co-authored-by: Moritz <[email protected]>

mike-hunhoff · 2023-02-28T22:23:53Z

Nice work. A few comments and maybe a larger question if this should maybe life in dncil or dnfile?

@mr-tz This came to mind about halfway through writing the code haha. I think the basic block logic likely belongs in dncil. Once we've addressed any issues here I'll move the basic block logic to dncil and update this PR to use the updated dncil APIs.

mike-hunhoff · 2023-03-01T00:58:38Z

Nice work. A few comments and maybe a larger question if this should maybe life in dncil or dnfile?

@mr-tz This came to mind about halfway through writing the code haha. I think the basic block logic likely belongs in dncil. Once we've addressed any issues here I'll move the basic block logic to dncil and update this PR to use the updated dncil APIs.

@mr-tz @williballenthin see mandiant/dncil#55

williballenthin · 2023-03-01T12:20:09Z

capa/features/extractors/dnfile/extractor.py

+                if idx == 0:
+                    # add #1
+                    leaders.add(insn.offset)


could pull this out of the loop so its not re-checked on each iteration

williballenthin · 2023-03-01T12:20:44Z

capa/features/extractors/dnfile/extractor.py

+            for idx, insn in enumerate(fh.inner.instructions):
+                if idx == 0:
+                    # add #1
+                    leaders.add(insn.offset)


Suggested change

for idx, insn in enumerate(fh.inner.instructions):

if idx == 0:

# add #1

leaders.add(insn.offset)

leaders.add(fh.inner.instructions[0])

for idx, insn in enumerate(fh.inner.instructions):

assuming there's at least one instruction for each function

williballenthin · 2023-03-01T12:21:40Z

capa/features/extractors/dnfile/extractor.py

+                    leaders.add(insn.offset)
+
+                if any((insn.is_br(), insn.is_cond_br(), insn.is_leave())):
+                    # add #2


Suggested change

# add #2

# add #2: targets of branches are leaders

williballenthin · 2023-03-01T12:22:39Z

capa/features/extractors/dnfile/extractor.py

+                if any((insn.is_br(), insn.is_cond_br(), insn.is_leave())):
+                    # add #2
+                    leaders.add(insn.operand)
+                    # add #3


Suggested change

# add #3

# add #3: fallthrough instructions after conditional branches are leaders

williballenthin · 2023-03-01T12:24:03Z

capa/features/extractors/dnfile/extractor.py

+            # calculate basic block leaders where,
+            #   1. The first instruction of the intermediate code is a leader
+            #   2. Instructions that are targets of unconditional or conditional jump/goto statements are leaders
+            #   3. Instructions that immediately follow unconditional or conditional jump/goto statements are considered leaders


#3: ... immediately follow unconditional ...

can you convince me of this? i think the instruction must be the target of a branch in order to be a leader. otherwise, its unreferenced code.

williballenthin · 2023-03-01T12:26:16Z

capa/features/extractors/dnfile/extractor.py

+                    fh.ctx["blocks"].append(bb_curr)
+                    continue
+
+                assert bb_curr is not None
+                bb_curr.instructions.append(insn)


Suggested change

fh.ctx["blocks"].append(bb_curr)

continue

assert bb_curr is not None

bb_curr.instructions.append(insn)

fh.ctx["blocks"].append(bb_curr)

else:

assert bb_curr is not None

bb_curr.instructions.append(insn)

optional style to clarify one of two cases happens here

williballenthin · 2023-03-01T12:28:04Z

capa/features/extractors/dnfile/extractor.py

+                if len(bb.instructions) == 0:
+                    # TODO: consider error?
+                    continue


Suggested change

if len(bb.instructions) == 0:

# TODO: consider error?

continue

assert len(bb.instructions) > 0

i dont see how this can ever not be the case, given that the bb is initialized with at least one element above. so, we could keep this assertion around, or also remove it.

williballenthin · 2023-03-01T12:28:41Z

capa/features/extractors/dnfile/extractor.py

+                if len(bb.instructions) == 0:
+                    # TODO: consider error?
+                    continue


Suggested change

if len(bb.instructions) == 0:

# TODO: consider error?

continue

assert len(bb.instructions) > 0

ditto

williballenthin · 2023-03-01T12:42:32Z

capa/features/extractors/dnfile/extractor.py

+                try:
+                    bb_next: DnBasicBlock = fh.ctx["blocks"][idx + 1]
+                    bb.succs.append(bb_next)
+                    bb_next.preds.append(bb)
+                except IndexError:
+                    continue


Suggested change

try:

bb_next: DnBasicBlock = fh.ctx["blocks"][idx + 1]

bb.succs.append(bb_next)

bb_next.preds.append(bb)

except IndexError:

continue

try:

bb_next: DnBasicBlock = fh.ctx["blocks"][idx + 1]

except IndexError:

continue

else:

bb.succs.append(bb_next)

bb_next.preds.append(bb)

make region that can throw and handle the exception as small as possible for clarity

is the IndexError a programming error? i suppose we'd hit that case at ret and/or any other instruction at end of function body. can we codify this in any way? not a priority.

williballenthin · 2023-03-01T12:49:22Z

I'm honestly torn here. I think displaying un-obfuscated method names is valuable, however, my work has seen few samples where this is the case. More frequently, I see large, obfuscated names that would be a pain to handle (display) correctly.

We could take a route simliar to de4dot by attempting to identify and rename obfuscated method names but I see that potentially confusing users who pivot from capa to a tool like dnSpy.

Reading your original message I see you say "when possible" which could be a valid path forward, e.g., we only display method names when we can identify the name has not been obfuscated at all (or past an established threshold).

I agree with everything you say here. I'd only recommend including the name when its useful and helpful. I do not think we should do any renaming or show any obfuscated names. So, if its reasonably easy to determine if a name is non-obfuscated, then I think we should maybe do that.

How to determine if non-obfuscated? I guess ASCII only, something around entropy, not mixing casing or numbers too weirdly, ...?

We should probably move this into a separate feature request.

mr-tz · 2024-03-22T10:15:16Z

@mike-hunhoff, do we want to keep this around here or close it?

mike-hunhoff · 2024-03-22T15:32:36Z

@mike-hunhoff, do we want to keep this around here or close it?

We can close - I'll link the corresponding dncil PR back to this for documentation purposes.

dotnet: add support for basic blocks

969403a

github-actions bot previously requested changes Feb 24, 2023

View reviewed changes

update CHANGELOG

c3778cf

PR changes

e99525a

use function address when emitting instructions

23d076e

mr-tz reviewed Feb 28, 2023

View reviewed changes

Update capa/features/extractors/dnfile/extractor.py

30272d5

Co-authored-by: Moritz <[email protected]>

mike-hunhoff mentioned this pull request Mar 1, 2023

add basic block support mandiant/dncil#55

Draft

williballenthin reviewed Mar 1, 2023

View reviewed changes

williballenthin mentioned this pull request Mar 1, 2023

show .NET function name when its helpful #1340

Open

mike-hunhoff marked this pull request as draft March 3, 2023 19:26

mr-tz added the dont merge Indicate a PR that is still being worked on label Oct 12, 2023

mike-hunhoff closed this Mar 22, 2024

mike-hunhoff mentioned this pull request Apr 10, 2024

add support for basic blocks mandiant/dncil#52

Open

akhilguruprasad22 mentioned this pull request May 1, 2024

Basic Blocks refactoring mandiant/dncil#112

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dotnet: add support for basic blocks #1326

dotnet: add support for basic blocks #1326

mike-hunhoff commented Feb 24, 2023

github-actions bot left a comment

mike-hunhoff commented Feb 27, 2023

mr-tz commented Feb 27, 2023

ghost commented Feb 27, 2023

mike-hunhoff commented Feb 27, 2023

mike-hunhoff commented Feb 27, 2023

ghost commented Feb 27, 2023 via email •

edited by ghost

Loading

ghost commented Feb 27, 2023 via email

mr-tz left a comment

mr-tz Feb 28, 2023

mike-hunhoff Feb 28, 2023

mr-tz Mar 1, 2023

williballenthin Mar 1, 2023

williballenthin Mar 1, 2023

williballenthin Mar 1, 2023

mike-hunhoff commented Feb 28, 2023

mike-hunhoff commented Feb 28, 2023 •

edited

Loading

mike-hunhoff commented Mar 1, 2023

williballenthin Mar 1, 2023

williballenthin Mar 1, 2023

williballenthin Mar 1, 2023

williballenthin Mar 1, 2023

williballenthin Mar 1, 2023

williballenthin Mar 1, 2023

williballenthin Mar 1, 2023

williballenthin Mar 1, 2023

williballenthin Mar 1, 2023

williballenthin Mar 1, 2023 •

edited

Loading

williballenthin Mar 1, 2023

williballenthin commented Mar 1, 2023

mr-tz commented Mar 22, 2024

mike-hunhoff commented Mar 22, 2024

		bb_next: DnBasicBlock = fh.ctx["blocks"][idx + 1]
		bb.succs.append(bb_next)

	# add #3
	# add #3: fallthrough instructions after conditional branches are leaders

dotnet: add support for basic blocks #1326

dotnet: add support for basic blocks #1326

Conversation

mike-hunhoff commented Feb 24, 2023

github-actions bot left a comment

Choose a reason for hiding this comment

mike-hunhoff commented Feb 27, 2023

mr-tz commented Feb 27, 2023

ghost commented Feb 27, 2023

mike-hunhoff commented Feb 27, 2023

mike-hunhoff commented Feb 27, 2023

ghost commented Feb 27, 2023 via email • edited by ghost Loading

ghost commented Feb 27, 2023 via email

mr-tz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mike-hunhoff commented Feb 28, 2023

mike-hunhoff commented Feb 28, 2023 • edited Loading

mike-hunhoff commented Mar 1, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

williballenthin Mar 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

williballenthin commented Mar 1, 2023

mr-tz commented Mar 22, 2024

mike-hunhoff commented Mar 22, 2024

ghost commented Feb 27, 2023 via email •

edited by ghost

Loading

mike-hunhoff commented Feb 28, 2023 •

edited

Loading

williballenthin Mar 1, 2023 •

edited

Loading