Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove dependency on capstone for PowerPC disassembly #6292

Open
wants to merge 13 commits into
base: dev
Choose a base branch
from

Conversation

cryptwhoa
Copy link

This PR removes the dependency on capstone for disassembling powerpc.

Capstone has worked admirably up until now, but it's nice to have something that you can have a little more control over. Some of its drawbacks:

  • There are bugs in capstone, and at times the existing powerpc architecture plugin has to work around its limitations; see the "LocalDisassembly" stuff for missing opcodes, or there are times when capstone treats something as the wrong signedness of integer.
  • Some design decisions make capstone a little less ergonomic to work with. One thing is that the internal representation of each instruction (the "details" part), at least for powerpc, is designed to make the disassembly text easy to work with, but not so much for program analysis. One example of this is the cmp* instructions in powerpc, where the condition register is omitted from the disassembly text if it's cr0. For program analysis purposes (ie lifting IL), it's cleaner to include the implied cr0 in the internal representation. But with capstone, the first operand has to be checked to see if it's is a condition register or a GPR register to figure out if it's an implied cr0 or a specified condition register.
  • If bugs are found in capstone, fixing them is nontrivial; there's a whole template language to learn.
  • The way that capstone is being used right now, there's 2 mallocs/2 frees for every instruction, it would be nice to do away with this overhead.
  • If we want to add support for a version of powerpc not currently in capstone (like VLE for example), we...just can't, without rewriting this whole thing anyways. It would be nice to add upstream support, but different projects always prioritize different things.

This "homegrown" version addresses those:

  • It handles the bugs found in capstone, and if any are found in the future, fixing them doesn't require a local disassembly hack in the binaryninja architecture.
  • Internal representations prioritize IL lifting, so generally all of the operands, even those that are implied, are always present.
  • It's all written in C/C++. If a binaryninja user finds a bug, they don't have to add hacks around the LocalDisassembly, or learn a template language to fix it upstream. It's even simple to add a fix locally to limp along, because it's all written in C/C++.
  • There's no dynamic memory allocation

The natural question is "ok so this version fixes the bugs in capstone, but how do you know this doesn't just introduce new/different bugs ?" There are only 4 billion possible instructions, so it's really not that bad to just do for (i = 0; i < 0xffffffff; ++i), decode with this homegrown version, and decode with capstone, and compare them. I did that (see capstone_compare_test.c), so that if there are any bugs in this homegrown implementation, at least they're already present, so this is strictly a step forwards with some bugs and not a step backwards with any others, known or unknown.

NOTES:

  • There are a small handful of differences between the homegrown and capstone that still get reported by the script. These are mostly restricted to tlb instructions, which are implementation dependent anyways. There are also a few reported instructions (tsr, tbegin) that I believe to be bugs in capstone, but didn't add special cases for those in the script.
  • The test script was originally developed against an older version of capstone (4.x), the version binary ninja used at the time I started, and was later updated to handle the version of capstone that it uses at the time of this writing (5.0.3). Some of the capstone bugs that the script mentions may be from 4.x and may be irrelevant now in 5.0.3.
  • This only accounts for discrepancies between capstone and the written documentation. If there are errata in the documentation that capstone accounts for, they're not reflected in the homegrown version. Cursory greps for "err" and "hardware" in capstone/suite/synctools/tablegen/PPC didn't reveal anything, so I'm not sure if there are any cases like this.
  • The comparison file (capstone_compare_test.c) doesn't compile out-of-the-box, since there's a namespace collision with the capstone's register enum. See the comments in the build script, build_capstone_compare.sh; it's not too bad, it's just a find/replace in a few files, that has to be reverted back for before building the powerpc binary architecture.
  • This doesn't handle the other variants of powerpc (QPX, paired-store) yet. I figured I'd check to make sure I'm not wasting my time with this before I finish those out.

noone added 13 commits December 23, 2024 12:47
This commit just brings it in and makes sure it builds, but doesn't
change anything in the arch plugin to use it yet.
Decoding instructions don't need these, but it's useful for the
architecture plugin.
We take a divergence from capstone: instead of treating every single
branch pseudo-op as its own distinct instruction ID, we just group all
of the BCx, BCLRx, and BCCTRx in a group, and just change the mnemonic
depending on the value of BI and BO.  This will drastically simplify the
arch plugin for things like getting instruction info and lifting.
Note that this hasn't been tested against capstone 5.0.3 yet: I said
"ooh it looks like binaryninja supports SPE now", added it, then when I
went to test it, it looks like capstone 5.0.3 doesn't support it yet
(`CS_MODE_SPE` isn't yet in the allowed bitmask for powerpc in
the `arch_configs` table in `cs.c`).

There's enough work that I'm leaving it in here for now.
This was getting unwieldly.
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


noone seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@NotNite
Copy link

NotNite commented Jan 2, 2025

I figured I'd check to make sure I'm not wasting my time with this before I finish those out.

A while ago (August), I was informed on Twitter that a Rust-based PPC arch plugin may be coming: https://x.com/vector35/status/1823830788976021840. Unsure if there's any news on that...

@plafosse plafosse requested a review from galenbwill January 22, 2025 14:01
@plafosse
Copy link
Member

@NotNite yes this is the start of that work!

@galenbwill galenbwill self-assigned this Jan 22, 2025
@galenbwill galenbwill added the Arch: PowerPC Issues with the PowerPC architecture plugin label Jan 22, 2025
@galenbwill galenbwill added this to the Gallifrey milestone Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arch: PowerPC Issues with the PowerPC architecture plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants