-
Notifications
You must be signed in to change notification settings - Fork 3
Interactivity or scripting #20
Comments
I don't have much experience with integrating python interpreters into C++ projects or the other way around, but the linux.py and ELF.py approach sounds good. It's actually how Macho and I think PE binary formats are supported. ELF is hardcoded in fcd. I think one could actually get something like "function filtering" to work by delegating some of the entry point discovery from fcd to the scripts. Discover some entry points in the .py scripts, omit suff like The issue with this approach I see is that some entry points are discovered using Remill via the recursive descent disassembly and I'm not sure if that would not reintroduce some entry points filtered by the scripts. Then again one could also pass a list of filtered entry points from the script to fcd and have fcd omit them as well. |
To add, I think the scripting approach is strictly better than the interactive one. But that's just my opinion. |
This raises one question for me, which is: should the main binary loading / parsing be done by C++ code? If we made fcd's C++ side cooperate with a Python side, then we could bring in third-party packages like Angr's cle to load in binary images, and have the C++ side actually invoke CLE to do the reading. I envision something like microx, where a class is provided that can be extended, and the extension implements methods for reading virtual memory, etc. This would then generalize to handling actual process memory dumps. |
Yeah, this sounds pretty good. And I think fcd actually has some support for this already, from glacing over |
I can also imagine *.py scripts being very useful in scenarios with packed and / or encrypted executables. |
So maybe something like... import cle
import fcd
# Memory abstraction that will let the decompiler read memory. You could
# implement Memory here by invoking APIs from cle, Binary Ninja, IDA Pro, etc.
# You could also provide info to fcd from a McSema-lifted CFG file, which contains
# rich info.
class ExecutableMemory(fcd.ExecutableMemory):
def __init__(self, ld):
self.ld = ld
def read(self, addr, num_bytes):
# do something with self.ld, returning a list or tuple bytearray
ld = cle.Loader(sys.argv[0])
memory = ExecutableMemory(ld)
decomp = fcd.Decompiler(memory)
decomp.add_entrypoint(0xf00, name="main")
# Fill in other named entrypoints from ld
# Maybe bring in Angr's CFGFast to invoke other APIs,
# e.g. decomp.mark_as_function() or something. Down
# the line, having the ability to mark indirect xrefs would
# be nifty.
# Now lift to bitcode
bc = decomp.lift()
# Show me the bitcode!
bc.dump(address=0xf00)
bc.dump(name="main")
# Eventually we could implement the emulator test suite
# via whatever bc is, e.g. bc.execute(cpu), where cpu is
# an object of a class implementing methods like
# read_register and read_memory.
bc.set_calling_convention(...)
bc.decompile(address=0xf00)
bc.decompile(name="main") |
I think your example looks good, but it's also the reverse of what fcd currently does. Currently fcd uses Python to parse executables. Like for example... import pefile
import bisect
# helper globals
stubs = {}
sectionStart = []
sectionInfo = {}
# fcd interface below (I assume this is what fcd's C++ Executable class requires)
executableType = "Portable Executable"
targetTriple = "unknown-unknown-win32"
entryPoints = []
def init(data):
# fill stubs, sectionStart, sectionInfo, ...
def getStubTarget(target):
# returns the target of a stub function (library functions, etc)
def mapAddress(address):
# maps virtual addresses to actual addresses in the binary The above script is then passed to fcd via a command-line flag, for example In your example it seems to me that fcd, would be more of a library with Python bindings, rather than a standalone executable, which I'm not opposed to, but I assume it would be a bit more work. That being said, it seems that C++ library with Python bindings is the way a lot of projects nowadays go, so why do something different. |
I think library-ifying it is something I could pull together in a reasonable amount of time. It'd be pretty cool to expose fcd to Binary Ninja, for example. |
That I completely agree with. |
This is more of a braindump for a longer-term way of using fcd. When I look at decompiled code, one thing I notice is that there are lots of libc bringup routines (e.g.
__libc_start_main
) that could probably be elided, but coming up with a good policy for what to elide and when is not straightforward.Another thing that comes up is how to specify things like headers to fcd in order for it to do a better job with decompilation. Interactivity has come up as an option, and I think that might be how fcd originally did things.
I think that a nice alternative might be a more scripting-oriented approach. It would be similar-ish to interactive, but permit more re-use down the line. For things like libc stuff, you can have a file like linux.py or ELF.py that just "does the right thing" for eliding stuff. Scripting may also enable things like specifying headers.
I'm not sure if scripting should be done via embedding a Python interpreter (this is what PointsTo did, and it worked reasonably well.. it would mean a new command-line argument would be something like
--script
or something), or making fcd into a Python module (a bit harder, might make it easier to integrate with other stuff).Any thoughts?
The text was updated successfully, but these errors were encountered: