-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fcd does not handle tail calls #34
Comments
Hej Félix, Thanks a lot for working on fcd! I feel very happy that you've decided to do small write-ups on your decompiler development adventures; where you document your progress, ideas, problems, set-backs and major wins at http://zneak.github.io/fcd/ They have been a good source of inspiration! If you haven't had a chance to see it yet, I'd very warmly recommend watching the talk Alessandro Di Federico presented at a LLVM developer conference in late 2016, where he introduced rev.ng which turns machine code into LLVM IR, using Qemu IR as an intermediate step in the translation. Using Qemu is interesting, and it comes with its own set of benefits and drawbacks. What rev.ng performs well at is (sometimes non-trivial) control flow analysis, by propagating constraints on register values (similar to what you do during type analysis). Related to this issue, rev.ng identifies tail calls and functions in a generic way, first by identifying candidate function addresses, and later pruning these candidate function addresses based on a set of constraints (single entry basic block (if I recall correctly), etc). A tail call may then be identified by a call from a function which jumps past other known functions; in other words, by a call to an address which is not part of the address region of the caller function. I've been playing around with some of these concepts the last couple of days, in a toy experiment to lift 32-bit x86 assembly to LLVM IR. Relevant extract from the // isTailCall reports whether the given instruction is a tail call instruction.
func (d *disassembler) isTailCall(funcEntry bin.Address, inst *instruction) bool {
...
if funcEntry <= target && target < funcEnd {
// Target inside function.
return false
} The target address of a JMP instruction is compared against known function start (entry basic block) and function end addresses (the last address of the function that is not part of succeeding data (e.g. switch jump tables) or other functions). I hope this may give some ideas on how tail calls may be identified, and hopefully you may end up incorporating some useful parts and finding yet other ways to identify them. Wish you all the best. Happy reversing! Cheers /u |
Alessandro reached out to me a few months ago, after his talk, so I saw it. :) The problem is not so much the idea as the time and interest that I have to put into this. Fcd's disassembler is very crude and not very good, and that's what's holding up the fix. It's one of the last concepts that have stayed almost identical since the beginning of the project, when I had a three-month deadline to make something that works, and I had to cut a few (many) corners to make that work. Right now, the disassembler does a single pass that does function identification, control flow reconstruction, and lifting, all at once. Because of that, it misses out on a lot of information that could be gathered if each of these tasks were a separate step. It would be much easier to solve tail calls and jump tables with a better disassembler. The other thing is that I'm not sure that I want to get into the disassembler business. It's an extremely difficult problem and some people already do it much better. I'm floating ideas to use disassembly input from radare, Bininja, Hopper, IDA, or any other useful source. Once this problem is solved, identifying tail calls is straightforward, I think. You would just need to check if the destination is a known function's entry point. You could also make assumptions based on the state of the stack frame, since tail calls pop their stack just like it would for a normal return. I don't really like the address range check (because functions don't have to be contiguous, although they usually are in cooperative programs), but rev.ng's logic looks like what I'd like to do for the same task. Thanks for your interest! I love to get feedback on the stuff that I do. |
Understandably so. This is the path that remill and MC-Semantics have taken, using primarily IDA and Binary Ninja for now.
I would agree, for the most part. Identifying tail calls to static addresses is straight forward. Tail calls to dynamic addresses (e.g.
Fair point. The preliminary solution to this problem in
Very happy to. It's great to see so many different approaches to a somewhat tricky problem : ) |
For reference, |
Functions that use the tail call optimization end with a
jmp
instruction instead of aret
statement. Because of that, fcd will inline the target function instead of calling it and returning its result.The text was updated successfully, but these errors were encountered: