Skip to content
This repository has been archived by the owner on Jun 16, 2019. It is now read-only.

Misc fixes #42

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open

Conversation

nickolas-pohilets
Copy link

@nickolas-pohilets nickolas-pohilets commented Jan 3, 2019

This MR contains assorted bug fixes, bringing fcd+remill to a point where it is able to successfully decompile a simple function using debug build of LLVM-7.0 and latest version of Remill.

LLVM versions pre 7.0 have a bug where removal of the dereferencable attributes triggers an assertion in debug build and reads memory out of bounds in release build, fixed in 9bc0b1080f195636fed019bce979aa72892d6c69.

@CLAassistant
Copy link

CLAassistant commented Jan 3, 2019

CLA assistant check
All committers have signed the CLA.

@pgoodman pgoodman requested a review from surovic January 3, 2019 21:50
fcd/pass_argrec_remill.cpp Show resolved Hide resolved
fcd/pass_argrec_remill.cpp Outdated Show resolved Hide resolved
fcd/pass_argrec_remill.cpp Outdated Show resolved Hide resolved
fcd/pass_argrec_remill.cpp Outdated Show resolved Hide resolved
fcd/pass_argrec_remill.cpp Outdated Show resolved Hide resolved
fcd/pass_argrec_remill.cpp Outdated Show resolved Hide resolved
@surovic
Copy link

surovic commented Jan 4, 2019

First of all, thanks a lot @nickolas-pohilets for the PR! I added some comments, let me know if you have any questions or comments to my comments :)

Also I'd like to let you know that there's a lot of work done on the AST part of fcd in the dev-clang-ast branch of the repo. It's basically a complete rewrite of the AST backend so I personally would not spend too much time on the old AST backend in fcd. The new backend is not at parity with the old one yet though.

@nickolas-pohilets
Copy link
Author

Is there a plan to sync back with zneak/fcd? Should project use coding style of the original project or conform with one of trailofbits?

@surovic
Copy link

surovic commented Jan 5, 2019

Should project use coding style of the original project or conform with one of trailofbits?

I personally use the trailofbits coding standards in all files with a trailofbits license header. In original fcd code I try to adhere to the original coding style. Generally as a rule of thumb -- make your code blend in with the surrounding code.

Is there a plan to sync back with zneak/fcd?

@pgoodman what do you think? I'm inclined to say no, since the divergence is pretty significant and I think that overall priorities have shifted away from fcd as a tool. The functionality will be covered and expanded upon by other tools (McSema with the DynInst frontend) and a tool based on the new AST backend from fcd.

@nickolas-pohilets what are your use-cases for fcd? We could take them into consideration when developing the other tools.

@pgoodman
Copy link
Collaborator

pgoodman commented Jan 5, 2019

@nickolas-pohilets There is no plan to do that. The original scope of the project was to port fcd to use Remill because the lifting approaches shared many similarities. Since then, the scope and direction of the project has changed somewhat.

@pgoodman
Copy link
Collaborator

pgoodman commented Jan 5, 2019

@surovic Returning a container type is usually acceptable because of RVO.

@nickolas-pohilets
Copy link
Author

I want to make a decompiler for Objective-C based on fcd+remill. Compiled Objective-C binaries contain lots of meta-information. As a reference, class-dump can generate headers based on that. Decompiling Objective-C into relatively high-quality source code seems to be a pretty low-hanging fruit.

Few things from the top of my head what will be need:

  • Wider interface of the executable to get ObjC metadata. Probably returning a section by name. Using llvm::object::ObjectFile might be a good idea.
  • Special signature recovery for calls of objc_msgSend.
  • AST able to represent Objective-C features. The one from Clang should do the job.
  • Lots of work in the backend to reconstruct the AST.

@nickolas-pohilets
Copy link
Author

What is the current status of support of LLVM 7.0 in Remill and McSema? I've quickly checked travis scripts in Remill, McSema and cxx-common, and 7.0 version is not mentioned anywhere.

@pgoodman
Copy link
Collaborator

pgoodman commented Jan 7, 2019

@nickolas-pohilets I think it's a matter of building cxx-common for LLVM 7.0. The latest issues we've been having is actually more to do with RTTI, so that we can support DynInst as a frontend.

@nickolas-pohilets
Copy link
Author

nickolas-pohilets commented Jan 9, 2019

So, what is then plan to get these changes merged?

@nickolas-pohilets
Copy link
Author

Sorry, closing by mistake.

@surovic
Copy link

surovic commented Jan 9, 2019

Just about to check out the PR, see if it builds on my machine and merge if they do. Stay tuned!

@surovic
Copy link

surovic commented Jan 9, 2019

Well, unfortunately there's a problem with building against LLVM-4.0 namely due to
a7d6c63 using llvm::Value::deleteValue() which does not exist prior to LLVM-5.0.

@nickolas-pohilets do you think you can fix this?
@pgoodman are we going to keep full LLVM-3.5 to LLVM-7.0 compatibility?

Edit: Another incompatibility is in 8e04ec1, since llvm/Transforms/Utils.h does not exist prior to LLVM-7.0. Take a look fcd/compat/Scalar.h for a hint at how we resolve header compatibility issues like these.

@nickolas-pohilets
Copy link
Author

I propose to drop pre-LLVM 7.0, if possible. As I mentioned in the PR description, there is a bug that got fixed only in LLVM 7.0

@surovic
Copy link

surovic commented Jan 9, 2019

I'm going to put this on hold until we hear from @pgoodman

@nickolas-pohilets
Copy link
Author

I think I can re-write the code to avoid buggy function to make it work with older versions, but I’m not sure if it is worth the effort. In contrast to Remill, FCD currently does not have any clients to keep compatibility with.

@surovic
Copy link

surovic commented Jan 9, 2019

To be honest, I think the fork is currently maintained only because some code might be useful in other projects, like McSema. FCD doing it's own CFG recovery is more of a hindrance than a feature, since lots of other tools provide this functionality and there's only so much developer time available for FCD. IMO the real value of FCD, considering McSema covers LLVM IR generation, is in it's C AST backend.

@nickolas-pohilets
Copy link
Author

Agree. Shall be we then go one step further and turn FCD+Remill into FCD+McSema+Some CFG Recovery?

@pgoodman
Copy link
Collaborator

pgoodman commented Jan 9, 2019

Ideally I want compat with older versions of LLVM. The way I tend to handle that is to add code into https://github.com/trailofbits/remill/tree/master/remill/BC/Compat

@pgoodman
Copy link
Collaborator

pgoodman commented Jan 9, 2019

      auto inst = expression->getAsInstruction();
      auto res = ctx.uncachedExpressionFor(*inst);
      inst->deleteValue();
      return res;

@nickolas-pohilets Can you deal with the leak by attaching inst to a basic block somewhere, then invoke inst->eraseFromParent()?

@surovic
Copy link

surovic commented Jan 9, 2019

The step has already been more or less made 🙂. Rellic is the clang-based C AST backend that lived in FCD's dev-clang-ast branch. It was private for a while, but we figured there's no reason not to make it public at this point.

So the toolchain we are currently looking at is something like McSema+Rellic, where CFG recovery is done by one of McSema's frontends. The main McSema frontend is IDA Pro currently, but there is development being done on a Binary Ninja and the DynInst frontend.

@nickolas-pohilets
Copy link
Author

Then, probably, I should already abandon FCD altogether and start hacking on Rellic. Shall we organise a (video) call some time next week to discuss collaboration? Pls pm me.

@pgoodman
Copy link
Collaborator

pgoodman commented Jan 9, 2019

Sure. What is your username on EH slack?

@nickolas-pohilets
Copy link
Author

@mykola Pokhylets

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants