Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go binary analysis to find source packages #2

Closed
pombredanne opened this issue Aug 1, 2023 · 16 comments
Closed

Go binary analysis to find source packages #2

pombredanne opened this issue Aug 1, 2023 · 16 comments

Comments

@pombredanne
Copy link
Member

I would like to analyze a Go binary and find which source packages were used to build it and more. The goal there is to recognize a binary as from Go, extract things out of it, map these things back to sources and open source source repos, and eventually inject that in the flow to create an SBOM in ScanCode/ScanCode.io

Go binaries can be either ELFs, PE or Mach-O. The initial focus should be on ELFs

To get the details of what's in a binary there are a couple avenues:

  • matching to purlDB : not yet for this library for now
  • collect the standard binary symbols and debug symbols (such as ELFs, PE or Mach-O): not yet for this library for now as this may be handled elsewhere
  • collect Go-specific strings and symbols such as the pclntab: the focus of this issue

The initial1st step is to determine the list of all third-party Go modules included in a binary. I would like to use a CLI tool with a CLI UI similar to that of ScanCode Toolkit, python-inspector and nuget-inspector that would:

  • accept a Go binary as an input argument
  • dump results of found Go packages in a JSON output format modelled after the ScanCode toolkit format for the packages section

Some candidate libraries include:

Beyond this for Go strings, see mandiant/flare-floss#845 by @Arker123 and mandiant/flare-floss#807

@williballenthin
Copy link

I've recently described a few workflows that I use when invoking GoReSym on an unknown Go executable: http://www.williballenthin.com/post/analyzing-go-programs-with-goresym/

Notably:

image

@pombredanne
Copy link
Member Author

@williballenthin Thank you ++ for dropping by! 🙇 your insights are super useful... Here the goal would be to integrate goresym as a library in a larger Go executable (or maybe reuse as-is?).

The initial usage would be: given some Go binary found in a codebase that we analyze for origin using a ScanCode.io pipeline, I want to:

  • find all the corresponding open source packages used in the binary
  • determine the version
  • look them up in PurlDB by PURL to collect details of origin and license
  • look them up in VulnerableCode by PURL to determine if there are know vulnerabilities

Later:

  • if there is a know vulnerability, find the fix commit and determine if the vulnerable code may be reachable.

NB: overall this is similar to related "binary" analysis we are working on for Java, JS and ELFs.

@williballenthin
Copy link

I'm not sure that GoReSym is setup to be used as a library today, but I suspect that @stevemk14ebr (the primary author) might be convinced to add support. If you decide to go this route using GoReSym and need any support or changes, please don't hesitate to open an issue at that repository so we can plan and implement.

@CatalinStratu
Copy link
Collaborator

Hi @williballenthin , I used https://github.com/goretk/gore but I will try to use GoReSym as well, if you want I could also contribute to GoReSym, to add the functionality we need

@CatalinStratu
Copy link
Collaborator

@pombredanne, we can use https://github.com/goretk/gore and the GetPackages method from this library to implement this functionality

@stevemk14ebr
Copy link

I will be working on making GoReSym easy to use as a library soon for integration with other tools we use on flare. Until that occurs, if you decide to use it, you could subprocess to it and parse the json output.

An advantage it may have over other projects is its focus on recovery of information even with obfuscated and malformed binaries.

CatalinStratu added a commit to CatalinStratu/go-inspector that referenced this issue Aug 2, 2023
@CatalinStratu
Copy link
Collaborator

CatalinStratu commented Aug 2, 2023

PR: #3

@pombredanne
Copy link
Member Author

@CatalinStratu IMHO the next steps are to:

  • Design the JSON output (including some detection of type, Go version etc). See what we do on the elf-inspector for a start as well as the scacnode-toolkit output as there is work to do just to get the go packages as a proper list of PURLs.

  • next, work on a pipeline to integrate this in ScanCode.io

  • And before all this, IMHO you should work to enable GoReSym to be used as a library (to support @stevemk14ebr work). There are a couple reasons:

  1. I have long used code written by Mandiant folks and @williballenthin in particular and I highly respect what they do. And they care and are responsive when needed.

  2. The AGPL license of gore may not always mesh well with our Apache license.

  3. gore seems to be a flatliner or very low activity project https://github.com/goretk/gore/pulse/monthly while https://github.com/mandiant/GoReSym/pulse/monthly is actively developed
    https://github.com/goretk/redress/pulse/monthly is also a flatliner/low activity project

  4. I do not known the authors of https://github.com/goretk/gore/graphs/contributors and in fact many seem to hide their identity. This does not give me this warm and fuzzy feeling I long for, even though the core gore author has submitted code to Mandian in Fix some FPs in ELF files by adding os requirement mandiant/capa-rules#454 and even though @stevemk14ebr contributed to https://github.com/goretk/pygore/commits?author=stevemk14ebr

  5. gore is 4+ year old and goresym is barely a year+ old but is already as or more popular

@williballenthin @stevemk14ebr What's been your rationale to start GoreSym when gore was there in the first place?

@stevemk14ebr I reckon you mentioned redress as an inspiration in https://www.mandiant.com/resources/blog/golang-internals-symbol-recovery

@pombredanne
Copy link
Member Author

@CatalinStratu one things that may be missing is a proper test suite.

You initial intuition to collect existing pre-built binaries test cases is IMHO a good one but these cannot be in the main git repo here as these would be too big. We could use a git module for these.... So IMHO you start to collect a few test binaries, prebuilt by others and with a well known origin, source code and license for testing

@pombredanne
Copy link
Member Author

pombredanne commented Aug 5, 2023

@CatalinStratu for tests, there may be several Go pre-built binaries availabel in Linux distro (fedora, Alpine and debian) for contrainer-related things
And hey! even Goresym could be used as a test binary https://github.com/mandiant/GoReSym/releases/tag/v2.4

The key here is to track carefully the origin and license of ALL these test files, ideally using an ABOUT file for each, see https://github.com/nexB/scancode-toolkit/blob/develop/src/packagedcode/bashlex.py.ABOUT for an example.

Note that the test suite in an external repo will be also usable by GoReSym.

@stevemk14ebr
Copy link

Motivations for GoReSym were primarily to enable better support of obfuscated binaries as we see them often, and additionally to base our code off of the Go runtime itself (hence why GoReSym is in Go) which enables faster updates for new runtime versions and more confidence in correctness of parsing.

@CatalinStratu
Copy link
Collaborator

@stevemk14ebr, could I somehow contribute to GoReSym? I made some improvements on my Fork(https://github.com/CatalinStratu/GoReSym), I made some good improvements, will you be able to do a PR?

@stevemk14ebr
Copy link

I would love additional contributions. Looking at your commit some of the changes involve renames and removing casts from files I copied from the upstream go source. As I copied these directly from upstream I'm not willing to change those for maintenance reasons, if it's good enough for Go it's good enough for us. I would accept contributions to the modified parts of the runtime (mostly within objfile) or main.

@CatalinStratu
Copy link
Collaborator

I would love additional contributions. Looking at your commit some of the changes involve renames and removing casts from files I copied from the upstream go source. As I copied these directly from upstream I'm not willing to change those for maintenance reasons, if it's good enough for Go it's good enough for us. I would accept contributions to the modified parts of the runtime (mostly within objfile) or main.

I made a PR, I will be grateful for your comments.

@pombredanne
Copy link
Member Author

@CatalinStratu I provided a bunch of comments there... I am not sure if you have the time to complete this?

@TG1999 See also mandiant/GoReSym#49
IMHO with this we could reuse Goresym as-is

@pombredanne
Copy link
Member Author

At this stage I think I can call this done based on these two PRs:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants