From 2326a87acd1a0db40a1b3c644239f8126a3424f6 Mon Sep 17 00:00:00 2001 From: Joxean Date: Mon, 10 Dec 2018 13:24:49 +0100 Subject: [PATCH 1/5] Update README.md --- README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/README.md b/README.md index 2958e2f..73d9570 100644 --- a/README.md +++ b/README.md @@ -13,6 +13,10 @@ Basically, the tool does the following: The tool will be released at some point in October. +## Donations + +You can help (or thank) the author of Pigaios by making a donation, if you feel like doing so: [![Donate](https://img.shields.io/badge/Donate-PayPal-green.svg)](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=LKGZZNUCZFYG8&source=url) + ## Requirements This project requires the installation of the CLang's Python bindings, Colorama is required for displaying colours (but is optional) and SciKit Learn is required for the Machine Learning part (which is also optional). You can install in Debian based Linux distros the dependencies with the following command: From fd9fcc75d782dcd691fc784a41c9017f2c8a8686 Mon Sep 17 00:00:00 2001 From: Joxean Date: Mon, 10 Dec 2018 13:43:26 +0100 Subject: [PATCH 2/5] Update README.md --- README.md | 92 ++++++++++++++++++------------------------------------- 1 file changed, 29 insertions(+), 63 deletions(-) diff --git a/README.md b/README.md index 73d9570..325ad5c 100644 --- a/README.md +++ b/README.md @@ -4,14 +4,14 @@ Pigaios ('πηγαίος', Greek for 'source' as in 'source code') is a tool for Basically, the tool does the following: - * Parse C source code and get artifacts from the Abstract Syntax Tree (AST) of each function. + * Parse C source code and extract features from the Abstract Syntax Tree (AST) of each function. * Export the same data extracted from C source codes from IDA databases. - * Find matches between the artifacts found in C source codes and IDA databases. + * Find matches between the features found in C source codes and IDA databases. * After an initial set of matches with no false positive is found, find more matches from the callgraph. * Rate the matches using both an "expert system" and a "machine learning" based system. * Also, import into the IDA database all the required structures and enumerations of a given code base (something not trivial in IDA). - The tool will be released at some point in October. +The tool was released in October 2018, during the [Hacktivity](https://www.hacktivity.com/) conference. ## Donations @@ -88,78 +88,44 @@ We will just remove all the lines for the files in "examples/" or "test/". After ``` $ srcbindiff.py -export -[+] CC contrib/testzlib/testzlib.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude +Using a total of 8 thread(s) +[+] CC examples/gzjoin.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include +[+] CC examples/fitblk.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include +[+] CC examples/enough.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include +[+] CC examples/gzappend.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include +[+] CC examples/zran.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include +[+] CC examples/zpipe.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include +[+] CC examples/gzlog.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include +[+] CC examples/gun.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include +examples/zran.c:402,68: warning: format specifies type 'unsigned long long' but the argument has type 'off_t' (aka 'long') +[+] CC contrib/testzlib/testzlib.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include +[+] CXX contrib/iostream/test.cpp -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include contrib/testzlib/testzlib.c:3,10: fatal: 'windows.h' file not found -[+] CXX contrib/iostream/test.cpp -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude +[+] CXX contrib/iostream/zfstream.cpp -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include contrib/iostream/zfstream.h:5,10: fatal: 'fstream.h' file not found -[+] CXX contrib/iostream/zfstream.cpp -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude contrib/iostream/zfstream.h:5,10: fatal: 'fstream.h' file not found -[+] CXX contrib/iostream3/test.cc -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -[+] CXX contrib/iostream3/zfstream.cc -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -[+] CC contrib/untgz/untgz.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -contrib/untgz/untgz.c:277,7: warning: implicit declaration of function 'chmod' is invalid in C99 -contrib/untgz/untgz.c:341,7: warning: implicit declaration of function 'mkdir' is invalid in C99 -contrib/untgz/untgz.c:659,11: warning: incompatible pointer types assigning to 'gzFile *' (aka 'struct gzFile_s **') from 'gzFile' (aka 'struct gzFile_s *') -contrib/untgz/untgz.c:665,18: warning: incompatible pointer types passing 'gzFile *' (aka 'struct gzFile_s **') to parameter of type 'gzFile' (aka 'struct gzFile_s *'); dereference with * -[+] CC contrib/inflate86/inffas86.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -[+] CC contrib/infback9/infback9.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -[+] CC contrib/infback9/inftree9.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -[+] CC contrib/blast/blast.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -[+] CXX contrib/iostream2/zstream_test.cpp -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -contrib/iostream2/zstream.h:27,10: fatal: 'strstream.h' file not found -[+] CC contrib/minizip/ioapi.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -[+] CC contrib/minizip/miniunz.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -contrib/minizip/miniunz.c:100,13: warning: extra tokens at end of #ifdef directive -contrib/minizip/miniunz.c:131,11: warning: implicit declaration of function 'mkdir' is invalid in C99 -contrib/minizip/miniunz.c:418,25: warning: passing 'const char *' to parameter of type 'char *' discards qualifiers -[+] CC contrib/minizip/minizip.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -contrib/minizip/minizip.c:97,13: warning: extra tokens at end of #ifdef directive -contrib/minizip/minizip.c:411,26: warning: passing 'const char *' to parameter of type 'char *' discards qualifiers -[+] CC contrib/minizip/unzip.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -[+] CC contrib/minizip/zip.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -[+] CC contrib/minizip/mztools.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -[+] CC contrib/minizip/iowin32.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -contrib/minizip/iowin32.h:14,10: fatal: 'windows.h' file not found -[+] CC contrib/masmx64/inffas8664.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -[+] CC contrib/puff/puff.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -[+] CC contrib/puff/pufftest.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -[+] CC gzlib.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -gzlib.c:252,9: warning: implicit declaration of function 'lseek' is invalid in C99 -[+] CC compress.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -[+] CC gzread.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -gzread.c:35,15: warning: implicit declaration of function 'read' is invalid in C99 -gzread.c:651,11: warning: implicit declaration of function 'close' is invalid in C99 -[+] CC gzclose.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -[+] CC crc32.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -[+] CC uncompr.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -[+] CC inflate.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -[+] CC gzwrite.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -gzwrite.c:89,20: warning: implicit declaration of function 'write' is invalid in C99 -gzwrite.c:661,9: warning: implicit declaration of function 'close' is invalid in C99 -[+] CC adler32.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -[+] CC zutil.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -[+] CC trees.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -[+] CC deflate.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -[+] CC inftrees.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -[+] CC infback.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -[+] CC inffast.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -Iinclude -[+] Building the callgraph... - -14 warning(s), 0 error(s), 5 fatal error(s) +[+] CXX contrib/iostream3/test.cc -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include +[+] CXX contrib/iostream3/zfstream.cc -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include +(...) +[+] Building definitions... +[i] Creating headers definition file zlib-1.2.11-exported.h... +[+] Building the callgraphs... +[+] Building the constants table... +[+] Creating indexes... ``` -As we can see, it compiled, parsed and generated everything from the source code and the process generated 14 warnings and 5 errors. The errors are because I'm compiling the ZLib source code in Linux and I don't have the windows.h header, for example. We can remove the files that are failing or we can just ignore them as one feature of this project is that it can parse both partial and non compilable source codes. Whatever we decide to do, we will have a SQLite database called "zlib-1.2.11.sqlite" in the same directory where we ran the command. We can open that database with whatever tool that supports SQLite databases, if we want to do so, like its command line tool: +As we can see, it compiled, parsed and generated everything from the source code and the process generated various warnings and errors. The errors are because I'm compiling the ZLib source code in Linux and I don't have the windows.h header, for example. We can remove the files that are failing or we can just ignore them as one feature of this project is that it can parse both partial and non compilable source codes. Whatever we decide to do, we will have a SQLite database called "zlib-1.2.11.sqlite" in the same directory where we ran the command. We can open that database with whatever tool that supports SQLite databases, if we want to do so, like its command line tool: ``` $ sqlite3 zlib-1.2.11.sqlite SQLite version 3.11.0 2016-02-15 17:29:24 Enter ".help" for usage hints. sqlite> select name from functions limit 5; -MyDoMinus64 -myGetRDTSC32 -BeginCountRdtsc -GetResRdtsc BeginCountPerfCounter +BeginCountRdtsc +Display64BitsSize +ExprMatch +ExprMatch ``` ## Importing symbols in IDA From aaecdcb0661698cae19200be911c86d12b215853 Mon Sep 17 00:00:00 2001 From: Joxean Date: Mon, 10 Dec 2018 14:02:22 +0100 Subject: [PATCH 3/5] Update README.md --- README.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/README.md b/README.md index 325ad5c..f705185 100644 --- a/README.md +++ b/README.md @@ -134,6 +134,23 @@ Once we have a binary opened in IDA that we know is using ZLib we can match func And that's it! Hopefully, it will make the life of reverse engineers easier and we will have to spend less time doing boring tasks like importing symbols or waste time reverse engineering open source libraries statically compiled in our targets. +## Screenshots + +List of matches between a Busybox 1.26.2 PowerPC binary and the 1.28 source code from the GIT repository: + +![List of matches between a Busybox 1.26.2 PPC binary and the 1.28 source code from the GIT repository](https://user-images.githubusercontent.com/2945834/49733950-2961f100-fc83-11e8-8a1d-254791382314.png) + +Visually diffing the pseudo-code of a function in some ```xmllint``` binary and the source code of libxml2: + +![Visually diffing the pseudo-code of a function in some xmllint binary and the source code of libxml2](https://user-images.githubusercontent.com/2945834/49734123-8eb5e200-fc83-11e8-956c-f9b029f331f8.png) + +Local types IDA view **before** importing symbols from the matches found between a Busybox 1.26.2 PowerPC binary and the 1.28 source code from the GIT repository: + +![image](https://user-images.githubusercontent.com/2945834/49734194-d3da1400-fc83-11e8-8380-91837bb7ca16.png) + +And the same view **after** importing symbols: +![image](https://user-images.githubusercontent.com/2945834/49734286-1d2a6380-fc84-11e8-9560-d2fb054a4c70.png) + ## License Pigaios is released under the GPL v3 license but commercial licenses for proprietary developments can be purchased. Contact admin AT joxeankoret DOT com for more details. From 73eac8d318efb9cfe40a3243a94784aee7e88278 Mon Sep 17 00:00:00 2001 From: Joxean Date: Mon, 10 Dec 2018 14:20:59 +0100 Subject: [PATCH 4/5] Update README.md --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index f705185..434975a 100644 --- a/README.md +++ b/README.md @@ -13,6 +13,8 @@ Basically, the tool does the following: The tool was released in October 2018, during the [Hacktivity](https://www.hacktivity.com/) conference. +NOTE: If you're looking for a tool for diffing or matching between binaries or if you can properly build binaries, you might want to take a look to [Diaphora](https://github.com/joxeankoret/diaphora). + ## Donations You can help (or thank) the author of Pigaios by making a donation, if you feel like doing so: [![Donate](https://img.shields.io/badge/Donate-PayPal-green.svg)](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=LKGZZNUCZFYG8&source=url) From d11e116e2d45e4e62f91a676a38a3812489e512f Mon Sep 17 00:00:00 2001 From: Joxean Date: Fri, 14 Dec 2018 00:50:08 +0100 Subject: [PATCH 5/5] Fix for a rare bug BUG: IDA might return that the start and end address of a basic block are 0 (zero). --- sourcexp_ida.py | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sourcexp_ida.py b/sourcexp_ida.py index c8d7b66..8c2d4d6 100644 --- a/sourcexp_ida.py +++ b/sourcexp_ida.py @@ -524,6 +524,8 @@ def do_export(self, f): flow = FlowChart(func) for block in flow: block_ea = block.startEA + if block.endEA == 0 or block_ea == BADADDR: + continue # ...and each instruction on each basic block for ea in list(Heads(block.startEA, block.endEA)):