Fraunhofer-AISEC · oxisto · Jan 13, 2025 · Dec 4, 2024 · Dec 4, 2024 · Dec 4, 2024
@@ -18,3 +18,5 @@ links to the specifications of the following concepts:
 * [Data Flow Graph (DFG)](./dfg)
 * [Data Flow Graph (DFG) Function Summaries](./dfg-function-summaries.md)
 * [Evaluation Order Graph (EOG)](./eog)
+* [Our inference rules](./inference) which may modify the graph
+* Read about [our overlay graph](./overlays) if you want to encode more information
@@ -0,0 +1,96 @@
+---
+title: "Inference of new nodes"
+linkTitle: "Inference of new nodes"
+no_list: true
+weight: 1
+date: 2025-01-10
+description: >
+    Inference of new nodes
+---
+
+# Inference System
+
+One of the goals of this library is to deal with incomplete code. In this case,
+the library provides various options to create new nodes and include them in the
+resulting graph. The user of the library can configure which of the inference
+options should be enabled. This document provides an overview of the different
+options and their expected behavior. The rules for the inferring new nodes are
+implemented in the class
+[`de.fraunhofer.aisec.cpg.passes.inference.Inference`](https://fraunhofer-aisec.github.io/cpg/dokka/main/older/main/cpg-core/de.fraunhofer.aisec.cpg.passes.inference/-inference/index.html)
+and are typically used by various passes.
+
+## Inference of namespace and record declarations
+
+If we encounter a scope, e.g, in a call to a function such as
+`java.lang.Object.toString()`, and we do not have a corresponding `NameScope`
+for the qualified name `java.lang`, we try to infer one. We recursively infer a
+namespace, e.g., `java` as well as `java.lang` until the scope can be resolved.
+There is one special check, in case the name refers to a type. In this case we
+infer a record declaration instead. This is usually the case when a type is
+nested in another type, e.g. `MyClass::MyIterator::next`. If we encounter usage
+of `MyClass::MyIterator` as a type somewhere, we infer a record instead of a
+namespace.
+
+Record declarations are indeed inferred for all (object) types that we
+encounter. The scope of the type or a fully qualified name (if specified) is
+taken into account when creating an inferred `RecordDeclaration`. If the record
+is supposed to exist in a scope / namespace that was "seen" (e.g., it was
+specified as a fully qualified name), but a corresponding `NamespaceDeclaration`
+did not exist, we also try to infer this namespace (see above). 
+
+For example, if we encounter the type `java.lang.String` (and do not find a
+matching declaration), we recursively infer the following nodes:
+
+- `NamespaceDeclaration` for `java` in the `GlobalScope`
+- `NamespaceDeclaration` for `java.lang` in the scope of the inferred `java`
+  namespace
+- `RecordDeclaration` for `java.lang.String` in the scope of the inferred
+  `java.lang` namespace
+
+It is sometimes indistinguishable whether we should infer a namespace or a
+record as a parent scope, since usually languages support nested records or
+classes. However, we tend to assume that the case that it is a namespace is far
+more likely.
+
+## Inference of function declarations
+
+If we try to resolve a `CallExpression`, where no `FunctionDeclaration` with a
+matching name and signature exists in the CPG, we infer a new
+`FunctionDeclaration`. This may include inferring a receiver (i.e., the base a
+method is invoked on) for object-oriented programming languages. We also infer
+the required parameters for this specific call as well as their types.
+
+The function declaration must be inferred within the scope of a
+`RecordDeclaration`, a `NamespaceDeclaration` or a `TranslationUnitDeclaration`.
+If the function `foo` is inferred within the scope of a `RecordDeclaration`,
+`foo` *may* represent a method but it could also be a static import depending on
+the `LanguageTraits` of the programming language. If we add a
+`MethodDeclaration` to a `RecordDeclaration` which we treated as a "struct", we
+change its `type` to "class".
+
+## Inference of variables
+
+While we do aim at handling incomplete code, we assume that it is more likely to
+analyze complete functions and missing some files/dependencies compared to
+having all files/dependencies available and missing few lines within a file.
+Based on this assumption, we infer global variables if we cannot find any
+matching symbol for a reference, but we do NOT infer local variables.
+
+## Inference of return types of functions
+
+This is a rather experimental feature and is therefore disabled by default.
+
+This option can be used to guess the return type of an inferred function
+declaration. We make use of the usage of the returned value (e.g. if it is
+assigned to a variable/reference, used as an input to a unary or binary operator
+or as an argument to another function call) and propagate this type to the
+return type, if it known. One interesting case are unary and binary operators
+which can be overloaded but we assume that they are more likely to treat numeric
+values (for `+`, `-`, `*`, `/`, `%`, `++`, `--`) and boolean values (for `!`).
+
+## Inference of DFG edges
+
+The library can apply heuristics to infer DFG edges for functions which do not
+have a body (i.e., functions not implemented in the given source code) if there
+is no custom specification for the respective function available. All parameters
+will flow into the return value.
@@ -0,0 +1,60 @@
+---
+title: "Overlay Graph"
+linkTitle: "Overlay Graph"
+no_list: true
+weight: 1
+date: 2025-01-10
+description: >
+    Overlay Graph
+---
+
+# Overlay Graph
+
+The CPG represents the code of a program as a graph of nodes $N_{CPG}$
+and edges $E$.
+
+Our basic version of the CPG only considers nodes that are part
+of the CPG's immediate representation of the program's AST (we denote
+these nodes as $N_{AST} \subseteq N_{CPG}$).
+
+The edges $E$ represent various graph structures like the abstract
+syntax tree (AST), data flow graph (DFG), the execution order (EOG),
+call graph, and further dependencies among code fragments. Each of
+the edges can have a predefined set of properties which is specified
+by our graph schema.
+
+However, this version of the CPG does not include any information
+about the semantics of the code or consider expert knowledge on
+certain framework or libraries.  This is, however, crucial information
+for in-depth semantic analyses. To account for this, we introduce
+the concept of an **Overlay Graph** which allows us to extend the graph
+with expert knowledge or any other information which may not be directly
+visible in the code.
+
+We define an overlay graph as a set of nodes $N_O \subseteq N_{CPG}$,
+where $\forall n_O \in N_O: n_O \not\in N_{AST}$. This means, we add nodes
+which are not part of the CPG's AST. These overlay nodes are denoted by
+extending the interface `de.fraunhofer.aisec.cpg.graph.OverlayNode` and
+are connected via an edge to the  nodes in $N_{AST}$. The overlay nodes
+may have additional edges and can fill  all known except from the AST edge.
+
+## Concepts and Operations
+
+One generic extension of the CPG can include **concepts** and
+**operations** for which we provide the two classes
+`de.fraunhofer.aisec.cpg.graph.concepts.Concept` and
+`de.fraunhofer.aisec.cpg.graph.concepts.Operation` which can be extended.
+We will incrementally add some nodes to the library within a dedicated
+module.
+
+Each concept aims to represent a certain "interesting" type of
+behavior or somehow relevant information and can contain multiple
+operations or interesting properties related to the same concept.
+Operations always have to represent some sort of program behavior.
+
+Typically, it makes sense to register custom passes which use the
+information  provided by the plain version of the CPG and generate
+new instances of a concept or operation when the pass identifies certain
+patterns. This pattern may be a call of a specific function, a sequence
+of functions, it may consider the values passed as arguments, or it may
+also be a known sequence of operations.
@@ -11,11 +11,13 @@ description: >
 
 # Getting Started
 
-After [installing the library](./installation), it can be used in different ways:
+The CPG can be used in different ways:
 
 * [As a library for Kotlin/Java](./library)
 * [Via an interactive command line interface](./cli)
 * [With custom automated analyses using the Query API](./query)
+* [Via neo4j](./neo4j)
 
-In all these cases, the [Shortcuts](./shortcuts) provide you a convenient way to
+
+In the first three cases, the [Shortcuts](./shortcuts) provide you a convenient way to
 quickly explore some of the most relevant information.
@@ -23,10 +23,10 @@ repositories {
 }
 
 dependencies {
-    implementation("de.fraunhofer.aisec:cpg:6.2.1") // Install everything
+    implementation("de.fraunhofer.aisec:cpg:9.0.2") // Install everything
     // OR
-    implementation("de.fraunhofer.aisec:cpg-core:6.2.1") // Only cpg-core
-    implementation("de.fraunhofer.aisec:cpg-language-java:6.2.1") // Only the java language frontend
+    implementation("de.fraunhofer.aisec:cpg-core:9.0.2") // Only cpg-core
+    implementation("de.fraunhofer.aisec:cpg-language-java:9.0.2") // Only the java language frontend
     ...
 }
 ```

@@ -0,0 +1,119 @@
+---
+title: "Using the Interactive CLI"
+linkTitle: "Using the Interactive CLI"
+no_list: true
+weight: 2
+date: 2025-01-10
+`
+description: >
+  Using neo4j for visualization (cpg-n2o4j)
+---
+
+# Neo4J visualisation tool for the Code Property Graph 
+
+A simple tool to export a *code property graph* to a neo4j database.
+
+## Requirements
+
+The application requires Java 17 or higher.
+
+## Build
+
+Build (and install) a distribution using Gradle
+
+```
+../gradlew installDist
+```
+
+Please remember to adjust the `gradle.properties` before building the project.
+
+## Usage
+
+```
+./build/install/cpg-neo4j/bin/cpg-neo4j  [--infer-nodes] [--load-includes] [--no-default-passes]
+                    [--no-neo4j] [--no-purge-db] [--print-benchmark]
+                    [--use-unity-build] [--benchmark-json=<benchmarkJson>]
+                    [--custom-pass-list=<customPasses>]
+                    [--export-json=<exportJsonFile>] [--host=<host>]
+                    [--includes-file=<includesFile>]
+                    [--password=<neo4jPassword>] [--port=<port>]
+                    [--save-depth=<depth>] [--top-level=<topLevel>]
+                    [--user=<neo4jUsername>] ([<files>...] | -S=<String=String>
+                    [-S=<String=String>]... |
+                    --json-compilation-database=<jsonCompilationDatabase> |
+                    --list-passes)
+      [<files>...]           The paths to analyze. If module support is
+                               enabled, the paths will be looked at if they
+                               contain modules
+      --benchmark-json=<benchmarkJson>
+                             Save benchmark results to json file
+      --custom-pass-list=<customPasses>
+                             Add custom list of passes (includes
+                               --no-default-passes) which is passed as a
+                               comma-separated list; give either pass name if
+                               pass is in list, or its FQDN (e.g.
+                               --custom-pass-list=DFGPass,CallResolver)
+      --export-json=<exportJsonFile>
+                             Export cpg as json
+      --host=<host>          Set the host of the neo4j Database (default:
+                               localhost).
+      --includes-file=<includesFile>
+                             Load includes from file
+      --infer-nodes          Create inferred nodes for missing declarations
+      --json-compilation-database=<jsonCompilationDatabase>
+                             The path to an optional a JSON compilation database
+      --list-passes          Prints the list available passes
+      --load-includes        Enable TranslationConfiguration option loadIncludes
+      --no-default-passes    Do not register default passes [used for debugging]
+      --no-neo4j             Do not push cpg into neo4j [used for debugging]
+      --no-purge-db          Do no purge neo4j database before pushing the cpg
+      --password=<neo4jPassword>
+                             Neo4j password (default: password
+      --port=<port>          Set the port of the neo4j Database (default: 7687).
+      --print-benchmark      Print benchmark result as markdown table
+  -S, --softwareComponents=<String=String>
+                             Maps the names of software components to their
+                               respective files. The files are separated by
+                               commas (No whitespace!).
+                             Example: -S App1=./file1.c,./file2.c -S App2=.
+                               /Main.java,./Class.java
+      --save-depth=<depth>   Performance optimisation: Limit recursion depth
+                               form neo4j OGM when leaving the AST. -1
+                               (default) means no limit is used.
+      --top-level=<topLevel> Set top level directory of project structure.
+                               Default: Largest common path of all source files
+      --use-unity-build      Enable unity build mode for C++ (requires
+                               --load-includes)
+      --user=<neo4jUsername> Neo4j user name (default: neo4j)
+```
+You can provide a list of paths of arbitrary length that can contain both file paths and directory paths.
+
+## Json export
+
+It is possible to export the cpg as json file with the `--export-json` option.
+The graph is serialized as list of nodes and edges:
+```json
+{
+   "nodes": [...],
+   "edges": [...]
+}
+```
+Documentation about the graph schema can be found at:
+[https://fraunhofer-aisec.github.io/cpg/CPG/specs/graph](https://fraunhofer-aisec.github.io/cpg/CPG/specs/graph)
+
+Usage example:
+```
+$ build/install/cpg-neo4j/bin/cpg-neo4j --export-json cpg-export.json --no-neo4j src/test/resources/client.cpp
+```
+
+To export the cpg from a neo4j database, you can use the neo4j `apoc` plugin.
+There it's also possible to export only parts of the graph.
+
+## Known issues:
+
+- While importing sufficiently large projects with the parameter <code>--save-depth=-1</code> 
+        a <code>java.lang.StackOverflowError</code> may occur.
+    - This error could be solved by increasing the stack size with the JavaVM option: <code>-Xss4m</code>
+    - Otherwise the depth must be limited (e.g. 3 or 5)
+
+- While pushing a constant value larger than 2^63 - 1 a <code>java.lang.IllegalArgumentException</code> occurs.
@@ -60,6 +60,22 @@ all (==> false)
 
 ## Operators of the detailed mode
 
+The starting point of an analysis is typically one operation inspired by predicate
+logics (**allExtended** or **existsEtended**) which work as follows:
+
+- They allow you to specify which type of nodes serve as starting point via
+  a reified type parameter.
+- The first argument is a function/lambda which describes certain pre-filtering
+  requirements for the nodes to check. This can be used to write something like
+  "implies" in the logical sense.
+- The second argument check the condition which has to hold for all or at least
+  one of these pre-filtered nodes.
+
+Example (the first argument of a call to "foo" must be 2): 
+```
+result.allExtended<CallExpression>{it.name.localName == "foo"} {it.argument[0].intValue eq const(2) }
+```
+
 Numerous methods allow to evaluate the queries while keeping track of all the
 steps. Currently, the following operations are supported:
 
@@ -87,6 +103,10 @@ For numeric values:
 **Note:** The detailed mode and its operators require the user to take care of
 the correct order. I.e., the user has to put the brackets!
 
+For a full list of available methodsm check the dokka documentation pages functions
+and properties and look for the methods which somehow make use of the `QueryTree`
+[here](https://fraunhofer-aisec.github.io/cpg/dokka/main/cpg-analysis/de.fraunhofer.aisec.cpg.query/index.html).
+
 ## Operators of the less detailed mode
 
 Numerous methods allow to evaluate the queries: