Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation site #1882

Merged
merged 26 commits into from
Jan 13, 2025
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
2d0765b
Update figure of workflow
KuechA Dec 4, 2024
37afc7d
Add paper of Konrad
KuechA Dec 4, 2024
52dad52
Update index
KuechA Dec 4, 2024
70839df
Update version number in library.md
KuechA Dec 4, 2024
5a17f73
Add neo4j documentation
KuechA Dec 4, 2024
1f68b27
Update the query api documentation page
KuechA Dec 4, 2024
801ab63
More shortcuts documentation
KuechA Dec 4, 2024
8ddf01c
Merge branch 'main' into ak/update-docs20241204
KuechA Dec 4, 2024
76c5ac7
Update description to version 9.0.0
KuechA Jan 10, 2025
3e63b7f
Remove link to installing the CPG in Getting Started page
KuechA Jan 10, 2025
46ad79a
Delete docs/docs/GettingStarted/installation.md
KuechA Jan 10, 2025
a6e6815
Remove installing page from navigation
KuechA Jan 10, 2025
df9d75f
Start summarizing inference rules and assumptions
KuechA Jan 10, 2025
1bd119f
List inference in spec landing page
KuechA Jan 10, 2025
78dad62
List inference md in the navigation
KuechA Jan 10, 2025
9e9f33e
Merge branch 'main' into ak/update-docs20241204
KuechA Jan 10, 2025
4745790
9.0.1
KuechA Jan 10, 2025
557fc1b
Merge branch 'main' into ak/update-docs20241204
KuechA Jan 10, 2025
d721021
Update numbers and date
KuechA Jan 10, 2025
92c8c00
Create overlays.md
KuechA Jan 10, 2025
cd4e848
List overlays in the specs index
KuechA Jan 10, 2025
6d092a8
Update mkdocs.yaml
KuechA Jan 10, 2025
6da5de8
Added inference docs
oxisto Jan 10, 2025
3e15516
Overlay graph
oxisto Jan 10, 2025
032f60a
Inference docs
oxisto Jan 11, 2025
9a16b48
Added MathJax as local asset
oxisto Jan 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/docs/CPG/specs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,5 @@ links to the specifications of the following concepts:
* [Data Flow Graph (DFG)](./dfg)
* [Data Flow Graph (DFG) Function Summaries](./dfg-function-summaries.md)
* [Evaluation Order Graph (EOG)](./eog)
* [Our inference rules](./inference) which may modify the graph
* Read about [our overlay graph](./overlays) if you want to encode more information
96 changes: 96 additions & 0 deletions docs/docs/CPG/specs/inference.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
---
title: "Inference of new nodes"
linkTitle: "Inference of new nodes"
no_list: true
weight: 1
date: 2025-01-10
description: >
Inference of new nodes
---

# Inference System

One of the goals of this library is to deal with incomplete code. In this case,
the library provides various options to create new nodes and include them in the
resulting graph. The user of the library can configure which of the inference
options should be enabled. This document provides an overview of the different
options and their expected behavior. The rules for the inferring new nodes are
implemented in the class
[`de.fraunhofer.aisec.cpg.passes.inference.Inference`](https://fraunhofer-aisec.github.io/cpg/dokka/main/older/main/cpg-core/de.fraunhofer.aisec.cpg.passes.inference/-inference/index.html)
and are typically used by various passes.

## Inference of namespace and record declarations

If we encounter a scope, e.g, in a call to a function such as
`java.lang.Object.toString()`, and we do not have a corresponding `NameScope`
for the qualified name `java.lang`, we try to infer one. We recursively infer a
namespace, e.g., `java` as well as `java.lang` until the scope can be resolved.
There is one special check, in case the name refers to a type. In this case we
infer a record declaration instead. This is usually the case when a type is
nested in another type, e.g. `MyClass::MyIterator::next`. If we encounter usage
of `MyClass::MyIterator` as a type somewhere, we infer a record instead of a
namespace.

Record declarations are indeed inferred for all (object) types that we
encounter. The scope of the type or a fully qualified name (if specified) is
taken into account when creating an inferred `RecordDeclaration`. If the record
is supposed to exist in a scope / namespace that was "seen" (e.g., it was
specified as a fully qualified name), but a corresponding `NamespaceDeclaration`
did not exist, we also try to infer this namespace (see above).

For example, if we encounter the type `java.lang.String` (and do not find a
matching declaration), we recursively infer the following nodes:

- `NamespaceDeclaration` for `java` in the `GlobalScope`
- `NamespaceDeclaration` for `java.lang` in the scope of the inferred `java`
namespace
- `RecordDeclaration` for `java.lang.String` in the scope of the inferred
`java.lang` namespace

It is sometimes indistinguishable whether we should infer a namespace or a
record as a parent scope, since usually languages support nested records or
classes. However, we tend to assume that the case that it is a namespace is far
more likely.

## Inference of function declarations

If we try to resolve a `CallExpression`, where no `FunctionDeclaration` with a
matching name and signature exists in the CPG, we infer a new
`FunctionDeclaration`. This may include inferring a receiver (i.e., the base a
method is invoked on) for object-oriented programming languages. We also infer
the required parameters for this specific call as well as their types.

The function declaration must be inferred within the scope of a
`RecordDeclaration`, a `NamespaceDeclaration` or a `TranslationUnitDeclaration`.
If the function `foo` is inferred within the scope of a `RecordDeclaration`,
`foo` *may* represent a method but it could also be a static import depending on
the `LanguageTraits` of the programming language. If we add a
`MethodDeclaration` to a `RecordDeclaration` which we treated as a "struct", we
change its `type` to "class".

## Inference of variables

While we do aim at handling incomplete code, we assume that it is more likely to
analyze complete functions and missing some files/dependencies compared to
having all files/dependencies available and missing few lines within a file.
Based on this assumption, we infer global variables if we cannot find any
matching symbol for a reference, but we do NOT infer local variables.

## Inference of return types of functions

This is a rather experimental feature and is therefore disabled by default.

This option can be used to guess the return type of an inferred function
declaration. We make use of the usage of the returned value (e.g. if it is
assigned to a variable/reference, used as an input to a unary or binary operator
or as an argument to another function call) and propagate this type to the
return type, if it known. One interesting case are unary and binary operators
which can be overloaded but we assume that they are more likely to treat numeric
values (for `+`, `-`, `*`, `/`, `%`, `++`, `--`) and boolean values (for `!`).

## Inference of DFG edges

The library can apply heuristics to infer DFG edges for functions which do not
have a body (i.e., functions not implemented in the given source code) if there
is no custom specification for the respective function available. All parameters
will flow into the return value.
60 changes: 60 additions & 0 deletions docs/docs/CPG/specs/overlays.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
---
title: "Overlay Graph"
linkTitle: "Overlay Graph"
no_list: true
weight: 1
date: 2025-01-10
description: >
Overlay Graph
---

# Overlay Graph

The CPG represents the code of a program as a graph of nodes $N_{CPG}$
and edges $E$.

Our basic version of the CPG only considers nodes that are part
of the CPG's immediate representation of the program's AST (we denote
these nodes as $N_{AST} \subseteq N_{CPG}$).
oxisto marked this conversation as resolved.
Show resolved Hide resolved

The edges $E$ represent various graph structures like the abstract
syntax tree (AST), data flow graph (DFG), the execution order (EOG),
call graph, and further dependencies among code fragments. Each of
the edges can have a predefined set of properties which is specified
by our graph schema.

However, this version of the CPG does not include any information
about the semantics of the code or consider expert knowledge on
certain framework or libraries. This is, however, crucial information
for in-depth semantic analyses. To account for this, we introduce
the concept of an **Overlay Graph** which allows us to extend the graph
with expert knowledge or any other information which may not be directly
visible in the code.

We define an overlay graph as a set of nodes $N_O \subseteq N_{CPG}$,
where $\forall n_O \in N_O: n_O \not\in N_{AST}$. This means, we add nodes
which are not part of the CPG's AST. These overlay nodes are denoted by
extending the interface `de.fraunhofer.aisec.cpg.graph.OverlayNode` and
are connected via an edge to the nodes in $N_{AST}$. The overlay nodes
may have additional edges and can fill all known except from the AST edge.

## Concepts and Operations

One generic extension of the CPG can include **concepts** and
**operations** for which we provide the two classes
`de.fraunhofer.aisec.cpg.graph.concepts.Concept` and
`de.fraunhofer.aisec.cpg.graph.concepts.Operation` which can be extended.
We will incrementally add some nodes to the library within a dedicated
module.

Each concept aims to represent a certain "interesting" type of
behavior or somehow relevant information and can contain multiple
operations or interesting properties related to the same concept.
Operations always have to represent some sort of program behavior.

Typically, it makes sense to register custom passes which use the
information provided by the plain version of the CPG and generate
new instances of a concept or operation when the pass identifies certain
patterns. This pattern may be a call of a specific function, a sequence
of functions, it may consider the values passed as arguments, or it may
also be a known sequence of operations.
6 changes: 4 additions & 2 deletions docs/docs/GettingStarted/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,13 @@ description: >

# Getting Started

After [installing the library](./installation), it can be used in different ways:
The CPG can be used in different ways:

* [As a library for Kotlin/Java](./library)
* [Via an interactive command line interface](./cli)
* [With custom automated analyses using the Query API](./query)
* [Via neo4j](./neo4j)

In all these cases, the [Shortcuts](./shortcuts) provide you a convenient way to

In the first three cases, the [Shortcuts](./shortcuts) provide you a convenient way to
quickly explore some of the most relevant information.
29 changes: 0 additions & 29 deletions docs/docs/GettingStarted/installation.md

This file was deleted.

6 changes: 3 additions & 3 deletions docs/docs/GettingStarted/library.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,10 @@ repositories {
}

dependencies {
implementation("de.fraunhofer.aisec:cpg:6.2.1") // Install everything
implementation("de.fraunhofer.aisec:cpg:9.0.2") // Install everything
// OR
implementation("de.fraunhofer.aisec:cpg-core:6.2.1") // Only cpg-core
implementation("de.fraunhofer.aisec:cpg-language-java:6.2.1") // Only the java language frontend
implementation("de.fraunhofer.aisec:cpg-core:9.0.2") // Only cpg-core
implementation("de.fraunhofer.aisec:cpg-language-java:9.0.2") // Only the java language frontend
...
}
```
Expand Down
119 changes: 119 additions & 0 deletions docs/docs/GettingStarted/neo4j.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
---
title: "Using the Interactive CLI"
linkTitle: "Using the Interactive CLI"
no_list: true
weight: 2
date: 2025-01-10
`
description: >
Using neo4j for visualization (cpg-n2o4j)
---

# Neo4J visualisation tool for the Code Property Graph

A simple tool to export a *code property graph* to a neo4j database.

## Requirements

The application requires Java 17 or higher.

## Build

Build (and install) a distribution using Gradle

```
../gradlew installDist
```

Please remember to adjust the `gradle.properties` before building the project.

## Usage

```
./build/install/cpg-neo4j/bin/cpg-neo4j [--infer-nodes] [--load-includes] [--no-default-passes]
[--no-neo4j] [--no-purge-db] [--print-benchmark]
[--use-unity-build] [--benchmark-json=<benchmarkJson>]
[--custom-pass-list=<customPasses>]
[--export-json=<exportJsonFile>] [--host=<host>]
[--includes-file=<includesFile>]
[--password=<neo4jPassword>] [--port=<port>]
[--save-depth=<depth>] [--top-level=<topLevel>]
[--user=<neo4jUsername>] ([<files>...] | -S=<String=String>
[-S=<String=String>]... |
--json-compilation-database=<jsonCompilationDatabase> |
--list-passes)
[<files>...] The paths to analyze. If module support is
enabled, the paths will be looked at if they
contain modules
--benchmark-json=<benchmarkJson>
Save benchmark results to json file
--custom-pass-list=<customPasses>
Add custom list of passes (includes
--no-default-passes) which is passed as a
comma-separated list; give either pass name if
pass is in list, or its FQDN (e.g.
--custom-pass-list=DFGPass,CallResolver)
--export-json=<exportJsonFile>
Export cpg as json
--host=<host> Set the host of the neo4j Database (default:
localhost).
--includes-file=<includesFile>
Load includes from file
--infer-nodes Create inferred nodes for missing declarations
--json-compilation-database=<jsonCompilationDatabase>
The path to an optional a JSON compilation database
--list-passes Prints the list available passes
--load-includes Enable TranslationConfiguration option loadIncludes
--no-default-passes Do not register default passes [used for debugging]
--no-neo4j Do not push cpg into neo4j [used for debugging]
--no-purge-db Do no purge neo4j database before pushing the cpg
--password=<neo4jPassword>
Neo4j password (default: password
--port=<port> Set the port of the neo4j Database (default: 7687).
--print-benchmark Print benchmark result as markdown table
-S, --softwareComponents=<String=String>
Maps the names of software components to their
respective files. The files are separated by
commas (No whitespace!).
Example: -S App1=./file1.c,./file2.c -S App2=.
/Main.java,./Class.java
--save-depth=<depth> Performance optimisation: Limit recursion depth
form neo4j OGM when leaving the AST. -1
(default) means no limit is used.
--top-level=<topLevel> Set top level directory of project structure.
Default: Largest common path of all source files
--use-unity-build Enable unity build mode for C++ (requires
--load-includes)
--user=<neo4jUsername> Neo4j user name (default: neo4j)
```
You can provide a list of paths of arbitrary length that can contain both file paths and directory paths.

## Json export

It is possible to export the cpg as json file with the `--export-json` option.
The graph is serialized as list of nodes and edges:
```json
{
"nodes": [...],
"edges": [...]
}
```
Documentation about the graph schema can be found at:
[https://fraunhofer-aisec.github.io/cpg/CPG/specs/graph](https://fraunhofer-aisec.github.io/cpg/CPG/specs/graph)

Usage example:
```
$ build/install/cpg-neo4j/bin/cpg-neo4j --export-json cpg-export.json --no-neo4j src/test/resources/client.cpp
```

To export the cpg from a neo4j database, you can use the neo4j `apoc` plugin.
There it's also possible to export only parts of the graph.

## Known issues:

- While importing sufficiently large projects with the parameter <code>--save-depth=-1</code>
a <code>java.lang.StackOverflowError</code> may occur.
- This error could be solved by increasing the stack size with the JavaVM option: <code>-Xss4m</code>
- Otherwise the depth must be limited (e.g. 3 or 5)

- While pushing a constant value larger than 2^63 - 1 a <code>java.lang.IllegalArgumentException</code> occurs.
20 changes: 20 additions & 0 deletions docs/docs/GettingStarted/query.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,22 @@ all (==> false)

## Operators of the detailed mode

The starting point of an analysis is typically one operation inspired by predicate
logics (**allExtended** or **existsEtended**) which work as follows:

- They allow you to specify which type of nodes serve as starting point via
a reified type parameter.
- The first argument is a function/lambda which describes certain pre-filtering
requirements for the nodes to check. This can be used to write something like
"implies" in the logical sense.
- The second argument check the condition which has to hold for all or at least
one of these pre-filtered nodes.

Example (the first argument of a call to "foo" must be 2):
```
result.allExtended<CallExpression>{it.name.localName == "foo"} {it.argument[0].intValue eq const(2) }
```

Numerous methods allow to evaluate the queries while keeping track of all the
steps. Currently, the following operations are supported:

Expand Down Expand Up @@ -87,6 +103,10 @@ For numeric values:
**Note:** The detailed mode and its operators require the user to take care of
the correct order. I.e., the user has to put the brackets!

For a full list of available methodsm check the dokka documentation pages functions
and properties and look for the methods which somehow make use of the `QueryTree`
[here](https://fraunhofer-aisec.github.io/cpg/dokka/main/cpg-analysis/de.fraunhofer.aisec.cpg.query/index.html).

## Operators of the less detailed mode

Numerous methods allow to evaluate the queries:
Expand Down
Loading
Loading