-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C-2PO: Thesis About a Weakly-Relational Pointer Analysis #1485
base: master
Are you sure you want to change the base?
C-2PO: Thesis About a Weakly-Relational Pointer Analysis #1485
Conversation
added distinct dummy and return variables; changed typ to exp as additional information for the terms, therefore simplified the function for finding the cil term starting; fixed some bugs;
… return of startState
… possible to dereference it further.
This reverts commit 8173b99.
… of the resulting varinfo
type prop = Equal of ((term * term * Z.t) [@equal tuple3_equal][@compare tuple3_cmp][@hash tuple3_hash]) | ||
| Nequal of ((term * term * Z.t) [@equal tuple3_equal][@compare tuple3_cmp][@hash tuple3_hash]) | ||
| BlNequal of ((term * term) [@equal tuple2_equal][@compare tuple2_cmp][@hash tuple2_hash]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth giving (semantic) names to the types term * term * Z.t
and term * term
above and use them here. That would avoid all the extra attributes here.
(* the hash/compare values should not depend on the type. | ||
But they have to be defined even though they are not used, for some reason.*) | ||
let equal_typ _ _ = true | ||
let hash_typ _ = 0 | ||
let compare_typ _ _ = 0 | ||
let equal_typ a b = Stdlib.compare a b = 0 | ||
let hash_typ x = Hashtbl.hash x | ||
let compare_typ a b = Stdlib.compare a b | ||
|
||
type t = AssignAux of (typ[@compare.ignore][@eq.ignore][@hash.ignore]) | ||
| ReturnAux of (typ[@compare.ignore][@eq.ignore][@hash.ignore]) | ||
type t = AssignAux of typ | ||
| ReturnAux of typ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment outdated? Because they are used now.
And as such, CilType.Typ.t
should be used instead such that correct implementations are used. CIL types are cyclic, so Stdlib.compare
may not terminate.
set: SSet.t; | ||
map: LMap.t; | ||
normal_form: T.v_prop list Lazy.t[@compare.ignore][@eq.ignore][@hash.ignore]; | ||
diseq: Disequalities.t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ppx_deriving does not have any [@compare.ignore]
or [@eq.ignore]
attributes (ppx_compare does, but we don't use that).
So this uses the built-in behavior: always forces the lazy evaluation and compares that. This might've had an unintended and unexpected performance impact.
I only noticed this now due to the weird Lazy.hash
definition above: ppx_deriving_hash doesn't have [@hash.ignore]
but it also doesn't have built-in Lazy.t
support (in the released version 0.1.2 at least), so that's why the compiler complains.
I think explicitly overriding all three for the entire type (in parenthesis!) should work:
diseq: Disequalities.t; | |
normal_form: (T.v_prop list Lazy.t[@compare fun _ _ -> 0][@eq fun _ _ -> true][@hash fun _ -> 0]); |
The Lazy
module extension above shouldn't be necessary then either.
module VarinfoDescription = struct | ||
(**This type is equal to `varinfo`, but the fields are not mutable and they are optional. | ||
Only the name is not optional. *) | ||
type t = { | ||
vname_: string; | ||
vtype_: typ option; | ||
vattr_: attributes option; | ||
vstorage_: storage option; | ||
vglob_: bool option; | ||
vinline_: bool option; | ||
vdecl_: location option; | ||
vinit_: initinfo option; | ||
vaddrof_: bool option; | ||
vreferenced_: bool option; | ||
} | ||
|
||
let from_varinfo (v: varinfo) = | ||
{vname_=v.vname; | ||
vtype_=Some v.vtype; | ||
vattr_=Some v.vattr; | ||
vstorage_=Some v.vstorage; | ||
vglob_=Some v.vglob; | ||
vinline_=Some v.vinline; | ||
vdecl_=Some v.vdecl; | ||
vinit_=Some v.vinit; | ||
vaddrof_=Some v.vaddrof; | ||
vreferenced_=Some v.vreferenced} | ||
|
||
let empty name = | ||
{vname_=name; | ||
vtype_=None; | ||
vattr_=None; | ||
vstorage_=None; | ||
vglob_=None; | ||
vinline_=None; | ||
vdecl_=None; | ||
vinit_=None; | ||
vaddrof_=None; | ||
vreferenced_=None} | ||
|
||
let update_varinfo (v: varinfo) (update: t) = | ||
let open Batteries in | ||
{vname=update.vname_; | ||
vtype=Option.default v.vtype update.vtype_; | ||
vattr=Option.default v.vattr update.vattr_; | ||
vstorage=Option.default v.vstorage update.vstorage_; | ||
vglob=Option.default v.vglob update.vglob_; | ||
vinline=Option.default v.vinline update.vinline_; | ||
vdecl=Option.default v.vdecl update.vdecl_; | ||
vinit=Option.default v.vinit update.vinit_; | ||
vid=v.vid; | ||
vaddrof=Option.default v.vaddrof update.vaddrof_; | ||
vreferenced=Option.default v.vreferenced update.vreferenced_; | ||
vdescr=v.vdescr; | ||
vdescrpure=v.vdescrpure; | ||
vhasdeclinstruction=v.vhasdeclinstruction} | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a massive amount of boilerplate for what exactly? The fields of varinfo
are mutable, so the type could just be changed in-place with v.vtype <- t
without introducing copies of the varinfo
with the same vid
but different data.
When varinfo
s are correctly compared with CilType.Varinfo
, then that only depends on vid
anyway, so such mutations shouldn't break any data structures they are in either.
}, | ||
"join_algorithm": { | ||
"title": "ana.c2po.join_algorithm", | ||
"description": "Which join algorithm 'c2po' should use. Values: 'partition' (default): the more efficient version, it uses only the partition to compute the join. Circular equalities are lost during the join; 'partition': a more precise version, uses the automaton to compute the join.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just quickly skimmed over it since I listened to your presentation and it sounded interesting.
The only thing I saw was that in this description you describe the value 'partition'
twice. That is probably a typo and the second one should be 'precise'
.
Also, I am happy that the taintPartialContext analysis from my bachelor's thesis gets another use here
This PR introduces a weakly-relational analysis between pointers (ref: 2-pointer Logic by Seidl et al.).
The analysis can infer must-equalities and must-disequalities between terms which are built from pointer variables, with the addition of constants and dereferencing.
Moreover, it computes pairs of variables that are in distinct memory blocks, for example if they were initialized with two different calls to
malloc
.An example of a property that the analysis can infer is
*(x + 1) = 3 + y
anda != 5 + *(y + 1)
.Abstract states are represented by a congruence closure of terms over the uninterpreted function
*
.Additionally, a list of disequalities and a list of block disequalities between terms is stored.
The disequalities are important during the assignment operation, because when we assign a value to an address on the heap, we need to forget the information we had about all terms thay may alias with this address.
Additionally to the (block) disequalities inferred by this analysis, we exploit the May-Point-To analysis in order to derive additional disequalities between terms.
Using the May-Point-To analysis can be disabled with the option
ana.c2po.askbase
.There are two version of the join and two versions of the equal function that are implemented.
They can be chosen via the options
ana.c2po.precise_join
andana.c2po.normal_form
, respectively.They are disabled by default.
If
precise_join
is enabled, the join is calculated using the quantitative finite automaton. Otherwise, only the partition is considered for the join, which is a bit less precise.If
normal_form
is enabled, the equality of two congruence closures is decided by calculating a normal form, instead ofcomparing the equivalence classes. The normal form is computed lazily and doesn't need to be recomputed when we call
equal
on the same domain element.The implementation of
enter
duplicates the parameter variables of each function in order to infer incombine
which terms are related to the initial parameters.In order to use the May-Point-To analysis also on these duplicated variables, an additional analysis
startState
is added.It remembers the May-Point-To information of the duplicated variables at the beginning of the function.
Here is a more detailed description of the files that are added:
The file
unionFind.ml
contains the code for a quantitative union find and the quantitative finite automata.They will be necessary in order to construct the congruence closure of terms.
The contained modules are:
T
: here, the terms (e.g.,*(x + 64)
) and propositions (e.g.,*(x + 64) = 192 + y
) are defined. There are also function to convert a CIL expression to a term and a term to a CIL expression.The offsets in the terms are expressed in bits, therefore they are equal to the offset of the CIL expression multiplied with the type of the element that the variable points to.
Each term stores the information about the CIL expression that was used to create the term.
This way, it is easier to reconvert a term to a CIL expression and to get information about the type of a term.
Only the terms that are pointers or arrays or structs or 64-bit integers are considered by the analysis.
Arrays and structs are interpreted as pointers, e.g.
a[3]
is interpreted as the term*(a + 3)
.UnionFind
: the union-find data structure is defined with terms as elements.LookupMap
: this map represents the transitions of the quantitative finite automata.Each term
t
is mapped to the terms*(z + t)
or equal terms.The file
congruenceClosure.ml
contains the data structures for the C-2PO Domain, i.e., the congruence closure, the disequalities and the block disequalities:BlDis
represents the block disequalities as a map that maps each term to a set of terms that are definitely in a different block.Disequalities
represents the disequalities as a map from a term to a map from an integer to another term.The module contains functions to compute the closure of the disequalities that are implied by equalities, disequalities or block disequalities.
SSet
: a set of terms that are currently considered by the analysis.MRMap
: maps each equivalence class to a minimal representative. This is necessary for computing the normal form.MayBeEqual
: contains code to check if two terms may point to the same address or if they may overlap. It uses information from the disequalities and from MayPointTo for this purpose.The file
c2poDomain.ml
defines the domain operations.c2poAnalysis.ml
contains the transfer functions for the analysis.duplicateVars.ml
contains functions for duplicating variables, which is used inenter
in the C-2PO analysis and in the StartState analysis.startStateAnalysis.ml
remembers the value of each parameter at the beginning of a function. It answers the query May-Point-To for the duplicated variables and returns the initial value of the original variable.singleThreadedLifter.ml
transforms any analysis into a single threaded analysis by returning top any time the code might be multi-threaded. This can be reused by other analyses in the future.