-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance detection of buffer overflows by introducing variable for length of array/blob #1330
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your pull request! Supporting relational information about array lengths and allocated blocks will be very neat!
I left some first round of remarks that you may have a look at.
a9c9eb7
to
f504431
Compare
src/analyses/base.ml
Outdated
| Q.MayOverflow e -> | ||
IntDomain.local_no_overflow := true; | ||
if M.tracing then M.trace "apron" "exp %a\n" d_exp e; | ||
ignore(query_evalint (Analyses.ask_of_ctx ctx) ctx.global ctx.local e); | ||
!IntDomain.local_no_overflow ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you find an example where the old no overflow handling in the relational domain caused an unsoundness in the analysis results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old no overflow handling didn't cause any unsoundness in the analysis results because the flag was ignored (see comment). However this overflow handling is not sufficient for the analysis, due to apron giving up when there is a cast from signed integer to unsigned integer and vice versa even when enabling --set sem.int.signed_overflow assume_none
. This may work for relational vla length analysis if there are just integers, but for malloc cil cast the expression to unsigned long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we decide to not ignore the old overflow handling, we do get unsoundness in the analysis as one can see in 6b797ad.
// PARAM: --enable ana.int.interval --set ana.activated[+] apron
#include <stdlib.h>
#include <goblint.h>
int main()
{
unsigned int len = rand();
len = len % 10;
for (int i = 0; i -1 < len ; i++)
{
int tmp = i;
}
}
This results in
[Error][Analyzer][Unsound] both branches over condition '(unsigned int )(i - 1) < len' are dead (tests/regression/100-limitation/01-crash.c:9:9-9:37)
[Error][Analyzer][Unsound] Both branches dead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I, unfortunately, found out that the current overflow handling in this PR also suffers from this problem. The queries for EvalInt are cached, which results in the local_no_overflow not being set correctly.
One idea to solve this problem, which don't involve rewriting the whole evalint function would be to disable the cache lookup with another global flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I, unfortunately, found out that the current overflow handling in this PR also suffers from this problem. The queries for EvalInt are cached, which results in the local_no_overflow not being set correctly.
One idea to solve this problem, which don't involve rewriting the whole evalint function would be to disable the cache lookup with another global flag.
Disabling the cache with a global flag seems to work. However, I agree that this solution may not be the prettiest. After having a look at the overflow handling in Linear two var equality #1297 and testing a bit, it seems like their overflow handling may also work with my analysis.
e4de37e
to
b516fd3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see there's progress here! I left some first comments, but I have not had the time to go into details yet (especially the changes inside the relational domain and the oob analysis).
Here's two questions I already have though:
- One's conceptual: Have you considered what happens for multi-threaded programs? Initially we thought about having an analysis such as this to demonstrate the usefulness of our analyses of global variables. Do you think this it within reach?
- The other one is perhaps more down to earth: What do you do when you encounter a
malloc
only in one branch? Will you not have issues with the apron environments not matching up after the join?
src/domains/queries.ml
Outdated
@@ -195,6 +199,9 @@ struct | |||
| MustTermAllLoops -> (module MustBool) | |||
| IsEverMultiThreaded -> (module MayBool) | |||
| TmpSpecial _ -> (module ML) | |||
| MayBeOutOfBounds _ -> (module ID) | |||
| MayOverflow _ -> (module MayBool) | |||
| AllocMayBeOutOfBounds _ -> (module VDQ.ProdID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also why ID
above and VDQ.ProdID
here? The meaning of the tuple should ideally be obvious from the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the relational VLA oob check, I don't check if the index can be negative because if the relational domain knows that the index is not negative, the bounds for the array index would be improved anyway in the Valuedomain.
However, this does not work for pointer arithmetic. The relational domain doesn't track the relationship between pointers (yet). Therefore the analysis can't give boundaries for the memory access. In the (newly) added test case 85 14 memOutOfBounds would raise a warning that we may access the array before the allocated memory. With the new pointer_tracking enabled, which tracks relational relationships between pointers, we know that this cannot be the case. We, therefore, return a tuple of ID
FlatBool
to know if we might access before or after the allocated memory.
@@ -249,10 +249,10 @@ struct | |||
let anyq = Queries.Any q in | |||
if M.tracing then M.traceli "query" "query %a\n" Queries.Any.pretty anyq; | |||
let r = match Queries.Hashtbl.find_option querycache anyq with | |||
| Some r -> | |||
| Some r when !lookUpCache -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this cache need to be disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for the new no overflow computation in the relational domain, as the old one was unsound and unused. I added a local_no_overflow
flag in the intdomain and checked this flag after an EvalInt query. This is a kind of dirty hack, as I first tried to rewrite the Evalint to include the overflow information, but after adapting the intdomain and trying to adopt the base.ml to include the overflow information. I noticed that this would involve nearly a few thousand lines of code and deemed this not reasonable for this PR.
This works in most cases except for loops, where it seems like the EvalInt query is cached, which results in the EvalInt query not being newly evaluated and the flag not being set correctly. See also: #1330 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very suspicious. We assume that queries are pure, so caching or not caching cannot make an observable difference. If it can, then something is wrong.
We don't recompute until query results stabilize, so in the presence of such impure queries we have no soundness guarantees anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is, because for the no_overflow query uses side effects to evaluate if we have an overflow. It queries the EvalInt query and checks the side effect for an overflow. However, this does not work if the query is cached.
Unfortunately, I don't think solving later issues is within reach. Investigating and fixing bugs in the relational domain and unsoundness issues in memOutOfBounds took quite some time. |
84f806e
to
a59b918
Compare
… of the current evaluation
Before merging this, one should verify whether the handling of non-unique mallocs is sound here, e.g. for a program segment: int n = 1 + (rand() % 5);
for(int i = 0; i < n; i++){
int* a = malloc(i * sizeof(int));
if(i == 0){ b = a;}
}
*b = 4; // Should warn, as size of block pointed to by b is zero. |
How does this ensure that the relational analysis never leaks the introduced dummy variables to other analysis, e.g. by making queries? Note that the relational information for other variables then may also be expressed via the dummy variables. |
The ghost variables are added into the environment with
|
This draft PR enhances the detection of buffer overflows by introducing a variable for the length of array/blob as suggested by @michael-schwarz in #620.
Arrays:
Upon creation of a VLA
vdecl
creates a new ghost variable in the domain, which tracks the symbolic relationships between program variables and the length of the array. Using the query system, the arraydomain can check for out-of-bounds accesses.Malloc:
This works similarly to the VLA case. When upon calling malloc/alloc/calloc the
special
transfer function creates a new ghost variable, which tracks the length of blob size. MemOutOfBounds can then use the query system to ask the relational domain if an OOB access occurred. In addition, I implemented pointer tracking as we would not be able to know the symbolic relationship of the pointer offset otherwise. (see the following example).TODO: