-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add misc debugging helpers and feature to poison dead memory #1241
base: master
Are you sure you want to change the base?
Conversation
We add some misc debugging helpers to help binding implementers in debugging tricky memory corruption bugs. We now can also poison dead memory in each space's `release` method. This feature supersedes "immix_zero_on_release".
The failing style checks are unrelated to the change. Seems like new Rust stable is a bit smarter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside the in-line comments, this PR is missing implementation for poison_on_release
for spaces other than "copy/immix/largeobject" spaces. Favorably the PR should implement the feature for all the spaces, and minimally the PR should at least panick if the feature is enabled and spaces in use do not have an implementation for poison_on_release
.
@@ -188,6 +188,11 @@ impl<VM: VMBinding> CopySpace<VM> { | |||
} | |||
|
|||
pub fn release(&self) { | |||
#[cfg(feature = "poison_on_release")] | |||
for (start, size) in self.pr.iterate_allocated_regions() { | |||
crate::util::memory::set(start, 0xbc, size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using an arbitrary value, you can define a mapping between space and its poison value, or use something from the space (such as the SpaceDescriptor index, or the space name).
I would suggest creating a module crate::util::poison_on_release
. Define the poison values there, and define functions to poison a memory region.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if we need the different values to tell which space it is, especially after Kunshan's change in #1236. But if we do want to use different values, there needs to be a place where we can look at and know what the value means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well that might be true. It was helpful to me but that predates when I was using prctl
. I can remove it and replace it with a static pattern of 0xab
or something obviously nonsense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The annotation of #1236 is precise for Map64, but imprecise for Map32 due to chunk reuse between spaces. It is still helpful to let each space have a different poison value.
#[cfg(feature = "poison_on_release")] | ||
crate::util::memory::set_pattern( | ||
line.start(), | ||
(0xdeadbeef ^ line.start().as_usize()) & !0xff, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why poison as a pattern rather than a u8
value as other spaces? What is the pattern here? How do we supposed to use the pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This removes the existing code for immix_zero_on_release
. So immix will not zero lines even if immix_zero_on_release
is enabled.
You should have an assertion to make sure that immix_zero_on_release
and poison_on_release
should not co-exist, as they conflict with each other for Immix space, and you should zero/poison the memory based on which feature is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feature supersedes zero on release. The pattern here is helpful since it encodes the line that was released into the pattern. So you can XOR with 0xdeadbeef
to get the exact line that was released and where the object comes from. It was useful to me while debugging ART.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
immix_zero_on_release
was a debugging feature to catch untraced objects. But using 0x0
can actually make it harder to debug (in my and others' experience) since you don't know if it's a null pointer or something that was released early. So previously some of the students and I used an obvious nonsense value like 0xab
as the pattern. This is refinement of the same since it encodes the line into the pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you think "poison_on_release" is better than "zero_on_release", just remove the immix_zero_on_release
feature, and make sure that poison_on_release
can be a replacement for immix_zero_on_release
. With the current code, immix_zero_on_release
is broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you can XOR with 0xdeadbeef to get the exact line that was released and where the object comes from.
I am not following here. Can you get the line by aligning down from the faulty address? Why is XOR needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem isn't the line itself but what object was referring to it. In my case I did not have a debugger available because crashes were happening at Android boot time. So I had to be creative to debug memory corruption errors (hence the dump_ram_around_address
function as well).
So consider the case that a line has been overwritten when an object inside it was live.
Object = 0x12000040
Line = 0x12000000
Encoded pattern = 0xCCADBE00
An object referring to 0x12000040
would maybe load a field at an offset 0x8
. So it loads the address 0x12000048
. If we wrote just 0x0
s into the line then we would get a SIGSEGV but then no way to figure out what address was loaded. But if we wrote 0xCCADBE00
to it then the SIGSEGV output would list say 0xCCADBE00
and then we can XOR with 0xDEADBE00
and get back 0x12000000
which was the line that was overwritten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is I guess unnecessary in the general case if you have a debugger, but I definitely found it useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Without the explanation above, I wouldn't know why you encode the value in that way, and how I am supposed to use the encoded value. Those explanation should go into the comments and the docs in the code so others won't have the same question as I did when they encounter the code and when they use this feature.
Besides, I suggest separating it into a different features:
poison_on_release
: this just generally poisons released memory (or dead objects) with an identifiable value.immix_poison_with_line_address_on_release
(or something like this): it poisons the memory with an encoded line address.
You can't implement this for native mark-sweep space since it uses the cell to thread a free-list. You can't exactly do it with mark-compact space as well. You could only do it with a completely released 4MB chunk. And the rest of the spaces don't release. |
Native mark-sweep only uses the first address in a cell as the freelist, and the rest of the cell can be poisoned. Mark compact also releases chunks/pages. The issue is that you need to define what |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I think we should keep a way to set the poison bytes to all zero. In some VM runtimes, such as Ruby, there are many NULL checking, and it will panic if it sees a field is NULL. So clearing the dead object to all zero bytes will make the program fail faster in some cases, crashing on one of the NULL-checking assertions. Overwriting the object to other poison values may slip through the assertion and fail somewhere else.
@@ -188,6 +188,11 @@ impl<VM: VMBinding> CopySpace<VM> { | |||
} | |||
|
|||
pub fn release(&self) { | |||
#[cfg(feature = "poison_on_release")] | |||
for (start, size) in self.pr.iterate_allocated_regions() { | |||
crate::util::memory::set(start, 0xbc, size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The annotation of #1236 is precise for Map64, but imprecise for Map32 due to chunk reuse between spaces. It is still helpful to let each space have a different poison value.
@@ -105,6 +105,48 @@ pub fn set(start: Address, val: u8, len: usize) { | |||
} | |||
} | |||
|
|||
/// Set a range of memory to the given pattern. Similar to C++'s std::fill. | |||
pub fn set_pattern(start: Address, pattern: usize, len: usize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest using a fixed-size integer type (or a type trait) for the pattern. At the call site, we give the pattern as a hexadecimal literal, but here we write memory at the granularity of usize
which is platform-dependent. For example, if pattern
is 0xdeadbeef
, and the platform is x86_64, then the memory will be filled with EF BE AD DE 00 00 00 00 EF BE AD DE 00 00 00 00 EF BE ...
, and I think we don't want the 00 00 00 00
in between.
pub fn set_pattern(start: Address, pattern: usize, len: usize) { | |
pub fn set_pattern(start: Address, pattern: u32, num_bytes: usize) { |
Alternatively
pub fn set_pattern(start: Address, pattern: usize, len: usize) { | |
trait UnsignedIntButNotUsize: num_traits::PrimInt + num_traits::Unsigned + num_traits::ToBytes {} | |
impl UnsignedIntButNotUsize for u8 {} | |
impl UnsignedIntButNotUsize for u16 {} | |
impl UnsignedIntButNotUsize for u32 {} | |
impl UnsignedIntButNotUsize for u64 {} | |
pub fn set_pattern<T: UnsignedIntButNotUsize>(start: Address, pattern: T, num_bytes: usize) { |
Sure. It's often the reverse in Java. You would add |
We add some misc debugging helpers to help binding implementers in debugging tricky memory corruption bugs. We now can also poison dead memory in each space's
release
method. This feature supersedes "immix_zero_on_release".