Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add misc debugging helpers and feature to poison dead memory #1241

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

k-sareen
Copy link
Collaborator

We add some misc debugging helpers to help binding implementers in debugging tricky memory corruption bugs. We now can also poison dead memory in each space's release method. This feature supersedes "immix_zero_on_release".

We add some misc debugging helpers to help binding implementers in
debugging tricky memory corruption bugs. We now can also poison dead
memory in each space's `release` method. This feature supersedes
"immix_zero_on_release".
@k-sareen k-sareen requested a review from qinsoon November 29, 2024 00:04
@k-sareen
Copy link
Collaborator Author

The failing style checks are unrelated to the change. Seems like new Rust stable is a bit smarter.

Copy link
Member

@qinsoon qinsoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside the in-line comments, this PR is missing implementation for poison_on_release for spaces other than "copy/immix/largeobject" spaces. Favorably the PR should implement the feature for all the spaces, and minimally the PR should at least panick if the feature is enabled and spaces in use do not have an implementation for poison_on_release.

@@ -188,6 +188,11 @@ impl<VM: VMBinding> CopySpace<VM> {
}

pub fn release(&self) {
#[cfg(feature = "poison_on_release")]
for (start, size) in self.pr.iterate_allocated_regions() {
crate::util::memory::set(start, 0xbc, size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using an arbitrary value, you can define a mapping between space and its poison value, or use something from the space (such as the SpaceDescriptor index, or the space name).

I would suggest creating a module crate::util::poison_on_release. Define the poison values there, and define functions to poison a memory region.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we need the different values to tell which space it is, especially after Kunshan's change in #1236. But if we do want to use different values, there needs to be a place where we can look at and know what the value means.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that might be true. It was helpful to me but that predates when I was using prctl. I can remove it and replace it with a static pattern of 0xab or something obviously nonsense.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The annotation of #1236 is precise for Map64, but imprecise for Map32 due to chunk reuse between spaces. It is still helpful to let each space have a different poison value.

#[cfg(feature = "poison_on_release")]
crate::util::memory::set_pattern(
line.start(),
(0xdeadbeef ^ line.start().as_usize()) & !0xff,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why poison as a pattern rather than a u8 value as other spaces? What is the pattern here? How do we supposed to use the pattern?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This removes the existing code for immix_zero_on_release. So immix will not zero lines even if immix_zero_on_release is enabled.

You should have an assertion to make sure that immix_zero_on_release and poison_on_release should not co-exist, as they conflict with each other for Immix space, and you should zero/poison the memory based on which feature is enabled.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature supersedes zero on release. The pattern here is helpful since it encodes the line that was released into the pattern. So you can XOR with 0xdeadbeef to get the exact line that was released and where the object comes from. It was useful to me while debugging ART.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

immix_zero_on_release was a debugging feature to catch untraced objects. But using 0x0 can actually make it harder to debug (in my and others' experience) since you don't know if it's a null pointer or something that was released early. So previously some of the students and I used an obvious nonsense value like 0xab as the pattern. This is refinement of the same since it encodes the line into the pattern.

Copy link
Member

@qinsoon qinsoon Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think "poison_on_release" is better than "zero_on_release", just remove the immix_zero_on_release feature, and make sure that poison_on_release can be a replacement for immix_zero_on_release. With the current code, immix_zero_on_release is broken.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you can XOR with 0xdeadbeef to get the exact line that was released and where the object comes from.

I am not following here. Can you get the line by aligning down from the faulty address? Why is XOR needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem isn't the line itself but what object was referring to it. In my case I did not have a debugger available because crashes were happening at Android boot time. So I had to be creative to debug memory corruption errors (hence the dump_ram_around_address function as well).

So consider the case that a line has been overwritten when an object inside it was live.

Object = 0x12000040
Line = 0x12000000
Encoded pattern = 0xCCADBE00

An object referring to 0x12000040 would maybe load a field at an offset 0x8. So it loads the address 0x12000048. If we wrote just 0x0s into the line then we would get a SIGSEGV but then no way to figure out what address was loaded. But if we wrote 0xCCADBE00 to it then the SIGSEGV output would list say 0xCCADBE00 and then we can XOR with 0xDEADBE00 and get back 0x12000000 which was the line that was overwritten.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is I guess unnecessary in the general case if you have a debugger, but I definitely found it useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Without the explanation above, I wouldn't know why you encode the value in that way, and how I am supposed to use the encoded value. Those explanation should go into the comments and the docs in the code so others won't have the same question as I did when they encounter the code and when they use this feature.

Besides, I suggest separating it into a different features:

  • poison_on_release: this just generally poisons released memory (or dead objects) with an identifiable value.
  • immix_poison_with_line_address_on_release (or something like this): it poisons the memory with an encoded line address.

@k-sareen
Copy link
Collaborator Author

Favorably the PR should implement the feature for all the spaces, and minimally the PR should at least panick if the feature is enabled and spaces in use do not have an implementation for poison_on_release

You can't implement this for native mark-sweep space since it uses the cell to thread a free-list. You can't exactly do it with mark-compact space as well. You could only do it with a completely released 4MB chunk. And the rest of the spaces don't release.

@qinsoon
Copy link
Member

qinsoon commented Nov 29, 2024

Favorably the PR should implement the feature for all the spaces, and minimally the PR should at least panick if the feature is enabled and spaces in use do not have an implementation for poison_on_release

You can't implement this for native mark-sweep space since it uses the cell to thread a free-list. You can't exactly do it with mark-compact space as well. You could only do it with a completely released 4MB chunk. And the rest of the spaces don't release.

Native mark-sweep only uses the first address in a cell as the freelist, and the rest of the cell can be poisoned. Mark compact also releases chunks/pages.

The issue is that you need to define what poison_on_release means, and all the policies should behave consistently to that (or give clear errors messages if a policy does not behave properly). Otherwise when a user turns on the feature, what would he expect?

Copy link
Collaborator

@wks wks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I think we should keep a way to set the poison bytes to all zero. In some VM runtimes, such as Ruby, there are many NULL checking, and it will panic if it sees a field is NULL. So clearing the dead object to all zero bytes will make the program fail faster in some cases, crashing on one of the NULL-checking assertions. Overwriting the object to other poison values may slip through the assertion and fail somewhere else.

@@ -188,6 +188,11 @@ impl<VM: VMBinding> CopySpace<VM> {
}

pub fn release(&self) {
#[cfg(feature = "poison_on_release")]
for (start, size) in self.pr.iterate_allocated_regions() {
crate::util::memory::set(start, 0xbc, size);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The annotation of #1236 is precise for Map64, but imprecise for Map32 due to chunk reuse between spaces. It is still helpful to let each space have a different poison value.

@@ -105,6 +105,48 @@ pub fn set(start: Address, val: u8, len: usize) {
}
}

/// Set a range of memory to the given pattern. Similar to C++'s std::fill.
pub fn set_pattern(start: Address, pattern: usize, len: usize) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest using a fixed-size integer type (or a type trait) for the pattern. At the call site, we give the pattern as a hexadecimal literal, but here we write memory at the granularity of usize which is platform-dependent. For example, if pattern is 0xdeadbeef, and the platform is x86_64, then the memory will be filled with EF BE AD DE 00 00 00 00 EF BE AD DE 00 00 00 00 EF BE ..., and I think we don't want the 00 00 00 00 in between.

Suggested change
pub fn set_pattern(start: Address, pattern: usize, len: usize) {
pub fn set_pattern(start: Address, pattern: u32, num_bytes: usize) {

Alternatively

Suggested change
pub fn set_pattern(start: Address, pattern: usize, len: usize) {
trait UnsignedIntButNotUsize: num_traits::PrimInt + num_traits::Unsigned + num_traits::ToBytes {}
impl UnsignedIntButNotUsize for u8 {}
impl UnsignedIntButNotUsize for u16 {}
impl UnsignedIntButNotUsize for u32 {}
impl UnsignedIntButNotUsize for u64 {}
pub fn set_pattern<T: UnsignedIntButNotUsize>(start: Address, pattern: T, num_bytes: usize) {

@k-sareen
Copy link
Collaborator Author

Overwriting the object to other poison values may slip through the assertion and fail somewhere else.

Sure. It's often the reverse in Java. You would add if (object != null) { /* do stuff */ } so you would actually miss incorrect uses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants