Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parser for /proc/[pid]/maps #34

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

xmartinez
Copy link

Hi!
This adds support for parsing process memory mappings information from /proc/[pid]/maps.

Copy link
Owner

@danburkert danburkert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Left some specific feedback, I think the most important thing to nail down is an ergonomic public API.

src/pid/maps.rs Outdated
/// will be mangled by the kernel and decoded by this module as:
///
/// ```rust,ignore
/// MemoryMapping (
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These delimiters should be braces, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch :-) Fixed!

src/pid/maps.rs Outdated
/// MemoryMapping (
/// ...,
/// path: PathBuf::from("/tmp/a\nfile"),
/// is_deleted:true,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add space after colon

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Fixed!

src/pid/maps.rs Outdated
///
/// See `man 5 proc`, `Linux/fs/proc/task_mmu.c`, and `Linux/fs/seq_file.c`.
#[derive(Debug, PartialEq, Eq, Hash)]
pub struct MemoryMapping {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you feel about MemoryMap as opposed to MemoryMapping? It's a bit shorter, and matches maps more closely.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! It does sound better! I have changed it in the updated pull request. Thanks for the suggestion!

src/pid/maps.rs Outdated
/// Path to the file backing this mapping (for non-anonymous mappings),
/// pseudo-path (such as `[stack]`, `[heap]`, or `[vdso]`) for some special
/// anonymous mappings, or empty path for other anonymous mappings.
pub path: PathBuf,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to parse out the map type into a proper enum, I think putting invalid paths into a PathBuf is a no-no. Maybe could look something like this:

pub enum MapType {
  Anonymous,
  File(pub PathBuf),
  Stack,
  Heap,
  Vdso,
  Vvar,
  Vsyscall,
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I agree that abusing PathBuf for anonymous mappings pseudo-paths is not a very clean API. However, enumerating all possible pseudo-paths might be problematic, as the exact list is not documented, is architecture and config dependent, and might change (new pseudo-paths might be added) in future kernel versions (which would break this API).

I do like the idea of using an enum to distinguish between file-backed and anonymous mappings. What do you think of doing something like this?:

pub struct MemoryMap {
    pub range: ops::Range<usize>,
    pub is_readable: bool,
    pub is_writable: bool,
    pub is_executable: bool,
    pub is_shared: bool,
    pub type: MapType,
}

pub enum MapType {
    /// File-backed memory mapping.
    File {
        /// Offset into the file backing this mapping.
        pub offset: u64,
        /// Device containing the file backing this mapping.
        pub dev: libc::dev_t,
        /// Inode of the file backing this mapping.
        pub inode: libc::ino_t,
        /// Path to the file backing this mapping.
        pub path: PathBuf,
        /// Whether the file backing this mapping has been deleted.
        pub is_deleted: bool,
    },
    /// Anonymous memory mapping.
    Anonymous {
        /// Pseudo-path (such as `[stack]`, `[heap]`, or `[vdso]`) for some special
        /// mappings, or empty path for generic anonymous mappings.
        pub name: Vec<u8>,
    }
}

i.e., using a Vec<u8> for anonymous mappings pseudo-paths?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, enumerating all possible pseudo-paths might be problematic, as the exact list is not documented, is architecture and config dependent, and might change (new pseudo-paths might be added) in future kernel versions (which would break this API).

Yeah, it would be nice if the #[non_exhaustive] attribute was stable. How about adding all of the known/common variants and compromising on an Unknown(String) variant as a fallback for everything else? (I think String is OK here, since the file is guaranteed to be valid ascii, right?)

src/pid/maps.rs Outdated
pub path: PathBuf,
/// Whether the file backing this mapping has been deleted (for
/// non-anonymous mappings).
pub is_deleted: bool,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find in the linux source where (deleted) is added to the file path, could you point that out to me? The manual doesn't mention it either.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The (deleted) suffix is part of the dcache interface. Pathnames are generated in /proc files using seq_file_path(), which ends up calling d_path() to generate the path for the corresponding dentry. The latter function is the one appending (deleted) if the entry has been deleted.

I have added a reference to Linux/fs/dcache.c in the MemoryMap documentation. Thanks for pointing this out! (It was indeed not obvious).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, thanks.

/// To decode only escaped newlines (leaving other escaped sequences alone):
///
/// ```rust,ignore
/// let path = unmangled_path(br"a\012\040path", b"\n");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look like correct usage of br"", since it doesn't include a delimiter, and why is it ignored?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand the issue re: br"". According to the Rust Reference, this is valid syntax for raw string byte literals. What delimiter should be included?

Re: ignore, the unmangle module is not part of the crate public API, so unmangled_path cannot be accessed in doctests. As rustdoc --test tries to run doctests for private items (see rust-lang/rust#30094), the example has to be explicitly ignored.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand the issue re: br"". According to the Rust Reference, this is valid syntax for raw string byte literals. What delimiter should be included?

Ah, so my confusion is that this instance isn't using any # delimiters. Apparently that's legal, but it works the same as a non-raw literal, so it can be simplified to b"a\012\040path".

Re: ignore, the unmangle module is not part of the crate public API, so unmangled_path cannot be accessed in doctests. As rustdoc --test tries to run doctests for private items (see rust-lang/rust#30094), the example has to be explicitly ignored.

Interesting, didn't know about that limitation. SGTM.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think br"..." can be simplified to b"..." in this case. The r prefix ensures that the backslash is interpreted literally (otherwise, it combines with the following b0 byte to generate a NUL byte: https://play.rust-lang.org/?gist=2bd4e99cd3a1468149226af57fb48956&version=stable).

src/unmangle.rs Outdated
escaped: &'a [u8],
}

impl<'a> Iterator for UnmangledPath<'a> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The iterator here is obfuscating the logic, could you just fold it into unmangled_path as a for loop?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! The code does indeed look more straightforward without the iterator. Thanks for the feedback :-)

- Rename `MemoryMapping` as `MemoryMap`.
- Fix `MemoryMap` example.
- Refer to `Linux/fs/dache.c` in `MemoryMap` documentation (to
  document ` (deleted)` suffix appending).
- unmangle: Remove `UnmangledPath` iterator and inline decoding in
  `unmangled_path()`.
@xmartinez
Copy link
Author

Hi Dan!

Thanks a lot for taking the time to review this patch! (and sorry for the delay in answering, I have been sick for a couple of weeks).

I have incorporated most of your suggested changes in the update to the PR (thanks for the feedback, the code looks much better now).

The only pending point is the MapType {File, Anonymous} enum change to the API (see my comment above). Could you comment on my proposal, please? (I'd rather have your feedback before changing the API).

@danburkert
Copy link
Owner

Sounds good! No worries about the delay, I'm not always the quickest myself :)

- Add `MemoryMapKind` enum to encode mapping type (file-backed,
  anonymous, vdso, ...).
- Move file-backed mapping specific fields to
  `MemoryMapKind::File(FileMap)` enum.
- Add `MemoryMap.file()` convenience method.
@xmartinez
Copy link
Author

Hi Dan!

I finally got around to change the API as per your suggestions. I have added a new MemoryMapKind enum that encodes the mapping type (file-backed, generic anonymous, vDSO, etc.).

I have also moved all fields that only make sense for file-backed mappings to a new FileMap struct, which is used by the MemoryMapKind::File enum variant.

I have added a file() -> Option<&FileMap> method to the MemoryMap struct to make client code that is only interested in file-backed mappings somewhat simpler (it avoids the necessity of using pattern matching in some use cases).

Apart from that, I have also extracted device number parsing onto its own function parse_dev() and the code now uses the device number to determine whether the mapping is file-backed (the previous approach used the inode number, which should also work, but comparing to the null device number will definitely work).

Let me know if there are any other changes that might improve the code. If not, I can rebase and squash the pull request branch (so it can be merged as a single logical commit).

@ice1000
Copy link

ice1000 commented Nov 21, 2018

Any update?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants