Skip to content

Commit

Permalink
improve notes and the code
Browse files Browse the repository at this point in the history
  • Loading branch information
evanj committed Jan 10, 2023
1 parent e3195bb commit 496bec9
Show file tree
Hide file tree
Showing 6 changed files with 281 additions and 71 deletions.
75 changes: 73 additions & 2 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,5 @@ nix = {version="0", features=["mman"]}
rand = "0"
rand_xoshiro = "0"
lazy_static = "1"
regex = "1"
regex = "1"
argh = "0.1.9"
4 changes: 3 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,12 @@ all: aligned_alloc_demo
cargo check
# https://zhauniarovich.com/post/2021/2021-09-pedantic-clippy/#paranoid-clippy
# -D clippy::restriction is way too "safe"/careful
# -D clippy::pedantic is also probably too safe
# -D clippy::pedantic is also probably too safe: currently allowing things we run into
# -A clippy::option-if-let-else: I stylistically disagree with this
cargo clippy --all-targets --all-features -- \
-D warnings \
-D clippy::nursery \
-A clippy::option-if-let-else \
-D clippy::pedantic \
-A clippy::cast_precision_loss \
-A clippy::cast-sign-loss \
Expand Down
41 changes: 34 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,46 @@
# Huge Page Demo

This is a demonstration of using huge pages on Linux to get better performance. It allocates a 4 GiB chunk both using a Vec (which will use "regular" malloc), then using mmap to get a 1 GiB-aligned region. It then uses madvise() to mark it for huge pages, then will touch the entire region to fault it in to memory. Finally, it does a random-access benchmark. This is probably the "best case" scenario for huge pages.
This is a demonstration of using huge pages on Linux to get better performance. It allocates a 4 GiB chunk using a Vec (which calls libc's malloc), then using mmap to get a 2 MiB-aligned region. It then uses `madvise(..., MADV_HUGEPAGE)` to mark the region for huge pages, then will touch the entire region to fault it in to memory. Finally, it does a random-access benchmark. This is probably the "best case" scenario for huge pages.

On an "Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz" (AWS m5d.4xlarge), the huge page version in about twice as fast.
On a "11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz", the huge page version is about 2.9X faster. On an older "Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz" (AWS m5d.4xlarge), the huge page version is about 2X faster. This seems to suggest that programs that make random accesses to large amounts of memory will benefit from huge pages.

This will compile and run on non-Linux platforms, but won't use huge pages.
As of 2022-01-10, the Linux kernel only supports a single size of transparent huge pages. The size will be reported as `Hugepagesize` in `/proc/meminfo`. On x86_64, this will be 2 MiB. For Arm (aarch64), most recent Linux distributions also defalut to 4 kiB/2 MiB pages. Redhat used to use 64 kiB pages, but [RHEL 9 changed it to 4 kiB around 2021-07](https://bugzilla.redhat.com/show_bug.cgi?id=1978730).

For more details, see [Reliably allocating huge pages in Linux](https://mazzo.li/posts/check-huge-page.html).
When running as root, it is possible to check if a specific address is a huge page. It is also possible to get the amount of memory allocated for a specific range as huge pages, by examining the `AnonHugePages` line in `/proc/self/smaps`. The `thp_` statistics in `/proc/vmstat` also can tell you if this worked by checking `thp_fault_alloc` and `thp_fault_fallback` before and after the allocation. See [the Monitoring usage section in the kernel's transhuge.txt for details](https://www.kernel.org/doc/Documentation/vm/transhuge.txt).

Unfortunately, there appears to be no way to get the *size* of a huge page that was allocated. It is possible to check that a specific address is a huge page, which this program will do if run as root. It is also possible to get the count of the amount of memory allocated for a specific range as huge pages, by examining the `AnonHugePages` line in `/proc/self/smaps`.
This demo compiles and runs on Mac OS X, but won't use huge pages.

For more details, see [Reliably allocating huge pages in Linux](https://mazzo.li/posts/check-huge-page.html), which I more or less copied.

## Malloc/Mmap behaviour

On Ubuntu 20.04.5 with kernel 5.15.0-1023-aws and glibc 2.31-0ubuntu9.9, malloc 4 GiB calls mmap to allocate 4 GiB + 4 KiB, then returns a pointer that is +0x10 (+16) from the pointer actually returned by mmap. Using aligned_alloc calls mmap to allocate 5 GiB + 4 KiB (size + alignment + 1 page?), then returns an aligned pointer. Calling mmap to allocate 4 GiB returns a pointer that is not aligned. E.g. On my system, I get one that is 32 kiB aligned.
## Results

From a system where `/proc/cpuinfo` reports "11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz", using `perf stat -e dTLB-load-misses,iTLB-load-misses,page-faults`:

### Vec

```
200000000 accessses in 6.421793881s; 31143945.7 accesses/sec
199,681,103 dTLB-load-misses
4,316 iTLB-load-misses
1,048,700 page-faults
```

### Huge Page mmap

```
200000000 in 2.193096392s; 91195262.0 accesses/sec
123,624,814 dTLB-load-misses
1,854 iTLB-load-misses
2,196 page-faults
```


## Malloc/Mmap behaviour notes

On Ubuntu 20.04.5 with kernel 5.15.0-1023-aws and glibc 2.31-0ubuntu9.9, `malloc(4 GiB)` calls `mmap` to allocate 4 GiB + 4 KiB, then returns a pointer that is +0x10 (+16) from the pointer actually returned by `mmap`. Using `aligned_alloc` to allocate 4 GiB with a 1 GiB alignment calls `mmap` to allocate 5 GiB + 4 KiB (size + alignment + 1 page?), then returns an aligned pointer. Calling mmap to allocate 4 GiB returns a pointer that is usually not aligned. E.g. On my system, I get one that is 32 kiB aligned. Calling mmap repeatedly seems to allocate addresses downward. [This tweet](https://twitter.com/pkhuong/status/1462988088070791173) also suggests that `mmap(MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE | MAP_HUGETLB)` will return an aligned address, although the mmap man page does not make it clear if that behavior is guaranteed or not.

On Mac OS X 13.1 on an M1 ARM CPU, using mmap to request 4 GiB of memory returns a block that is aligned to a 1 GiB boundary. The same appears to be true for using malloc. I didn't fight to get dtruss to work to see what malloc is actually doing.

Expand Down
44 changes: 39 additions & 5 deletions src/linux_hugepages.rs
Original file line number Diff line number Diff line change
Expand Up @@ -66,9 +66,13 @@ impl PagemapEntry {
}
}

/// Returns the best guess at the page size for the address pointed at by p.
/// This needs to run as root to work correctly. This function will print
/// detailed debugging output.
pub fn read_page_size(p: usize) -> Result<usize, std::io::Error> {
const PAGEMAP_PATH: &str = "/proc/self/pagemap";
const KPAGEFLAGS_PATH: &str = "/proc/kpageflags";

// KPF_THP https://github.com/torvalds/linux/blob/master/include/uapi/linux/kernel-page-flags.h
const KPAGEFLAGS_THP_BIT: u64 = 22;

Expand Down Expand Up @@ -100,11 +104,41 @@ pub fn read_page_size(p: usize) -> Result<usize, std::io::Error> {

let kpageflag_entry = u64::from_le_bytes(entry_bytes);
if (kpageflag_entry & (1 << KPAGEFLAGS_THP_BIT)) == 0 {
println!(" kpageflags does not have THP bit set; not a huge page?");
} else {
println!(" kpageflags THP bit is set: is a huge page!");
println!(" kpageflags does not have THP bit set; not a huge page");
return Ok(page_size);
}

println!(" kpageflags THP bit is set: is a huge page!");

// Read the size of the huge page from /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
read_hugepage_size()
}

fn read_hugepage_size() -> Result<usize, std::io::Error> {
const HPAGE_PMD_SIZE_PATH: &str = "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size";
let mut hpage_size_string = std::fs::read_to_string(HPAGE_PMD_SIZE_PATH)?;
// always terminated by \n
if hpage_size_string.ends_with('\n') {
hpage_size_string.pop();
}

let hpage_size_result = hpage_size_string.parse::<usize>();
match hpage_size_result {
Err(err) => {
let msg = format!(" failed to parse {HPAGE_PMD_SIZE_PATH}: {err:?}");
Err(std::io::Error::new(std::io::ErrorKind::Other, msg))
}
Ok(hpage_size) => Ok(hpage_size),
}
}

#[cfg(all(test, target_os = "linux"))]
mod test {
use super::*;

// TODO: figure out how we can figure out the size of the huge page?
Ok(page_size)
#[test]
fn test_read_hugepage_size() {
// this is not always true, but true for x86_64 and current aarch64 platforms
assert_eq!(2048 * 1024, read_hugepage_size().unwrap());
}
}
Loading

0 comments on commit 496bec9

Please sign in to comment.