Skip to content

Commit

Permalink
Expose alloc_slow. Add a section in user guide about allocation optim…
Browse files Browse the repository at this point in the history
…ization (#967)

This PR exposes `alloc_slow()` to the bindings, adds a few public
methods to allow bindings to implement allocation efficiently without
duplicating mmtk-core code, and adds a section in the user guide to
discuss allocation optimization.

The changes in this PR includes:
1. Expose `alloc_slow()` in `memory_manager`.
2. Add `Mutator::allocator()` to allow bindings to get a specific
allocator from an allocator selector. Add `Mutator::allocator_impl()` to
allow bindings to get a typed allocator from a selector.
3. Add `Mutator::get_allocator_base_offset()` to allow bindings to use a
specific allocator without selector (for performance).
4. Add a section in the user guide about allocation optimization. Remove
some unused `SUMMARY.md` in the user guide.
5. Add `Address::as_mut_ref()`.
6. Expose the field for the fastpath bump pointer in some allocators.


Related discussion on Zulip:
https://mmtk.zulipchat.com/#narrow/stream/262679-General/topic/Refilling.20BumpPointer.20using.20AllocatorInfo/near/394142997
  • Loading branch information
qinsoon authored Oct 11, 2023
1 parent 9a676e6 commit 0328b05
Show file tree
Hide file tree
Showing 15 changed files with 534 additions and 46 deletions.
3 changes: 3 additions & 0 deletions docs/userguide/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,9 @@
- [How to Undertake a Port](portingguide/howto/prefix.md)
- [NoGC](portingguide/howto/nogc.md)
- [Next Steps](portingguide/howto/next_steps.md)
- [Performance Tuning](portingguide/perf_tuning/prefix.md)
- [Link Time Optimization](portingguide/perf_tuning/lto.md)
- [Optimizing Allocation](portingguide/perf_tuning/alloc.md)

-----------

Expand Down
11 changes: 0 additions & 11 deletions docs/userguide/src/portingguide/SUMMARY.md

This file was deleted.

120 changes: 120 additions & 0 deletions docs/userguide/src/portingguide/perf_tuning/alloc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# Optimizing Allocation

MMTk provides [`alloc()`](https://docs.mmtk.io/api/mmtk/memory_manager/fn.alloc.html)
and [`post_alloc()`](https://docs.mmtk.io/api/mmtk/memory_manager/fn.post_alloc.html), to allocate a piece of memory, and
finalize the memory as an object. Calling them is sufficient for a functional implementation, and we recommend doing
so in the early development of an MMTk integration. However, as allocation is performance critical, runtimes generally would
optimize to make allocation as fast as possible, in which invoking `alloc()` and `post_alloc()` becomes inadequent.

The following discusses a few design decisions and optimizations related to allocation. The discussion mainly focuses on `alloc()`.
`post_alloc()` works in a similar way, and the discussion can also be applied to `post_alloc()`.
For conrete examples, you can refer to any of our supported bindings, and check the implementation in the bindings.

Note that some of the optimizations need to make assumptions about the MMTk's internal implementation and may make the code less maintainable.
We recommend adding assertions in the binding code to make sure the assumptions are not broken across versions.

## Efficient access to MMTk mutators

An MMTk mutator context (created by [`bind_mutator()`](https://docs.mmtk.io/api/mmtk/memory_manager/fn.bind_mutator.html)) is a thread local data structure
of type [`Mutator`](https://docs.mmtk.io/api/mmtk/plan/struct.Mutator.html).
MMTk expects the binding to provide efficient access to the mutator structure in their thread local storage (TLS).
Usually one of the following approaches is used to store MMTk mutators.

### Option 1: Storing the pointer

The `Box<Mutator<VM>>` returned from `mmtk::memory_manager::bind_mutator` is actually a pointer to
a `Mutator<VM>` instance allocated in the Rust heap. It is simple to store it in the TLS.
This approach does not make any assumption about the intenral of a MMTk `Mutator`. However, it requires an extra pointer dereference
whene accessing a value in the mutator. This may sound not that bad. However, this degrades the performance of
a carefully implemented inlined fastpath allocation sequence which is normally just a few instructions.
This approach could be a simple start in the early development, but we do not recommend it for an efficient implementation.

If the VM is not implemented in Rust,
the binding needs to turn the boxed pointer into a raw pointer before storing it.

```rust
{{#include ../../../../../vmbindings/dummyvm/src/tests/doc_mutator_storage.rs:mutator_storage_boxed_pointer}}
```

### Option 2: Embed the `Mutator` struct

To remove the extra pointer dereference, the binding can embed the `Mutator` type into their TLS type. This saves the extra dereference.

If the implementation language is not Rust, the developer needs to create a type that has the same layout as `Mutator`. It is recommended to
have an assertion to ensure that the native type has the exact same layout as the Rust type `Mutator`.

```rust
{{#include ../../../../../vmbindings/dummyvm/src/tests/doc_mutator_storage.rs:mutator_storage_embed_mutator_struct}}
```

### Option 3: Embed the fastpath struct

The size of `Mutator` is a few hundreds of bytes, which could be considered as too large for TLS in some langauge implementations.
Embedding `Mutator` also requires to duplicate a native type for the `Mutator` struct if the implementation language is not Rust.
Sometimes it is undesirable to embed the `Mutator` type. One can choose only embed the fastpath struct that is in use.

Unlike the `Mutator` type, the fastpath struct has a C-compatible layout, and it is simple and primitive enough
so it is unlikely to change. For example, MMTk provides [`BumpPointer`](https://docs.mmtk.io/api/mmtk/util/alloc/struct.BumpPointer.html),
which simply includes a `cursor` and a `limit`.

In the following example, we embed one `BumpPointer` struct in the TLS.
The `BumpPointer` is used in the fast path, and carefully synchronized with the allocator in the `Mutator` struct in the slow path.
Note that the `allocate_default` closure in the example below assumes the allocation semantics is `AllocationSemantics::Default`
and its selected allocator uses bump-pointer allocation.
Real-world fast-path implementations for high-performance VMs are usually JIT-compiled, inlined, and specialized for the current plan
and allocation site, so that the allocation semantics of the concrete allocation site (and therefore the selected allocator) is known to the JIT compiler.

For the sake of simplicity, we only store _one_ `BumpPointer` in the TLS in the example.
In MMTk, each plan has multiple allocators, and the allocation semantics are mapped
to those allocator by the GC plan you choose. So a plan use multiple allocators, and
depending on how many allocation semantics are used by a binding, the binding may use multiple allocators as well.
In practice, a binding may embed multiple fastpath structs as the example for those allocators if they would like
more efficient allocation.

Also for simpliticy, the example assumes the default allocator for the plan in use is a bump pointer allocator.
Many plans in MMTk use bump pointer allocator for their default allocation semantics (`AllocationSemantics::Default`),
which includes (but not limited to) `NoGC`, `SemiSpace`, `Immix`, generational plans, etc.
If a plan does not do bump-pointer allocation, we may still implement fast paths, but we need to embed different data structures instead of `BumpPointer`.

```rust
{{#include ../../../../../vmbindings/dummyvm/src/tests/doc_mutator_storage.rs:mutator_storage_embed_fastpath_struct}}
```

## Avoid resolving the allocator at run time

For a simple and general API of `alloc()`, MMTk requires `AllocationSemantics` as an argument in an allocation request, and resolves it at run-time.
The following is roughly what `alloc()` does internally.

1. Resolving the allocator
1. Find the `Allocator` for the required `AllocationSemantics`. It is defined by the plan in use.
2. Dynamically dispatch the call to [`Allocator::alloc()`](https://docs.mmtk.io/api/mmtk/util/alloc/trait.Allocator.html#tymethod.alloc).
2. `Allocator::alloc()` executes the allocation fast path.
3. If the fastpath fails, it executes the allocation slow path [`Allocator::alloc_slow()`](https://docs.mmtk.io/api/mmtk/util/alloc/trait.Allocator.html#method.alloc_slow).
4. The slow path will further attempt to allocate memory, and may trigger a GC.

Resolving to a specific allocator and doing dynamic dispatch is expensive for an allocation.
With the build-time or JIT-time knowledge on the object that will be allocated, an MMTK binding can possibly skip the first step in the run time.

If you implement an efficient fastpath allocation in the binding side (like the Option 3 above, and generating allocation code in a JIT which will be discussed next),
that naturally avoids this problem. If you do not want to implement the fastpath allocation, the following is another example of how to avoid resolving the allocator.

Once MMTK is initialized, a binding can get the memory offset for the default allocator, and save it somewhere. When we know an object should be allocated
with the default allocation semantics, we can use the offset to get a reference to the actual allocator (with unsafe code), and allocate with the allocator.

```rust
{{#include ../../../../../vmbindings/dummyvm/src/tests/doc_avoid_resolving_allocator.rs:avoid_resolving_allocator}}
```

## Emitting Allocation Sequence in a JIT Compiler

If the language has a JIT compiler, it is generally desirable to generate the code sequence for the allocation fast path, rather
than simply emitting a call instruction to the allocation function. The optimizations we talked above are relevant as well: 1.
the compiler needs to be able to access the mutator, and 2. the compiler needs to be able to resolve to a specific allocator at
JIT time. The actual implementation highly depends on the compiler implementation.

The following are some examples from our bindings (at the time of writing):
* OpenJDK:
* <https://github.com/mmtk/mmtk-openjdk/blob/9ab13ae3ac9c68c5f694cdd527a63ca909e27b15/openjdk/mmtkBarrierSetAssembler_x86.cpp#L38>
* <https://github.com/mmtk/mmtk-openjdk/blob/9ab13ae3ac9c68c5f694cdd527a63ca909e27b15/openjdk/mmtkBarrierSetC2.cpp#L45>
* JikesRVM: <https://github.com/mmtk/mmtk-jikesrvm/blob/fbfb91adafd9e9b3f45bd6a4b32c845a5d48d20b/jikesrvm/rvm/src/org/jikesrvm/mm/mminterface/MMTkMutatorContext.java#L377>
* Julia: <https://github.com/mmtk/julia/blob/5c406d9bb20d76e2298a6101f171cfac491f651c/src/llvm-final-gc-lowering.cpp#L267>
5 changes: 5 additions & 0 deletions docs/userguide/src/portingguide/perf_tuning/prefix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Performance Tuning for Bindings

In this section, we discuss how to achieve the best performance with MMTk in a binding implementation.
MMTk is a high performance GC library. But there are some key points that need to be done correctly
to achieve the optimal performance.
20 changes: 0 additions & 20 deletions docs/userguide/src/tutorial/SUMMARY.md

This file was deleted.

21 changes: 21 additions & 0 deletions src/memory_manager.rs
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,27 @@ pub fn alloc<VM: VMBinding>(
mutator.alloc(size, align, offset, semantics)
}

/// Invoke the allocation slow path. This is only intended for use when a binding implements the fastpath on
/// the binding side. When the binding handles fast path allocation and the fast path fails, it can use this
/// method for slow path allocation. Calling before exhausting fast path allocaiton buffer will lead to bad
/// performance.
///
/// Arguments:
/// * `mutator`: The mutator to perform this allocation request.
/// * `size`: The number of bytes required for the object.
/// * `align`: Required alignment for the object.
/// * `offset`: Offset associated with the alignment.
/// * `semantics`: The allocation semantic required for the allocation.
pub fn alloc_slow<VM: VMBinding>(
mutator: &mut Mutator<VM>,
size: usize,
align: usize,
offset: usize,
semantics: AllocationSemantics,
) -> Address {
mutator.alloc_slow(size, align, offset, semantics)
}

/// Perform post-allocation actions, usually initializing object metadata. For many allocators none are
/// required. For performance reasons, a VM should implement the post alloc fast-path on their side
/// rather than just calling this function.
Expand Down
96 changes: 96 additions & 0 deletions src/plan/mutator_context.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ use crate::plan::global::Plan;
use crate::plan::AllocationSemantics;
use crate::policy::space::Space;
use crate::util::alloc::allocators::{AllocatorSelector, Allocators};
use crate::util::alloc::Allocator;
use crate::util::{Address, ObjectReference};
use crate::util::{VMMutatorThread, VMWorkerThread};
use crate::vm::VMBinding;
Expand Down Expand Up @@ -118,6 +119,20 @@ impl<VM: VMBinding> MutatorContext<VM> for Mutator<VM> {
.alloc(size, align, offset)
}

fn alloc_slow(
&mut self,
size: usize,
align: usize,
offset: usize,
allocator: AllocationSemantics,
) -> Address {
unsafe {
self.allocators
.get_allocator_mut(self.config.allocator_mapping[allocator])
}
.alloc_slow(size, align, offset)
}

// Note that this method is slow, and we expect VM bindings that care about performance to implement allocation fastpath sequence in their bindings.
fn post_alloc(
&mut self,
Expand Down Expand Up @@ -169,6 +184,80 @@ impl<VM: VMBinding> Mutator<VM> {
unsafe { self.allocators.get_allocator_mut(selector) }.on_mutator_destroy();
}
}

/// Get the allocator for the selector.
///
/// # Safety
/// The selector needs to be valid, and points to an allocator that has been initialized.
/// [`crate::memory_manager::get_allocator_mapping`] can be used to get a selector.
pub unsafe fn allocator(&self, selector: AllocatorSelector) -> &dyn Allocator<VM> {
self.allocators.get_allocator(selector)
}

/// Get the mutable allocator for the selector.
///
/// # Safety
/// The selector needs to be valid, and points to an allocator that has been initialized.
/// [`crate::memory_manager::get_allocator_mapping`] can be used to get a selector.
pub unsafe fn allocator_mut(&mut self, selector: AllocatorSelector) -> &mut dyn Allocator<VM> {
self.allocators.get_allocator_mut(selector)
}

/// Get the allocator of a concrete type for the selector.
///
/// # Safety
/// The selector needs to be valid, and points to an allocator that has been initialized.
/// [`crate::memory_manager::get_allocator_mapping`] can be used to get a selector.
pub unsafe fn allocator_impl<T: Allocator<VM>>(&self, selector: AllocatorSelector) -> &T {
self.allocators.get_typed_allocator(selector)
}

/// Get the mutable allocator of a concrete type for the selector.
///
/// # Safety
/// The selector needs to be valid, and points to an allocator that has been initialized.
/// [`crate::memory_manager::get_allocator_mapping`] can be used to get a selector.
pub unsafe fn allocator_impl_mut<T: Allocator<VM>>(
&mut self,
selector: AllocatorSelector,
) -> &mut T {
self.allocators.get_typed_allocator_mut(selector)
}

/// Return the base offset from a mutator pointer to the allocator specified by the selector.
pub fn get_allocator_base_offset(selector: AllocatorSelector) -> usize {
use crate::util::alloc::*;
use memoffset::offset_of;
use std::mem::size_of;
offset_of!(Mutator<VM>, allocators)
+ match selector {
AllocatorSelector::BumpPointer(index) => {
offset_of!(Allocators<VM>, bump_pointer)
+ size_of::<BumpAllocator<VM>>() * index as usize
}
AllocatorSelector::FreeList(index) => {
offset_of!(Allocators<VM>, free_list)
+ size_of::<FreeListAllocator<VM>>() * index as usize
}
AllocatorSelector::Immix(index) => {
offset_of!(Allocators<VM>, immix)
+ size_of::<ImmixAllocator<VM>>() * index as usize
}
AllocatorSelector::LargeObject(index) => {
offset_of!(Allocators<VM>, large_object)
+ size_of::<LargeObjectAllocator<VM>>() * index as usize
}
AllocatorSelector::Malloc(index) => {
offset_of!(Allocators<VM>, malloc)
+ size_of::<MallocAllocator<VM>>() * index as usize
}
AllocatorSelector::MarkCompact(index) => {
offset_of!(Allocators<VM>, markcompact)
+ size_of::<MarkCompactAllocator<VM>>() * index as usize
}
AllocatorSelector::None => panic!("Expect a valid AllocatorSelector, found None"),
}
}
}

/// Each GC plan should provide their implementation of a MutatorContext. *Note that this trait is no longer needed as we removed
Expand All @@ -186,6 +275,13 @@ pub trait MutatorContext<VM: VMBinding>: Send + 'static {
offset: usize,
allocator: AllocationSemantics,
) -> Address;
fn alloc_slow(
&mut self,
size: usize,
align: usize,
offset: usize,
allocator: AllocationSemantics,
) -> Address;
fn post_alloc(&mut self, refer: ObjectReference, bytes: usize, allocator: AllocationSemantics);
fn flush_remembered_sets(&mut self) {
self.barrier().flush();
Expand Down
8 changes: 8 additions & 0 deletions src/util/address.rs
Original file line number Diff line number Diff line change
Expand Up @@ -300,6 +300,14 @@ impl Address {
&*self.to_mut_ptr()
}

/// converts the Address to a mutable Rust reference
///
/// # Safety
/// The caller must guarantee the address actually points to a Rust object.
pub unsafe fn as_mut_ref<'a, T>(self) -> &'a mut T {
&mut *self.to_mut_ptr()
}

/// converts the Address to a pointer-sized integer
pub const fn as_usize(self) -> usize {
self.0
Expand Down
Loading

0 comments on commit 0328b05

Please sign in to comment.