Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce the Store API for great good. #3446

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

matthieu-m
Copy link

@matthieu-m matthieu-m commented Jun 17, 2023

The Store offers a more flexible allocation API, suitable for in-line memory store, shared memory store, compact "pointers", const/static use, and more.

Adoption of this API, and its use in standard collections, would render a number of specialized crates obsolete in full or in part, such as StackFuture.

Rendered

The Store offers a more flexible allocation API, suitable for in-line
memory store, shared memory store, compact "pointers", const/static use,
and more.

Adoption of this API, and its use in standard collections, would render
a number of specialized crates obsolete in full or in part, such as
StackFuture.
@ehuss ehuss added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Jun 17, 2023
text/3446-store.md Outdated Show resolved Hide resolved
text/3446-store.md Outdated Show resolved Hide resolved
text/3446-store.md Show resolved Hide resolved
text/3446-store.md Outdated Show resolved Hide resolved
Copy link

@CAD97 CAD97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad to see this move forward, and apologize that I wasn't able to drive it like I said I was planning on doing. I'm mostly on board with the current direction, though I have a bunch of notes/thoughts I've put inline.

cc @rust-lang/wg-allocators

text/3446-store.md Show resolved Hide resolved
text/3446-store.md Show resolved Hide resolved
text/3446-store.md Outdated Show resolved Hide resolved
text/3446-store.md Outdated Show resolved Hide resolved
text/3446-store.md Show resolved Hide resolved
- Add a separate `SharingStore` trait -- see future possibilities.

It should be noted that `dyn` usage of `Allocator` and `Store` suffers from the requirement of using unrelated traits as
it is not possible to have a `dyn Allocator + Clone + PartialEq` trait today, though those traits can be implemented for
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple trait vtables are coming "soon," along with vtable upcasting.

I am now thinking we might potentially want a separate trait ErasedStore from the trait Store for the purpose of dynamic storages, to shim ErasedHandle in and potentially to provide is_sharing_with and clone functionality. Making Store deliberately not object safe initially might thus be desirable, until we figure out how dynamic storages "should" work.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No opinion \o/

text/3446-store.md Outdated Show resolved Hide resolved
text/3446-store.md Show resolved Hide resolved
Comment on lines +997 to +998
Since those extra capabilities can be brought in by user traits for now, I would favor adopting a wait-and-see approach
here.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not exactly trivial to add post facto, since all storages would need to provide the "potentially limited storage" API for it to be meaningfully useful. But combined with pattern refined types, I think adding to Store something like:

pub trait Store {
    const MAX_STORAGE: usize = isize::MAX as usize;
    const MIN_STORAGE: usize = 0;
}

should get the job done, since you can have e.g.

struct Vec<T, S> {
    handle: S::Handle,
    len: usize @ 0..={S::MAX_STORAGE / size_of::<T>()},
    cap: usize @ {S::MIN_STORAGE / size_of::<T>()}..={S::MAX_STORAGE / size_of::<T>()},
    store: S,
    marker: PhantomData<Box<[T]>>,
}

and that should hopefully theoretically be able to at least optimize for single-valued cap and cap/len which are always less than uNN... though also maybe not, depending on coercions (e.g. (&usize @ PAT) as &usize); pattern restricted types might just give you niches, not size compression. Interim solutions that use other solutions should definitely be left to 3rd party, since that type jujitzu is definitely beyond anything else std has exposed. (The Store trait hierarchy is already close to if not the most involved API in std, even after the massive simplifications v3 gives over v2 and v2 gives over v1.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not convinced that this would the right point of customization, to be honest.

For example, I regularly use u32 for length (and u8 for capacity, as the exponent of a power-of-2) even with the Allocator API not because the Allocator cannot provide more, but because I don't care about supporting more.

In fact, my implementation of InlineVec uses u16 regardless, since I have uses for more than 255, but not use over 64K elements.

Requiring to wrap the store into an adapter that limits the sizes every time you wish to save up some bytes seem more complicated than directly plugging the types you want to use in the first place.

text/3446-store.md Outdated Show resolved Hide resolved
@CAD97

This comment was marked as off-topic.

@CraftSpider
Copy link

I think CAD has already covered most of my thoughts on the API, so I just want to say: I'm very happy to see this initiative moving forwards!

type Handle: Copy;

/// Returns a dangling handle, always invalid.
fn dangling() -> Self::Handle;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to take a Layout or ptr::Alignment in dangling, so that Vec::new can use dangling to get the 0-length slice pointer?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question.

My intent was for dangling handles to always be invalid, and therefore to be UB to do anything with it, including resolving it. For example, I hoped that an index could use a giant value, which resolve would be able to detect was out of bounds in Debug.

Needless to say, this doesn't mesh well with Vec::as_ptr, which is then used for 0-length slices. It requires Vec to introduce a branch in as_ptr to choose between resolving or returning a dangling pointer, and that branch may not be well-optimized at all... it's all around sad.

This leaves two choices, as far as I can see:

  1. Alter the invalidity of dangling handle, and state that they must be resolvable, though the resolved pointer itself is then invalid. It would still be invalid to attempt to grow/shrink/deallocate a dangling handle, however.
  2. Require Vec to use a 0-sized allocation, and implementers to handle those.

I believe (1) is clearly the superior alternative here, so I'll go ahead and amend the RFC in that direction in the next batch of fixes, and mention why in the alternatives section.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... I went ahead and tried having dangling take a ptr::Alignment argument, and there was an unforeseen circumstance: it became fallible.

I created a branch on the companion repository which I invite you to look at. The problem I face there is that inline stores cannot provide an allocation with an alignment greater than their own as allocations are always offsets within the store itself. In turn, this makes Vec::new() fallible, or in the companion repository branch it makes LinkedList::new() and SkipList::new() fallible.

This result, in turn, in making dangling fallible since one may (accidentally, I suppose) ask for an alignment greater than what the Store may be able to provide, even in the absence of actual allocation.

I am quite unsure of how to move forward on this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #3446 (comment) @CAD97 raised the possibility of an allocate_dangling method.

It would make the interface more complex, but we could have:

  • Store::invalid: an invalid handle, it should never be resolved, deallocated, grown, nor shrunk.
  • Store::allocate_zero_sized: a specialized allocation method for ZSTs. The handle is valid, and I would argue should be treated like any other handle: it may be resolved, deallocated, grown, etc... (it can't be shrunk). Multiple independent ZST handles may resolve to the same memory address. Unlike CAD97's proposal I'd argue it must be deallocated ultimately.

This would help making LinkedList::new and SkipList::new infallible again, as they could use Store::invalid. It would not, unfortunately, help Vec::new be infallible as Store::allocate_zero_sized would remain fallible due to the alignment argument.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike CAD97's proposal I'd argue it must be deallocated ultimately.

This is a problem for implementations. It also leads to the situations we see where both sides of the allocation interface end up checking for ZSTs and having to special case them (See my links to the ZST allocation issues with the current API elsewhere in the thread). While that's less detrimental for Stores (Store is probably not going to be frequently, if at all, used via virtual dispatch), it's still a bad sign IMO.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an unforeseen circumstance: dangling became fallible.

Fallibility wasn't unforseen on my end; my allocate_dangling did return Result to allow failure to allocate if the alignment request cannot be fulfilled. It's fine if Vec::new can fail and call handle_alloc_error, imho; the entire collection interface is built around assuming that allocation is essentially infallible. A try_new_in (or whatever solution is used for fallible allocation in collections) would be necessary instead to avoid termination on exhausting the storage.

The ability to skip deallocating an allocate_dangling handle is key to it being any amount of realistically useful. Without that guarantee, it's unambiguously better to just never use the allocator to create zero-sized allocations.

Note that my spitball still allowed zero sized allocations in the standard allocate path, requiring deallocation to prevent leaks, with the expectation that this allocation would be permitted to use the exact same path as non-zero-sized allocation and be wasteful for zero-sized allocation which can conjure a fresh allocation at any nonzero aligned address.

Store is probably not going to be frequently, if at all, used via virtual dispatch

It'd be used dynamically essentially exactly as much as Allocator would, since it's replacing Allocator as the storage generic axis for collections. FWIW, I fully expect dyn GlobalAlloc will always remain the interface used for #[global_allocator], and changing it to be dyn Allocator to be unnecessary and of no real benefit. It makes sense that the API tradeoff calculus would be different for the globally available static allocation interface. (Though automatically bridging from Allocator to GlobalAlloc is both reasonable and desirable.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine if Vec::new can fail and call handle_alloc_error, imho; the entire collection interface is built around assuming that allocation is essentially infallible.

I am coming around to thinking this may be fine.

By defining type InlineVec<T, const N: usize> = Vec<T, InlineSingleStore<[T; N]>>; the type is guaranteed to have an infallible dangling, and thus users of the abstraction will never see a panicking new.

Performance-wise, this would rely on dangling being inlined in new, so the optimizer can see the error path is never taken and optimize out any check/call, but that can be tamed.

Copy link
Member

@thomcc thomcc Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine if Vec::new can fail and call handle_alloc_error, imho; the entire collection interface is built around assuming that allocation is essentially infallible

I'm not sure this is fine, it seems quite unfortunate to me. Note that Vec::new is currently const. Obviously that only applies to the global allocator (or whatever default store is used for Vec), but it seems like quite a drawback to prevent things like Store-based smallvec/arrayvec/etc from having const fn news (which I can say very concretely is something desirable).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it seems like quite a drawback to prevent things like Store-based smallvec/arrayvec/etc from having const fn news.

Note that handle_alloc_error is (unstably) const, so it won't prevent InlineVec::new from being const.

On the other hand, Rust doesn't support marking individual trait functions const for now, so we may need to lobby the compiler team or otherwise find a clever work-around for Store::dangling.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scottmcm For your viewing pleasure, let me present StoreVec. As a bonus, built entirely in a const context, because I could.

There's only one glitch preventing StoreVec::new to be const in general: Default is not a #[const_trait] yet, sadly.

text/3446-store.md Outdated Show resolved Hide resolved
text/3446-store.md Show resolved Hide resolved
@thomcc
Copy link
Member

thomcc commented Jun 23, 2023

What does the indirection of having Handle types give us that we don't get from just using raw pointers?

Is this mostly to support remote allocation (e.g. GPU resources?) I'm not sure that this is a great way to support such scenarios, and it seems to significantly complicate the API surface.

@matthieu-m
Copy link
Author

What does the indirection of having Handle types give us that we don't get from just using raw pointers?

Everything! :)

It enables in-line stores, to start with, by which I mean a store which returns pointer in the memory range spanned by self. That is in type ArrayVec<T, N> = Vec<T, InlineSingleStore<[T; N]>>;, the store cannot return NonNull<u8> as the result of the allocation, because as soon as the vector is moved, the NonNull<u8> points into the nether since it's self-referential.

The same self-referential problem appears in InlineBumpStore<[T; N]> which can be used to parameterize a BTreeX for example: it uses indexes, because pointers would be invalidated on moves.

Handles also allow, roughly in order of importance (for me!):

  • static/const: It's unclear whether Rust will one day allow to allocate in a const context and store the result in a static or const variable. It's a non-trivial issue. As soon as we can have const traits, however, Box<dyn Trait, InlineSingleStore<..>> or FxHashMap<K, V, InlineBumpStore<...>> can be const-constructed and stored in const or static => they don't use pointers, they use () and integers respectively. No problem.
  • Pointer compaction: sometimes you can use u32 (or another small type) instead of a 64-bits integral. The JVM does so automatically for heap sizes < 4GB. JS engines/DOM may benefit for the same reasons.
  • Compressed memory, with lazy de-compression on access.
  • Non-traditional memory: shared memory, possibly GPU memory, remote hosts' memory in a distributed system, disk, database, ...

Or in short, yes there's an inconvenience, which I find slight in front of all the Safety comments any way necessary to justify that dereferencing the resolved pointer is safe, but in exchange for the inconvenience many an horizon opens.

I note that all the usecases mentioned here are already part of the Motivation section of this RFC: do you have any suggestion to tweak/rewrite the Motivation section to make it clearer?

@thomcc
Copy link
Member

thomcc commented Jun 23, 2023

Hm, that's fairly compelling, thanks.

@clarfonthey
Copy link
Contributor

I'm cautiously optimistic about this API. I like the way it works, I like the flexibility of it, and I'm hopeful it'll really improve things.

Initially, I was worried about the amount of unsafe code still required to make simple array-based storage work, but then I remembered that MaybeUninit would still require it.

Honestly, I wonder if there's any merit to talking about "levels of unsafety" considering how UB is itself defined to be maximum unsafety regardless. Either way, I wouldn't be upset to see this implemented as-is.

Copy link

@CAD97 CAD97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I currently see as blocking concerns (discussed in above reviews, I'll hopefully edit in direct links later):

  • A resolution for dangling/Vec::new getting a handle resolvable to a zero-sized allocation, while still guaranteeing that no storage is actually reserved yet
  • Either a known way to ensure Box retains its retag behavior or signoff that this means Box doesn't cause retags / doesn't get noalias
  • Further investigation showing that the cost of &mut methods outweighs the opportunity cost of requiring all storages with inline state to be !Freeze (and thus making the container using them !Freeze even if it doesn't utilize the shared mutability)
  • Added discussion of what happens with the Allocator trait; integrating it as part of the Store trait hierarchy seems preferable to making it essentially an "impl alias."

@thomcc
Copy link
Member

thomcc commented Jun 26, 2023

One note about the Allocator trait is that &dyn Store<Handle = ...> (assuming something else in the trait doesn't break object safety -- I haven't checked) will have a lot of overhead, given that it would need a virtual method call for each resolve operation. Whereas AllocatorStore<dyn Allocator> or something would be fine.

@matthieu-m
Copy link
Author

(assuming something else in the trait doesn't break object safety -- I haven't checked)

Nothing else does right now. I specifically switched to taking &self in dangling to make it available for dyn Store.

One note about the Allocator trait is that &dyn Store<Handle = ...> ([...]) will have a lot of overhead, given that it would need a virtual method call for each resolve operation. Whereas AllocatorStore<dyn Allocator> or something would be fine.

There would be overhead, yes, however it wouldn't be a "surprise", and therefore it's something that can be coded against.

For example, if you have a Vec<T, &dyn Store<...>>, you can dereference it to a slice, then use the slice directly and avoid any further call to resolve.

Another possibility is to introduce a further trait which eschew calling the store when the handle is self-resolvable (S::Handle: TryInto<NonNull<u8>>), something along the lines of:

trait StoreHandle: Copy {
    default fn resolve<S>(self, store: &Store) -> NonNull<u8>
    where
        S: Store<Handle = Self>,
    {
        store.resolve(self)
    }
}

impl<H> StoreHandle for H
where
     H: TryInto<NonNull<u8>> + Copy,
{
    default fn resolve<S>(self, store: &Store) -> NonNull<u8>
    where
        S: Store<Handle = Self>,
    {
        self.try_into().unwrap_or_else(|| store.resolve(self))
    }
}

@thomcc
Copy link
Member

thomcc commented Jun 26, 2023

For example, if you have a Vec<T, &dyn Store<...>>, you can dereference it to a slice, then use the slice directly and avoid any further call to resolve.

This works for a vector/string but not something like a map. It also only works with read-only interfaces. Note that with allocator you can add things to and remove things from the vector with no cost unless you trigger reallocation.

@CAD97
Copy link

CAD97 commented Jun 26, 2023

This specific reply wanders pretty far off topic, but I don't think the resolve overhead for &dyn Allocator (with trait Allocator: Store<Handle = AllocHandle> + ...) is strictly necessary, and it could be automatically eliminated given two individually desirable other features.

The key feature is (unproposed) final trait methods where default method bodies are currently allowed. If a trait method's default is final, it would not be allowed to be replaced by the concrete impl, and it would be possible for (it to be elided from the vtable and) calls to that method to bypass virtual dispatch, depending on the body of the function (i.e. if the body can be safely dispatched without just resulting in multiple virtual calls in its impl).

Pair that with a default impl<A: Allocator> Store for A which sets Handle = AllocHandle and resolve(&self, handle: AllocHandle) = handle.get() in a final manner, the compiler should be able to inline <&dyn Allocator as Store>::resolve as nonvirtual.

How practical getting those features are, though, as well as if it's desirable to block stabilization of (at least the Allocator part of) the storage API on it for perf reasons is a different question. Especially since I'm positing the availability of a feature that's not even proposed yet.

Even if it can't be done automatically, a wrapping struct AllocStore<A: ?Sized + Allocator>(A) can be written which does the specialization on an opt-in basis.

@clarfonthey
Copy link
Contributor

clarfonthey commented Jun 27, 2023

One note about the Allocator trait is that &dyn Store<Handle = ...> (assuming something else in the trait doesn't break object safety -- I haven't checked) will have a lot of overhead, given that it would need a virtual method call for each resolve operation. Whereas AllocatorStore<dyn Allocator> or something would be fine.

Isn't this the main purpose of StableStore? I would imagine that all traditional memory allocators would implement this, and this means that you can store the pointers without worrying about them becoming invalid, and then there aren't any additional resolve calls.

I would imagine that using dynamic dispatch for something else would be ill-advised precisely for the performance issues you mention. I can't see why something like an array-based Vec would use it, for example.

@thomcc
Copy link
Member

thomcc commented Jun 27, 2023

Right, but I couldn't use that in that way with Vec, could I? At that point, you're back to implementing data structures by hand yourself.

* Changes:

- Rename all XxxStore traits to StoreXxx.

* Motivation:

As proposed by @CAD97, this establishing a naming convention in which:

- StoreXxx are the traits of the API.
- XxxStore are the types implementing the API.
@matthieu-m
Copy link
Author

matthieu-m commented Jul 2, 2023

Right, but I couldn't use that in that way with Vec, could I? At that point, you're back to implementing data structures by hand yourself.

It will depend how efficient you want things to be.

Firstly, if you use dyn Store<Handle = H>, and want efficiency (memory-wise) you can just aim for H to be convertible to/from NonNull<u8> in order to skip the whole resolve step.

Then, it's just a matter of implementing an adapter Store:

struct DynStore<'a, H>(&'a dyn Store<Handle = H>);

unsafe impl<'a, H> const StoreDangling for DynStore<'a, H>
where
     H: Copy + From<NonNull<u8>>,
{
    type Handle = H;

    fn dangling(&self, alignment: Alignment) -> Result<Self::Handle, AllocError> {
        let handle: NonNull<u8> = /* magic */;

        Ok(handle.into())
    }
}

unsafe impl<'a, H> Store for DynStore<'a, H>
where
    H: Copy + From<NonNull<u8>> + Into<NonNull<u8>>,
{
    unsafe fn resolve(&self, handle: H) -> NonNull<u8> { handle.into() }

    fn allocate(&self, layout: Layout) -> Result<Self::Handle, AllocError> { self.0.allocate(layout) }

    // ... forward all other methods ...
}

And that's it! Just parameterize your Vec with DynStore<'a, H> and off you go. No penalties of any kind.

If really you somehow got yourself stuck with a dyn Store<Handle = H> where H cannot be convert to/from NonNull<u8> but the store is StoreStable, then you can create a new handle type which bundles together (H, NonNull<u8>) to achieve the desired effect... trading off memory usage for speed.

If the local increase of memory usage is a problem, it's also possible to use a non-Send store adapter which stores the mapping from handle to pointer (and back) into a thread-local store, then returns the pointer as a handle directly. The mapping only would have to be consulted on allocation/deallocation, so its overhead should be relatively minimal.

Anyway, before I start pulling any more hare-brained schemes, I think I've made it clear that there are many solutions that do NOT involve re-implementing collections: the whole point of the Store API is to avoid having to do so, after all.

* Changes:

- Introduce StoreDangling, as super-trait of Store.
- Make dangling take an alignment, it becomes fallible.

* Motivation:

A separate trait is necessary for `dangling` to be usable in const
contexts even when the entire `Store` implementation cannot be const.

An alignment is necessary for efficient use in `Vec`, which takes
advantage of NonNull::dangling being aligned today.
@matthieu-m
Copy link
Author

What I currently see as blocking concerns (discussed in above reviews, I'll hopefully edit in direct links later):

* [ ]  A resolution for `dangling`/`Vec::new` getting a handle resolvable to a zero-sized allocation, while still guaranteeing that no storage is actually reserved yet

I added the StoreDangling trait, with the "new" fallible dangling method.

The reasoning for a separate trait is laid out in the RFC, but in short, it's not possible to have a const Store implementation for any Allocator, but it is possible to have a const StoreDangling, and from then to have a const Vec::new_in.

Vec::new being const only requires Default to be #[const_trait] which while it will require a separate RFC, certainly seems within reach -- I was actually surprised to realize it wasn't.

* [ ]  Either a known way to ensure `Box` retains its retag behavior or signoff that this means `Box` doesn't cause retags / doesn't get `noalias`

Given that you used Box itself in your demo, may I take it that the lang-item has an effect there? If so, can we not make it so that the lang-item continues effecting its magic when Box stores a handle instead?

@tgross35 tgross35 mentioned this pull request Jul 2, 2023
4 tasks
@tgross35
Copy link
Contributor

tgross35 commented Jul 2, 2023

I don't think I saw anything about being able to use alloc::collections types in a fallible way via this proposal - for example, push still returns () in the example. Is that planned at all?

There is a very interesting experiment going on with the Allocator trait that essentially provides an error type and a way to convert the errors. Basically the below (with some caveats to make it object safe):

/// Defaults to current allocator style of panic on OOM
trait Allocator {
    type Result<T> = T;
    fn convert_result<T, E>(res: Result<T, E>) -> Self::Result<T> {
        match res {
            Ok(val) -> val,
            Err(e) -> handle_alloc_error(e),
        }
    }
}

impl Alloc for FallibleAllocator {
    type Result<T> = Result<T, AllocError>;
    fn map_result .// ..
}

// Resolves to `()` for infallible, `Result<(), AllocError>` for fallible
impl<T, A:Alloc> Vec<T, A> { fn push(val: T) -> A::Result<()>; }

impl<T, A: Alloc> Box<T, A> { fn push(val: T) -> A::Result<Box<T,A>>; }

I think support for some sort of behavior like this may be fairly crucial for this proposal since it is intended to work in a lot of places where allocation can fail, but I am unsure what the best solution is that would be. Or if I might have just missed something in the RFC.

@matthieu-m
Copy link
Author

@CAD97 @thomcc

I believe we've all been busy, and so progress on this RFC has stalled. I think it would be good to establish a TODO list of things to be done, and for me to consolidate this list in the top comment.

The experiment for &mut self failed for any "multi" store, so I don't see the point in trying to pursue it more -- personally -- as having it only for single store seems fairly pointless.

RFC #3490 is fairly interesting, to me, as it means being able to remove StoreDangling, and instead just have Store with a const dangling method. I'm thinking of revising the RFC accordingly, though not the experimental implementation as that's unsupported by the compiler for now.

Is there any other outstanding point you'd like to see clarified and/or investigated?

@mangelats
Copy link

I have a few suggestions and thoughts which I hope may help improving this API.

Error handling

Instead of returning Result<_, AllocError> I would suggest to add an associated type to define what error types a particular store uses. This error must be convertible into AllocErr 1. It can even be defaulted to it with the associated type defaults 2:

#[const_trait]
pub unsafe trait StoreDangling {
    /// Error type for fallible operations
    type Error: Into<AllocError> = AllocError;

    // [...]
}

There is a blanket implementation of Into<T> for T 3 so the default AllocError satisfies this requirement. Adding Error also allows to design store-specific errors which may provide more information for specific cases while keeping a uniform interface. It also enables infallible stores by using the never type (!) 4 (and letting the compiler optimize the error handling mechanisms away).

unsafe impl const StoreDangling for InfallibleExample {
  // No alloc, dealloc, or resize will ever fail in this store
  type Error = !;

  // [...]
}

This solution has a few fallbacks though:

  1. ! should implement Into for all types (including AllocError), but From<!> for T is not yet implemented 5.
  2. It doesn't play nice with the ? operator because it would require From<Self::Error> for AllocError 6 which I wasn't able to define as a working requirement 7.

It's also be possible to have per-method error type but I feel like that makes the trait less focused and harder to use.

Mutability, sharing, and multithreading

The point that stood out the most for me is the usage of &self in methods that mutate values (allocate, deallocate, grow, shrink, and its variants). In some implementations this seems to make sense (eg. when using a GlobalAlloc 8) but in others seems forced (eg. when allocating over memory on the stack).

I feel that the desired mutability restrictions to cover all cases are:

  • Always allow modifying actions when having a single mutable reference-like variable.
  • Allow to modifying actions when having multiple reference-like variable if and only if the storage supports it.
  • Allow to modifying actions from multiple threads if and only if the storage supports it.
  • Avoid unnecessary locks (eg. malloc has its own lock. Forcing to use a mutex is not necessary and probably hurts performance).
  • Allow uses of stores to require support for what they can do.
  • Allow using standard types to upgrade the stores (like Arc<Mutex<S>>, Rc<S>, etc.) 9

The solution that I feel more likely to work is spliting the store into the interface and memory (storefront and back of house? 10 11) with traits that mark capabilities and have requirements 12:

  • MultiRef: It marks that you can use Clone to get another storefront (reference-like).
  • ThreadSafe: Requiring MultiRef, Send and Sync, it marks that it is safe to use in multiple threads.
pub trait MultiRef: Store + Clone {}
pub trait ThreadSafe: MultiRef + Send + Sync {}

The data modifying methods in Store would require a mutable reference, but the ones that support MultiRef, can be cloned with an immutable reference, so Clone is the way of promoting from &self to &mut self 13. Because the storefront shouldn't contain the memory, the cloning should be fast 14.

The separation is useful because it already exists in case of allocating functions like malloc or a GlobalAlloc. In this case, the storefront could be a zero-sized type that simply calls the necessary functions and it implements all the traits (because malloc is thread safe). The trait could also be implemented for std::alloc::System bacuase that's pretty much what it already is 15.

This separation also makes it possible to have multiple storefronts for a single store: a store could have one lock free single-threaded, and one lock-based multithreaded storefronts.

In this suggestion composites would be nice. So the following would be tribial to make:

  • One with an underlying Rc<S> which would add MultiRef.
  • Another one based of Arc<Mutex<S>> for MultiRef and ThreadSafe.

The obvious downside to this is more traits.

Resizing

It's technically always possible to try to resize by allocating + moving data + deallocating (because its specified that growing/shrinking invalidates pointers) but some implementation may not like resizing unless forced to. I can see 3 possible ways of managing it:

  1. Force all stores to define resizing methods (and provide utility functions to avoid reimplementing each time).
  2. Add this as the default implementation.
  3. Move resizing into yet-another-trait (and maybe provide the default implementation there).

Lifetimes of pointed memory

When resolving a handle, the resulting pointer has no lifetime associated. We know for a fact that this may not be true, as it could be invalidated. One of such cases is when the the store is droped 16. This could be checked at compile time by adding lifetimes to the result type when resolving.

Here is a simple example that would limit the lifetime:

struct SliceStore<'a>(&'a mut [u8]);

struct StorePtr<'a = 'static>(NonNull<u8>, PhantomData<&'a mut u8>);

unsafe impl<'a> const Store<'a> for SliceStore<'a> {
  unsafe fn resolve(&self, handle: Self::Handle) -> StorePtr<'a> {
    // [...]
  }
  // [...]
}

I don't really like this exact implementation, but it's useful to show the general idea.

There were some discussions about using something like PointerGuard<T> 17 for a different use case 18.

Footnotes

  1. Or another base error type if it ever changes.

  2. For more info on associated trait types, see RFC 2532 and its tracking issue.

  3. As per the docs: «From is reflexive, which means that From<T> for T is implemented»

  4. Read more about ! in the docs specifically about infallible errors, its RFC 1216, and its tracking issue.

  5. It seems to be low priority see issue 64715.

  6. It seems like using Into instead of From breaks some backwards compatibility. Either way liks are RFC 1859 and issue 84277.

  7. I tried doing so but it makes the compiler overflow. I may be missing something, but I feel this is at the edge of what is meant to be possible.

  8. GlobalAlloc always takes a &self as part of its requirements (docs). I have the feeling that it is for the exact same reason as the one given in this RFC.

  9. This may not be possible with the current trait system. I feel like having a wrapping type would be ok.

  10. Naming is hard. I love that this uses the names of parts of an actual store though.

  11. I left Store with the same name in the example but renaming it to Storefront may be worth it if the concept changes.

  12. I don't like this names but I can't think of better ones right now.

  13. I'm on the fence if Clone is the proper trait. That being said, this is exactly how Rc and Arc work, so it makes sense to keep it this way.

  14. I expect it to either be a ZST or a wraper on a reference/Rc/Arc, the copy of which are as low as they can for their application.

  15. std::alloc::System is a ZST handle to the system allocation functions and it already supports Clone, Send, and Sync.

  16. Which is properly stated in the Store's safety section on the docs.

  17. Comment in this PR.

  18. I find it interesting that the idea of not resolving into an actual pointer but some custom type happened twice for different reasons.

@matthieu-m
Copy link
Author

matthieu-m commented Oct 14, 2023

First of all, @mangelats thank you very much for the detailed feedback!

Error handling

The API of Store has been intentionally kept as close as possible to the API of Allocator. This is done for DRY purposes, and for convenience.

This is not to say that the API of Allocator is the best it could be. In fact @thomcc, a member of the libs-team, published a blog article about 2 months ago called Let's talk about the Allocator trait where he enumerated a whole host of problems/questions with the current trait. It is reasonable to think, thus, that the Allocator trait will evolve before its stabilization.

The intent is for Store to mirror the Allocator trait, and thus I will ruthlessly defer many API changes suggestions -- such as whether allocator should return the size of the allocation -- to Allocator.

I do think your idea of finer-grained error handling, and I think it would be worth suggesting for Allocator. If Allocator adopts it, Store will too.

Resizing

Similar to the above, one of the suggestion made by @thomcc for Allocator is to make grow/shrink be in-place (or fail), and leave it up to the user to make a separate allocation instead.

I would welcome such a change, as it's possible to make a grow-in-place-or-out-of-place out of a grow-in-place + allocate/deallocate but the reverse is not, and thus I see grow-in-place as a more fundamental primitive.

Once again, though, I'll defer the decision to Allocator, and adapt Store based on this.

Lifetimes of pointed memory

This could be checked at compile time by adding lifetimes to the result type when resolving.

Actually... not really.

One of the future extensions proposed is the idea of fungible stores. It should be possible, for example, to design an arena that is held by an Arc and shared across multiple threads by different stores, where the memory allocated by a store on any thread will remain available as long as the arena is alive, and may be deallocated by any other store referring to the same instance of the arena which allocated it. In such a case, the lifetime "high bound" of the allocation is dynamic.

But the converse is also true, the lifetime "low bound" of the allocation is also dynamic. You can keep multiple copies of a pointer, but it takes a single deallocate on any copy to invalidate the allocation and all the pointers.

@CAD97 tried the lifetime approach in their revised storages-api repository. As mentioned in the trade-offs section I deem it a mistake.

I think it is better for Store to provide a raw API:

  • The more "guarantees" we attempt to bake in at the API level, the more likely we are to accidentally prevent the use of the API in an otherwise "legitimate" usecase.
  • Extra compile-time guarantees can be provided by extra layers, as is idiomatic in Rust.

Mutability, sharing, and multithreading

@CAD97 shares your opinion that it would be nice to avoid mandatory internal mutability, and I've investigated it quite a bit... and I don't think it's going to be fruitful.

Store fronts sharing the backing memory are, indeed, extremely common, and the SharingStore possible future extension is one attempt I made at an API which acknowledges this matter of fact, and offers fungibility for allocations accordingly. In a sense, you should consider that all Store are a front first and foremost, and some just happen to also be a back at the same time.

The fundamental problem, I think, is that at the heart of allocators there is a conflict between theory and practice:

  • Theory: Alias analyses reason in terms of "independent" allocations.
  • Practice: the memory allocator has a single huge blob of memory to partition into allocations, and that blob aliases with everything.

Rust only offers a single (blunt) instrument to turn-off aliasing considerations: internal mutability, via UnsafeCell.

As a result, any memory allocator written in Rust which wishes to partition a memory block -- a notable exception thus being single-allocation allocators -- MUST use UnsafeCell.

Let's walk through it with an example, a Store with resolve and resolve_mut methods. The problem is that it should normally be possible to have two concurrently resolved mutable references (to distinct elements), and I can't find a way to make it possible without internal mutability. Not without offending the Stacked Borrows model, at least, and I'm not too convinced about Tree Borrows either.

Let's see it in action with an example:

struct MiniStore<T> {
    //  Invariant: the memory cell at index `i` is allocated if `mask & (1 << i)` is set.
    mask: u8,
    memory: [MaybeUninit<T>; 8],
};

impl<T> Store for MiniStore<T> {
    //  **BOOM**
    fn resolve_mut(&mut self, handle: Self::Handle) -> *mut u8 {
        todo!()
    }
}

The problem that arises in attempting to call resolve_mut is that if you already have an outstanding mutable reference to any of the elements of self.memory, then forming a mutable reference to self creates a mutable reference which overlaps with the former. And this generalizes to any kind of &mut reference, obviously.

There is a handful of cases where you create a store with a single memory allocation -- for Box, Vec, VecDeque, HashMap, ... -- but many collections requires multiple nodes (independent allocations) and an important usecase is the ability to share a single store between multiple collections (arena style).

Since it seems that most allocators/stores -- with the clear exception of the handful of single-allocation ones -- will require internal mutability of the backing memory to protect against accidental aliasing, then it just doesn't seem worth it to complicate the API with additional traits/methods for a handful of cases.

When it comes to allocator, an aliasing barrier (internal mutability) is just the name of the game, and we have to accept it.

@glaebhoerl
Copy link
Contributor

Rust only offers a single (blunt) instrument to turn-off aliasing considerations: internal mutability, via UnsafeCell.
...
... I can't find a way to make it possible without internal mutability. Not without offending the Stacked Borrows model, at least, and I'm not too convinced about Tree Borrows either.

Is the UnsafeAliased RFC #3467 relevant here? If the problem is solely language-level UB (as implied by the reference to Stacked and Tree Borrows), then that seems like it ought to resolve it.

@matthieu-m
Copy link
Author

Rust only offers a single (blunt) instrument to turn-off aliasing considerations: internal mutability, via UnsafeCell.
...
... I can't find a way to make it possible without internal mutability. Not without offending the Stacked Borrows model, at least, and I'm not too convinced about Tree Borrows either.

Is the UnsafeAliased RFC #3467 relevant here? If the problem is solely language-level UB (as implied by the reference to Stacked and Tree Borrows), then that seems like it ought to resolve it.

Thanks for bringing this RFC to my attention.

I won't even pretend to be an expert in the situation, I'd prefer @CAD97 or @RalfJung to review the situation.

Still... I am afraid it actually raises more questions than it answers. Taking back my example from above:

struct MiniStore<T> {
    //  Invariant: the memory cell at index `i` is allocated if `mask & (1 << i)` is set.
    mask: u8,
    memory: [MaybeUninit<T>; 8],
};

impl<T> Store for MiniStore<T> {
    //  **BOOM**?
    fn resolve(&self, handle: Self::Handle) -> *const u8 {
        todo!()
    }

    //  **BOOM**
    fn resolve_mut(&mut self, handle: Self::Handle) -> *mut u8 {
        todo!()
    }
}

The issue is that aliasing occurs between:

  • An active outstanding mutable reference to let x = &mut store.memory[0] (obtained through resolve or resolve_mut).
  • Any reference to &store leads to a BOOM.

From the wording of the UnsafeAliased RFC:

UnsafeAliased could affect aliasing guarantees both on mutable and shared references. This would avoid the currently rather subtle situation that arises when one of many aliasing &mut UnsafeAliased<T> is cast or coerced to &UnsafeAliased<T>: that is a read-only shared reference and all aliases must stop writing! It would make this type strictly more 'powerful' than UnsafeCell in the sense that replacing UnsafeCell by UnsafeAliased would always be correct. (Under the RFC as written, UnsafeCell and UnsafeAliased can be nested to remove aliasing requirements from both shared and mutable references.)

I seem to read -- and please correct me if I'm wrong:

  • That wrapping memory in UnsafeCell prevents any aliasing issue between &store and x, but not between &mut store and x.
  • That wrapping memory in UnsafeAliased prevents any aliasing issue between &mut store and x, but not between &store and x.

If my understanding is correct, then UnsafeAliased is absolutely necessary for &mut store, but it won't remove the need to also use UnsafeCell for &store anyway.

Given that the primary motivation of @CAD97 in having mutable methods to mutate the storage -- and a resolve_mut -- were to eschew the requirement for UnsafeCell which contaminates its parent struct (and their parents, recursively) and inhibits a number of optimizations, I would say that UnsafeAliased is not helping.

It would indeed allow &mut store references, and thus &mut self methods, but:

  • This would only add difficulty in implementing inline stores, now requiring both UnsafeCell and UnsafeAliased.
  • It would not help in implementing out-of-line stores, for which the out-of-line memory is only referred to by pointer, never references (if done correctly).

So, UnsafeAliased makes &mut self methods possible, but &mut self methods seem to only add woes.

I'd be very happy to have understood wrong :)

@RalfJung
Copy link
Member

RalfJung commented Oct 20, 2023

I won't even pretend to be an expert in the situation, I'd prefer @CAD97 or @RalfJung to review the situation.

Sorry, what's the question? I don't have the time to read a 1300 line RFC and a 167 comment discussion, but if there's a concrete self-contained question I can try to help with that. :)


The problem is illustrated in [3446-store/aliasing-and-mutability.rs] on which Stacked Borrows chokes. While Tree
Borrows _does_ accept the program as valid, it is not clear whether LLVM `noalias` would be compatible in such a case
or not, now and in the future.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all we know, if the code is accepted by Tree Borrows, then it is compatible with LLVM noalias. Everything else would be a serious bug in Tree Borrows.


This RFC introduces 5 new traits.

The core of this RFC is the `Store` trait, a super-trait of the `StoreDangling` trait:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says Store is a super-trait of StoreDangling, but the declaration below does the opposite: StoreDangling is a super-trait of Store.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I'll take note and fix it later.

Ideally there would be no StoreDangling at all, but we'll need const methods in traits for that, as dangling really needs to be const if we want Vec::new to be const too for arbitrary traits (and I'd argue we do).

@matthieu-m
Copy link
Author

I won't even pretend to be an expert in the situation, I'd prefer @CAD97 or @RalfJung to review the situation.

Sorry, what's the question? I don't have the time to read a 1300 line RFC and a 167 comment discussion, but if there's a concrete self-contained question I can try to help with that. :)

Thankfully, the question is much more narrow in scope :)

In the comment you were replying to, there's a code snippet:

struct MiniStore<T> {
    //  Invariant: the memory cell at index `i` is allocated if `mask & (1 << i)` is set.
    mask: u8,
    memory: [MaybeUninit<T>; 8],
};

impl<T> Store for MiniStore<T> {
    //  **BOOM**?
    fn resolve(&self, handle: Self::Handle) -> *const u8 { ... }

    //  **BOOM**
    fn resolve_mut(&mut self, handle: Self::Handle) -> *mut u8 { ... }
}

The intent of the code is that MiniStore will carve the single region of its memory field into distinct sub-regions -- as allocators are wont to do.

The allocator/store implementation will take care not to form references to sub-regions, but it cannot help but form references to the memory field: self is accessed by references, and self.memory is (implicitly) a reference.

Yet those sub-regions should be considered "independent" of each others, and of the memory field, for the purpose of alias analysis, and the question is therefore what are the requirements on how memory should be wrapped so that:

  1. It is possible to have a &mut/noalias pointer to a sub-region and a &self to MiniStore: is wrapping memory in UnsafeCell sufficient?
  2. It is possible to have a &mut/noalias pointer to a sub-region and a &mut self to MiniStore: is wrapping memory in your proposed UnsafeAliased sufficient?
  3. If the memory field must be wrapped in both UnsafeCell and UnsafeAliased, is there a restriction on which wrapper should be the outer one and which should be the inner one?

Since the discussion is now veering into &self vs &mut self I'd appreciate if you could answer the above questions so all participants (me first) have a clear view of the guidelines as we discuss the trade-offs.

@RalfJung
Copy link
Member

RalfJung commented Oct 21, 2023

It is possible to have a &mut/noalias pointer to a sub-region and a &self to MiniStore: is wrapping memory in UnsafeCell sufficient?

Yes. That's like the &mut T that you can get out of a RefCell<T>, which aliases with the &RefCell<T>.

It is possible to have a &mut/noalias pointer to a sub-region and a &mut self to MiniStore: is wrapping memory in your proposed UnsafeAliased sufficient?

Just looking at noalias, yes UnsafeAliased will remove the noalias from &mut MiniStore so there's no direct UB. But the actual constraints for soundness are a lot more subtle than that:

Mutable references are still generally expected to be unique even with UnsafeAliased, that's required for soundness of mem::swap. So two &mut UnsafeAliased<T> (or wrappers around them) that are handed to safe code must never alias. Now in your case I assume you are guarding (likely via unsafe preconditions) against using resolve_mut on the same index twice, so the two mutable references we are considering (&mut MiniStore<T> and &mut T) have different type so it's less obvious that things would be unsound... but I am far from confident that there isn't some other form of unsoundness. I don't think Rust can support two &mut aliasing each other. UnsafeAliased is only intended to enable the usecase of an &mut aliasing a *mut.

In particular, the model I've been using in RustBelt to prove soundness of the type system would make the following function sound, and it is intended to remain sound with UnsafeAliased:

fn assume_disjoint<T, U>(x: &mut T, y: &mut U) {
  let xsize = mem::size_of_val(x);
  let ysize = mem::size_of_val(y);
  if size_t == 0 || size_u == 0 { return; }
  let xaddr = x as *mut T as usize;
  let yaddr = y as *mut T as usize;
  // We could assume full disjointness, but checking that isn't actually easy to write, so we just
  // check if y starts within x. Good enough to make the point.
  if (xaddr .. xaddr+xsize).contains(&yaddr) { unsafe { hint::unreachable_unchecked(); } }
}

With your API I could call that function in a way that triggers UB. Now, your API is largely unsafe (if I understood correctly), so this does not immediately imply unsoundness, but it means we have to tread very carefully.

* Changes:

- Move `resolve_mut` higher, and link to Ralf Jung's comment on the
  soundness challenges it presents.
- Reword allocation section.
- Fix mention of super-trait.

* Motivation:

It is now clear that `resolve_mut` is a non-starter, and thus that
interior mutability will be necessary to avoid incorret `noalias`
annotations.

In turn, it makes it pointless to try and use `&mut self` for
allocation et al.
@matthieu-m
Copy link
Author

@CAD97, @thomcc Given the above comment from Ralf, I am tempted to consider than any allocator/store API should mostly stick to &self.

Due to aliasing constraints, all but single-slot allocators/stores -- limited to single-element or single-array based collections -- will be incompatible with having a &mut reference to their backing memory, and will only allow & reference if the backing memory is wrapped into an UnsafeCell.

A counter-argument, for Store, would be to distinguish between single-slot and multi-slot stores, as some previous incarnations did. The benefit would be to avoid having an UnsafeCell in Box and Vec (essentially). It may be worth it, given how prevalent those APIs are, if there are indeed performance benefits to be had, though I am not familiar enough with the code-generation aspect to judge whether performance is affected.

If it is desired to special-case such collections, then I would propose changing the traits around. The current Store would remain as-is, but embed the guarantees of StoreMultiple, and a StoreSingle trait would be introduced:

trait StoreSingle: StoreDangling {
    fn resolve(&self, handle: Self::Handle) -> *const u8;
    fn resolve_mut(&mut self, handle: Self::Handle) -> *mut u8;

    fn allocate(&mut self, layout: Layout) -> Result<(Self::Handle, usize), AllocError>;
    //  All `&mut self` from there.
}

And there would be a default implementation of StoreSingle for any type implementing Store.

Additionally, the marker store traits StoreStable and StorePinning would remove the Store super-trait requirement so as to be usable with StoreSingle as well.

There would be no increase in terms of number of traits (Store & StoreMultiple become Store & StoreSingle), and the number of methods to implement would remain similar (thanks to the default implementation).

Obviously, this would not solve the issue that in-line stores in BTreeXxx collections (and other node-based collections) would still require UnsafeCell, but perhaps the aliasing guarantees are less important then anyway with nodes referencing each others already?

@RalfJung
Copy link
Member

I should slightly revise what I said: really UnsafeAliased can only support aliasing of Pin<&mut T> and other pointers. The pinning part seems quite crucial. After all, if you are aliased, then that means there's other incoming pointers and just moving this value would invalidate them, so you better be pinned.

So there's potentially a non-interior-mutable API that works, but it'd probably involve pinning. Which makes sense I think; if your storage type stores data in-line then there is actual aliasing that needs to be handled, but if it stores data behind a pointer indirection then things become a lot less critical.

Whether it's worth using Pin to avoid interior mutability... I'll leave that decision to others.

A counter-argument, for Store, would be to distinguish between single-slot and multi-slot stores, as some previous incarnations did. The benefit would be to avoid having an UnsafeCell in Box and Vec (essentially). It may be worth it, given how prevalent those APIs are, if there are indeed performance benefits to be had, though I am not familiar enough with the code-generation aspect to judge whether performance is affected.

Right, that seems to be the main point... specifically this is about whether the SmallVec defined this way can get noalias on its shared references, right? The normal Vec would be fine since it'd use a ZST storage type that uses the system allocator and there's no UnsafeCell anywhere?

@matthieu-m
Copy link
Author

Right, that seems to be the main point... specifically this is about whether the SmallVec defined this way can get noalias on its shared references, right? The normal Vec would be fine since it'd use a ZST storage type that uses the system allocator and there's no UnsafeCell anywhere?

Indeed, the issue is for {Inline|Small}{Box|Vec|BTreeXxx|...} collections in which there's necessarily aliasing of the backing memory (when in-line) and the references handed out.

The "multi allocations" collections (BTree{Map|Set}, LinkedList, ...} will necessarily require UnsafeCell: iterating over mutable references to elements requires that having outstanding &mut to elements and a reference to the store (for resolving further elements) be sound.

The "single allocation" collections (Box, Vec, VecDeque, Hash{Map|Set}, ...) could benefit from a special StoreSingle trait with mutable receivers, to avoid UnsafeCell in their in-line stores.

Hopefully I'll get some time next week-end to switch from Store+StoreMultiple to Store+StoreSingle in my experimental repository, and revise the RFC accordingly.

Thank you very much for your clarifications.

@mangelats
Copy link

Hey @matthieu-m!

Thanks for the lightning-fast reply. I took some time to investigate most of the things further, so sorry for the delay.

Error handling & Resizing

Fair enough. I just wanted to give alternatives to things that were talked about. Maybe I'll see if other people proposed similar things and help there :)

Lifetimes of pointed memory

I think I didn't explain myself well enough for this section.

Let's imagine that you retrieve a pointer using a handle from a store. This pointer may be invalidated by any reason outlined in this RFC. We could say that it has a lifetime 'a which starts when retrieved and ends when the pointer is invalidated. If we could presisely define this lifetime, we could use references instead.

IMHO this is not be possible: some stores may share the handle and the same pointer may be retrieved in one thread and invalidated by another. Because of that, it's impossible to know the exact lifetime since even IO could change it. Thus, I believe the guarantees should be the responsibility of the abstractions on top of the store.

But now let's define a lifetime 'b which outlives any possible 'a. One such example is 'static, since 'a will for sure not be valid after the program stops running. But that's not the only option because in some cases we can be more precise. For example you may have an array on the stack as the backing memory for your store. In that case you could make 'b the lifetime of the array 1. In the case of an arena allocator that you proposed, 'b would probably be 'static because it's backing memory probably exists for the entirity of the running program.

We can build a type AccessGuard<'b> with the following properties:

  • Any interaction within 'b's lifetime is unsafe and not guarateed to be valid (like a pointer).
  • Any interaction after 'b is invalid (like a reference).
         'a ─────────┐
         'b ───────────────────┐
    'static ─────────────────────────────┐
            ┌────────┬─────────┬─────────┐
    Pointer │ unsafe   unsafe    unsafe  │
            └────────────────────────────┘
            ┌────────┬─────────┬─────────┐
  Reference │ valid  │ invalid   invalid │
            └────────┴───────────────────┘
            ┌────────┬─────────┬─────────┐
AccessGuard │ unsafe   unsafe  │ invalid │
            └──────────────────┴─────────┘

Note that 'static outlives both 'a and 'b, and 'b outlives 'a. Also, AccessGuard<'static> behaves like a pointer 2. This means that this API only extends the current one while allowing some stores to add precision that may raise errors at compile time.

Mutability, sharing, and multithreading

There are 3 orthogonal problems we are trying to tackle:

  1. Using &self vs &mut self
  2. UB due to aliasing rules
  3. Enable some optimizations with noalias

&self vs &mut self

&self allows to have multiple references to the same object. Adding internal mutability, it allows to modify from multiple references. The alternative I proposed is &mut self + Clone (when supported). This can also be used on top of a store that uses non-mutable references by wraping a reference. Here is a simplified example:

trait Store {
  type Handle;
  unsafe fn retrieve(&mut self, handle: Self::Handle) -> *mut u8;
}

struct BackOfStore {
  // [...]
}
impl BackOfStore {
  pub unsafe fn retrieve(&self, index: usize) -> *mut u8 {
    todo!()
  }
}

#[derive(Clone)]
#[repr(transparent)]
struct Storefront<'a>(&'a BackOfStore);
impl<'a> Store for Storefront<'a> {
  type Handle = usize;
  unsafe fn retrieve(&mut self, handle: Self::Handle) -> *mut u8 {
    self.0.retrieve(handle)
  }
}

You can clone the storefront as much as you want (no back of store is cloned) to aquire the access needed. It also allows for stores that cannot handle being used from multiple references. While the current version also works, it forces interior mutability, even when it shouldn't be necessary because it only works in single-threaded environments.

Aliasing

I'm still not 100% sure I understand the rules as writen, so this may be wrong.

Aliasing rules will need to be followed when dereferencing the resulting pointer from resolve/resolve_mut. This should be left for the abstractions on top to deal with, as it could be the case that it never gets multiple pointers simultaneosly.

It took me a while to understand the problem you talked about. Simply, a store references should not alias the allocated memory is. The clearest example –like you pointed out– of this is an inline store:

struct InlineStore {
  meta: Metadata,
  mem: [_; 8],
}

Using any InlineStore methods means that you are also aliasing mem when you only really only want meta and a pointer to mem. I made a partial example of an inline store (playground link because it's too much code for a comment). It also constains a multi-threaded one which is made thread safe with an Arc<Mutex<S>>. The main point is that you can wrap mem with UnsafeCell (or UnsafeAliased :) and then just work with raw pointers 3.

struct InlineStore<T> {
  meta: Metadata,
  mem: UnsafeCell<[MaybeUninit<T>; 8]>,
}
impl InlineStore<T> {
  pub fn retrieve(&mut self, index: usize) -> *mut u8 {
    let raw_mem = self.mem.get() as *mut T;
    unsafe { raw_mem.add(index) as *mut u8 }
  }
}

After doing this example I'd say that aliasing should be for the implementation to ensure because it highly depends on how the memory is layed out. I'd add a note in the documentation warning about it but nothing else. Also, adding an outer UnsafeCell (or related types) does nothing, since when accessing it, you also access mem; so even when adding a mutex to it, you need another wrapper to break this aliasing.

PD: I have yet to form an opinion about Pin and have to do more reserch on linked containers (multi allocations) interactions with this API.

Footnotes

  1. Since when the array gets deallocated, any pointer will for sure be invalid. Note that they may be invalid before that point.

  2. Technically, the lifetime of the pointer is unbound; but because it won't outlive the running program, treating it like 'static is not wrong.

  3. UnsafeCell has get and UnsafeAliased has get_mut which return a raw mutable pointer to the underlying memory, breaking aliasing.

@matthieu-m
Copy link
Author

Another round of well-articulated feedback, I feel spoiled :)

Lifetimes of pointed memory

I can see the advantage of having a maximum bound, indeed. For example, for a stack-pinned store, it's clear that all pointers will be invalidated when the stack frame containing the store is unwound.

What is less clear is whether this additional safeguard can be built on top of Store, or not.

For example, in the future possibilities, you can see that it's possible to build TypedHandle<T> which allows tracking the type of the handle at compile-time. It's still unsafe to use (lifetimes could go wrong) but it solves one aspect at least.

Ideally, I would prefer if the Store API remained as low-level as possible, and for abstractions to be built on top. The first few iterations tried to impart more static guarantees, and were ultimately judged too unwieldy and limiting, so I'd really like to avoid repeating the same mistake.

Would it be possible to build AccessGuard on top? It seems that at no point in time does it actually borrow the store itself, and that instead its lifetime should be linked to the lifetime bound of the memory within -- like a stack-allocated pool, for example.

&self vs &mut self

I'm afraid I'm still not quite understanding why you insist on separating front and back, and why mutability of the front matters so much to you.

In the end, the core data-structure is always the "back", the actual memory pool with both memory and associated metadata for tracking what is available (or not), and this memory pool will be aliased -- and thus require UnsafeCell -- in all but the most trivial cases (of single-element single-use pools). In this RFC, this memory pool is Store.

I don't understand what the "store front" you propose is supposed to be. It seems that it's either also the back (in-line stores) or just a reference to the back (your example) and never has any (mutable) state of its own.

What role is this "front" supposed to play that a &'a S (S: Store) or &'a dyn Store cannot play? Why is it necessary?

Aliasing

The main point is that you can wrap mem with UnsafeCell (or UnsafeAliased) and then just work with raw pointers

You're close, though a bit off yet as far as I understand from my (very recent) conversation with Ralf Jung on this very thread:

  • &UnsafeCell<Pool> is sound even in the presence of an outstanding &mut T pointing within Pool.
  • &mut UnsafeCell<Pool> is NOT sound in the presence of an outstanding &mut T pointing within Pool.
  • UnsafeAliased was a red herring, it won't help.

Therefore, you cannot, soundly, form a mutable reference to Pool in the presence of an outstanding &mut T, nor to any value containing Pool as a field (even indirectly).

And thus the code you linked is unsound despite the use of UnsafeCell, because it forms a &mut UnsafeCell to MiniBackStore and thus to MiniBackStore::mem (prior to taking a pointer).

(And I'll freely admit I had not quite anticipated that when I launched myself in this quest, I knew aliasing mattered, but had not realized the depth of it... I hope that Ralf comments above and the UnsafeAliased RFC can help you better grasp the issue like they helped me)

Split resolve and resolve_mut

Another issue with resolve_mut -- outside of single-allocation storages -- is that a Store should be able to be shared between various collections (think &'a dyn Store<Handle = X>) and still allow resolving *mut T.

Hence, for most stores and usecases, a fn resolve(&self, handle: Self::Handle) -> *mut u8 is required, which by itself requires interior mutability (of the memory pool) since we want from non-mut to mut.


As noted in my prior comment, I'll try to investigate switching from Store (potentially single allocation) + StoreMultiple: Store to Store (multiple allocations) + StoreSingle (parallel, none being a super-trait of the other) with an eye to having no interior mutability in single-shot stores as used by Box and Vec since those are extremely common collections...

... and at this stage I'm afraid it's about the best we can do.

Whenever a store must make multiple allocations, there must be an UnsafeCell around the memory pool it uses, and it must be possible to go from &UnsafeCell<...> to *mut u8, so that quite limits our options.

@mangelats
Copy link

I'm going to make simpler and more focused comments (probably a single topic) so it doesn't take so long to make them.

Lifetimes of pointed memory

The lifetime depends on the store being used. That means that if we want an abstraction to know this lifetime we need this API to share it publicly. We would like to do something like:

trait Store {
  lifetime 'limit;
}
impl Store for Example {
  lifetime 'limit = 'static;
}

So then we could use AccessGuard<<Example as Store>::'limit> on an abstraction on top. This is obviously not valid Rust. We need to get creative with metaprogramming to achieve a similar result. The example using types to encode lifetimes is quite long so check it out the example on the playground (works in stable).

Is this what you had in mind?

@matthieu-m
Copy link
Author

matthieu-m commented Oct 27, 2023

Is this what you had in mind?

I didn't have much in mind, to be honest.

I would, however, develop it much more in-line with the TypedHandle, either as a wrapper type around the API or with an extension trait.

With that in mind:

pub trait StoreUpperBound<'a>: Store {
    //  # Safety: see `Store::resolve`.
    unsafe fn resolve_bounded(&self, handle: Self::Handle) -> AccessGuard<'a> {
        //  Safety:
        //  - Per pre-conditions.
        AccessGuard::new(unsafe { self.resolve(handle) })
    }
}

impl<'a> StoreUpperBound<'a> ExampleStore<'a> {}

And then anyone wanting to use an AccessGuard can just depend on the new trait.

And if people need to adapt existing implementation, you can simply provide an adapter struct which wraps an existing store and forwards existing traits implementations + implements StoreUpperBound.

I would think that's all that's necessary, no?

@matthieu-m
Copy link
Author

@mangelats I updated the companion repository with a Store + StoreSingle API instead, where StoreSingle can have a "proper" API mutability-wise. I'll need some time to re-work the RFC accordingly though, probably something to be done over the next week-end, so I invite you to check the repo if you want a preview.

* Changes:

- Remove StoreMultiple, blending its guarantees into Store.
- Introduce parallel trait StoreSingle, specifically for single
  allocations.
- Review APIs, examples, guarantee descriptions, and justifications.

* Motivation:

An in-line Store must use UnsafeCell internally, which runs afoul of
LLVM coarse-grained attributes (noalias, readonly, writeonly, etc...).

A specialized StoreSingle can however be used for single-allocations,
powering the most commonly used collections (by a wide margin), as it
side-steps the woes of UnsafeCell.
@matthieu-m
Copy link
Author

@CAD97 I think you will particularly appreciate this new revision, as you were the one most prominently decrying the issues that UnsafeCell leads to with regard to code generation.

Thanks to the discussions with @mangelats (once again, thanks for your feedback!), I got the idea of moving away from a single "do-it-all" Store trait supplemented by a StoreMultiple marker trait towards two distinct traits:

  • Store: same API as before, but with the additional guarantees offered previously by StoreMultiple, that is existing handles are not invalidated when allocating.
  • StoreSingle: mutabily-conscious API, only possible thanks to the restriction that it only allocates one memory block at a time.

While the "duplication" of API bothered me, at first, I think it's fine because:

  1. Users of traits -- casual users, store implementers, collection implementers -- already knew whether a single allocation or multiple allocations were necessary: StoreMultiple was required for the latter. So there's no "additional burden".
  2. The signatures of the methods are nigh identical between the two, with the exceptions being:
    • The fact that StoreSingle has both resolve and resolve_mut.
    • The fact that StoreSingle methods take &mut self instead of & self.
  3. The guarantees of the methods are nigh identical between the two, with the exceptions being:
    • The fact that StoreSingle::resolve pointer is a const, not mut.
    • The fact that any allocation by StoreSingle may invalidate all previous handles (and generally does).

This means there's essentially a single API to learn, it just so happens to be expressed as two different traits.


The one thing that bothers me is that I couldn't manage to provide a default implementation of StoreSingle for all types implementing Store. This would require to use specialization, I expect, but I couldn't it to compile and had to implement them by hand. I would be grateful to anyone who figures it out.

text/3446-store.md Outdated Show resolved Hide resolved
Avoid infinite recursion in `MaxPick::clear`.

Co-authored-by: Jiahao XU <[email protected]>
@FilipAndersson245
Copy link

Any update on this?

@matthieu-m
Copy link
Author

Well, it's embarrassing... but no, not really. And it's mostly because nobody (not even I) has been pushing forward.

Ideally, we would need to move this to an e-RFC. A bunch of the small API questions can be solved a posteriori -- they're unsolved for Allocator already, for the most. And a few of the hairy questions (such as Box integration) would be easier to solve in-tree due to interaction with private language items.

There's one nagging API issue which may be better solved prior to an e-RFC, though, and that's moving away from StoreDangling to a const fn dangling(...) on the Store and StoreSingle traits. Unfortunately, const functions in traits are not a thing, there's a languishing RFC, and no compiler implementation.

Even with const functions in traits, though, the libs team didn't seem super motivated:

  • Allocator has already been languishing for years, I'd expect they're afraid Store would follow.
  • There's not much enthusiasm from the community; in particular, despite the demo crate being available, there's been no feedback about using it at all. It's not clear whether integrating Store in-tree would lead to any more feedback.

If you care to see this move forward, then there's two ways you can help:

  1. Help RFC RFC: const functions in traits #3490 move forward, possibly by driving is compiler implementation.
  2. Try out the Store API and provide feedback, especially if you used this on "exotic" storage (shared memory, compressed memory, GPU memory, ...).

@Amanieu
Copy link
Member

Amanieu commented Jul 2, 2024

I personally am not enthusiastic about the Store API because its main use cases (SmallVec and ArrayVec) would end up worse that the existing custom implementations on crates.io:

  • SmallVec can pack the capacity and length into a single field when using the small layout, which means that its smallest size is 24 bytes (cap + len + ptr for the large layout, len + 16 bytes of data for the small layout). This is not possible with the Store API since both the capacity and length must be kept outside the store.
  • ArrayVec avoids the need to hold the capacity entirely since it is available in the type system. Again, this is not possible to do with a generic Vec that works with any Store.

@matthieu-m
Copy link
Author

I personally am not enthusiastic about the Store API because its main use cases (SmallVec and ArrayVec) would end up worse that the existing custom implementations on crates.io:

Yes, everybody has written those two (and their String variants, incredibly common too). I can't even count how many times I have, and it's actually because I was tired of subtle behavior/feature differences between my variants that I created the first Storage API in C++ (since std::allocator didn't cut it).

VecDeque is only slightly harder to write, and I've written Inline & Small variants of it too. In C++, where it's a real pain.

I haven't seen many versions of BTreeMap or HashMap in Inline & Small variants, though. And I certainly can't remember any with the performance of what we've got in std.

The first advantage of Store is not so much that the super-common Vec can get Inline & Small variant, it's that all collections do. Even the complex ones (BTreeMap, HashMap) or the more esoteric ones (BinaryHeap). Even the 3rd-party ones (IndexedMap, FenwickTree, ...). All.

I personally am not enthusiastic about the Store API because its main use cases (SmallVec and ArrayVec) would end up worse that the existing custom implementations on crates.io:

Specialized collections on crates.io may always have some advantage indeed. They may be leaner (data-wise and code-wise), they may have extra functionality/guarantees, etc...

I wouldn't necessarily say they're better though. They may not be as feature-rich, as optimized, as sound, and regardless they require depending on a 3rd-party which introduces its own risks.

So, yes, if one wishes to save 8 bytes one can use SmallVec and ArrayVec from crates.io.

I'd recommend people don't unless they have a very good reason too: Inline & Small collections -- by virtue of inlining the data -- are fat already. In most cases 8 bytes won't make or break the usecase, and the quality/safety/security assurances you get by using the standard ones are not worth trading for such a small saving.

ArrayVec avoids the need to hold the capacity entirely since it is available in the type system. Again, this is not possible to do with a generic Vec that works with any Store.

Not with the current Vec, but that's just because the current Vec never had to make the provision. Customizable metadata (length & capacity), of which a fixed capacity is one of the easiest implementations, would easily solve this case.

The more pressing issue may be fallibility. An Inline version of Vec will fail to push/extend at some point, which the current signatures don't allow. But once again, fallibility is already an issue today (Linux Kernel) and requires changing the current Vec to accommodate, because it's a short-coming of Vec, not a short-coming of Store or Allocator.

SmallVec can pack the capacity and length into a single field when using the small layout, which means that its smallest size is 24 bytes (cap + len + ptr for the large layout, len + 16 bytes of data for the small layout). This is not possible with the Store API since both the capacity and length must be kept outside the store.

Similarly, I have an InlineString<N> implementation which stores neither length nor capacity, only an array [u8; N], using a NUL byte as delimiter, and otherwise being N bytes long in the absence of such. Heroics could be made to define a CoreString type with abstract metadata to accommodate it. Further heroics would be needed to make CoreString Copy. At least, I haven't figured out how a generic container can be both Copy for some parameters & Drop for others. And overall it may not be worth it. Sometimes a specialized solution is warranted.

Yes, a specialized container can pull heroics. With trade-offs. Whether a user value those heroics or not will depend on their usecases. Most users probably won't care:

  • SmallVec<String, 1> is 32 bytes (optimized) vs 40 bytes (standard), for a +25% overhead.
  • SmallVec<String, 2> is 56 bytes (optimized) vs 64 bytes (standard), for a +15% overhead.
  • SmallVec<String, 4> is 104 bytes (optimized) vs 112 bytes (standard), for a +8% overhead.
  • It's all downhill from here.

If a user is looking for such savings, though, maybe they'll want to use u32 instead of usize for capacity and length even in the general Vec case. It's certainly a trick I've used often. No performance cost and only extremely rare usecases require storing more than 4 billion elements. Or maybe they're fine with only power-of-2 capacities, in which case they only need 6 bits at the top of the usize to store the capacity (on 64-bits platforms), and they still have 58 bits for the actual length, which realistically they'll never exhaust.

Once again, though, are the heroics worth it? For most usecases, no. Rewriting a container from scratch to save up 8 bytes is rarely, if ever, worth it. All the more reasons for a generic Store API.

I personally am not enthusiastic about the Store API because its main use cases (SmallVec and ArrayVec) would end up worse that the existing custom implementations on crates.io:

Are those even the main usecases, though?

I mean, I'm quite a fan of InlineBox<dyn Trait, 256> which gets me no-allocation type-erasure. There's a specialized StackFuture on crates.io, but I don't recall a generic version for any trait, though I may have missed it.

And even if those are the main usecases, I don't see many people attacking the issue of rewriting core data-structures for shared-memory usage (inter-process communication OR persistence across restarts).

I'd love me a HashMap<K, V, S, SharedMemoryStore>.

@Walter-Reactor
Copy link

Walter-Reactor commented Aug 16, 2024

Sorry if this is off topic, but could you elaborate on what problems the current allocator API has with shared memory? We've been using allocator2 with a shared memory allocator for a little while now & haven't hit any issues yet, and this has me concerned we just haven't exercised the API enough.

@matthieu-m
Copy link
Author

Sorry if this is off topic, but could you elaborate on what problems the current allocator API has with shared memory? We've been using allocator2 with a shared memory allocator for a little while now & haven't hit any issues yet, and this has me concerned we just haven't exercised the API enough.

The Allocator API is perfectly suited to allocating shared memory, there is no concern.

The problem of the Allocator API is that it returns a pointer. On modern hardware and OSes, a pointer is an address in virtual memory which is, by default, private to the process, which in turn means said pointer is, by default, meaningless outside the process which allocated it. This means a pointer cannot be shared across processes, by default, which in turns means a pointer should not be written to (or read from) shared memory. Specific precautions can be taken: disabling ASLR may allow sharing pointers to virtual-tables or other static data, shared memory can be mapped at the same address in all processes allowing sharing "inner" pointers, etc... but by default, pointers should not be shared.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.