Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue for Vec::push_within_capacity #100486

Open
1 of 5 tasks
the8472 opened this issue Aug 13, 2022 · 19 comments
Open
1 of 5 tasks

Tracking Issue for Vec::push_within_capacity #100486

the8472 opened this issue Aug 13, 2022 · 19 comments
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@the8472
Copy link
Member

the8472 commented Aug 13, 2022

Feature gate: #![feature(vec_push_within_capacity)]

This is a tracking issue for Vec::push_within_capacity

It enables pushing into a Vec if there is any capacity left and otherwise returns the value. I.e. it avoids implicit resizing. This can be useful when one wants to explicitly control it (e.g. via try_reserve) or uses a Vec in an unsafe manner where moving the elements would be a problem.

Public API

// alloc::Vec

impl Vec<T, A> {
    pub fn push_within_capacity(&mut self, value: T) -> Result<(), T>;
}

Steps / History

Unresolved Questions

Footnotes

  1. https://std-dev-guide.rust-lang.org/feature-lifecycle/stabilization.html

@the8472 the8472 added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC labels Aug 13, 2022
@em-tg
Copy link

em-tg commented Mar 19, 2023

Would it be possible for push_within_capacity to take self by shared reference? If the backing store is never re-allocated then calling this wouldn't invalidate any references to the Vec's contents, I'd think.

@the8472
Copy link
Member Author

the8472 commented Mar 19, 2023

It still needs to bump the length field.

@em-tg
Copy link

em-tg commented Mar 19, 2023

Wouldn't wrapping the length in UnsafeCell work for that? The public api only exposes the len() getter.

@em-tg
Copy link

em-tg commented Mar 20, 2023

Ah wait I think I see the issue. You'd lose Sync by doing that. Sorry, still pretty new to Rust and just thought I'd ask!

@tiby312
Copy link

tiby312 commented Jun 19, 2023

I think this is a great feature. I often find myself just assuming my buffer reserving code is working since it is a hard bug to catch when it's not working given push automatically reserves.

@krtab
Copy link
Contributor

krtab commented Oct 27, 2023

Hi!

I think this could enter its FCP:

@kornelski
Copy link
Contributor

I find it very useful, and have similar helpers in my codebases. It generates much leaner code than the usual push when the optimizer can't be convinced that the capacity is sufficient (e.g. when it's calculated in a non-trivial way, or the vec-appending code isn't all inlined). It has a useful side effect of verifying that a sufficient capacity has been precalculated, and there aren't unexpected reallocations.

However, just push is not enough. I'm missing similar functionality for extend_from_slice_within_capacity.

@elichai
Copy link
Contributor

elichai commented Feb 11, 2024

I hope this isn't completely useful,
But is there some kind of mechanism we can use to allow all these try_ functionality without adding dozens of try_ in all collections, could being generic over the effect of try allow something like this?

@Darksonn
Copy link
Contributor

Darksonn commented May 1, 2024

How about insert_within_capacity? It's very non-trivial to reimplement insert without an allocation codepath.

@the8472
Copy link
Member Author

the8472 commented May 1, 2024

Sounds interesting, though a bit niche (just like this method). If you have a use-case you should file an ACP for it.
Ditto @kornelski

If you several a bunch of methods like this in mind then we could perhaps bundle them to see if there are some overarching things to consider.

@zeroflaw
Copy link

zeroflaw commented May 9, 2024

Now that it's possible to do a try_reserve, I find myself using these extension methods a lot more often. push_within_capacity here gives us an escape hatch to do an unchecked_push.
I included an example of insert_within_capacity and try_insert_within_capacity

// ===== trait Ext =====

pub trait Ext<T> {
    /// Appends an element to the back of a collection without checking
    /// if the collection has enough capacity.
    ///
    /// # Safety
    /// The user must ensure that `self.len() < self.capacity()`.
    ///
    unsafe fn unchecked_push(&mut self, value: T);

    /// Removes the last element from a vector without checking
    /// if the collection is empty.
    ///
    /// # Safety
    /// The user must ensure that `self.len() != 0`.
    ///
    unsafe fn unchecked_pop(&mut self) -> T;

    /// Inserts an element at position index within the vector, 
    /// without checking if the collection has enough capacity, 
    /// or the position index is out of bounds.
    /// all elements after the position index are shifted to the right. 
    /// 
    /// # Safety
    /// The user must ensure that `index <= self.len()` and `self.len() < self.capacity()`
    ///
    unsafe fn unchecked_insert(&mut self, index: usize, value: T);


    /// Inserts an element at position index within the vector, 
    /// if there is sufficient spare capacity, otherwise an error is returned with the element.
    /// 
    /// # Errors
    /// if insufficient capacity, an error is returned with the element
    /// 
    /// # Panics
    /// Panics if index > len.
    /// 
    fn insert_within_capacity(&mut self, index: usize, element: T) -> Result<(), T>;

    /// Inserts an element at position index within the vector, 
    /// if there is sufficient spare capacity, and index is within bounds,
    /// otherwise an error is returned with the element.
    /// 
    /// # Errors
    /// if insufficient capacity, or the index is out of bounds, an error is returned with the element
    /// 
    fn try_insert_within_capacity(&mut self, index: usize, element: T) -> Result<(), T>;
}

// ===== impl Vec =====

impl<T> Ext<T> for Vec<T> {
    unsafe fn unchecked_push(&mut self, value: T) {
        std::hint::assert_unchecked(self.len() < self.capacity());
        self.push_within_capacity(value).unwrap_unchecked();
    }

    unsafe fn unchecked_pop(&mut self) -> T {
        std::hint::assert_unchecked(!self.is_empty());
        self.pop().unwrap_unchecked()
    }
    
    unsafe fn unchecked_insert(&mut self, index: usize, element: T) {
        std::hint::assert_unchecked(index <= self.len());
        std::hint::assert_unchecked(self.len() < self.capacity());
        self.insert(index, element);
    }
    
    fn insert_within_capacity(&mut self, index: usize, element: T) -> Result<(), T> {
        if self.len() == self.capacity() {
            return Err(element);
        }
        unsafe { std::hint::assert_unchecked(self.len() < self.capacity()) };
        self.insert(index, element);
        Ok(())
    }
    
    fn try_insert_within_capacity(&mut self, index: usize, element: T) -> Result<(), T> {
        if self.len() == self.capacity() || index > self.len() {
            return Err(element);
        }
        unsafe { self.unchecked_insert(index, element); }
        Ok(())
    }
}

@kornelski
Copy link
Contributor

There are two ways to implement such functionality, and I'm not sure which way to propose:

  1. Add more methods on Vec with a _within_capacity version. I think this would work well, but I've got a lot of pushback about adding try_ versions of methods. Rust doesn't have a feature like Swift's function naming/overloading to keep all of these functionality tweaks elegantly together.

  2. A reference type like FixedCapacityVecView<'_> that imitates interface of Vec, but without reallocation. This seems like a big addition, especially if it will duplicate most of Vec's interface, but most users will probably only need push and extend.

@GoldsteinE
Copy link
Contributor

This FixedCapacityVecView<'_> is kinda similar to std::io::BorrowedBuf<'_> in both its API and idea.

@kornelski
Copy link
Contributor

Although it's somewhat similar, BorrowedBuf is not a replacement:

  • it only supports u8, no other types.
  • it tracks initialized and filled levels separately, which is unnecessary for Vec. Just an extra field may seem like a small issue, but Vec had issues with missed optimizations and bloated code when LLVM lost track of len <= capacity relationship, and I'm afraid that tracking relationships betwee len, capacity, spare_capacity_slice.len(), filled, and initialized all at once may be hard for LLVM in hot loops. I use push in capacity mainly to reduce branches and code size.
  • it doesn't seem to have a way to integrate with a Vec in safe code. User needs to use vec.set_len() to update it with the filled len. This actually seems like a flaw in BorrowedVec to me.

@the8472
Copy link
Member Author

the8472 commented Jun 5, 2024

I assume that was only intended to point at some similar-not-identical prior art, not suggesting that BorrowedBuf can be reused for the purpose.

Add more methods on Vec with a within_capacity version. I think this would work well, but I've got a lot of pushback about adding try versions of methods.

Yeah, imo this is the biggest obstacle to getting anything done. It's either a lot new awkwardly named methods, view types or some generics/associated type shenanigans.
And some of the APIs require new design because they're not just 1:1 changes. Note how push_within_capacity has a return type, unlike push.

Thinking about design I now realize that there's an alternative, it could take a &mut MaybeUninit<T> and not move the value if the capacity is full. Would that bring any perf benefits, maybe for large values?
And that's just an alternative for a single method. This kind of exploration is why it's often suggested to test these thing as a crate outside of std first.

@mcroomp
Copy link

mcroomp commented Dec 19, 2024

How about something like this:

pub fn needs_to_grow<T>(&self, additional: usize) -> bool {
    additional > self.capacity().wrapping_sub(self.len())
}

it makes it easy to solve both problems... the trick is that the implementation must match exactly to what RawVec does, otherwise the optimizer doesn't realize that the operations are identical.

@kornelski
Copy link
Contributor

I don't think it's important to return elements that couldn't fit in the capacity. This isn't like I/O functions or concurrent queues where you can't predict whether insert will succeed or not. When you want to keep the elements that can't be inserted, it's sufficient to check the capacity before push/extend.

@mcroomp
Copy link

mcroomp commented Dec 20, 2024

I don't think it's important to return elements that couldn't fit in the capacity. This isn't like I/O functions or concurrent queues where you can't predict whether insert will succeed or not. When you want to keep the elements that can't be inserted, it's sufficient to check the capacity before push/extend.

The main problem is the that obvious x.len() + additional <= x.capacity() will not convince the optimizer not to check twice, you have to do additional > self.capacity().wrapping_sub(self.len()) to match the check that RawVec makes, which is why it would probably be better to have a utility method in Vec to avoid taking a dependency on the internal implementation.

@hanna-kruppe
Copy link
Contributor

Returning the value that couldn’t be pushed/inserted/etc. only seems to matter if:

  1. The element type isn’t Copy, and
  2. The code using *_within_capacity can do something useful with the excess value after failing to put it in the Vec, and
  3. It’s important that the capacity is only checked once for runtime efficiency or binary size.

I struggle to imagine a use case with these properties that isn’t isomorphic to “generic try_{push,insert,…} that handles OOM by returning Err”. The other uses I know of only involve Copy types (e.g., T = u8 for byte centric I/O without std::io), or have sufficient capacity as logical invariant (i.e., if the within_capacity call fails, that’s not meaningfully recoverable).

I guess if you use a Vec as a fixed size buffer of non-Copy elements and want to flush the buffer to another stage of processing whenever it gets full? This needs push_within_capacity (or eat the potential cost of a manual capacity check) but not any other methods (that imply a capacity check). A moving extend_from_slice analogue is sometimes useful, but if you want to do better than pushing one element at a time then the technical challenge isn’t “de-duplicate a capacity check” but rather arranging a right-sized memcpy from the source into your buffer (repeatedly until the source is exhausted). This can be done today with spare_capacity_mut without having to assume anything undocumented about Vec.

If any use case for more than push_within_capacity amounts to tackling the “zoo of Vec::try_*” challenge, then I’d rather see just push_within_capacity stabilized on its own merits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests