Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement FlattenOk::{fold, rfold} #927

Merged

Conversation

kinto-b
Copy link
Contributor

@kinto-b kinto-b commented May 1, 2024

Relates to #755

$ cargo bench --bench specializations flatten_ok/fold -- --baseline flatten_ok

flatten_ok/fold         time:   [3.0078 µs 3.2944 µs 3.5502 µs]
                        change: [-47.164% -42.833% -38.023%] (p = 0.00 < 0.05)
                        Performance has improved.

Edit: .rfold is an almost identical implementation, so I'll put it through in a moment. Just running benchmarks...

Edit:

$ cargo bench --bench specializations flatten_ok/rfold -- --baseline flatten_ok

flatten_ok/rfold        time:   [2.7442 µs 2.8254 µs 2.9134 µs]
                        change: [-42.924% -39.835% -36.485%] (p = 0.00 < 0.05)
                        Performance has improved.

Copy link

codecov bot commented May 1, 2024

Codecov Report

Attention: Patch coverage is 97.22222% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 94.55%. Comparing base (6814180) to head (568021d).
Report is 64 commits behind head on master.

Files Patch % Lines
src/flatten_ok.rs 97.22% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #927      +/-   ##
==========================================
+ Coverage   94.38%   94.55%   +0.16%     
==========================================
  Files          48       48              
  Lines        6665     6926     +261     
==========================================
+ Hits         6291     6549     +258     
- Misses        374      377       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@Philippe-Cholet Philippe-Cholet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can it pass tests without using the fields inner_front and inner_back?! That's troubling!

EDIT: Note that you can use an unique baseline for running benchmarks.
I agree we should make the tests fail before fixing [r]fold as this indicates the tests are a bit buggy.

@Philippe-Cholet Philippe-Cholet added this to the next milestone May 1, 2024
@kinto-b
Copy link
Contributor Author

kinto-b commented May 1, 2024

@Philippe-Cholet I was just looking back over the code wondering that myself! Investigating now...

Edit: Have to step out for dinner. Will look into this some more later on. I feel as though it shouldn't work. The necessary fix seems pretty simple, but I want to make the tests fail with the current code before I change anything

@Philippe-Cholet Philippe-Cholet changed the title Implement FlattenOk::fold Implement FlattenOk::{fold, rfold} May 1, 2024
@kinto-b
Copy link
Contributor Author

kinto-b commented May 1, 2024

Ah so the tests fail to capture what's wrong with this code because they use Option<u8> values for each Ok item in the iterator. We need non-singleton values.

There is a comment there about it being too slow to use Vec<u8>:

// `Option<u8>` because `Vec<u8>` would be very slow!! And we can't give `[u8; 3]`.
fn flatten_ok(v: Vec<Result<Option<u8>, char>>) -> () {

I've tested changing over to Vec<u8> and, though it catches the problem, it takes several minutes for the test to finish once the bug is ironed out.

What's the best approach here do you think?

@kinto-b kinto-b force-pushed the feature/FlattenOk-fold branch from e9707bc to fc5aab4 Compare May 1, 2024 11:36
@Philippe-Cholet
Copy link
Member

Ah I think I wrote this comment. Use Vec<u8> here is really slow, it's not really an option. So if we can't give [u8; 3] instead of Option<u8> then maybe we can write our own limited vector (what a pain) and implement the necessary traits for this to work. I'm gonna investigate this.

@Philippe-Cholet
Copy link
Member

Philippe-Cholet commented May 1, 2024

So we basically need a light small vector, here is one with max 2 elements. The flatten_ok test pass under 1 second for me.

use quickcheck::Arbitrary;
use rand::Rng;
...
    // remove comment
    fn flatten_ok(v: Vec<Result<SmallIter2<u8>, char>>) -> () {
...

/// Like `VecIntoIter<T>` with maximum 2 elements.
#[derive(Debug, Clone, Default)]
enum SmallIter2<T> {
    #[default]
    Zero,
    One(T),
    Two(T, T),
}

impl<T: Arbitrary> Arbitrary for SmallIter2<T> {
    fn arbitrary<G: quickcheck::Gen>(g: &mut G) -> Self {
        match g.gen_range(0u8, 3) {
            0 => Self::Zero,
            1 => Self::One(T::arbitrary(g)),
            2 => Self::Two(T::arbitrary(g), T::arbitrary(g)),
            _ => unreachable!(),
        }
    }
    // maybe implement shrink too, maybe not
}

impl<T> Iterator for SmallIter2<T> {
    type Item = T;

    fn next(&mut self) -> Option<Self::Item> {
        match std::mem::take(self) {
            Self::Zero => None,
            Self::One(val) => Some(val),
            Self::Two(val, second) => {
                *self = Self::One(second);
                Some(val)
            }
        }
    }

    fn size_hint(&self) -> (usize, Option<usize>) {
        let len = match self {
            Self::Zero => 0,
            Self::One(_) => 1,
            Self::Two(_, _) => 2,
        };
        (len, Some(len))
    }
}

impl<T> DoubleEndedIterator for SmallIter2<T> {
    fn next_back(&mut self) -> Option<Self::Item> {
        match std::mem::take(self) {
            Self::Zero => None,
            Self::One(val) => Some(val),
            Self::Two(first, val) => {
                *self = Self::One(first);
                Some(val)
            }
        }
    }
}

@scottmcm
Copy link
Contributor

scottmcm commented May 1, 2024

nit: that looks wrong since it never shrinks from One to Zero. EDIT: No, I'm blind and missed the take, nevermind.

(ArrayVec<T, 2> is the classic solution for an iterator like this, but I would also understand not wanting to add a dev-dep just for this.)

@Philippe-Cholet
Copy link
Member

I thought of ArrayVec<T, N> but I would have to wrap it anyway to implement quickcheck::Arbitrary (and therefore Iterators too). I would have added it if we wanted more control than a fixed max-length of 2.

@kinto-b
Copy link
Contributor Author

kinto-b commented May 3, 2024

@Philippe-Cholet Ah neat, thanks for that! Shall I pop that into its own module within tests/ or just stick it in tests/specializations.rs?

@Philippe-Cholet
Copy link
Member

@kinto-b Just add this in "tests/specializations.rs" where it's needed. We can move it later if we need to.

Sidenote: For CI to pass again (because of recent Clippy 1.78), you will need to rebase your branch on master.

kinto-b added 3 commits May 5, 2024 00:25
Previously the test used Option<u8> but the coverage was bad. We cannot use Vec<u8> because it is too slow.
@kinto-b kinto-b force-pushed the feature/FlattenOk-fold branch from fc5aab4 to 568021d Compare May 4, 2024 14:26
Copy link
Member

@Philippe-Cholet Philippe-Cholet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
The specialization test did slow us down. An Option seemed good enough when I wrote it.
I guess the benchmark is up-to-date. -40% is nice enough.

Thanks again!

@Philippe-Cholet Philippe-Cholet added this pull request to the merge queue May 4, 2024
Merged via the queue into rust-itertools:master with commit e53f635 May 4, 2024
13 checks passed
@kinto-b
Copy link
Contributor Author

kinto-b commented May 4, 2024

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants