Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FromIter for Ranges #278

Merged
merged 5 commits into from
Nov 14, 2024
Merged

Add FromIter for Ranges #278

merged 5 commits into from
Nov 14, 2024

Conversation

konstin
Copy link
Member

@konstin konstin commented Nov 8, 2024

Add a method to construct ranges from an iterator of arbitrary segments. This allows to .collect() an iterator of tuples of bounds. This is more ergonomic than folding the previous ranges with the next segment each time, and also faster.

Split out from #273
Closes astral-sh#33
Fixes #249

#[test]
fn from_iter_valid(segments in proptest::collection::vec(any::<(Bound<u32>, Bound<u32>)>(), ..30)) {
// We check invariants in the method.
Ranges::from_iter(segments.clone());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could strengthen this test by checking that the result is equivalent to iteratively calling union.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This caught a significant oversight!

// following: |------|
//
// final: |------| |------| |------|
segments.insert(insertion_point, segment);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is doing insertion sort. Witch is O(n^2), because this insert (or the remove above) is O(n). Shoving all the segments into a heap should get us O(n*log(n)) time, but take O(n) space. I don't know if it actually matters in practice.

If the point is to be practical I'm happy to merge this with a comment about alternative implementations. If the point of this is to implement the algorithmically best solution, we can do better than this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One solution would be collecting the segments into a sorted datastructure, then building a second ranges without any shifting around: We pay the price of two datastructure for no moving around. This solution works with only a single ranges, inserting or removing elements as we detect overlaps: If the elements are already in order, it's only pushes to a single smallvec (I chose not to reserve, since we could vastly over-reserve). As usual, I'd go with this implementation and change it if benchmarking with users shows that another algorithm would be better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment on these two lines explaining that the call is O(n) which makes the algorithm O(n^2) but it should be good enough for now. That way somebody looking at a flame graph that's hot here has some clue that there is a fix we've already thought of.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment here and a more general explanation of the tradeoff above.

Copy link
Member

@Eh2406 Eh2406 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two nits about comments.

// We have three constraints we need to fulfil:
// 1. The segments are sorted, from lowest to highest (through `Ord`): By sorting.
// 2. Each segment contains at least one version (start < end): By `union`.
// 3. There is at least one version between two segments: By `union`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"By union" does not match the implementation. While I'm making nits about comments, let's make this clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good catch, i forgot to update the comment after changing the implementation.

// following: |------|
//
// final: |------| |------| |------|
segments.insert(insertion_point, segment);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment on these two lines explaining that the call is O(n) which makes the algorithm O(n^2) but it should be good enough for now. That way somebody looking at a flame graph that's hot here has some clue that there is a fix we've already thought of.

Add a method to construct ranges from an iterator of arbitrary segments. This allows to `.collect()` an iterator of tuples of bounds. This is more ergonomic than folding the previous ranges with the next segment each time, and also faster.

Split out from #273
Closes astral-sh#33
Fixes #249
@konstin konstin force-pushed the konsti/dev/from-iter branch from cd57a69 to fe37b91 Compare November 14, 2024 13:18
@konstin konstin added this pull request to the merge queue Nov 14, 2024
Merged via the queue into dev with commit 33f68e4 Nov 14, 2024
6 checks passed
@konstin konstin deleted the konsti/dev/from-iter branch November 14, 2024 13:21
@konstin konstin added the Ranges label Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for Range::union_many or similar
2 participants