Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add experimental u64 version bitset #284

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

x-hgg-x
Copy link
Contributor

@x-hgg-x x-hgg-x commented Nov 20, 2024

Part of #274.

This PR adds a experimental module with the new code to facilitate reviewing.

Copy link

codspeed-hq bot commented Nov 20, 2024

CodSpeed Performance Report

Merging #284 will not alter performance

Comparing x-hgg-x:u64-bitset (a9f86b9) with dev (3741e3b)

Summary

✅ 6 untouched benchmarks

@x-hgg-x x-hgg-x force-pushed the u64-bitset branch 2 times, most recently from 37f165a to 2066fa6 Compare November 21, 2024 10:14
@x-hgg-x x-hgg-x changed the title Use u64 for version set type Add experimental u64 version bitset Nov 21, 2024
@x-hgg-x
Copy link
Contributor Author

x-hgg-x commented Nov 21, 2024

I kept only the new code and put it in a new module to facilitate reviewing.

@Eh2406
Copy link
Member

Eh2406 commented Nov 21, 2024

Thank you for doing that! With this revision, I'm looking forward to reviewing it :-P.

It seems very useful for this library to provide example implementations of the traits it uses. Like OfflineDependencyProvider, "here is a way to implement it that might be really useful if you care about X, Y, and Z but is not the best choice if you care about A, B, and C". So if (and I'm not sure, because I haven't yet looked at the code) we have a "here is a VersionSet implementation that is very fast if most of your packages have fewer than 63 versions, but may be slower or have confusing error messages for packages that have more" seems worth adding and easy enough to accept.

@x-hgg-x
Copy link
Contributor Author

x-hgg-x commented Nov 21, 2024

"here is a VersionSet implementation that is very fast if most of your packages have fewer than 63 versions, but may be slower or have confusing error messages for packages that have more"

With the latest code for #274, I removed the limit of 63 versions in the OfflineDependencyProvider, and the u64 bitset is always faster regardless of the number of versions. We also can merge or collapse derivation trees containing virtual packages in the report, so that the error messages stay comprehensible.

@x-hgg-x
Copy link
Contributor Author

x-hgg-x commented Nov 21, 2024

In #274 I didn't kept the VersionSet trait since the next non-breaking commits reduces the resolve time by a further 50%, but they depend of the reduced size of VersionSet/Term/VersionIndex, so they are not applicable if the types stay generic.

A solution could be to copy all changed code in the experimental module and keep temporarily the two implementations in parallel to compare them.

@mpizenberg
Copy link
Member

A solution could be to copy all changed code in the experimental module and keep temporarily the two implementations in parallel to compare them.

Sounds like a good compromise to start with.

@konstin
Copy link
Member

konstin commented Nov 26, 2024

In this PR, the speedup is in a separate module, how will that interact with #274? Will the other performance improvements only apply to this new API or also improve existing usage?

Can you describe what assumptions the new representation is making? E.g., do we need to know all versions upfront? When selecting versions, do we need to go through all versions or can we skip some (e.g. prereleases)? How do i convert a complex version type to the bitset? What's the maximum number of versions?

Before we merge, we also need to explain the representation of the bitset in a docstring.

@x-hgg-x x-hgg-x force-pushed the u64-bitset branch 2 times, most recently from 7638eca to cefdf76 Compare November 26, 2024 17:32
@x-hgg-x
Copy link
Contributor Author

x-hgg-x commented Nov 27, 2024

In this PR, the speedup is in a separate module, how will that interact with #274? Will the other performance improvements only apply to this new API or also improve existing usage?

I put the changes in a separate module so that I can split the changes in multiple PRs.
Here is my plan:

  • Copy existing files one by one in the experimental module with changes applied, which can be reviewed iteratively.
  • Extend benchs with pubgrub::expermental::resolve() to monitor and compare performance.
  • Apply other optimizations, which are specific to the new code.
  • When the new code is validated, replace the old implementation.

#274 is only used to show the final code, and will be closed once all code has been merged in the experimental module.

Can you describe what assumptions the new representation is making? E.g., do we need to know all versions upfront? When selecting versions, do we need to go through all versions or can we skip some (e.g. prereleases)? How do i convert a complex version type to the bitset? What's the maximum number of versions?

The dependency provider need to know all versions of a dependency in the get_dependencies() method of the parent package. It needs to assign a unique increasing u64 index for each version of a package, which can be implemented by storing all versions in a Vec and returning its index. There is no need for the versions to have any specific ordering, but sorting versions indices by priority will make the choose_version() method easier to implement by always choosing the last (or first) version.

In the get_dependencies() method, the dependency provider will need to list all matching versions of a dependency, get their version index and call PackageVersionWrapper::new_dep() with the list of matching version indices (pre-releases are included if they match the version requirement of the parent). By using the PackageVersionWrapper wrapper, we can support more than 63 versions (and version indices greater than 63), so the maximum number of versions is u64::MAX + 1. Internally the wrapper stores a bitset using Rc<[VersionSet]> containing all matching versions indices, then by using subpackages it dispatches the dependency versions by grouping up to 63 version indices in a VersionSet.

@konstin
Copy link
Member

konstin commented Dec 10, 2024

While #274 does great for the cargo performance numbers, it would be a quite disruptive of an API change for other pubgrub users such as uv and elm, breaking with the Ranges model for versions. It's hard to anticipate what effects that would have on downstream packages (the performance characteristic will be very different, conversion from other version system may have an impact, afaik we're losing the exact operators used), and I'm not sure if the benefit is worth the cost of the migration; Commits that avoid major changes to the API surface (or make the new interface opt-in) are much easier to merge.

If you want, we can also chat directly about how to best integrate these changes, e.g. in zulip or you can reach me at [email protected], or you can meet @Eh2406 and me in the weekly pubgrub office hours, wednesdays at 7pm CET (calendar: https://github.com/rust-lang/calendar?tab=readme-ov-file)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants