Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce string copies in Source #89

Merged
merged 3 commits into from
Sep 29, 2023
Merged

Conversation

goto-bus-stop
Copy link
Contributor

The Source structure stores an owned String for each individual line. This can cause many allocations for large documents. We regularly deal with 65 000 lines or more that only have a handful of diagnostics.

This PR makes two changes:

  1. It separates string storage and line storage, Line{} now only contains indices into the string. Both character and byte indices are calculated in Source::from so the string storage can be sliced efficiently. You now have to go through Source::get_line_text(Line) to get the source text for a line. This commit still copies the text into Source but it does it in one allocation.
  2. It makes Source generic over the string storage. This does add some complexity, as the genericity also requires having an associated string storage type on Cache implementations. But, it lets you use Arc<str> as a storage type to avoid having to copy the whole source text into ariadne, or even &str if you can guarantee the lifetime. The FileCache now no longer copies each file's contents immediately after reading it. The sources() function will use string references if possible. Only users of custom caches should be affected by this change.

With these changes you can use ariadne without copying any strings, the only allocation is for computing line offsets (and for individual diagnostics of course).

Copy link
Owner

@zesterer zesterer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is a brilliant change! Just one small suggestion. It's a shame that default associated types aren't stable, since this could have been a semver-compliant change (type Storage = String;).

src/source.rs Outdated Show resolved Hide resolved
@goto-bus-stop
Copy link
Contributor Author

Yeah I started out with type Storage = String and got yelled at 🥲

Defaulting to String makes sense, mostly eases the upgrade path a bit for people currently writing Source in return types or struct fields.

@zesterer
Copy link
Owner

Thanks very much for the PR!

@zesterer zesterer merged commit 170a460 into zesterer:main Sep 29, 2023
@goto-bus-stop goto-bus-stop deleted the line-no-alloc branch September 29, 2023 11:44
@goto-bus-stop
Copy link
Contributor Author

thanks for reviewing so fast! 🤩

@zesterer
Copy link
Owner

Were you looking for a release with this in the near term, or are you ok to wait for a while?

@goto-bus-stop
Copy link
Contributor Author

It's not critical, so please pick as you see fit :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants