-
-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option for a Span to be based on bytes rather than characters #8
Comments
I was using spans with byte-based indexing and came across this issue so I changed to character-based indexing. It actually turned out to be a bit more efficient as well. |
fwiw I created a fork that does this (but removes All it does is switch this line: Line 94 in ccd4651
to: let len = line.chars().fold(0, |mut acc, i| {acc += i.len_utf8(); acc }); I am happy to open a PR that adds a function |
Isn't that simply |
haha TIL |
I'm interested in working on this. @zesterer would you be open to switching entirely to byte indices? Or should both ways be supported? |
Definitely interested! In my view, spans should use byte indices by default, with some built-in way to look up character indices. I'm happy to accept a temporary solution for now: long-term, I think the crate needs more substantial changes anyway. |
Is there a branch which already solves this? |
Not yet, no. I don't currently have the time to work on it, although I wish I did. |
I see in the code it's possible to provide your own |
This has been suggested elsewhere, yes. That's probably something to add to the list for an eventual refactor. |
I'm getting started in a PR to implement this, which approach would you prefer @zesterer ?
The first is of course more general, even allowing users to combine Char and Byte spans, but requires two new structs for Span with and without SourceId. The second is less general, requiring the user to specify it for the whole report. Which is more ergonomic, but limits freedom |
Well, I've finally come to the point where I need to switch it over. I'll implement it as a boolean flag to ReportBuilder. |
Closes zesterer#8 Closes Duplicate issues zesterer#71 and zesterer#57
Any chance that this feature could get into a patch release soon? I'm fine with putting
in my EDIT: Actually, while playing around with this feature, I discovered that the column number of the offset of a report is printed incorrectly when using byte indices. I created a PR to fix this: #113. |
I've just released |
Right now, the error drawer seems to assume that
span.start()
andspan.end()
are character indices rather than byte indices. This might be useful for manually choosing the error position, but most lexers use byte positions instead of character positions.This makes a difference when you use Unicode characters in the source.
This is a language that is lexed using logos, which uses byte indices:
The error gets offset because of some Unicode characters before the error position:
I am not sure what the most idiomatic API for this is, but one way could be to have an optional function in the
Span
trait that looks something likefn uses_bytes() -> bool
, where the default implementation just returnsfalse
. Another way is to add the option to theConfig
struct.The text was updated successfully, but these errors were encountered: