Add option for a Span to be based on bytes rather than characters #8

Spu7Nix · 2021-09-10T19:14:30Z

Right now, the error drawer seems to assume that span.start() and span.end() are character indices rather than byte indices. This might be useful for manually choosing the error position, but most lexers use byte positions instead of character positions.

This makes a difference when you use Unicode characters in the source.
This is a language that is lexed using logos, which uses byte indices:

Error: Index 5 is out of range of array (length 3)
   ╭─[test/test.spwn:3:6]
   │
 3 │ ╭─▶ arr[5] = 1
 4 │ ├─▶ // this should not be in the error
   · │
   · ╰──────────────────────────────────────── Index 5 is out of range of array (length 3)
───╯

The error gets offset because of some Unicode characters before the error position:

// øæåææ
let arr = [1, 2, 3]
arr[5] = 1
// this should not be in the error

I am not sure what the most idiomatic API for this is, but one way could be to have an optional function in the Span trait that looks something like fn uses_bytes() -> bool, where the default implementation just returns false. Another way is to add the option to the Config struct.

The text was updated successfully, but these errors were encountered:

danaugrs · 2022-08-19T22:24:26Z

I was using spans with byte-based indexing and came across this issue so I changed to character-based indexing. It actually turned out to be a bit more efficient as well.

brockelmore · 2023-04-19T14:46:56Z

fwiw I created a fork that does this (but removes char based version entirely) here

All it does is switch this line:

ariadne/src/source.rs

Line 94 in ccd4651

let len = line.chars().count();

to:

let len = line.chars().fold(0, |mut acc, i| {acc += i.len_utf8(); acc });

I am happy to open a PR that adds a function from_with_byte_offsets(s: S) -> Self where S: AsRef<str> to Source if that is desirable

Johan-Mi · 2023-05-19T19:24:29Z

let len = line.chars().fold(0, |mut acc, i| {acc += i.len_utf8(); acc });

Isn't that simply line.len()?

brockelmore · 2023-05-19T20:13:13Z

haha TIL String::len gives the length in bytes! yes you are correct

goto-bus-stop · 2023-08-25T08:20:00Z

I'm interested in working on this. @zesterer would you be open to switching entirely to byte indices? Or should both ways be supported?

zesterer · 2023-08-27T18:27:19Z

I'm interested in working on this. @zesterer would you be open to switching entirely to byte indices? Or should both ways be supported?

Definitely interested! In my view, spans should use byte indices by default, with some built-in way to look up character indices. I'm happy to accept a temporary solution for now: long-term, I think the crate needs more substantial changes anyway.

VonTum · 2024-02-20T10:29:24Z

Is there a branch which already solves this?

zesterer · 2024-02-20T17:57:48Z

Not yet, no. I don't currently have the time to work on it, although I wish I did.

VonTum · 2024-02-20T20:40:29Z

I see in the code it's possible to provide your own impl Cache<>, perhaps turning Source into a trait, (or of course for backward compatibility, have Source implement a new SourceTrait), which can have differing implementations such that one can also avoid the String allocations for each line, and allow byte indexing.

zesterer · 2024-02-21T21:10:13Z

This has been suggested elsewhere, yes. That's probably something to add to the list for an eventual refactor.

VonTum · 2024-02-22T13:11:00Z

I'm getting started in a PR to implement this, which approach would you prefer @zesterer ?

Extend the Span trait with a fn is_byte_span(&self) -> bool, returning false by default not to break existing code, then add a ByteSpan impl Span, or
Add a field or template parameter to Report/ReportBuilder that sais if the given spans are byte spans.

The first is of course more general, even allowing users to combine Char and Byte spans, but requires two new structs for Span with and without SourceId.

The second is less general, requiring the user to specify it for the whole report. Which is more ergonomic, but limits freedom

VonTum · 2024-02-27T20:03:06Z

Well, I've finally come to the point where I need to switch it over. I'll implement it as a boolean flag to ReportBuilder.

Closes zesterer#8 Closes Duplicate issues zesterer#71 and zesterer#57

Zollerboy1 · 2024-04-25T15:40:08Z

Any chance that this feature could get into a patch release soon? I'm fine with putting

ariadne = { git = "https://github.com/zesterer/ariadne.git", rev = "a061071" }

in my Cargo.toml for now, but it would be much nicer if this could be version 0.4.1 instead.

EDIT:

Actually, while playing around with this feature, I discovered that the column number of the offset of a report is printed incorrectly when using byte indices. I created a PR to fix this: #113.

zesterer · 2024-04-25T19:00:49Z

I've just released 0.4.1, which includes these changes. Thanks @VonTum and @Zollerboy1 for the contributions!

zesterer added the enhancement New feature or request label Sep 23, 2021

bestouff mentioned this issue Oct 12, 2021

API overhaul #12

Open

5 tasks

zesterer mentioned this issue Jan 4, 2023

support byte spans #57

Closed

zesterer mentioned this issue Apr 19, 2023

Allow Spans to be byte offset based instead of char offset based #71

Closed

goto-bus-stop mentioned this issue Aug 30, 2023

Diagnostics API improvements apollographql/apollo-rs#617

Closed

8 tasks

VonTum added a commit to VonTum/ariadne that referenced this issue Feb 29, 2024

Add byte spans

3dfac9a

Closes zesterer#8 Closes Duplicate issues zesterer#71 and zesterer#57

zesterer closed this as completed in a061071 Mar 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option for a Span to be based on bytes rather than characters #8

Add option for a Span to be based on bytes rather than characters #8

Spu7Nix commented Sep 10, 2021

danaugrs commented Aug 19, 2022 •

edited

Loading

brockelmore commented Apr 19, 2023

Johan-Mi commented May 19, 2023

brockelmore commented May 19, 2023

goto-bus-stop commented Aug 25, 2023

zesterer commented Aug 27, 2023

VonTum commented Feb 20, 2024

zesterer commented Feb 20, 2024

VonTum commented Feb 20, 2024

zesterer commented Feb 21, 2024

VonTum commented Feb 22, 2024

VonTum commented Feb 27, 2024

Zollerboy1 commented Apr 25, 2024 •

edited

Loading

zesterer commented Apr 25, 2024

Add option for a Span to be based on bytes rather than characters #8

Add option for a Span to be based on bytes rather than characters #8

Comments

Spu7Nix commented Sep 10, 2021

danaugrs commented Aug 19, 2022 • edited Loading

brockelmore commented Apr 19, 2023

Johan-Mi commented May 19, 2023

brockelmore commented May 19, 2023

goto-bus-stop commented Aug 25, 2023

zesterer commented Aug 27, 2023

VonTum commented Feb 20, 2024

zesterer commented Feb 20, 2024

VonTum commented Feb 20, 2024

zesterer commented Feb 21, 2024

VonTum commented Feb 22, 2024

VonTum commented Feb 27, 2024

Zollerboy1 commented Apr 25, 2024 • edited Loading

zesterer commented Apr 25, 2024

danaugrs commented Aug 19, 2022 •

edited

Loading

Zollerboy1 commented Apr 25, 2024 •

edited

Loading