Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EdnError rework (new parse impl) #150

Merged
merged 8 commits into from
Jun 7, 2024
Merged

EdnError rework (new parse impl) #150

merged 8 commits into from
Jun 7, 2024

Conversation

Grinkers
Copy link
Collaborator

Resolves #119

Performance

Note this is using the benchmark in main branch. The performance increase should be much better on larger strings, as there is now no clone() in parse. The only allocations are for constructing the EDN struct and for parsing numbers. Note that the walker is now all pointer based. Peeking was a big reason previously for the clones, you can see the new performance characteristics (utf-8 complexity but no allocations) https://godbolt.org/z/zK9dEzrYn

Before

     Running benches/parse.rs (target/release/deps/parse-5de3e714a9fe6fe3)
parse                   time:   [23.316 µs 24.270 µs 25.380 µs]
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) high mild
  1 (1.00%) high severe
     Running benches/tagged_parse.rs (target/release/deps/tagged_parse-041b63e457dffa70)
tagged_parse            time:   [4.4875 µs 4.5609 µs 4.6395 µs]
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

After

     Running benches/parse.rs (target/release/deps/parse-5de3e714a9fe6fe3)
parse                   time:   [4.6219 µs 4.6649 µs 4.7142 µs]
                        change: [-81.813% -80.967% -80.093%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  8 (8.00%) high severe
     Running benches/tagged_parse.rs (target/release/deps/tagged_parse-041b63e457dffa70)
tagged_parse            time:   [1.6429 µs 1.6504 µs 1.6588 µs]
                      change: [-65.166% -64.416% -63.697%] (p = 0.00 < 0.05)
                      Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe

Debug

given

let edn = "#_ ,, #{hello, this will be discarded} #_{so will this} #{this is invalid";
let e = Edn::from_str(edn);
println!("{e:?}");

Before

Err(ParseEdn("None could not be parsed at char count 58"))

After

Err(EdnError { code: UnexpectedEOF, line: Some(1), column: Some(74), index: Some(73) })
Examples/tests can be seen in https://github.com/Grinkers/edn-rs/blob/parse/tests/error_messages.rs

Errors

In this rework, the following strings were also previously being parsed as valid EDN. Fuzzing invalid EDN is still a TODO (separate issue/PR).

":"
"5011227E71367421E12" ; https://github.com/edn-format/edn#symbols
"{ :[0x42] 42 }"
"\\cats"
"{ :foo 42 :foo 43 }"

@Grinkers Grinkers changed the title EdnError rework EdnError rework (new parse impl) Feb 20, 2024
@Grinkers Grinkers force-pushed the parse branch 2 times, most recently from 0a2d384 to 3965bc3 Compare February 23, 2024 19:39
@Grinkers Grinkers marked this pull request as ready for review February 23, 2024 19:43
@Grinkers
Copy link
Collaborator Author

3965bc3
I ended up adding a CustomError fallback for crates like edn-derive, std-only.

@evaporei
Matching PR for compatibility over at (this is a breaking change). Please tell me if you have any better ideas. I believe derive will always rely on std, so it's easy enough to add std stuff without messing with the base layer
evaporei/edn-derive#44

@Grinkers
Copy link
Collaborator Author

Added a new Error for duplicate in sets, matching clojure.

@naomijub
Copy link
Owner

Sorry for the delay, I'm moving to another city and have had limited time. I can take a look as soon as the branch is updated

Copy link

codecov bot commented Mar 21, 2024

Codecov Report

Attention: Patch coverage is 77.03927% with 76 lines in your changes are missing coverage. Please review.

Project coverage is 71.37%. Comparing base (c60d05b) to head (f658a21).

Files Patch % Lines
src/deserialize/parse.rs 81.59% 53 Missing ⚠️
src/deserialize/mod.rs 40.00% 21 Missing ⚠️
src/edn/error.rs 75.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #150      +/-   ##
==========================================
- Coverage   73.43%   71.37%   -2.06%     
==========================================
  Files           8        9       +1     
  Lines         862      877      +15     
==========================================
- Hits          633      626       -7     
- Misses        229      251      +22     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Grinkers
Copy link
Collaborator Author

Sorry for the delay, I'm moving to another city and have had limited time. I can take a look as soon as the branch is updated

No worries, I don't think it's too urgent. It would be nice to be merged in so I can tackle the other issues though. Rebased, which is what I assume you meant. No other changes. Tell me if you want this broken up/squashed in a different way.

I think #138 should be a higher priority (github release action -> crates.io publishing). I think we should do a final 0.17 before this PR gets merged too.

@Grinkers
Copy link
Collaborator Author

Code coverage bot seems really bad? It's marking tons of things as untested.

To compare, here's the source based coverage generated by https://github.com/Grinkers/edn-rs/blob/5ad0427304dc9a05276ef2de36013c82e5ab22e8/bb.edn#L38

parse.rs.html.zip

@Grinkers Grinkers mentioned this pull request May 29, 2024
@naomijub naomijub merged commit bc5bc8f into naomijub:main Jun 7, 2024
17 checks passed
Grinkers added a commit to Grinkers/edn-rs that referenced this pull request Jun 11, 2024
This reverts commit bc5bc8f, reversing
changes made to c60d05b.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EdnError relies on an owned and allocated String to convey information
2 participants