-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use xmlbf for XML parsing #67
Conversation
Some hlint cleanups
@wismill it'd be absolutely brilliant if you're able to complete this xmlbf implementation of an RDF/XML parser. It'd complete parsing support for the three parsers, 100% compliant with the w3c RDF parsing unit tests. |
# Conflicts: # bench/MainCriterion.hs
Add support for Algebraic Graphs. Fix robstewart57#59
# Conflicts: # src/Text/RDF/RDF4H/XmlParser.hs
By the way, this PR relies on an unmerged MR for Once we're done I think it would be better to just squash the commits. |
Amazing! Thank you.
Is there a specific GitLab issue raised at https://gitlab.com/k0001/xmlbf/issues for this fix? |
I have fixed the test suite for Turtle. Note that I have introduced a breaking change for the Turtle parser: document URI has precedence over the base URI in the initial state of the parser. It is not clear to me which base URI (as defined in https://www.ietf.org/rfc/rfc3986.txt, section 5.1) is passed to the parser. There are now only 2 tests failing (out of 1208!!) for this package: one that seems to be linked to a possible bug in IRI resolution, the other about XML serialization. Master branch has still 65 tests failing. We are pretty close! I have opened a new issue for |
Ok, from what I can read from this page, the tests in |
@wismill are you suggesting removing just I'd be fine with that. Perhaps we should keep the removal of that directory separate from this PR? If so, either we can remove it after the commits here a squashed into a single commit. Or otherwise, I could remove it in |
This is really excellent! I'm following your efforts on xmlbf at https://gitlab.com/k0001/xmlbf/merge_requests/5 and https://gitlab.com/k0001/xmlbf/merge_requests/9 . Something that occurred to me is that from xmlbf, might we get XML/RDF serialisation for free from a
in |
Yes, exactly! Then we could add tests that |
@robstewart57 do you mind that there are a lot of changes unrelated to the XML parser? I got very enthusiastic ;-) I will contain myself from now. By the way my first PR was accepted for |
I've moved that discussion over to another issue: #68 . Now that https://gitlab.com/k0001/xmlbf/commit/96ac0e1930837a58282b343a06f86055e037f627 has been accepted, once xmlbf 0.6 is uploaded to hackage is this PR ready to be squashed and merged? I will then merge the |
I need to do some cleanup and check basic performance. |
The new XML parser is faster that the XHT-based one, but probably still too slow. $ cabal v2-run rdf4h-bench -- "rdf4h/parsers/xml-"
benchmarking rdf4h/parsers/xml-xmlbf
time 1.475 s (1.388 s .. 1.556 s)
0.999 R² (0.999 R² .. 1.000 R²)
mean 1.442 s (1.397 s .. 1.465 s)
std dev 43.17 ms (2.698 ms .. 52.39 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking rdf4h/parsers/xml-xht
time 1.984 s (1.780 s .. 2.081 s)
0.999 R² (0.997 R² .. 1.000 R²)
mean 2.053 s (2.015 s .. 2.073 s)
std dev 36.46 ms (2.247 ms .. 44.34 ms)
variance introduced by outliers: 19% (moderately inflated) |
You may review and merge this PR. There are still some corner cases in the XML parser (indicated by TODO, FIXME) but I am confident they are unlikely to cause much trouble since the parser passes the W3C test suite. I think we should create an issue to release a new version of |
@robstewart57 Once you merge it to master I will work on #68, #70, #71 and #73. |
This PR has been merged, thanks again! |
Work in progress to add xmlbf backend for RDF/XML parsing.