-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NoSuchElementException when parsing xoai #66
Comments
Removing that particular |
Could you create a minimum test case with the failing example (and the XML not retrieved from the site, so the example is stable)? Could you also confirm if the XML is OAI valid? |
Minimal test case - inserting inline as GH wont take an XML attachment. As to its validity, I can confirm its created by a Dspace instance. Do you have an XOAI validator? Will also update issue title. <?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="static/style.xsl"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2017-07-31T20:10:46Z</responseDate>
<request verb="GetRecord" identifier="oai:repository.somewhere.ac.uk:10373/1861"
metadataPrefix="xoai">http://repository.somewhere.ac.uk/oai/request</request>
<GetRecord>
<record>
<header>
<identifier>oai:repository.somewhere.ac.uk:9999/999</identifier>
<datestamp>2015-02-03T17:41:40Z</datestamp>
<setSpec>com_10373_3</setSpec>
<setSpec>col_10373_12</setSpec>
</header>
<metadata>
<metadata xmlns="http://www.lyncode.com/xoai">
<element name="dc">
<!-- either this block -->
<element name="contributor">
<element name="author">
<element name="none">
<field name="value">Author1, First A.</field>
<field name="value">Author2, Second</field>
<field name="value">Author3, Third</field>
</element>
</element>
</element>
<!-- or this following commented block -->
<!--
<element name="relation">
<element name="ispartof">
<element name="en">
<field name="value">Another article 6(4)</field>
</element>
</element>
</element>
-->
</element>
</metadata>
</metadata>
</record>
</GetRecord>
</OAI-PMH> |
Asking Google for "OAI validator" turns up quite a few hits. The only one I'm at all familiar with is OVAL: http://oval.base-search.net/ |
Thank you for that observation - I should have checked also. I have now checked with the "offending" endpoint with http://oval.base-search.net/ and http://validator.oaipmh.com/. In particular, the latter produced no error for ListRecords, and the former produced an error about "No incremental harvesting (day granularity) of ListRecords", which I think would be irrelevant. Output from a third validator can be found at http://oanet.cms.hu-berlin.de/validator/pages/validation_dini_results.xhtml?vid=ZUZaM2FscFM2NEpUY2lncHdZYno2QT09 - I don't feel qualified to ascertain the relevance of any of these to the Exception at hand. |
I believe this is concerned with more than two levels of nesting |
The problem is related to the underlying XmlReader, which consumes events without checking that they are not what was being requested. After some hacking, the simplest fix I could identify was just to check in the MetadataParser that the EOD had not been reached . If someone else is in agreement, I can add a test case, and make a pull request. |
I had a long plane journey, so rewrote the traversal code underlying MetadataParser, which has a number of problems when parsing xoai:
My revised MetadataParser can be found at cmacdonald@05f67f2 I have my own application code that I have with tested examples of OAI from Pure, Dspace and Eprints. I can make unit tests for xoai-serviceprovider. |
Stacktrace as follows:
Minimum reproducible:
Example record at: view-source:http://repository.abertay.ac.uk/oai/request?verb=GetRecord&metadataPrefix=xoai&identifier=oai:repository.abertay.ac.uk:10373/1861
parseElement is failing at: parsing of
license
. Example is<element name="license"><field name="bin">Tk9URTogVGhpcyBpcyB0aGUgZGVmYXVsdCBsaWNlbmNlIHRoYXQgdGhlIFVuaXZlcnNpdHkgb2YgQWJlcnRheSAKRHVuZGVlIHJlcXVpcmVzIGFsbCBzdWJtaXR0ZXJzIHRvIGdyYW50LgoKTk9OLUVYQ0xVU0lWRSBESVNUUklCVVRJT04gTElDRU5DRQoKQnkgYWdyZWVpbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbmNlLCB5b3UgKHRoZSBhdXRob3IocyksIApjb3B5cmlnaHQgb3duZXIgb3Igbm9taW5hdGVkIGFnZW50KSBncmFudHMgdG8gVW5pdmVyc2l0eSBvZiBBYmVydGF5IApEdW5kZWUgKFVBRCkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLCB0cmFuc2xhdGUgCihhcyBkZWZpbmVkIGJlbG93KSwgYW5kL29yIGRpc3RyaWJ1dGUgeW91ciBzdWJtaXNzaW9uIChpbmNsdWRpbmcgdGhlIAphYnN0cmFjdCkgd29ybGR3aWRlIGluIHByaW50IGFuZCBlbGVjdHJvbmljIGZvcm1hdCBhbmQgaW4gYW55IG1lZGl1bSwgCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBVQUQgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4gCllvdSBhbHNvIGFncmVlIHRoYXQgVUFEIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIApzdWJtaXNzaW9uIGZvciBwdXJwb3NlcyBvZiBzZWN1cml0eSwgYmFjay11cCBhbmQgcHJlc2VydmF0aW9uLgoKWW91IHJlcHJlc2VudCB0aGF0IHRoZSBzdWJtaXNzaW9uIGlzIG9yaWdpbmFsIHdvcmssIGFuZCB0aGF0IHlvdQpoYXZlIHRoZSByaWdodCB0byBncmFudCB0aGUgcmlnaHRzIGNvbnRhaW5lZCBpbiB0aGlzIGxpY2VuY2UuIFlvdSAKYWxzbyByZXByZXNlbnQgdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIgCmtub3dsZWRnZSwgaW5mcmluZ2UgdXBvbiBhbnlvbmUncyBjb3B5cmlnaHQuCgpJZiB0aGUgc3VibWlzc2lvbiBjb250YWlucyBtYXRlcmlhbCBmb3Igd2hpY2ggeW91IG9yIHlvdXIgcHVibGlzaGVyCmRvIG5vdCBob2xkIGNvcHlyaWdodCwgeW91IHJlcHJlc2VudCB0aGF0IHlvdSBoYXZlIG9idGFpbmVkIHRoZQp1bnJlc3RyaWN0ZWQgcGVybWlzc2lvbiBvZiB0aGUgY29weXJpZ2h0IG93bmVyIHRvIGdyYW50IFVBRCB0aGUKcmlnaHRzIHJlcXVpcmVkIGJ5IHRoaXMgbGljZW5jZSwgYW5kIHRoYXQgc3VjaCB0aGlyZC1wYXJ0eSBvd25lZAptYXRlcmlhbCBpcyBjbGVhcmx5IGlkZW50aWZpZWQgYW5kIGFja25vd2xlZGdlZCB3aXRoaW4gdGhlIHRleHQgb3IKY29udGVudCBvZiB0aGUgc3VibWlzc2lvbi4KCklGIFRIRSBTVUJNSVNTSU9OIElTIEJBU0VEIFVQT04gV09SSyBUSEFUIEhBUyBCRUVOIFNQT05TT1JFRCBPUiAKU1VQUE9SVEVEIEJZIEFOIEFHRU5DWSBPUiBPUkdBTklaQVRJT04gT1RIRVIgVEhBTiBVQUQsIFlPVSBSRVBSRVNFTlQgClRIQVQgWU9VIEhBVkUgRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgClJFUVVJUkVEIEJZIFNVQ0ggQ09OVFJBQ1QgT1IgQUdSRUVNRU5ULgoKVUFEIHdpbGwgY2xlYXJseSBpZGVudGlmeSB5b3VyIG5hbWUocykgYXMgdGhlIGF1dGhvcihzKSBvciBvd25lcihzKSAKb2YgdGhlIHN1Ym1pc3Npb24sIGFuZCB3aWxsIG5vdCBtYWtlIGFueSBhbHRlcmF0aW9uLCBvdGhlciB0aGFuIGFzIAphbGxvd2VkIGJ5IHRoaXMgbGljZW5jZSwgdG8geW91ciBzdWJtaXNzaW9uLgo=</field> </element>
which contains the mime encoded contents of the license.
v4.2.1-SNAPSHOT cloned from git repo today.
Any ideas?
The text was updated successfully, but these errors were encountered: