You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A small number of PMC files use the xlink namespace without defining it first. For example, the documents include "xlink:href" where "xlink" hasn't be defined. This breaks the XML parser and gives errors like below.
Traceback (most recent call last):
File "/projects/jlever/github/biotext/src/bioconverters/pmcxml.py", line 390, in pmcxml2bioc
for pmc_doc in process_pmc_file(source, tag_handlers=tag_handlers):
File "/projects/jlever/github/biotext/src/bioconverters/pmcxml.py", line 274, in process_pmc_file
for event, elem in etree.iterparse(source, events=("start", "end", "start-ns", "end-ns")):
File "/home/jlever/.linuxbrew/Cellar/python/3.7.3/lib/python3.7/xml/etree/ElementTree.py", line 1222, in iterator
yield from pullparser.read_events()
File "/home/jlever/.linuxbrew/Cellar/python/3.7.3/lib/python3.7/xml/etree/ElementTree.py", line 1297, in read_events
raise event
File "/home/jlever/.linuxbrew/Cellar/python/3.7.3/lib/python3.7/xml/etree/ElementTree.py", line 1269, in feed
self._parser.feed(data)
xml.etree.ElementTree.ParseError: unbound prefix: line 12, column 0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "src/convertPMC.py", line 56, in <module>
for bioc_doc in pmcxml2bioc(io.StringIO(data)):
File "/projects/jlever/github/biotext/src/bioconverters/pmcxml.py", line 450, in pmcxml2bioc
raise RuntimeError("Parsing error in PMC xml file: %s" % source)
RuntimeError: Parsing error in PMC xml file: <_io.StringIO object at 0x7f04d1099c18>
An initial hacky fix was implemented in 63663fe and e30c3e9. This tried to fixed href specific cases. This needs to be explored further (as a new non href-related file) has appeared.
The text was updated successfully, but these errors were encountered:
A small number of PMC files use the xlink namespace without defining it first. For example, the documents include "xlink:href" where "xlink" hasn't be defined. This breaks the XML parser and gives errors like below.
An initial hacky fix was implemented in 63663fe and e30c3e9. This tried to fixed href specific cases. This needs to be explored further (as a new non href-related file) has appeared.
The text was updated successfully, but these errors were encountered: