Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-document mode produces events only for first document #42

Closed
display-none opened this issue Feb 8, 2018 · 3 comments
Closed

Multi-document mode produces events only for first document #42

display-none opened this issue Feb 8, 2018 · 3 comments
Labels
Milestone

Comments

@display-none
Copy link

When trying to parse a stream of documents, woodstox produces events for first document and after an end-document event goes into end-of-input state and reports that it has no more events. Seems like it's not the intended behaviour.

For example the following code will print "Document start" only once.

import com.ctc.wstx.api.WstxInputProperties;
import com.google.common.collect.Lists;
import org.codehaus.stax2.XMLEventReader2;
import org.codehaus.stax2.XMLInputFactory2;

import javax.xml.stream.XMLStreamException;
import javax.xml.stream.events.XMLEvent;
import java.io.StringReader;
import java.util.List;

import static org.apache.commons.lang3.StringUtils.*;

public class Test {

    private static final String MULTIDOC =
            "<?xml version='1.0'?><root>text</root><!--comment-->\n"
            +"<?xml version='1.0'?><root>text</root><?proc instr>\n"
            +"<?xml version='1.0'?><root>text</root><!--comment-->"
            +"<?xml version='1.0' encoding='UTF-8'?><root>text</root><!--comment-->"
            +"<?xml version='1.0' standalone='yes'?><root>text</root><!--comment-->"
            +"<?xml version='1.0'?><root>text</root><!--comment-->";


    public static void main(String[] args) throws XMLStreamException {
        new Test().woodstox();
    }

    public void woodstox() throws XMLStreamException {
        XMLInputFactory2 factory = (XMLInputFactory2) XMLInputFactory2.newInstance();
        factory.setProperty(WstxInputProperties.P_INPUT_PARSING_MODE, WstxInputProperties.PARSING_MODE_DOCUMENTS);

        XMLEventReader2 xmlEventReader = (XMLEventReader2) factory.createXMLEventReader(new StringReader(MULTIDOC));
        while (xmlEventReader.hasNextEvent()) {
            XMLEvent xmlEvent = xmlEventReader.nextEvent();
            if (xmlEvent.isStartDocument()) {
                System.out.println("Document start");
            }
        }
    }

}

woodstox 5.0.3, stax2 4.0.0

@cowtowncoder
Copy link
Member

That does seem unexpected. I think most testing has been done with token streams; I don't use event streams myself, ever (there is not much benefit to using, just overhead), so it is possible that something with interaction there might be presenting problems. Perhaps event construction logic does not take multi-doc mode into account, for example.

On plus side this might something relatively easy to fix.

@cowtowncoder cowtowncoder added the active Issue being actively investigated label Mar 28, 2018
cowtowncoder added a commit that referenced this issue Mar 30, 2018
@cowtowncoder
Copy link
Member

I can reproduce this, added failing test.

@cowtowncoder
Copy link
Member

I can see how this occurs, but not sure how easy it would be to resolve. Part of the problem is with use of Stax2 reference implementation, which can't rely on mode (since it does not know of it).
But I think I can look into this more after releasing intermediate 5.1.0 version.

cowtowncoder added a commit that referenced this issue Aug 22, 2018
@cowtowncoder cowtowncoder added this to the 5.2.0 milestone Aug 22, 2018
@cowtowncoder cowtowncoder added bug and removed active Issue being actively investigated labels Aug 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants