Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent breaking leading xml tag #142

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ArtyomBaranovskiy
Copy link

Hello,

I'm using your tool to build a kind of WebGrabber so I have to handle really many cases.
One of them is the html document with " tag instead of DocType.
It's rendered by any modern browser without errors so I expect the same functionality from CSQuery.
However, default output formatter transorms the tag into " which is handled in incorrect way by browsers.

I suggest the following pull request to fix the issue. I'm sorry for having no time to dig deeper to the root cause why described tag is parsed as html comment.

Short commit description:

1)Prevent Default OutputFormatter from breaking leading xml comment

  • As some sites specify leading xml comment tag instead of doctype, xml
    comment tag should not be broken during parsing of any document
  • When the comment tag is like - simply render it's NodeValue as
    no more wrapping is required

- As some sites specify leading xml comment tag instead of doctype, xml
  comment tag should not be broken during parsing of any document
- When the comment tag is like <?...?> - simply render it's NodeValue as
  no more wrapping is required
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants