Skip to content

Commit

Permalink
Update doc
Browse files Browse the repository at this point in the history
  • Loading branch information
tatsuhiro-t committed Nov 13, 2024
1 parent ad65582 commit 7048357
Showing 1 changed file with 22 additions and 0 deletions.
22 changes: 22 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,25 @@ urlparse
urlparse is a URL parser compatible to `url_parser_parse_url()` from
`nodejs/http-parser <https://github.com/nodejs/http-parser>`_ which
has been archived since 2022.

There is a slight difference in a return code when they fail.
`url_parser_parse_url()` returns nonzero if it fails.
`urlparse_parse_url()` returns the negative error code
``URLPARSE_ERR_PARSE`` if it fails.

`url_parser_parse_url()` historically does not follow any standards
like RFC 3986. Here is the allowed characters in each URL component:

- scheme: ``A-Za-z``
- userinfo: ``A-Za-z!$%&'()*+,-.:;=_~``
- host: ``a-zA-Z0-9-.``
- IPv6 host: ``A-Fa-f0-9.:``
- optionally followed by zone info which starts ``%`` and can
contain: ``A-Za-z0-9%-._~``
- and IPv6 host must be enclosed by ``[`` and ``]``
- port: ``0-9``
- path: ``A-Za-z0-9!"$%&'()*+,-./:;<=>@[\\]^_\`{|}~``
- query: ``A-Za-z0-9!"$%&'()*+,-./:;<=>?@[\\]^_\`{|}~``
- fragment: ``A-Za-z0-9!"#$%&'()*+,-./:;<=>?@[\\]^_\`{|}~``
- all consecutive ``#`` characters that preceed a fragment are
treated as a single ``#``.

0 comments on commit 7048357

Please sign in to comment.