Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

show_map.php should handle & in url #28

Open
johnksv opened this issue Mar 12, 2023 · 4 comments
Open

show_map.php should handle & in url #28

johnksv opened this issue Mar 12, 2023 · 4 comments

Comments

@johnksv
Copy link
Contributor

johnksv commented Mar 12, 2023

Some RSS handlers htmlescape special signs in the links, such as & -> &. If the urls are then accessed directly without converting back, doma can't find the map.

Example:
Link from doma rss feed: https://kartarkiv.nydalen.idrett.no/show_map.php?user=vbj&map=7191
Link after the feed has been processed by w3 rss feed validator ( https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fkartarkiv.nydalen.idrett.no%2Frss.php ): https://kartarkiv.nydalen.idrett.no/show_map.php?user=vbj&map=7191

The later link result in doma not finding the map, and thus returning The map has been removed. to the user.
Related code:
https://github.com/matstroeng/doma/blob/master/src/show_map.controller.php#L19

Suggested solution: Doma/php should handle & as url parameter.

Edit:
Think this must be solved in code, since:

PHP's URL parser does not expect to encounter HTML entities, because they should not be present in URLs; it therefore correctly splits the query string on &, treating the trailing amp; as part of the key.
(source: https://stackoverflow.com/questions/17972654/amp-precedes-get-array-element-parameter-name)

@runerys
Copy link
Contributor

runerys commented Mar 21, 2023

I think the root problem is that the rss-link uses the print statement - which HTML-encodes the url.
https://github.com/matstroeng/doma/blob/master/src/rss.php#L27

Html encoding content is a safety measure, but in this case, the Doma service has full control over the url and can output the raw value in the link element.

@johnksv
Copy link
Contributor Author

johnksv commented Mar 21, 2023

That is a good find @runerys , thanks.
When I investigated this issue I just used the browser. It does of course decode the entities, thus I didn't notice this. When using curl to scrape the url the output is:

$ curl -v https://kartarkiv.nydalen.idrett.no/rss.php
…
<link>https://kartarkiv.nydalen.idrett.no/show_map.php?user=vbj&amp;map=7191</link>
…

Will refactor the issue and PR to fix the root cause.

@runerys
Copy link
Contributor

runerys commented Mar 21, 2023

Yes - and "View source" in the browser reveals the same.

I have to admit that I struggle to find out if the RSS link-element MUST be encoded according to standards. I've searched around a bit, but all examples I find are to blogs with nice folder-like urls.

@runerys
Copy link
Contributor

runerys commented Mar 21, 2023

Unfortunately, I'm wrong. I found some validators, and the link element must be html encoded. The error is in the client application handling the feed and NOT html decoding the link before opening it.

So I guess a fix must look more like your original proposal.

You can try validating both urls and direct rss input here: https://validator.w3.org/feed/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants