-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VFS-524] A URI with an IPv6 address can't be parsed out correctly #438
[VFS-524] A URI with an IPv6 address can't be parsed out correctly #438
Conversation
@@ -94,7 +93,7 @@ protected Http5FileObject(final AbstractFileName name, final FS fileSystem, | |||
final FileSystemOptions fileSystemOptions = fileSystem.getFileSystemOptions(); | |||
urlCharset = builder.getUrlCharset(fileSystemOptions); | |||
final String pathEncoded = ((GenericURLFileName) name).getPathQueryEncoded(getUrlCharset()); | |||
internalURI = URIUtils.resolve(fileSystem.getInternalBaseURI(), pathEncoded); | |||
internalURI = fileSystem.getInternalBaseURI().resolve(pathEncoded); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
URIUtils.resolve()
from HttpClient5.1 has an issue of escaping host symbols that is not working properly with IPv6 hosts. HttpClient4 and native Java URI.resolve()
don't have this problem.
if (isIPv6Host) { | ||
return ch == ']'; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @fyudanov
Thank you for this PR! 😄
The expression below does not look right to me or the method is misnamed, but the code sure could use some comments 😉
Below, we have the test for the last char of a host name:
return ch == '/' || ch == ';' || ch == '?' || ch == ':' || ch == '@' || ch == '&' || ch == '=' || ch == '+'
|| ch == '$' || ch == ',';
Reading https://datatracker.ietf.org/doc/html/rfc3986#page-49:
host = IP-literal / IPv4address / reg-name
...
reg-name = *( unreserved / pct-encoded / sub-delims )
...
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
For example, shouldn't ~
be in the test. Also the method needs comments if you intend to also test for a separator char with rest of the path like /
and ?
. Or am I missing something?
At least this PR needs more testing with host names and URI that contain all the these special characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the char list from
ch == '/' || ch == ';' || ch == '?' || ch == ':' || ch == '@' || ch == '&' || ch == '=' || ch == '+'
|| ch == '$' || ch == ',';
looks a bit inconsistent with RFC (at first site at least), but it wasn't introduced or changed in this PR, I've just moved it to the separate method.
Not sure, if we want to revisit it in the scope of the current change, because the IPv6 host parsing algorithm doesn't actually depend on this list :)
I could add few more tests with chars like ~
, if it would be sufficient. But to me, this would cover previously existing IPv4 cases rather than the current change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More tests would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've added few new test cases. Those that are commented out are the cases that doesn't work like I would expect according to RFC.
However, I've ensured the behavior on the tests is exactly the same as it was before this PR, so I'm not sure we want to cover missed cases here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @fyudanov
Please see my comment in the code.
https://issues.apache.org/jira/browse/VFS-524
The fix is based on the one, suggested in the Jira ticket, but also includes some corner cases covered and a number of additional unit tests, also tested end-to-end.