Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use own copy of PercentCodec for URI path encoding #1109

Merged
merged 6 commits into from
Jul 31, 2024

Conversation

Xtansia
Copy link
Collaborator

@Xtansia Xtansia commented Jul 25, 2024

Description

Use own copy of PercentCodec for URI path encoding

Adapted from Apache HttpComponents HttpCore v5's https://github.com/apache/httpcomponents-core/blob/e009a923eefe79cf3593efbb0c18a3525ae63669/httpcore5/src/main/java/org/apache/hc/core5/net/PercentCodec.java

Issues Resolved

Fixes #1103

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@dblock
Copy link
Member

dblock commented Jul 25, 2024

Can we add passing an optional encoder (without reflection) for backwards compatibility?

@Xtansia Xtansia force-pushed the fix/unify-percent-encoding branch from 59ef0db to 7fcb24d Compare July 30, 2024 02:18
Signed-off-by: Thomas Farr <[email protected]>
@Xtansia Xtansia marked this pull request as ready for review July 30, 2024 02:29
@reta
Copy link
Collaborator

reta commented Jul 30, 2024

Can we add passing an optional encoder (without reflection) for backwards compatibility?

For bwc, we need to make sure 2.x uses HCv4 encoding but 3.x uses HVv5 encoding, we could do that by class lookup.

"Either '" + HTTP_CLIENT5_UTILS_CLASS + "' or '" + HTTP_CLIENT4_UTILS_CLASS + "' is required by not found on classpath"
);
}
public static void setCodec(PercentCodec codec) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make it package private for now

Suggested change
public static void setCodec(PercentCodec codec) {
static void setCodec(PercentCodec codec) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want to let users call it for backwards compat?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want to let users call it for backwards compat?

May be not directly? As part of client transport builder?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta The problem with calling it as part of the transport builder is you're then hiding the static/global nature of it behind an instance creation. It's perfectly reasonable that in my app I construct two different clients using different transports and could unwittingly override the setting. Which ideally it would be completely tied to the transport instance, but that's not backportable to 2.x.

Copy link
Collaborator

@reta reta Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta The problem with calling it as part of the transport builder is you're then hiding the static/global nature of it behind an instance creation.

Thanks @Xtansia , I understand that. Consider the alternative - asking users to call this method directly. Personally, I don't think we are giving the right tool out - I am having hard time even to explain in documentation what are the consequences of letting to change the path encoder arbitrarily, from anywhere, anytime.

It's perfectly reasonable that in my app I construct two different clients using different transports and could unwittingly override the setting.

We have all the instruments to prevent that from happening: fe, use set-once semantics.

Copy link
Member

@dblock dblock Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta The problem with calling it as part of the transport builder is you're then hiding the static/global nature of it behind an instance creation.

It sounds like we all agree that the ultimate API to recommend users is a builder like so.

builder = ApacheHttpClient5TransportBuilder.builder(hosts)
  .encoder(PercentCodec.RFC3986_UNRESERVED)
  .decoder(PercentCodec.RFC3986_UNRESERVED)
  .build()

If we implement this then we don't need a global public setting, do we? It's a convenience that causes a lot of side-effects, maybe we better force users to explicitly set it for every client they create?

If you feel strongly about it, I am fine with both PathEncoder.default(Encoding.HTTP_CLIENT_V5_EQUIV) global default and an env setting, but I don't know why we'd want to add yet another level of indirection or even the global default out of "convenience" for users which will end up actually having side effects in their entire codebase.

I do think that just having a setting org.opensearch.path.encoding is not sufficient, I'd want a programmatic way of calling PathEncoder.default(...) - users reporting the bug they are experiencing originally probably want to add a line of code to their apps, not a setting (they might not have any).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we implement this then we don't need a global public setting, do we? It's a convenience that causes a lot of side-effects, maybe we better force users to explicitly set it for every client they create?

That would be ideal but the problem comes from usage of the path encoder: it is referenced statically from (literally) every single request class: SimpleEndpoint.pathEncode(...);. That's the showstopper that @Xtansia tries to solve with static PathEncoder configuration .

So if we do follow the builder pattern

builder = ApacheHttpClient5TransportBuilder.builder(hosts)
  .pathEncoder(PercentCodec.RFC3986_UNRESERVED)
  .build()

The pathEncoder would set the static path encoder behind the scene (and I think this is 100% fine). I hardly imagine the case when users would want clients with same JVM / app but different path encoders.

PS: I don't think we need decoder for path but that's just a comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I understand. I am strongly opposed to an instance .pathEncoder that sets a global default. That's definitely not expected! We should either fix the static calls or I'm fine with a global setting.

Copy link
Collaborator

@reta reta Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's definitely not expected! We should either fix the static calls or I'm fine with a global setting.

My mental model breaks for static call (sorry), but global setting (system property) is looking acceptable, thanks @Xtansia

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm good with this.


public static final PercentCodec DEFAULT_CODEC = APACHE_HTTP_CLIENT_5_EQUIV_CODEC;

private static PercentCodec codec;
Copy link
Collaborator

@reta reta Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we leave an option to change the value from anywhere:

Suggested change
private static PercentCodec codec;
private static volatile PercentCodec codec = DEFAULT_CODEC;

private static PercentCodec codec;

public static PercentCodec getCodec() {
if (codec == null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could just initialize the codec with DEFAULT_CODEC value

Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs an UPGRADING.md and a guide I think.

"Either '" + HTTP_CLIENT5_UTILS_CLASS + "' or '" + HTTP_CLIENT4_UTILS_CLASS + "' is required by not found on classpath"
);
}
public static void setCodec(PercentCodec codec) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want to let users call it for backwards compat?

Xtansia added 2 commits July 31, 2024 18:40
Signed-off-by: Thomas Farr <[email protected]>
Signed-off-by: Thomas Farr <[email protected]>
reta
reta previously approved these changes Jul 31, 2024
@dblock
Copy link
Member

dblock commented Jul 31, 2024

@Xtansia Can you please add to https://github.com/opensearch-project/opensearch-java/blob/main/UPGRADING.md and we can merge?

Signed-off-by: Thomas Farr <[email protected]>
@dblock dblock merged commit 3d8061e into opensearch-project:main Jul 31, 2024
58 checks passed
@dblock
Copy link
Member

dblock commented Jul 31, 2024

Merged, backport accordingly?

@Xtansia Xtansia deleted the fix/unify-percent-encoding branch August 1, 2024 02:47
Xtansia added a commit to Xtansia/opensearch-java that referenced this pull request Aug 1, 2024
…t#1109)

* Use own copy of PercentCodec for URI path encoding

Adapted from Apache HttpComponents HttpCore v5's https://github.com/apache/httpcomponents-core/blob/e009a923eefe79cf3593efbb0c18a3525ae63669/httpcore5/src/main/java/org/apache/hc/core5/net/PercentCodec.java

Signed-off-by: Thomas Farr <[email protected]>

* Refactor PercentCodec a bit

Signed-off-by: Thomas Farr <[email protected]>

* Add change log

Signed-off-by: Thomas Farr <[email protected]>

* Switch to system property

Signed-off-by: Thomas Farr <[email protected]>

* spotless

Signed-off-by: Thomas Farr <[email protected]>

* Add UPGRADING note

Signed-off-by: Thomas Farr <[email protected]>

---------

Signed-off-by: Thomas Farr <[email protected]>
(cherry picked from commit 3d8061e)
dblock pushed a commit that referenced this pull request Aug 1, 2024
)

* Use own copy of PercentCodec for URI path encoding (#1109)

* Use own copy of PercentCodec for URI path encoding

Adapted from Apache HttpComponents HttpCore v5's https://github.com/apache/httpcomponents-core/blob/e009a923eefe79cf3593efbb0c18a3525ae63669/httpcore5/src/main/java/org/apache/hc/core5/net/PercentCodec.java

Signed-off-by: Thomas Farr <[email protected]>

* Refactor PercentCodec a bit

Signed-off-by: Thomas Farr <[email protected]>

* Add change log

Signed-off-by: Thomas Farr <[email protected]>

* Switch to system property

Signed-off-by: Thomas Farr <[email protected]>

* spotless

Signed-off-by: Thomas Farr <[email protected]>

* Add UPGRADING note

Signed-off-by: Thomas Farr <[email protected]>

---------

Signed-off-by: Thomas Farr <[email protected]>
(cherry picked from commit 3d8061e)

* Tweak default encoding behavior to match prior behavior

Signed-off-by: Thomas Farr <[email protected]>

* spotless

Signed-off-by: Thomas Farr <[email protected]>

---------

Signed-off-by: Thomas Farr <[email protected]>
dancristiancecoi pushed a commit to dancristiancecoi/opensearch-java that referenced this pull request Aug 8, 2024
…t#1109)

* Use own copy of PercentCodec for URI path encoding

Adapted from Apache HttpComponents HttpCore v5's https://github.com/apache/httpcomponents-core/blob/e009a923eefe79cf3593efbb0c18a3525ae63669/httpcore5/src/main/java/org/apache/hc/core5/net/PercentCodec.java

Signed-off-by: Thomas Farr <[email protected]>

* Refactor PercentCodec a bit

Signed-off-by: Thomas Farr <[email protected]>

* Add change log

Signed-off-by: Thomas Farr <[email protected]>

* Switch to system property

Signed-off-by: Thomas Farr <[email protected]>

* spotless

Signed-off-by: Thomas Farr <[email protected]>

* Add UPGRADING note

Signed-off-by: Thomas Farr <[email protected]>

---------

Signed-off-by: Thomas Farr <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Default URL path encoding differs between main & 2.x
3 participants